Main Page Modules Class Hierarchy Alphabetical List Compound List File List Compound Members File Members Related Pages Examples

data format draft 0.1 and note

The project page is http://dataio.sourceforge.net/. More...

See also:: dataio

Note:: supported primitive data type

The supported primitive data type are:

bool, char, wchar_t (wchat_t does not work?), string, int, long, unsigned, unsigned long, float, double, long double.

To add new primitive type T, specialize the following two template methods:

_dataiorec<T>::stringtoitem(string &s, T &iten); // will change s, if desired

_dataiorec<T>::itemtostring(T const &item, string &s);

If T is supported primitive type (that above two methods is epecialized) vector <T> and vector<vector<T> > are automatically supported.

CSV (comma separated values) style: This is the basic data format used by dataio.

The data is separated by columnseparator

some conventions:

The dataio library skip extra spaces between words, except inside block delimited by stringdelimiter.
If space is used as column separator, the extra spaces is skiped.
in the simple text mode (charspecifier == '\0'??), the delimited string will contain any text, except the linedelimiter and no special representation is required. The string delimiter is represented by two subsequent string delimiter (the charspecifier?? is not implemented yet)
the standard delimited string is assumed as single delimited block (is suitable to check the standard if extendedmode is off).
The dataio (change to do it only if extendedmode is on??), it will recognize sequence of several blocks and words. In this case:
- extra space between words is deleted, and space between word and delimited block is desconsidered.
- The adjacent delimited block (possible separated by white spaces) is joined.
in the simple text mode (charspecifier == '\0'??), the string is delimited if contain column separator, comment markers, sequence of more than one !isgraph(), or string delimiters. It it contain lineseparator, output will cause an error.
CAUTION: EOF char (normally Ctrl Z) will not used except for lineseparator, because the input methods assume as lineseparator.
todo:
- delimmited string will contain \?? as decimal specification?

The each data block the form

[variable 1][separator][values 11]

...

[separator][values 1n]

...

[variable n][separator][values n1]

...

[separator][values nn]

single primitive type

the primitive data with names is posted in the data as

[var name] [separator] [value]

and without names as

[value]

vector of primitive type

The vector is posted as value separated by columnseparator, in the single line (if line is long, will use linewrap features, but remember that it meke unsupported by spreadseet application).

The vector data with name is the form

[var name] [separator] [values list]

[var name]

[separator] [values list]

The vector withou names is the form

[values list]

vector of vector of the primitive type

The vector<vector> are formed by several lines, each of this containning data separated by column separator as vector

with names, it is posted as

[var name] [separator] [first line values list]

[separator] [second line values list]

...

[separator] [last line values list]

[var name] [separator]

[first line values list]

[separator] [second line values list]

...

[separator] [last line values list]

this data without names is the form

[first line values list]

[second line values list]

...

[last line values list]

[empty line]

Note:: The primitive data type and their vectors will added to loader using dataio::add() methods. If name is passed, assumed as field as name, and if name is omoted, is assumed as name less field.

composed data type

The data corresponding to composed member data obedies the above requisites, but all member field need to one column shifted to indicate that is the member. for example, record "person" with fields: name=Paul, telephon = xxxxx, age=221, is soted as

person [separator] name [separator] Paul

[separator] telephon [separator] xxxxx

[separator] age [separator] 21

with item name, or

[separator] name [separator] Paul

[separator] telephon [separator] xxxxx

[separator] age [separator] 21

without names. The record field will or not use the name. for example, Above data with item name, but name of record without field name is the form

person [separator] Paul

[separator] telephon [separator] xxxxx

[separator] age [separator] 21

the only requeriments is that the empty one column is added in the left side of each lines of composed data (relative to current position). The other rules is same as non composed data.

For composed datafield with names, the end of datas is assumed as ocurrence of line that is not shifted as current position. (that contain non empty data at left of corrent shift position - shift position is one shifted relative to name column)

for composed data without names, the end of data is detected with their field components. No special rule to determine end of specific composed data field.

note The standard CSV skip all extra space (independent of the space is column separator or not) and while space that is not column separator or new line marker. The current stage of dataio skip extra spaces only if it is the column separator

to think: Perphaps, is usefull to insert option so that the all empty column in the line will skiped?
todo: except the case then column separator is space, the empty column at end of line is considered. In dataio, the empty column at end of line are ignored. Need to insert flag to make on/off the empty column clearing.
The string that contain more than one space between words, or contain special character or word, need to delimited with string delimiter. The most of spreadseet applications support only the input of this. The dataio does not support string delimiters, but in future, will support input/output of delimited string.
To think: The incorpolate character delimiters? the standard spreadseet applications do not implement char, because is considered as string containning one character.

Note:

The composed data field requires that is posted inside of one dataio class and added to dataio loader class. The recomendation is to use composed data only with names, in way to obtain more portability.
The dataio is assumed to composed data (with fields is the added itens) sometimes, the composed data is stored (or associated as item of the) extended class of dataio to perform validation, or manage complex data, such as table. in this case, the pointer casting to dataio is required. For example, if mydataio is derived class as

class mydataio : dataio {

....

};

Suppose that mydata is mydataio type and io is dataio type;

In way to add mydata as item of io with name "subfield", call

io.add("subfield", (dataio *)&mydata); // if derived class, need casting

important note:

The data without names will appear only before the names data (independent of the order that the field is added to loader). The unnamed data is processed in order until this is complete. After this, start the named data field readding. The named data does not require that ordered.
Empty column of end of row are ignored. If all of colun is empty (blank, or comment only), is assumed as blank line (used as data field separator).
The dataio is very configurable, for example:
- case sensive or not (using dataio::ignorecase() )
- define column separator (using dataio::columnseparator)
- change decimal denotatioon for float (use char dataio::decimal(char) )
- and much more.
The lineseparator is configurable in way to support different data format, but the cross platform users require special caution. The new line marker differ dependding the operating system as
- UNIX: '\n' (LF)
- ANSI (DOS/WINDOW): '\r\n' (CR LF)
- VAX/VMS: '\r' (CR)
- UNKNOW (there exist?): '\n\r' (LF CR) The dataio will detect new line automatically (default)

multi-block style: If some name appear more that once, is assumed that the data is the list blocken in the elements. For example, the data item = {x, y, z) will writed in the form

item [separator] x

item [separator] y

item [separator] z

It is suitable to describe the list of composed data (see table features). Note that can not use this features without item name, or within table data.

Note:: To manage correctly this, need to add item as member of loader, and item is that associated to extended class of dataio, so the methods

void dataio::startinblock(unsigned i) // prepare to input column i

bool dataio::startoutblock(unsigned i) // prepare to input column i

and optional

void dataio::validate(unsigned i) // validate column i that inputed

are implemented.

column oriented data: The data will be column oriented. In this case, is assumed to the transposed one (column is row and row is column) relative to standard one. As the table format, the blank line is assumed as end of datablock

If activate columnoriented and istable, the data becames as most popular spreadseet like table format.

table data: In this case, vector<vector> will not used as field data (the vector is supported, and the element is itored, one in each column). For example, suppose that ';' is used as column separator, then

name ; Alfred ; Paul ; Mary

point ; 12 ; 15 ; 17

; 21 ; 52 ; 37

; 9 ; 7 ; 27

sum ; 123 ; 74 ; 81

store the values:

column 1:

name = Alfred
point = 12, 21, 9
sum = 123

column 2:

name = paul
point = 15, 52, 7
sum = 74

column 3:

name = mary
point = 17, 37, 27
sum = 81

The blanck line are assumed as end of table.

Note:: In way to process table data format, need to extend dataio class and overwrite the methods:

void dataio::startinblock(unsigned i) // prepare to input column i

bool dataio::startoutblock(unsigned i) // prepare to input column i

and optional

void dataio::validate(unsigned i) // validate column i that inputed

attributor enabled mode: In the config file, normaly the value is attributed using attributor To support this, dataio permit use of attributor to attribute value for name (replacing column separator) in the first level named variables.

Actually, the attributor enabled mode will input/output unnamed data too, but it will disabled in way to compatibilize as most standard.

The attributor mode similar to CVS, except the fact that the separator between variables and data is [attribseparator]. Actually, [column separator] and [attribseparator] will used as separator of first column.

If extendedmode is disabled, the variable without names will not used, due to most popular standard.

todo: need to make cheking of correct use of this (only attributor will used to attribute values), in the standard mode (the extended mode, it is impossible, because will use unnamed data. will check, if unnamed is empty()).

Note: this mode is ignored by column oriented data and table data

section enabled mode: The windows ini file use section and it is used to the applications for readding only the data associated to it. The data io assume that, if one of sectionnameopen or sectionnameclose is enabled (isgraph character), assume that the first level (not shifted data) name is used as section.

Corrently, will use unnamed data in the section enabled mode, but it will desactivated to standarize, i.e, On the first level (not subfield) the only named data will be used in the section activated mode. for example, windows inifiel do not permit use of variable outside of section definition, but the current one do not checking (and empty section definition ends the current section)

The standard are as same as CSV, that affect the composed data field in the following way. The name of composed data is posted, delimited by sectionname marker and the one column shifting of their datafield is skipped:

[sectionnameopen][var name][sectionnameclose] [datalist 1] ... [datalist n]

The end of this block is assumed as occurrence of next section definition.

The section scape is made by empty section marker, and no subsection is suported. The dataio provides datadefinition outside of section (datafield that is not composed data) only before compose data by convenience, but section scape will be disabled. Note that non section data before section is not recomended, because is not windows standard.

If extendedmode is disabled, the variable without names will not used to make standard. Note that the non-composed data type will be used because it is usefull, but is not the most popular standard.

Todo: the output methods will changed to output the non composed data first and follows to composed data??.
to think: On the input, emppty section is will ignored for compatibility for standard? or continue to supporting??

Note:: the section enabled mode is ignored by column oriented data and table data.

Note:: The section enabled, attrib separator enabled, exteddedmode disabled mode Assume that is standard sectionized style that each composed element does not contain unnamed data member and their members is simple, or vector. Thus, insert attrib separator between first and seccond column.

comment on data

The two comment is supported:
1. comment block delimited by commentopen and commentclose marker (if one of this is empty, assumed as disabled)
2. the line comment: if commentline marker is found, assume taht rest of current line is comment.

need to add the member before their comment. To add comment, search for arleady added members. If not found, the action is not performed. For example, to add comment to item foo, need to add foo first.
comment output
1. the global comment is outputed first, using one line for each item. The line commentting mode is used as prefered mode, and an extra blank line is inserted after end of global comment (if global comment is not empty).
2. If istable and columnoriented is disabled, each item comment is outputed, using one line for each item. The line commentting mode is used as prefered mode.
3. for comment of subitems, the block commenting mode is used. Each comment item is delimited by comment delimiter. The comment list is treated like as list of values and the output depend the columnoriented. If no columnoriented is applyed is single line. (Each columnoriented between subitem and main class make traspose of data)
4. istable active mode: No member comment is outputed, but sub members comment is outputed.
5. column oriented mode: The all member comment are outputed as subitems like comment mode.
Todo: In future, is abled to read comment line (line that only non white space value is comment) as single comment and store in the comment list associated to next data to read.

line wrapping: if linewrap is enabled, will break long line as several one The line wrapmarker says to join the line at the next one.

For example, if '\' is linewrap marker,

a [separator] b \

[separator] 7 [separator] 9

is interpreted as

a [separator] b [separator] 7 [separator] 9

Notes: flags related: -------------- 1. maped into member during input/outputsetvalue/getvalue process

are: lineseparator, columnseparator, linewrap, commentline, commentopenmark, commentclosemark, attribseparator, stringdelimiter, parseallinputstring, todo: charmarker used to specify the special characrer, like C/C++ (if is white, assumed disabled)

2. setted by method, and optionally, will map inside member during flag setting: ignorecase, emptyisvalid, decimal, validateall, collectnames todo: collectunknowdata used to decide collect or not the unknow data

3. maped by maprecomemdedflagstomember method: ignorecase, emptyisvalid, decimal, validateall, collectnames

4. Not maped inside members (do manually, if desire): sectionnameopen, sectionnameclose, istable, columnoriented, maxcolumnonline, clearemptytail, extendedmode (is checked only by operator>>() and operator<<()??

obs.: dataio::setted(string const &name) do not apply parsestring() in name. do it, if desire.

string on cell -------------- 1. The field that is not setted to string is considered as well as, if parseallinputstring is off. 2. On the name, or string item: The sequence of white space is replaced by one single space, and space on head or tail of strig is ignored. If delimiter string is used, the space before and after this is skiped. 3. if parseallinputstring is on, all item is parsed as in (2) 4. The name and string item is outputed as delimited string, if it contain markers: commentline, commentopen, commentclose, lineseparator, columnseparator, attribseparator, stringdelimiter, linewrap (if last non white of string), section name marker (if is name on first column) The lineseparator is deleted from string item, but on name or others, is keeped (atempty for, this, because linesepaator inside name, or other data that is not string will cause errors).

todo: implement extended text mode that word with non white char representations, as like C/C++ style epecification.

add/delete: ----------- 1. Add with name that arleady exist, replace old one. If need to output record list, use multiblock features.

2. Can not delete fields. Only delete features is clear() that delete all fields. Note that in the istable enabled mode, inside of startinblock, validate, delete will cause errors (setvalue do not check variable type)

Generated at Thu Sep 6 13:45:43 2001 for dataio by