CPD2 Data File Format

Internally in the DB system most data is exchanged in the form of ASCII CSV files with special headers at the start of the file. To work with this data is recommended you either export it into a simple form or use the internal programming interfaces (see “perldoc DATA_Interface”).

The basic structure of the file is that of a number of headers lines beginning with “!” followed by zero or more data lines. Lines are delimited by a newline (“\n” or Unix line encoding) only.

Each data line begins with a record type identifier that designates the interpretation of the CSV fields that follow. This record type is a reference to the definition in the header lines. Records without the minimum corresponding headers (“row;colhdr”, “row;mvc”, and “row;varfmt”) are not allowed.

Header Lines

Header lines always begin with “!” and are completely contained at the start of the file. That is, the first line not beginning with “!” is a data line, and all lines that follow it are also data lines. Ordering of values within the header does not matter. Conceptually headers take the form of a hierarchical tree with a single value per leaf.

Each header line has two components, the path to the leaf node and the value of that leaf. These are delimited by a “,” and anything after a second comma is ignored. The path takes the form of one or more values separated by “;” with all spaces ignored. The value is a single string with no commas in it, the interpretation of that string depends on the context of the leaf.

An example line:

!var;ZP1sB1_XH;FieldDesc,Gamma: ZP2*(1-RH/100)^(-ZP1) total blue (PM1)

This designates the leaf in the following tree:

var
|- ZP1sB1_XH
   |- FieldDesc

With the value “Gamma: ZP2*(1-RH/100)^(-ZP1) total blue (PM1)”.

Any data line that appears in a file is required to have a corresponding entry in the “row;colhdr”, “row;varfmt” and “row;mvc” hierarchies. Variables may also have entries under the “var” hierarchy while some general information about the file may be under the “fil” hierarchy. Other entries are allowed but may or may not be understood by components of the system (unless new data is being generated they will be passed through filters, however).

Each record type must contain a time specifier of some form, any one of “EPOCH” (Unix epoch time: seconds since 1970-01-01T00:00:00Z), “DateTime” (ISO 8601 date and time, in the form of “2010-06-18T15:55:06Z”), or both “Year” and “DOY” fields specifying the four digit year and the decimal DOY (starting with Jan 1st = 1.00000). Additionally many components of the system require a “STN” field that contains a station code string.

Generally all data in the file must be in strictly time ascending order, but certain programs allow for exceptions to that (such as data.archive.put). Additionally a variable of the same name, station, and archive source must have the same value in all records of a given time it appears in. That is, the combination of variable, station, archive and time designates a globally unique value.

row;colhdr

This component of the headers names the fields in a given record. The basic path looks like “!row;colhdr;<RECORD>,<RECORD>;<VARIABLE1>;<VARIABLE2>…”. That is the record name under the “row” and “colhdr” hierarchy with the final component being the record name. The value is a “;” separated list of variable names for that record in the same order as other “row” header lines and in the same order as the data of that record type.

Where “<RECORD>” is the name of the record being described, as described here, for example “S11a”. Each variable has a name that by convention follows the form described here, for example “Tu_S11”.

For example:

!row;colhdr;N21f,N21f;STN;EPOCH;DateTime;ZMethod_N21;ZEquation_N21;ZF1_N21;ZP1_N21;ZP2_N21

row;mvc

This component of the headers provides the MVCs (missing value codes, the value in the field when the data was not available) in a given record. The basic path looks like “!row;mvc;<RECORD>,<RECORD>;<MVC1>;<MVC2>…”. That is the record name under the “row” and “mvc” hierarchy with the final component being the record name. The value is a “;” separated list of MVCs for that record in the same order as other “row” header lines and in the same order as the data of that record type.

For most decimal numbers this consists of filling all digits with “9”.

For example:

!row;mvc;N21f,N21f;ZZZ;0;9999-99-99T99:99:99Z;Z;Z;9.999e-99;9.999e-99;9.999e-99

row;varfmt

This component of the headers provides the formats (these can be either in standard printf(2) form or in CPD2 extended form which is similar to printf format and begins with “*” instead of “%”, see $CPD2/src/libcpd2/sfmt.h for a full definition) in a given record. The basic path looks like “!row;varfmt;<RECORD>,<RECORD>;<FORMAT1>;<FORMAT2>…”. That is the record name under the “row” and “varfmt” hierarchy with the final component being the record name. The value is a “;” separated list of formats for that record in the same order as other “row” header lines and in the same order as the data of that record type.

The primary distinction with CPD2 extended formats is that they specify the number of digits before and after the decimal instead of the whole length, by default they also clip the value to that range, which is disabled with the “@” modifier. They can be generated with the Perl library CPD2NL and the function SFMTsprintf. For example the format “*@04.2f” generates numbers of the form “0012.34”.

For example:

!row;varfmt;N21f,N21f;%s;%u;%04d-%02d-%02dT%02d:%02d:%02dZ;%s;%s;%010.3e;%010.3e;%010.3e

var;<VariableName>;<VariableField>[;<StartTime>]

The “var” component of headers has the second level in the hierarchy as the name of the variable as defined in the “row;colhdr”. The field defines a string field related to that specific variable, rather than the whole file. The following forms as recognized to have special meanings, but others may be defined:

  • FieldDesc - A human readable description of the variable.
  • Wavelength - The “center” wavelength of the variable in nm and a “type” code, separated by “;”. For example “467;PSAP-3W”. This also requires a start time (see below). This is used explicitly to adjust wavelengths, for example by data.edit.wl.

An example of a simple (non-start time) header:

!var;ZP1_N21;FieldDesc,C: (ZP1)*(SS)^(ZP2)

A start time consists of any convertible time format with infinity allowed (designating the time of the first line of data in the file). If a start time is given even if it is infinite then this header is assumed to only apply from that time onwards only or until a new value for that header is received (in the context of archiving). That is, the header applies until a newer (a later start time) one is defined.

An example of a wavelength start timed header:

!var;ZF1bsB0_XH;Wavelength;2010-04-01T00:00:00Z,450;TSI Neph

Record Lines

After the last header line the record lines begin. These consists of a CSV string (with fields containing commas quoted), starting with the record type identifier. This record identifier is a reference into the “row” header lines and defines the contents of that record.

An example record line:

N21f,BRW,1270080000,2010-04-01T00:00:00Z,LevenbergMarquardt,TwoParameter,03.832e-01,05.599e+02,09.126e-01

Example File Excerpts

An example of the generated data for a CCN fit:

!row;colhdr;N21f,N21f;STN;EPOCH;DateTime;ZMethod_N21;ZEquation_N21;ZF1_N21;ZP1_N21;ZP2_N21
!row;mvc;N21f,N21f;ZZZ;0;9999-99-99T99:99:99Z;Z;Z;9.999e-99;9.999e-99;9.999e-99
!row;varfmt;N21f,N21f;%s;%u;%04d-%02d-%02dT%02d:%02d:%02dZ;%s;%s;%010.3e;%010.3e;%010.3e
!var;DateTime;FieldDesc,Date String (YYYY-MM-DDThh:mm:ssZ)
!var;EPOCH;FieldDesc,Epoch time: seconds from Jan 1 1970
!var;STN;FieldDesc,Station ID code
!var;ZEquation_N21;FieldDesc,Equation fitted
!var;ZF1_N21;FieldDesc,chi^2
!var;ZMethod_N21;FieldDesc,Fit method used
!var;ZP1_N21;FieldDesc,C: (ZP1)*(SS)^(ZP2)
!var;ZP2_N21;FieldDesc,k: (ZP1)*(SS)^(ZP2)
N21f,BRW,1270080000,2010-04-01T00:00:00Z,LevenbergMarquardt,TwoParameter,03.832e-01,05.599e+02,09.126e-01
N21f,BRW,1270081800,2010-04-01T00:30:00Z,LevenbergMarquardt,TwoParameter,09.611e-01,09.831e+02,01.310e+00
N21f,BRW,1270083600,2010-04-01T01:00:00Z,LevenbergMarquardt,TwoParameter,01.659e+00,01.032e+03,01.444e+00
N21f,BRW,1270085400,2010-04-01T01:30:00Z,LevenbergMarquardt,TwoParameter,04.959e+00,01.086e+03,01.395e+00

An example of some TSI neph data:

!StationID,SFB
!fil;FileName,S11_20100608T191505Z
!fil;ProcessedBy;S11,cpd
!fil;Project,Bondville Illinois USA
!fil;cpd2;librev, 2010-05-27T16:51:02Z
!fil;cpd2;revision,$Revision: 2632 $ May 27 2010 10:51:09
!fil;name,S11
!fil;version,cpd2
!row;colhdr;S11a,S11a;STN;EPOCH;DateTime;F1_S11;F2_S11;Tu_S11;T_S11;Uu_S11;U_S11;P_S11;BsB_S11;BsG_S11;BsR_S11;BbsB_S11;BbsG_S11;BbsR_S11
!row;mvc;S11a,S11a;ZZZ;0;9999-99-99T99:99:99Z;FFFF;FFFF;999.9;999.9;999.9;999.9;9999.9;9999.99;9999.99;9999.99;9999.99;9999.99;9999.99
!row;varfmt;S11a,S11a;%s;%u;%04d-%02d-%02dT%02d:%02d:%02dZ;%04X;%04X;*@03.1f;*@03.1f;*@03.1f;*@03.1f;*@04.1f;*@04.2f;*@04.2f;*@04.2f;*@04.2f;*@04.2f;*@04.2f
!var;BbsB_S11;FieldDesc,Aerosol backwards-hemispheric light scattering coefficient (Mm-1) blue
!var;BbsB_S11;Wavelength;2010-06-17T00:10:00Z,450;TSI Neph
!var;BbsG_S11;FieldDesc,Aerosol backwards-hemispheric light scattering coefficient (Mm-1) green
!var;BbsG_S11;Wavelength;2010-06-17T00:10:00Z,550;TSI Neph
!var;BbsR_S11;FieldDesc,Aerosol backwards-hemispheric light scattering coefficient (Mm-1) red
!var;BbsR_S11;Wavelength;2010-06-17T00:10:00Z,700;TSI Neph
!var;BsB_S11;FieldDesc,Aerosol total light scattering coefficient (Mm-1) blue
!var;BsB_S11;Wavelength;2010-06-17T00:10:00Z,450;TSI Neph
!var;BsG_S11;FieldDesc,Aerosol total light scattering coefficient (Mm-1) green
!var;BsG_S11;Wavelength;2010-06-17T00:10:00Z,550;TSI Neph
!var;BsR_S11;FieldDesc,Aerosol total light scattering coefficient (Mm-1) red
!var;BsR_S11;Wavelength;2010-06-17T00:10:00Z,700;TSI Neph
!var;DateTime;FieldDesc,Date String (YYYY-MM-DDThh:mm:ssZ)
!var;EPOCH;FieldDesc,Epoch time: seconds from Jan 1 1970
!var;F1_S11;FieldDesc,System flags
!var;F2_S11;FieldDesc,Nephelometer flags
!var;P_S11;FieldDesc,Presure inside nephelometer (hPa)
!var;STN;FieldDesc,Station ID code
!var;T_S11;FieldDesc,Temperature inside nephelometer (C)
!var;Tu_S11;FieldDesc,Inlet temperature (C)
!var;U_S11;FieldDesc,Relative humidity (percent) inside nephelometer
!var;Uu_S11;FieldDesc,Relative humidity (percent) at nephelometer  inlet (calculated)
S11a,SFB,1276733400,2010-06-17T00:10:00Z,0000,0000,027.0,032.0,027.4,020.2,0823.7,-000.30,0000.03,0000.07,0000.10,0000.01,0000.24
S11a,SFB,1276733460,2010-06-17T00:11:00Z,0000,0000,027.0,032.0,027.5,020.3,0823.7,0000.16,0000.18,0000.02,0000.11,-000.01,0000.16
S11a,SFB,1276733520,2010-06-17T00:12:00Z,0000,0000,027.0,032.0,027.4,020.3,0823.6,0000.34,0000.20,-000.12,0000.22,0000.07,-000.04
S11a,SFB,1276733580,2010-06-17T00:13:00Z,0000,0000,027.0,032.0,027.6,020.4,0823.6,0000.06,0000.02,0000.32,0000.27,-000.09,-000.22
S11a,SFB,1276733640,2010-06-17T00:14:00Z,0000,0000,027.0,032.0,027.7,020.5,0823.6,-000.64,0000.40,-000.09,0000.11,0000.01,0000.05