DB System FAQ

See http://www.esrl.noaa.gov/gmd/aero/about/faq.html for general aerosol system questions.

In this document “$STATION” refers the the lower case station code (e.x. “mlo”) of the station in question.

What are the programs to perform some common tasks

General data access

data.get - To access data in native CPD2 format.
data.consolidate - To request specific parameters.
data.avg - To generate averages.
data.export - To reformat data into other styles (e.x. CSV).

Editing

data.pass - To pass data into the clean archive.
data.lost - To mark data as known to be missing.

Viewing data

cpx2 - General purpose data viewer.

How do I submit my data to the World Data Center for Aerosols?

The DB system provides utilities to automate the generation and submission of data. This system is broken up into two primary components: the generator that aggregates and packages the data into the collective units required by the WDCA and the uploader which runs the generator for all the types of the data being submitted and uploads the results to the destination FTP server. For more information about the data format used by the WDCA see this document.

This system requires fairly extensive configuration and metadata information to operate within the constraints defined by the WDCA however most of this is standard boilerplate and is defined in global defaults. The result is that in most cases you will only have to enter a small amount of very general metadata about the station.

Before you begin, please ensure you have the following information available:

A GAWSIS registered station code, this is a three letter code like “BND”.
A laboratory code as issued by NILU, this is code like “US06L”.
A station code as issued by NILU, this is a code like “US0035R”.
A platform code as issued by NILU, this is a code like “US0035S”.
A valid GAW WDCA ID, this is issued by GAWSIS and takes the form of GAWA<CC><ST><STN>, where “<CC>” is the country code (e.x. “US”), “<ST>” is the state of province code or “__” (e.x. “IL”) and “<STN>” is the GAW station ID (e.x. “BND”). For example for BND this is “GAWAUSILBND”.
Contact information for the data submitter, originator, and principal investigator (these are often the same person). This is usually also the information in GAWSIS.
The station site geographic information, including latitude, longitude, altitude, land use and setting, GAW type and WMO region. With the exception of the land use and setting, this information is the same as is in GAWSIS.
Information about the inlet system (e.x. the height and if it has a hood or similar).
If the station has a CPC, the nominal station pressure and temperature.
If available, the timeline of instruments at the station, including serial numbers.

Much of this information is required by GAWSIS and available through it. Additionally, first time submissions should see the NILU submission page for information on how to obtain the NILU required fields.

Once you have this information, the next step is to enter it into the station specific EBAS configuration, located in /aer/db/etc/$STATION/ebas/global.$STATION.conf. How this file is structured is discussed in detail in the EBAS configuration page. An example/template is below:

,,BLOCK
    OriginatorName,Ogren,John
    OriginatorEMail,John.A.Ogren@noaa.gov
    PrincipalInvestigatorName,Ogren,John
    PrincipalInvestigatorEMail,John.A.Ogren@noaa.gov

    DataSubmitterName,Ogren,John
    DataSubmitterEMail,John.A.Ogren@noaa.gov

    OrganizationName,"National Oceanic and Atmospheric Administration/Earth System Research Laboratory/Global Monitoring Division"
    OrganizationAcronym,"NOAA/ESRL/GMD"
    OrganizationUnit,
    OrganizationAddress1,"325 Broadway"
    OrganizationAddress2,
    OrganizationZIP,"80305"
    OrganizationTown,"Boulder, CO"
    OrganizationCountry,"USA"
    
    Laboratory,US06L
    Station,US0035R
    Platform,US0035S
    StationWDCAID,GAWAUSILBND
    StationWDCAName,"Bondville, Illinois"
    StationState,Illinois
    Latitude,40.050
    Longitude,-88.367
    Altitude,213m
    LandUse,Agricultural
    StationSetting,Rural
    GAWType,R
    WMORegion,4

    InletHeightAGL,10m
    InletType,Hat or hood
END

Note that some of these fields have restricted values they can take (e.x. LandUse) so please ensure they are in compliance with the EBAS configuration page.

If the station has a CPC, the nominal station pressure and laboratory temperature are needed for converting the number concentrations to standard temperature and pressure (1013.25 hPa, 273.15 K). These values need to be entered in the file /aer/db/etc/$STATION/ebas/cpc_L1.$STATION.conf with information similar to:

,,Comment,"Lab-Nominal-T=25C,Lab-Nominal-P=880hPa"
,,VARIABLE,conc
    ProcessingData,25,880
END

Then creating /aer/db/etc/$STATION/ebas/cpc_L2.$STATATION.conf like:

,,Comment,"Lab-Nominal-T=25C,Lab-Nominal-P=880hPa"
,,VARIABLE,conc
    ProcessingData,25,880
END
,,VARIABLE,16_perc
    ProcessingData,25,880
END
,,VARIABLE,84_perc
    ProcessingData,25,880
END

Once the above are complete, you can verify if the system already contains the instrument type and serial number timeline using data.instruments.get. This will generate a report of the instruments the DB system knows about, if this is inaccurate or inconsistent with the information you have the segmentation can be modified using CPX2 (in the menu CPX2→Segmentation), the segments themselves are named with the instrument base (e.x. “S11”) and contain the manufacturer code, the model, and the serial number separated by “;” in their data field.

The system should now be ready to generate valid WDCA EBAS 1.1 files. You can generate these manually using data.aggregate.ebas. This program will create files in the current directory that are in compliance with the WDCA format (note that this does not enforce submission timing requirements (full years)). To use it, you will need to specify a type, this is generally of the form “$INSTRUMENT_L$LEVEL[_cut]”, where $INSTRUMENT is one of “cpc”, “neph” or “psap”, $LEVEL is one of “0”, “1”, or “2” and the trailing “_cut” is optional to enable cut size splitting. For example “neph_L0_cut” is the level 0 (raw) neph data split by cut size. Normally all levels and all instruments are submitted once a complete year has been edited (passed). A final call might look like:

data.aggregate.ebas bnd neph_L2_cut 2010 2011

You can either run this directly for all the types you want to submit or you can create a configuration file data.aggregate.upload to run many of them in a batch (and automatically submit them, usually). This is done by creating /aer/db/etc/$STATION/upload.$TYPE.$STATION.conf where $TYPE is a name used to tell data.aggregate.upload what to generate. Note that if you create one with a type of “nilu_wdca” (e.x. /aer/db/etc/bnd/upload.nilu_wdca.bnd.conf) then it will automatically be run on completely edited (passed) years. If you do not want to enable automatic submission, use a different type, for example “wdca_manual” in /aer/db/etc/$STATION/upload.wdca_manual.$STATION.conf). In either case the file should have a structure like:

EBAS:neph_L0_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd
EBAS:neph_L1_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd
EBAS:neph_L2_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd
EBAS:psap_L0_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd
EBAS:psap_L1_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd
EBAS:psap_L2_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd
EBAS:cpc_L0,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd
EBAS:cpc_L1,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd
EBAS:cpc_L2,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd

Remove any lines related to any instruments that do not exist for the station. The “aerosol@gmd” is the password given to the NILU FTP server, and can be set to anything.

Once you have this file created you can see what data.aggregate.upload would submit by doing:

data.aggregate.upload --dryrun wdca_manual $STATION 2009

Assuming you created the manual file as above (if you did the automatic one replace “wdca_manual” with “nilu_wdca”). This will create a number of files in the current directory, these files are what would be uploaded (if you ran it without the –dryrun switch). Note that WDCA submissions must be for complete years, but this is not enforced anywhere, so ensure you always run it with a time range that species exactly one year.

As mentioned above, if you create the file as type “nilu_wdca” the system will automatically run it whenever it detect a year has been completely edited and passed.

It is also recommended that you manually send any changed you make to these files back to NOAA if you made them on an AER_VM system as they will not be transfered back automatically. If you are using an AER_VM system and sending edited data back to NOAA, it is also recommended that if you want automatic submissions you have them run at NOAA instead of from your AER_VM as the system supports better retry failsafes.

Old (deprecated) answer

First the exporting system must be configured. It is often sufficient to supply only the following the global.conf instance:

DataOriginator,"Ogren John"
Organization,"NOAA/ERSL/GMD 1 325 Broadway Boulder CO 80305, USA  phone:+1 303 497 6210, John.A.Ogren@noaa.gov"
DataSubmitter,"Ogren John"
Station,US0035R
Platform,US0035S
Laboratory,US06L

As well as the serial numbers (via instrument names) in the respective PSAP, Neph and CPC configuration files.

Once the exporting system is configured, the upload targets must be set in niluarchive.conf and/or nrt.conf.

At this point data uploading will happen automatically. The upload will occur once complete years have been passed for the archive data and as it is processed for NRT data. On AER_VM systems it may be necessary to run data.aggregate.station to execute the pending jobs. To explicitly upload data to the archive you can use data.aggregate.nilu.archive:

data.aggregate.nilu.archive bnd 2009 2010

How do I adjust Neph/PSAP data to different (non-standard) wavelengths?

Use data.edit.wl like:

data.get bnd A11a 2009:1 2009:2 clean | data.edit.wl --target=300,500,900

The adjustment is done like:

output = exp(log(inputLow) + (log(inputHigh) - log(inputLow)) * (log(targetWL/sourceWLLow)/log(sourceWLHigh/sourceWLLow)))

In normal processing raw data is at the instrument native wavelengths. In clean data for most PSAPs there will exist a BaO channel that has the data adjusted to 550 nm: for 1-wavelength PSAPS that originate at 574 nm this will be the only output field, for ones with only a 530 nm channel, it will not exist, and for 3-wavelength PSAPS it will exist in addition to the channels at the native wavelengths. The default configuration for intensive (XI) records all data is adjusted to 450, 550, and 700 nm.

How do I get a specific parameter from multiple stations ?

Use data.consolidate.station like:

data.consolidate.station bnd,brw,mlo 2009:10 2009:11 BsG_S11

See also the examples in /aer/prg/r/examples/multi-station-box.r for working with this data in R.

How do I get rid of the data gaps reports in the processing emails?

Clean data gaps are only created when there has been no data passed. To resolve that, simply pass the data with data.pass.

The raw gaps are generated based on what data the system has processed. You can use data.lost to inform it that a gap is unrecoverable and it should not be reported anymore. For example, if you know that data from 2010-05-11 to 2010-05-14 was lost due to no station power, you could do:

data.lost sfb 2010-05-11 2010-05-14

If you have information available about all the gaps in the report, you can use the “–interactive” switch to data.lost to have it prompt about each one:

data.lost --interactive sfb

Use the “–no-comment” switch to disable the comment prompt (not recommended).

How do I convert data into CPD2 format?

Generally this involves writing a conversion program using the DB system libraries. See “perldoc DATA_Interface” for the main library used to do this.

There is a simple converter for a_[HDM] files that can be used as is or as a starting point for a specialized converter. It is located in /aer/db/bin/templates/read_a_X.example

How do I tell what data has been passed?

'latest' will till you the latest data passed and data.status can generate a report. For more detail you can use:

data.comments.get bnd none none clean | grep '^=== clean add'

How do I view Box-Whisker plots for a station?

To make them out of daily averages (the default):

cpx2 --config=/aer/db/etc/plots/yearly.box.xml --source=avgH --dbavg=86400 app 2009:152 2010:60

For plots with the latest year difference:

CURRENT_START=1293840000 cpx2 --config=/aer/db/etc/plots/dyearly.box.xml --source=avgH --dbavg=86400 app 2009:152 2010:60

Where 1293840000 is the Epoch time of the start of the current year (from “doy.pl”, for example).

For hourly averages:

cpx2 --config=/aer/db/etc/plots/yearly.box.xml --source=avgH app 2009:152 2010:60

To generate the default plot sets in the default locations (/aer/www/net/$STATION/images/stats), use data.plots.update:

data.plots.update --reprocess app avgH

How do I view correct intensive hourly averages while editing data?

Normally the right click menu in CPX2 has an option to smooth to hourly averages. This only smooths the final value of the variable, so for intensive parameters it is the average of the ratios, which is incorrect. To get the correct ratio of averages for edited data you can use:

cpx2 --mode=avgH --source=avgHe app 2009w50 1w

Note that switching display modes within CPX2 will cause the source to reset and no longer load edited data.

How do I create a new station

To create a new station in the DB system, use data.newstation. Note that this will use all the system defaults for configuration. Override any needed configuration in /aer/db/etc/$STATION. The CPX2 configuration for the station will also be blank, the simplest way to set that up is to copy an existing configuration file to $CPX2/etc/cpx2.$STATION.xml

To set up a station for legacy processing use make_new_stn.

How do I transfer a snapshot of CPD2 data to a VM

To create a snapshot of some data:

data.get bnd S11a,A11a,N61a 2009 2010 avgH | bzip2 -c -9 - > data.bz2

To ingest that snapshot on a VM:

data.newstation -f bnd
bzcat data.bz2 | data.archive.put bnd avgH -

How do I get a summary of the neph status over a long period of time ?

cpx2 --config=/aer/prg/cpx2/etc/cpx2.neph_status.xml sgp 2010:1 2010:160

This contains plots of the spancheck percent error and sensitivity factors as well as the zero background reference. The initial load may take some time for CPD1 data as it requires parsing the t* zip files.

How do I get CCN data when the supersaturation is within a bound and stable?

data.get sgp N11a,S11a,N61a 2010:1 1d clean | data.consolidate --source=- N2_N11 Uc_N11 N_N61 BbsB_S11 F2_N11 | data.edit.corr.field --variables='*' --test='$variables->{Uc_N11} <= 0.15 || $variables->{Uc_N11} >= 0.25 || $variables->{F2_N11} & 0x3' --mod='map { undef $variables->{$_} if $_ ne "time" } keys(%{$variables})'

How do I determine what instruments are available and when they where operational?

To determine what instruments where at a station, use data.instruments.get. This reads the instrument segmentation to display a general summary of known “major” instruments. For example:

data.instruments.get mlo

Will display the summary for MLO. With the instrument names (e.x. “A11”) from this, you can get more detailed information using data.coverage. For example, to see how much PSAP data was available in 2010, you can do:

data.coverage --records=A11a --interval=quarter mlo 2010 2011

Where the “A11a” record name is defined by the instrument name as above and the “quarter” specifies the resolution (see the data.coverage documentation for available resolutions).

How do I apply a flow or spot calibration correction to PSAP or CLAP data?

Add a line to corr.conf using data.edit.corr.psap_cal. In general the line will look something like:

A12a,2010-12-01,2010-12-13,A12a;A12m,data.edit.corr.psap_cal --input=A12 --old=0,0.909 --new=0.0;0.7486

How do I get hourly averaged data for the neph and psap in csv format?

data.get thd A11a,S11a 2010:001 2010:032 avgH | data.export --mode=csv --mvc-type=mvc --station=on

How to get 1 minute clean sub-um only data for the neph and psap in csv format?

data.avg --interval=1m --stddev=off --count=off --contam thd S11a,A11a 2010:1 2010:20 clean | data.consolidate --source=- --regex --noautoavg '.+[^0]_.+' | data.export --mode=csv --mvc-type=mvc --station=on

How do I get MET data?

Use getmet on vortex2.cmdl.noaa.gov. This is a wrapper for several DB system programs that can set up the necessary environment to call them. For example, to get one day of BRW MET data:

getmet brw 2011-01-01 2011-01-02

Or the same, but averaged to one-hour:

getmet --interval=1h brw 2011-01-01 2011-01-02

For more help on the getmet command type “getmet --help” or “perldoc getmet”.

See the wx record definition for the definition of the variables, data.export for more information about controlling the export format, data.avg for information about the controlling the averaging, and timeformat for information about the various time formats accepted by getmet.

What are the commands used to view/edit/pass MET data?

To view and edit MET data, use cpx2.wx.

cpx2.wx mlo 2011-06-01 10d

To pass MET data (i.e., inform the system that a time period has passed QC) use data.wx.pass:

data.wx.pass mlo 2011-06-01 10d

Both are only available on systems that can run the DB system (e.g., aero.cmdl.noaa.gov).

How do I update instrument metadata? (manufacturer, model, or serial number)

The simplest way to update instrument metadata (manufacturer, model, and serial number) is to use CPX2's segmentation editing interface. To do this open CPX2 for a time range that intersects the times that need to be changed. For example if the station only has a single instrument segment or you only need to update the latest one, you can just run “cpx2 $STATION”. If you needed to edit a segment that ended on 2009-12-12 you could do “cpx2 $STATION 2009-12-11 2d”.

Once in CPX2 go to the menu “CPX2→Segmentation” or press CTRL-S to bring up the segmentation window. The instrument segments are those with the name of the instrument code (e.x. “A11”). To filter the list to only the instrument you need to change, use the drop down box in the upper right. You can then modify the segment to change the metadata.

The format of the segment data is “$MANUFACTURER;$MODEL;$SERIALNUMBER” for example “RR;PSAP-3W;107” for a 3-W PSAP or “TSI;3563;1001” for a 3-W Neph.

To update an existing segment for a change in the instrument:

Open CPX2's segmentation at the time of the change as described above.
Modify the existing segment by selecting it, then pressing the “Edit” button.
In the modification dialog add or update the end time (if it doesn't have an end time defined, click the checkbox to set one).
Press “Ok” to accept the modification.
Add a new segment by pressing the “Add” button.
Within the add dialog, set the start time to the time of the change, and set the end time (if the instrument is ongoing, uncheck the box).
Set the segment type to the instrument code (e.x. “A11”).
Set the segment data to the format as described above.
Press “Ok” to accept the addition.
Press “Save” on the segment list table. CPX2 may ask for confirmation to modify the segments outside the visible time range. Tell it to continue anyway.

How do I get a fixed format from data.export?

To get a fixed output order from data.export or to reorder any simple CSV data you can use the alias “colreorder”. It takes as a single argument the output column header:

data.get bnd S11a 2010:1 1d | data.export | colreorder Year,DOY,F_aer,BsB_S11,BsG_S11,BsR_S11

It assumes that the first line of the input the column names and it will fill any columns specified in the output order that are not in the input with empty fields.

How do I generate summary statistics and plots for variables?

Use quantile_stats. This can generate summary statistics and plots for a combination of station and variables.

For example:

quantile_stats alt cleanstart cleanend U_S11

How do I get the message log from the command line?

Use data.get and the “X” record:

data.get bnd X 2011:20 2011:25

How do I generate a report of exceptions in data?

Use data.faultreport to generate a report of the times when data exceeded predefined exceptional limits. The configuration is defined in faultreport.conf. When invoked with no explicit configuration it will use the (possibly station specific) “editquality” configuration that has some general predefined limits for extensive and intensive parameters. For example:

data.faultreport --source=avgH bnd 2010:1 1d

How do I correct the CLAP data when a spot is re-sampled?

Power interruptions or uncontrolled shutdowns occasionally cause the CLAP to restart sampling on spot #1, even though the filter was not changed. If this happens, the transmittance values after the restart will be incorrect, causing the Bond or CTS corrections to return invalid values. The easiest way to deal with this problem is to invalidate the CLAP data for the time period after the restart until the filter was changed.

It is possible to correct the filter transmittance data after the restart by multiplying them by the transmittance when sampling on the affected spot originally ended. Note, however, that the original ending transmittance is generally close to 0.7, which means that the transmittances after the restart will be below 0.7 and likely will yield invalid data. Note furthermore that the CTS correction cannot be applied after the restart, because the system will lose track of the particle scattering and absorption optical depths prior to the restart. However, the Bond correction can still be applied.

To determine the transmittances at the end of the original sampling on the spot, use data.get to determine the approximate end time of the spot, and then use data.consolidate to display the transmittances around the ending time. A multiplicative mentor edit can be used to multiple the transmittances after the restart by the ending transmittances when the spot was originally sampled.

For example, the site operator at BRW made a log entry on January 6, 2014 reporting that the filter was restarted:

data.get brw X 6 7 |grep USER
==>
X,BRW,1389036566,2014-01-06T19:29:26Z,"USER: Changed CLAP filter...Monday JD006 found on spot #1 when on Friday JD003 was on spot #5.  "

Use data.get to retrieve spot changes from the previous three weeks:

data.get brw A12n 3w 2014:7 |data.consolidate --source=- --regex F[fn]_A12 |data.export
==>
Year,DOY,Ff_A12,Fn_A12
2013,351.36329,0057,06
2013,352.22159,0057,07
2013,355.31837,0057,08
2013,357.03351,0058,01
2013,358.38059,0058,02  <== approximate ending time of spot #1
2013,361.19267,0058,03
2013,363.89606,0058,04
2014,002.17119,0058,05
2014,003.92414,0058,01  <== approximate starting time of spot #1 after restart
2014,006.81258,0059,01  <== start time of spot #1 after filter change

This report shows that spot #1 originally ended around DOY 2013,358.38059 (i.e., the starting time of spot #2). Use data.consolidate to see the transmittance data around that time:

data.consolidate --regex brw 2013,358.38059-5m 10m F[fn]_A12 Ir[BGR]_A12 |data.export
==>
Year,DOY,Ff_A12,Fn_A12,IrB_A12,IrG_A12,IrR_A12
2013,358.37778,0058,01,0.7006955,0.7303835,0.7794262
2013,358.37847,0058,01,0.7004939,0.7302014,0.7792826
2013,358.37917,0058,01,0.7002968,0.7300103,0.7791096
2013,358.37986,0058,02,0.7000950,0.7298169,0.7789416
2013,358.38056,0058,02,0.9999361,0.9999445,0.9999444
2013,358.38125,0058,02,0.9996371,0.9996883,0.9997713
2013,358.38194,0058,02,0.9993476,0.9994285,0.9995861
2013,358.38264,0058,02,0.9990442,0.9991682,0.9993553
2013,358.38333,0058,02,0.9987232,0.9989042,0.9991759
2013,358.38403,0058,02,0.9984139,0.9986447,0.9989556

This report shows that the ending transmittances for spot #1 were 0.7000950,0.7298169,0.7789416 for the blue, green, and red channels, respectively. Multiplicative mentor edits using these factors should be applied for the time period when spot #1 was resampled, which the first report shows was from about 2014,003.924 until 2014,006.812.

The edit corrections can be entered from the mentor edit dialog in cpx2, or they can be entered from the command line using data.edit.mentor.modify.

echo 'add,"2014,3.924","2014,6.812",A12a,poly,"variables=IrB_A12 0.0 0.7000950",JAO,"correct for re-started spot 1"' |data.edit.mentor.modify brw
echo 'add,"2014,3.924","2014,6.812",A12a,poly,"variables=IrG_A12 0.0 0.7298169",JAO,"correct for re-started spot 1"' |data.edit.mentor.modify brw
echo 'add,"2014,3.924","2014,6.812",A12a,poly,"variables=IrR_A12 0.0 0.7789416",JAO,"correct for re-started spot 1"' |data.edit.mentor.modify brw

Information

Data