See http://www.esrl.noaa.gov/gmd/aero/about/faq.html for general aerosol system questions.
In this document “$STATION” refers the the lower case station code (e.x. “mlo”) of the station in question.
The DB system provides utilities to automate the generation and submission of data. This system is broken up into two primary components: the generator that aggregates and packages the data into the collective units required by the WDCA and the uploader which runs the generator for all the types of the data being submitted and uploads the results to the destination FTP server. For more information about the data format used by the WDCA see this document.
This system requires fairly extensive configuration and metadata information to operate within the constraints defined by the WDCA however most of this is standard boilerplate and is defined in global defaults. The result is that in most cases you will only have to enter a small amount of very general metadata about the station.
Before you begin, please ensure you have the following information available:
Much of this information is required by GAWSIS and available through it. Additionally, first time submissions should see the NILU submission page for information on how to obtain the NILU required fields.
Once you have this information, the next step is to enter it into the station specific EBAS configuration, located in /aer/db/etc/$STATION/ebas/global.$STATION.conf. How this file is structured is discussed in detail in the EBAS configuration page. An example/template is below:
,,BLOCK OriginatorName,Ogren,John OriginatorEMail,John.A.Ogren@noaa.gov PrincipalInvestigatorName,Ogren,John PrincipalInvestigatorEMail,John.A.Ogren@noaa.gov DataSubmitterName,Ogren,John DataSubmitterEMail,John.A.Ogren@noaa.gov OrganizationName,"National Oceanic and Atmospheric Administration/Earth System Research Laboratory/Global Monitoring Division" OrganizationAcronym,"NOAA/ESRL/GMD" OrganizationUnit, OrganizationAddress1,"325 Broadway" OrganizationAddress2, OrganizationZIP,"80305" OrganizationTown,"Boulder, CO" OrganizationCountry,"USA" Laboratory,US06L Station,US0035R Platform,US0035S StationWDCAID,GAWAUSILBND StationWDCAName,"Bondville, Illinois" StationState,Illinois Latitude,40.050 Longitude,-88.367 Altitude,213m LandUse,Agricultural StationSetting,Rural GAWType,R WMORegion,4 InletHeightAGL,10m InletType,Hat or hood END
Note that some of these fields have restricted values they can take (e.x. LandUse) so please ensure they are in compliance with the EBAS configuration page.
If the station has a CPC, the nominal station pressure and laboratory temperature are needed for converting the number concentrations to standard temperature and pressure (1013.25 hPa, 273.15 K). These values need to be entered in the file /aer/db/etc/$STATION/ebas/cpc_L1.$STATION.conf with information similar to:
,,Comment,"Lab-Nominal-T=25C,Lab-Nominal-P=880hPa" ,,VARIABLE,conc ProcessingData,25,880 END
Then creating /aer/db/etc/$STATION/ebas/cpc_L2.$STATATION.conf like:
,,Comment,"Lab-Nominal-T=25C,Lab-Nominal-P=880hPa" ,,VARIABLE,conc ProcessingData,25,880 END ,,VARIABLE,16_perc ProcessingData,25,880 END ,,VARIABLE,84_perc ProcessingData,25,880 END
Once the above are complete, you can verify if the system already contains the instrument type and serial number timeline using data.instruments.get. This will generate a report of the instruments the DB system knows about, if this is inaccurate or inconsistent with the information you have the segmentation can be modified using CPX2 (in the menu CPX2→Segmentation), the segments themselves are named with the instrument base (e.x. “S11”) and contain the manufacturer code, the model, and the serial number separated by “;” in their data field.
The system should now be ready to generate valid WDCA EBAS 1.1 files. You can generate these manually using data.aggregate.ebas. This program will create files in the current directory that are in compliance with the WDCA format (note that this does not enforce submission timing requirements (full years)). To use it, you will need to specify a type, this is generally of the form “$INSTRUMENT_L$LEVEL[_cut]”, where $INSTRUMENT is one of “cpc”, “neph” or “psap”, $LEVEL is one of “0”, “1”, or “2” and the trailing “_cut” is optional to enable cut size splitting. For example “neph_L0_cut” is the level 0 (raw) neph data split by cut size. Normally all levels and all instruments are submitted once a complete year has been edited (passed). A final call might look like:
data.aggregate.ebas bnd neph_L2_cut 2010 2011
You can either run this directly for all the types you want to submit or you can create a configuration file data.aggregate.upload to run many of them in a batch (and automatically submit them, usually). This is done by creating /aer/db/etc/$STATION/upload.$TYPE.$STATION.conf where $TYPE is a name used to tell data.aggregate.upload what to generate. Note that if you create one with a type of “nilu_wdca” (e.x. /aer/db/etc/bnd/upload.nilu_wdca.bnd.conf) then it will automatically be run on completely edited (passed) years. If you do not want to enable automatic submission, use a different type, for example “wdca_manual” in /aer/db/etc/$STATION/upload.wdca_manual.$STATION.conf). In either case the file should have a structure like:
EBAS:neph_L0_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd EBAS:neph_L1_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd EBAS:neph_L2_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd EBAS:psap_L0_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd EBAS:psap_L1_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd EBAS:psap_L2_cut,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd EBAS:cpc_L0,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd EBAS:cpc_L1,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd EBAS:cpc_L2,gzftp://gaw-wdca.nilu.no/incoming,anonymous,aerosol@gmd
Remove any lines related to any instruments that do not exist for the station. The “aerosol@gmd” is the password given to the NILU FTP server, and can be set to anything.
Once you have this file created you can see what data.aggregate.upload would submit by doing:
data.aggregate.upload --dryrun wdca_manual $STATION 2009
Assuming you created the manual file as above (if you did the automatic one replace “wdca_manual” with “nilu_wdca”). This will create a number of files in the current directory, these files are what would be uploaded (if you ran it without the –dryrun switch). Note that WDCA submissions must be for complete years, but this is not enforced anywhere, so ensure you always run it with a time range that species exactly one year.
As mentioned above, if you create the file as type “nilu_wdca” the system will automatically run it whenever it detect a year has been completely edited and passed.
It is also recommended that you manually send any changed you make to these files back to NOAA if you made them on an AER_VM system as they will not be transfered back automatically. If you are using an AER_VM system and sending edited data back to NOAA, it is also recommended that if you want automatic submissions you have them run at NOAA instead of from your AER_VM as the system supports better retry failsafes.
First the exporting system must be configured. It is often sufficient to supply only the following the global.conf instance:
DataOriginator,"Ogren John" Organization,"NOAA/ERSL/GMD 1 325 Broadway Boulder CO 80305, USA phone:+1 303 497 6210, John.A.Ogren@noaa.gov" DataSubmitter,"Ogren John" Station,US0035R Platform,US0035S Laboratory,US06L
As well as the serial numbers (via instrument names) in the respective PSAP, Neph and CPC configuration files.
Once the exporting system is configured, the upload targets must be set in niluarchive.conf and/or nrt.conf.
At this point data uploading will happen automatically. The upload will occur once complete years have been passed for the archive data and as it is processed for NRT data. On AER_VM systems it may be necessary to run data.aggregate.station to execute the pending jobs. To explicitly upload data to the archive you can use data.aggregate.nilu.archive:
data.aggregate.nilu.archive bnd 2009 2010
Use data.edit.wl like:
data.get bnd A11a 2009:1 2009:2 clean | data.edit.wl --target=300,500,900
The adjustment is done like:
output = exp(log(inputLow) + (log(inputHigh) - log(inputLow)) * (log(targetWL/sourceWLLow)/log(sourceWLHigh/sourceWLLow)))
In normal processing raw data is at the instrument native wavelengths. In clean data for most PSAPs there will exist a BaO channel that has the data adjusted to 550 nm: for 1-wavelength PSAPS that originate at 574 nm this will be the only output field, for ones with only a 530 nm channel, it will not exist, and for 3-wavelength PSAPS it will exist in addition to the channels at the native wavelengths. The default configuration for intensive (XI) records all data is adjusted to 450, 550, and 700 nm.
Use data.consolidate.station like:
data.consolidate.station bnd,brw,mlo 2009:10 2009:11 BsG_S11
See also the examples in /aer/prg/r/examples/multi-station-box.r for working with this data in R.
Clean data gaps are only created when there has been no data passed. To resolve that, simply pass the data with data.pass.
The raw gaps are generated based on what data the system has processed. You can use data.lost to inform it that a gap is unrecoverable and it should not be reported anymore. For example, if you know that data from 2010-05-11 to 2010-05-14 was lost due to no station power, you could do:
data.lost sfb 2010-05-11 2010-05-14
If you have information available about all the gaps in the report, you can use the “–interactive” switch to data.lost to have it prompt about each one:
data.lost --interactive sfb
Use the “–no-comment” switch to disable the comment prompt (not recommended).
Generally this involves writing a conversion program using the DB system libraries. See “perldoc DATA_Interface” for the main library used to do this.
There is a simple converter for a_[HDM] files that can be used as is or as a starting point for a specialized converter. It is located in /aer/db/bin/templates/read_a_X.example
'latest' will till you the latest data passed and data.status can generate a report. For more detail you can use:
data.comments.get bnd none none clean | grep '^=== clean add'
To make them out of daily averages (the default):
cpx2 --config=/aer/db/etc/plots/yearly.box.xml --source=avgH --dbavg=86400 app 2009:152 2010:60
For plots with the latest year difference:
CURRENT_START=1293840000 cpx2 --config=/aer/db/etc/plots/dyearly.box.xml --source=avgH --dbavg=86400 app 2009:152 2010:60
Where 1293840000 is the Epoch time of the start of the current year (from “doy.pl”, for example).
For hourly averages:
cpx2 --config=/aer/db/etc/plots/yearly.box.xml --source=avgH app 2009:152 2010:60
To generate the default plot sets in the default locations (/aer/www/net/$STATION/images/stats), use data.plots.update:
data.plots.update --reprocess app avgH
Normally the right click menu in CPX2 has an option to smooth to hourly averages. This only smooths the final value of the variable, so for intensive parameters it is the average of the ratios, which is incorrect. To get the correct ratio of averages for edited data you can use:
cpx2 --mode=avgH --source=avgHe app 2009w50 1w
Note that switching display modes within CPX2 will cause the source to reset and no longer load edited data.
To create a new station in the DB system, use data.newstation. Note that this will use all the system defaults for configuration. Override any needed configuration in /aer/db/etc/$STATION. The CPX2 configuration for the station will also be blank, the simplest way to set that up is to copy an existing configuration file to $CPX2/etc/cpx2.$STATION.xml
To set up a station for legacy processing use make_new_stn.
To create a snapshot of some data:
data.get bnd S11a,A11a,N61a 2009 2010 avgH | bzip2 -c -9 - > data.bz2
To ingest that snapshot on a VM:
data.newstation -f bnd bzcat data.bz2 | data.archive.put bnd avgH -
cpx2 --config=/aer/prg/cpx2/etc/cpx2.neph_status.xml sgp 2010:1 2010:160
This contains plots of the spancheck percent error and sensitivity factors as well as the zero background reference. The initial load may take some time for CPD1 data as it requires parsing the t* zip files.
data.get sgp N11a,S11a,N61a 2010:1 1d clean | data.consolidate --source=- N2_N11 Uc_N11 N_N61 BbsB_S11 F2_N11 | data.edit.corr.field --variables='*' --test='$variables->{Uc_N11} <= 0.15 || $variables->{Uc_N11} >= 0.25 || $variables->{F2_N11} & 0x3' --mod='map { undef $variables->{$_} if $_ ne "time" } keys(%{$variables})'
To determine what instruments where at a station, use data.instruments.get. This reads the instrument segmentation to display a general summary of known “major” instruments. For example:
data.instruments.get mlo
Will display the summary for MLO. With the instrument names (e.x. “A11”) from this, you can get more detailed information using data.coverage. For example, to see how much PSAP data was available in 2010, you can do:
data.coverage --records=A11a --interval=quarter mlo 2010 2011
Where the “A11a” record name is defined by the instrument name as above and the “quarter” specifies the resolution (see the data.coverage documentation for available resolutions).
Add a line to corr.conf using data.edit.corr.psap_cal. In general the line will look something like:
A12a,2010-12-01,2010-12-13,A12a;A12m,data.edit.corr.psap_cal --input=A12 --old=0,0.909 --new=0.0;0.7486
data.get thd A11a,S11a 2010:001 2010:032 avgH | data.export --mode=csv --mvc-type=mvc --station=on
data.avg --interval=1m --stddev=off --count=off --contam thd S11a,A11a 2010:1 2010:20 clean | data.consolidate --source=- --regex --noautoavg '.+[^0]_.+' | data.export --mode=csv --mvc-type=mvc --station=on
Use getmet on vortex2.cmdl.noaa.gov. This is a wrapper for several DB system programs that can set up the necessary environment to call them. For example, to get one day of BRW MET data:
getmet brw 2011-01-01 2011-01-02
Or the same, but averaged to one-hour:
getmet --interval=1h brw 2011-01-01 2011-01-02
For more help on the getmet command type “getmet --help” or “perldoc getmet”.
See the wx record definition for the definition of the variables, data.export for more information about controlling the export format, data.avg for information about the controlling the averaging, and timeformat for information about the various time formats accepted by getmet.
To view and edit MET data, use cpx2.wx.
cpx2.wx mlo 2011-06-01 10d
To pass MET data (i.e., inform the system that a time period has passed QC) use data.wx.pass:
data.wx.pass mlo 2011-06-01 10d
Both are only available on systems that can run the DB system (e.g., aero.cmdl.noaa.gov).
The simplest way to update instrument metadata (manufacturer, model, and serial number) is to use CPX2's segmentation editing interface. To do this open CPX2 for a time range that intersects the times that need to be changed. For example if the station only has a single instrument segment or you only need to update the latest one, you can just run “cpx2 $STATION”. If you needed to edit a segment that ended on 2009-12-12 you could do “cpx2 $STATION 2009-12-11 2d”.
Once in CPX2 go to the menu “CPX2→Segmentation” or press CTRL-S to bring up the segmentation window. The instrument segments are those with the name of the instrument code (e.x. “A11”). To filter the list to only the instrument you need to change, use the drop down box in the upper right. You can then modify the segment to change the metadata.
The format of the segment data is “$MANUFACTURER;$MODEL;$SERIALNUMBER” for example “RR;PSAP-3W;107” for a 3-W PSAP or “TSI;3563;1001” for a 3-W Neph.
To update an existing segment for a change in the instrument:
To get a fixed output order from data.export or to reorder any simple CSV data you can use the alias “colreorder”. It takes as a single argument the output column header:
data.get bnd S11a 2010:1 1d | data.export | colreorder Year,DOY,F_aer,BsB_S11,BsG_S11,BsR_S11
It assumes that the first line of the input the column names and it will fill any columns specified in the output order that are not in the input with empty fields.
Use quantile_stats. This can generate summary statistics and plots for a combination of station and variables.
For example:
quantile_stats alt cleanstart cleanend U_S11
Use data.get and the “X” record:
data.get bnd X 2011:20 2011:25
Use data.faultreport to generate a report of the times when data exceeded predefined exceptional limits. The configuration is defined in faultreport.conf. When invoked with no explicit configuration it will use the (possibly station specific) “editquality” configuration that has some general predefined limits for extensive and intensive parameters. For example:
data.faultreport --source=avgH bnd 2010:1 1d
Power interruptions or uncontrolled shutdowns occasionally cause the CLAP to restart sampling on spot #1, even though the filter was not changed. If this happens, the transmittance values after the restart will be incorrect, causing the Bond or CTS corrections to return invalid values. The easiest way to deal with this problem is to invalidate the CLAP data for the time period after the restart until the filter was changed.
It is possible to correct the filter transmittance data after the restart by multiplying them by the transmittance when sampling on the affected spot originally ended. Note, however, that the original ending transmittance is generally close to 0.7, which means that the transmittances after the restart will be below 0.7 and likely will yield invalid data. Note furthermore that the CTS correction cannot be applied after the restart, because the system will lose track of the particle scattering and absorption optical depths prior to the restart. However, the Bond correction can still be applied.
To determine the transmittances at the end of the original sampling on the spot, use data.get to determine the approximate end time of the spot, and then use data.consolidate to display the transmittances around the ending time. A multiplicative mentor edit can be used to multiple the transmittances after the restart by the ending transmittances when the spot was originally sampled.
For example, the site operator at BRW made a log entry on January 6, 2014 reporting that the filter was restarted:
data.get brw X 6 7 |grep USER ==> X,BRW,1389036566,2014-01-06T19:29:26Z,"USER: Changed CLAP filter...Monday JD006 found on spot #1 when on Friday JD003 was on spot #5. "
Use data.get to retrieve spot changes from the previous three weeks:
data.get brw A12n 3w 2014:7 |data.consolidate --source=- --regex F[fn]_A12 |data.export ==> Year,DOY,Ff_A12,Fn_A12 2013,351.36329,0057,06 2013,352.22159,0057,07 2013,355.31837,0057,08 2013,357.03351,0058,01 2013,358.38059,0058,02 <== approximate ending time of spot #1 2013,361.19267,0058,03 2013,363.89606,0058,04 2014,002.17119,0058,05 2014,003.92414,0058,01 <== approximate starting time of spot #1 after restart 2014,006.81258,0059,01 <== start time of spot #1 after filter change
This report shows that spot #1 originally ended around DOY 2013,358.38059 (i.e., the starting time of spot #2). Use data.consolidate to see the transmittance data around that time:
data.consolidate --regex brw 2013,358.38059-5m 10m F[fn]_A12 Ir[BGR]_A12 |data.export ==> Year,DOY,Ff_A12,Fn_A12,IrB_A12,IrG_A12,IrR_A12 2013,358.37778,0058,01,0.7006955,0.7303835,0.7794262 2013,358.37847,0058,01,0.7004939,0.7302014,0.7792826 2013,358.37917,0058,01,0.7002968,0.7300103,0.7791096 2013,358.37986,0058,02,0.7000950,0.7298169,0.7789416 2013,358.38056,0058,02,0.9999361,0.9999445,0.9999444 2013,358.38125,0058,02,0.9996371,0.9996883,0.9997713 2013,358.38194,0058,02,0.9993476,0.9994285,0.9995861 2013,358.38264,0058,02,0.9990442,0.9991682,0.9993553 2013,358.38333,0058,02,0.9987232,0.9989042,0.9991759 2013,358.38403,0058,02,0.9984139,0.9986447,0.9989556
This report shows that the ending transmittances for spot #1 were 0.7000950,0.7298169,0.7789416 for the blue, green, and red channels, respectively. Multiplicative mentor edits using these factors should be applied for the time period when spot #1 was resampled, which the first report shows was from about 2014,003.924 until 2014,006.812.
The edit corrections can be entered from the mentor edit dialog in cpx2, or they can be entered from the command line using data.edit.mentor.modify.
echo 'add,"2014,3.924","2014,6.812",A12a,poly,"variables=IrB_A12 0.0 0.7000950",JAO,"correct for re-started spot 1"' |data.edit.mentor.modify brw echo 'add,"2014,3.924","2014,6.812",A12a,poly,"variables=IrG_A12 0.0 0.7298169",JAO,"correct for re-started spot 1"' |data.edit.mentor.modify brw echo 'add,"2014,3.924","2014,6.812",A12a,poly,"variables=IrR_A12 0.0 0.7789416",JAO,"correct for re-started spot 1"' |data.edit.mentor.modify brw