data.avg

data.avg can average, fill and covert data to be rectangular (single record type). It always coverts data to be rectangular, but if the averaging interval is less than or equal to the data frequency, the output is just the input (possibly filled). The result is output to standard output.

The normal mode of operation is to average on a one hour interval with no filling, but with variables split on cut size and standard deviations for them generated. Average bins are attempted to be aligned on intelligent time boundaries (e.g. one hour average bins start on the hour).

It can operate either by requesting data from data.get itself or by operating on its input (as part of a pipe).

Command Line Usage

data.avg [--interval=seconds|month] [--stddev=on/off | --nostddev] 
         [--cut=on/off | --nocut] [--cutflags] 
         [--ignorecut=pattern1,pattern2,...]
         [--count=on/off] [--source=archive] [--fill]
         [--noalign] [--contam] [--cpx]
         [station rec start end]

Arguments

If the station, record, start and end are omitted data.avg works on standard input instead of requesting data. Specifying a toggle argument without an “on/off” code enables that toggle. That is ”–count” turns on count generation.

start and end

The time specifiers for the data to be retrieved. Start is inclusive while end is exclusive, so all data contained within the half open interval [start,end) will be returned. Any convertible time format is accepted.

station

The station identifier code. For example 'brw'. Case insensitive.

records

The cpd2 record type to be retrieved. For example: 'S11a'. Case sensitive. Multiple record types may be separated by ”,”, ”;” or ”:”. Note that this is a single argument and that spaces are not allowed.

--interval=seconds|month

Averaging interval in seconds, defaulting to 3600. If the result is divisible by 60 seconds, the first bin has the minute field rounded down. If it is divisible by 3600 the hour field is also rounded, and the day field if it is day divisible. May also be “month” to average on month intervals.

--stddev=on/off --nostddev

Enable or disable (default enabled) standard deviation calculation for each field. A separate field for each field is generated that contains the standard deviation for that record.

--cut=on/off --nocut

Enable or disable (default enabled) cut size splitting. If cut size splitting is enabled, all fields will be separated into 0 (default/coarse/10um) and 1 (flagged/fine/1um) variants based on the cut size flag for that record.

--cutflags

Split flags based on the cut size (default off).

--ignorecut=pattern1,pattern2,...

Set patterns to ignore the cut split on. If a variable matched the regular expression defined by the pattern encased as “^pattern$” then cut size splitting on that variable is not done.

--count=on/off

Enable or disable (default disable) generating a count field for each variable (after cut splitting). The count field is the variable name with an “N” added to the end.

--source=archive

Selected the source archive to request data from, defaulting to clean. Has no effect when working on standard input.

--fill

Enable filling of missing time ranges. This enables filling with MVCs for time ranges that are missing entirely from the source archive. Alignment and normal binning are preserved. That is a gap of two hours in length when generating one hour averages would result in two records of all MVCs.

--noalign

Disable bin alignment as described above.

--contam

Enable averaging of contaminated data.

--cpx

Pass –cpx to data.get.

Example Usage

Single record one hour average

data.avg sgp S11a 2008:10 2008:11

Multiple records of raw data in one day average

data.avg --interval=86400 --source=raw bnd S11a,A11a 2003W02 2003W03

Filling data from a pipe to a file

data.get sgp A11a 2008 2009 raw | data.avg --fill > 2008_hourly