normal_stats

normal_stats generates normal distribution related statistics about variables from a station. For each variable this includes the mean, standard deviation and the A value from the Anderson-Darling test as well as the number of points used. This is done for both the data and the base 10 log of the data (so both a linear and log normal distribution).

Additionally it can both break down the data into a series of intervals being examined (to represent the change over the course of time) and it can bin within each interval based on a number of natural time divisions. The two can be combined, so it's possible to get the time series of all the Mondays and Tuesdays by year by specifying “year” as the selection interval, “wday” as the binning and “2,3” as the bin selection.

Command Line Usage

normal_stats [--binning=<units>] [--bin-select=1,2,3,...] 
             [--select=<units>] [--source=avgH] [--output=csv|cpd2|report]
             station start end [Variable1 Variable2 ...]

Arguments

start and end

The time specifiers for the data to be retrieved. Start is inclusive while end is exclusive, so all data contained within the half open interval [start,end) will be used. Any convertible time format is accepted.

station

The station identifier code. Case insensitive.

Variable1 Variable2 ...

List of variables to generate statistics for. These are regular expressions defining variables as per data.consolidate. Note that for cut size split variables they must define the cut selection as optional (i.e. “[01]?”), otherwise they will not be located.

The default is all extensive parameters, SSA and the scattering Blue-Red Angstrom exponent and the standard deviations for all extensive parameters.

--source=avgH

Set the source archive, defaulting to avgH.

--binning=<units>

Select the binning within each selection interval. The possible units are:

  • mo, mon or month - Month, Jan = 1
  • d, day, mday - Day of month
  • week or wday - Day of week, Monday = 1
  • yday or doy - Day of year, Jan 1st = 1
  • h or hour - Hour of day, Midnight = 0
  • m or min - Minute of hour
  • s or sec - Second of minute

Defaults to no binning (include the entire selection interval).

--bin-select=1,2,3,...

Select the bin numbers to output, defaulting to all bins.

--select=<units>

Set the interval to select data on, each interval is output in time ascending order. Available units are:

  • y or year
  • mo, mon or month
  • d or day
  • h or hour
  • m or min
  • s or sec

Defaults to no selection (include the entire specified data range).

--output=csv|cpd2|report

Set the output mode. Defaults to CSV mode for any selection interval or report mode if none is set (that is single line report).

CSV mode outputs the first line as a header and the first field of each line is a time stamp. Report mode is the same as the CSV mode redirected to csv_transpose.

Example Usage

Default output report

normal_stats sgp 2008 2010

Only December across multiple years

normal_stats --binning=month --bin-select=12 sgp 2008 2010

All Mondays and Tuesdays by year for a single variable

normal_stats --binning=wday --bin-select=1,2 --select=year sgp 2008 2010 'BsG0?_S11'

Yearly values for a multiple variables

normal_stats --select=year sgp 2008 2010 'BsG0?_S11' 'BaG[01]?_A11'