spikefilter.conf

This format of file defines sets of processes used to generate segments around detected spikes. The direct handler is data.segmentation.spikefilter which is called on different instances by data.editquality and data.edit.mentor.generate.spikefilter.

Format

Lines beginning with '#' are treated as comments. The format consists of a CSV separated list of key and values. Keys are case insensitive. The file is further broken down into a number of nested definitions, each scope defining part of the filter. The highest level, the “set” level, defines a complete set of filters.

Within each set there can be a number of independent filters, each of which gets a vote on if a specific point is part of a spike or not. These votes are combined with a voting function and checked against a threshold. If they exceed the threshold then the value can be “bleed” to neighboring points and the process repeated. The default is that each filter has a vote weight of one and no bleeding is done. That is, if any filter marks a point as a spike then the output point is considered as one.

Set Scope Definition

A set is begun with the key “BeginSet” and ends with the key “EndSet”. It can take up to four parameters: the start, the end, the tag name, and the tag records, in that order.

Start and End

Determines the times for which this set of filters is applied to any input. Either bound can be zero or blank to indicate unlimited time, otherwise they can be any convertible time format.

Tag Name

The “name” or “type” of any segments that this set of filters generates. See the output format. This can be used by a calling program to determine handling of the segments. For example, data.edit.mentor.generate recognizes “invalidate” and “contaminate” in this field for the two types of edits it can generate. In that example, it ignores the tag records and variables (below) for contaminate edits, but uses them for invalidate edits.

Tag Records

The output tagged records for this filter set. These are the records that follow the tag name in the output format. By default this will be any records used by the filter.

Set Scope

There are several direct keys that can be placed within a set scope as well as several other scopes that can be nested within it:

  • VoteThreshold - A numeric key determining the threshold for triggering, defaulting to 1.0.
  • TagVariables - A list of variables to tag in the output.
  • RemoveMentorEdits - Causes any invalidating mentor edits not to be processed in filters. Optionally takes a list of variables to handle, defaulting to all variables used.
  • RemoveNephErrors - Causes data that occurs when a neph is reporting an error condition to not be processed. Optionally takes a list (separated by “;” or “:”) of nephs and another list (separated by “;” or “:”) of variables to handle. The nephs default to “S11” and the variables default all scattering and backscattering for the R, G, and B channels.
  • VoteBleeding … End - Defines the scope specifying the bleeding function of votes.
  • Limits … End - Defines the scope of a filter that triggers whenever variables go beyond predefined constant values.
  • ResidualSPLP … End - Defines the scope of a filter that triggers based on the residual from a single pole low pass digital filter.
  • ProbitSVM … End - Defines the scope of a filter that is based on a SVM regression fit (epsilon regression, using radial basis functions), estimating the noise from a probit boundary of the residual and triggering based on a threshold of that noise.
  • Any of the general keys below, to set that value for all filters.

General Keys

These keys are valid in both the set global scope and specific filter scopes:

  • Cut - If “true” or non-zero then the input variables are split based on cut size and the filters are run independently on each (if supported).
  • RemoveContam and AllowContam - If “true” or non-zero then contaminated values are considered or allowed (depending on the key) in the filters being run.

VoteBleeding Scope

All keys until “End” are part of the scope. The scope defines the bleeding function applied when any vote value exceeds the threshold. Bleeding allows for a large vote to cause other nearby ones to also trigger. The following keys are valid within the scope:

  • Forward - The number of points forward in time to apply bleeding to.
  • Backward - The number of points backward in time to apply bleeding to.
  • Distance - The number of points forward and backward to apply bleeding to (effectively sets both forward and backward above).
  • Amount - The fraction of the vote value to bleed into points.
  • Iterations or Itter - The number of bleeding iterations to perform.
  • Threshold - The threshold to apply bleeding at.

VoteFunction scope

All keys until “End” are part of the scope. This scope defines the transformation of an input vote value (usually proportional to the confidence of the filter) of a point to the value passed to the global voting handler. It can consist of the following keys:

  • Constant - Add this constant value regardless of the input value (default 1.0).
  • Prescalar or Fraction - Multiply the noise normalized value by this before any other handling is done (default 1.0).
  • Linear or LinearWeight - Multiply the input value by this (default 0.0).
  • LinearLimit or LinearMax - If enabled (greater than zero) the maximum value the linear component can have (before weighting).
  • Log or LogWeight - The weight of the natural log of the value plus one added (default 0.0).
  • LogA or LogConstant - The constant to multiply the value by inside the log (default 1.0).
  • Exp or ExpWeight - The weight of the exponential of the value plus one added (default 0.0).
  • ExpA or ExpConstant - The constant to multiply the value by inside the exp (default 1.0).

VoteWeight Scope

All keys until “End” are part of the scope. This scope defines the weighting of a specific filter's vote output into the global vote value. That value is the sum of all the weighted inputs and is compared against the global threshold. It can consist of the following keys:

  • Linear or LinearWeight - Multiply the input value by this (default 1.0).
  • LinearLimit or LinearMax - If enabled (greater than zero) the maximum value the linear component can have (before weighting).
  • Log or LogWeight - The weight of the natural log of the value plus one added (default 0.0).
  • LogA or LogConstant - The constant to multiply the value by inside the log (default 1.0).
  • Exp or ExpWeight - The weight of the exponential of the value plus one added (default 0.0).
  • ExpA or ExpConstant - The constant to multiply the value by inside the exp (default 1.0).

ResidualNoiseFunction Scope

All keys until “End” are part of the scope. This scope defines the transformation from a residual of some smoother to a single value representing the noise of the variable. The output is the sum of the values transformed by the parameters below, divided by the number of points used. The following keys are valid:

  • Perc, Percentile, Fraction - The fraction of sorted residuals to use, a number less than 1.0 causes some fraction of the largest residuals (presumable those that are spikes) to be discarded (default 0.9).
  • Linear or LinearWeight - Multiply the input value by this (default 1.0).
  • LinearLimit or LinearMax - If enabled (greater than zero) the maximum value the linear component can have (before weighting).
  • Log or LogWeight - The weight of the natural log of the value plus one added (default 0.0).
  • LogA or LogConstant - The constant to multiply the value by inside the log (default 1.0).
  • Exp or ExpWeight - The weight of the exponential of the value plus one added (default 0.0).
  • ExpA or ExpConstant - The constant to multiply the value by inside the exp (default 1.0).

Limits Scope

All keys until “End” are part of the scope. The scope defines a spike trigger that occurs whenever any of its input variables exceeds a minimum and/or maximum value. A limits failure causes other filters to not consider the point. The following keys are valid:

  • VoteScalar - Defines the VoteFunction for this filter as above. The input value is the triggering value, with the normalization factor being the difference between the minimum and maximum or whichever is defined if one isn't.
  • VoteWeight - Defines the VoteWeight for the output of this filter.
  • Min - Set the minimum value to trigger when any variable exceeds it, defaulting to no limit.
  • Max - Set the maximum value to trigger when any variable exceeds it, defaulting to no limit.
  • Variables - A list of variables to consider.
  • Records - A list of the records containing the variables (if not already present from another loading record).

ResidualSPLP Scope

All keys until “End” are part of the scope. The scope defines a spike trigger that is based on the residual from a single pole low pass digital filter, run forwards and backwards in time. The noise estimate is based off a median smoother. The following keys are valid:

  • NoiseMedian - The number of points to median smooth, default 5.
  • NoiseEstimate - The ResidualNoiseFunction scope defining the noise estimate.
  • VoteScalar - Defines the VoteFunction for this filter as above. The input value is the triggering value, with the normalization factor being the difference between the minimum and maximum or whichever is defined if one isn't.
  • VoteWeight - Defines the VoteWeight for the output of this filter.
  • TC or TimeConstant - The time constant of the filter, default 10.
  • BackoutNoiseFactor - A residual of more than this value times the noise estimate causes a point to be backed out (not added) to the running smoother, default 4.0.
  • TriggerNoiseFactor - A residual of more than this value times the noise estimate causes a point to trigger a spike vote, default 2.0.
  • Require - A list of names to require results from before running, results from these are not considered. Can be used to ensure sequencing of filters (for example CCN flows provide a result while CCN counts require one).
  • Provide - A list of names that this filter provides results for, used by any that require them.
  • Variables - A list of variables to consider.
  • Records - A list of the records containing the variables (if not already present from another loading record).

ProbitSVM Scope

All keys until “End” are part of the scope. The scope defines a spike trigger that is based on the residual from a SVM regression fit (epsilon regression of radial basis functions) of the variables. The trigger threshold is based of some portion of the residuals in the cumulative frequency space (in probit values). The following keys are valid:

  • PresmoothMedian - The number of points to median smooth before anything else, default 5.
  • VoteScalar - Defines the VoteFunction for this filter as above. The input value is the triggering value, with the normalization factor being the difference between the minimum and maximum or whichever is defined if one isn't.
  • VoteWeight - Defines the VoteWeight for the output of this filter.
  • Gamma or g - The SVM regression gamma, default 300.
  • Epsilon or e - The SVM regression epsilon, default 0.025.
  • Variability or ProbitVariablityCDF - The cumulative fraction of the data to use as a noise estimate, in CDF space. The default of 0.0547993 corresponds to a probit value of 1.6.
  • Threshold or ProbitThresholdFraction - The fraction of the value at the variability points to use as the spike threshold. The default of 5 corresponds to a probit threshold of 8 assuming the default variability (8/1.6 = 5).
  • MinThreshold - The absolute minimum the threshold can be. It will always be this large even if the probit variability estimate generates a smaller one. Default 0.5.
  • MinThresholdFraction - Sets a minimum for the threshold as a fraction of the global median of the data. Default 0.25 (the threshold must be at least 25% of the median of the data).
  • Require - A list of names to require results from before running, results from these are not considered. Can be used to ensure sequencing of filters (for example CCN flows provide a result while CCN counts require one).
  • Provide - A list of names that this filter provides results for, used by any that require them.
  • Variables - A list of variables to consider.
  • Records - A list of the records containing the variables (if not already present from another loading record).