## 3.5 Experimental week 2 forecasts of extreme events using the operational NCEP ensemble

The existence of a significant spread-skill relationship (at relatively short forecast ranges) means that changes of both the mean and width of the forecast probability distribution from their climatological values can be used to estimate the probability that the verification will lie in the tails of the climatological probability distribution (see Fig. 2.6). The statistical models discussed earlier assume that the spread is constant, and that only shifts of the mean are important in altering the probability that the verification will be an "extreme event".

Unfortunately, this advantage of ensemble forecasts, which is modest
but significant in Week 1, is lost by the middle of Week 2. The main
reason is that by Week 2 the forecast ensemble spread nearly saturates
to its climatological mean value, so that there are no significant
spread variations from case to case. In other words, most of the
predictable variation of forecast skill in Week 2 is associated with
predictable variations of the signal, not of noise. For several years
CDC has exploited this fact in producing an experimental real-time
Week 2 forecast product based on the NCEP ensemble (http://www.esrl.noaa.gov/psd/~jsw/week2/). Tercile
probability forecasts of 500 mb height, 850 mb temperature, 250 mb
zonal wind, sea-level pressure and precipitation are provided. Only
the signal, not noise, is used to construct these probability
forecasts. The procedure involves converting maps of the predicted
standardized anomalies into maps of extreme quantile (in this case,
tercile) probabilities. This calibration is done empirically, using
the available historical record of ensemble forecasts and verifying
analyses. The procedure is as follows: 1) for a positive standardized
forecast anomaly , all instances in which a forecast *exceeded* this
value in the data record are found, and the probability that the verifying
analysis fell in the upper tercile of the climatological distribution
is computed, 2) the standardized anomaly contour is relabeled as a
probability of above-normal equal to . If is negative, the probability that the verifying analysis
fell into the lower tercile is computed, and the contour is relabeled
"probability of below-normal". If the model has systematic errors,
these probabilities need not be symmetric, i.e. the probability of
below-normal for a negative need not be the same as the probability of above-normal
for a positive . Our calibration thus provides one simple way of
accounting for model error in probabilistic predictions.

Figure 3.11 shows an example of such a
probability forecast. Note that the interpretation of this map is
slightly different from that for a conventional probability
forecast. If all the points on the map inside the yellow contour (as
opposed to those inside the yellow *band*) are counted over a
large sample of forecasts, 50-60% of these points will verify in the
upper tercile of the climatological distribution. Similarly, for
points falling in the darkest red regions on the map, over 90% will
verify in the upper tercile. The conventional interpretation would be
that points in the yellow *band* would have a 50-60% chance of
verifying in the upper tercile. Such a calibration would require a lot
more forecasts to compute reliably, since there are far fewer points
inside the yellow band than there are inside the yellow contour.

**Fig. 3.11**Example of an experimental week 2 forecast verifying the last week of October 1998.

Since we assume that the signal, not the noise, contains all of the
useful predictive information, the useful subspace of the ensemble can
be isolated through an EOF analysis of the correlation matrix of the
ensemble-mean predictions. (The idea here is similar to that in Fig 2.8). The right panels of Fig. 3.12 show the three leading EOFs thus
obtained. For comparison, the three leading EOFs of the correlation
matrix of observed 7-day averages is also shown, in the left
panels. There are two notable aspects to Fig. 3.12: 1) the signal and observed EOF patterns
are similar, and 2) the three leading EOFs explain considerably more
variance of the ensemble-mean forecasts than they do of the observed
variability (36% vs. 22%). To understand this better, note that the
total forecast covariance can be decomposed into a part due to the
predictable signal (**C**_{signal}) and a part due to
unpredictable noise (**C**_{noise}). If the forecast model
is unbiased and the noise is uncorrelated with the signal, the
observed variance (**C**_{obs}) is approximately the sum of
the two. This relationship is exact for the LIM discussed in section 3.1. The fact that the signal variation
occurs in a lower dimensional subspace than the observational 7-day
averages then simply means that the variance contained in the noise is
non-trivial. The similarity of the observed and signal EOF patterns
has a subtler interpretation: it implies that the noise component of
the covariance is nearly white, and that the ensemble-mean does indeed
capture most of the extractable signal with coherent spatial
structure.

**Fig. 3.12**Rotated EOFs of weekly average 500 mb height computed using the correlation matrix for DJF 1958-1994 (left panels) and the correlation matrix of week 2 operational ensemble mean forecasts for DJF 1995/96 to 1997/98.

The product shown in Fig. 3.11 has been quite popular with operational forecasters. A similar method has been adopted in operations by NCEP/CPC. A detailed analysis of the performance of this scheme, and its implications for Week 2 predictability, is underway.