## 3.6 Unifying ensemble forecasting and data assimilation

The fundamental goal in subseasonal prediction, just as in seasonal prediction, is to predict the forecast probability distribution function (PDF) accurately. In the previous sections, we have discussed research efforts at CDC toward this goal. It is hoped that statistical methods like the LIM or CCA, when combined with an NWP ensemble, will improve the mean of the forecast PDF. The spread-skill relationship discussed in section 3.4 shows that useful information can be extracted at short forecast ranges from the second moment of the NWP ensemble. One obvious way to improve the accuracy of the forecast PDF is to improve the accuracy of the initial PDF. Currently, all operational centers construct an ensemble of initial conditions by perturbing a single control analysis, obtained from a three-dimensional (as at NCEP) or a simplified four-dimensional (as at ECMWF) data assimilation system. The methods used to generate the perturbations to the control analysis, breeding vectors at NCEP and singular vectors at ECMWF, are fundamentally ad-hoc and not representative of analysis uncertainty. CDC scientists have been investigating new ways of coupling the ensemble forecast and data assimilation steps, in order to improve both the initial and forecast PDFs.

The coupling of ensemble forecasting and data assimilation is natural. The essence of data assimilation is statistical, in that it amounts to blending "first guess" forecasts with new observations using weights determined by their respective error statistics. Carefully constructed forecast ensembles can provide such statistics. Currently, operational methods make rather simplistic assumptions about the error statistics, assuming, for example, that the correlation of forecast errors at two locations depends only on the distance between them and not on the location or whether the atmosphere has recently been quiescent or stormy (Fig 3.13a). Results from simple model experiments using sophisticated "Ensemble Kalman Filter" techniques suggest that the quality of initial conditions can be dramatically improved by using forecast error statistics estimated from a specially constructed ensemble. For example, error statistics from the ensemble permits a single observation at a fixed location to make very different corrections to the first-guess depending on the flow of the day (Fig 3.13b). By estimating the analysis increment to the first guess in this flow-dependent manner, ensembles of initial conditions can be dramatically improved, perhaps even to the point that they are more accurate than analyses based on four-dimensional variational methods.

**Fig. 3.13**Examination of the structure of the "analysis increment" (the initial condition minus the prior "first guess" forecast) for the traditional method of doing data assimilation, where error statistics do not change from location to location or day-to-day. In this experiment, an observation that is 1 K warmer than the prior forecast is found at the location denoted by the dot. (a) Analysis increments using the "3D-Var" data assimilation methodology. The "one size fits all" increments are a simple decreasing function of increasing distance from the observation location. (b) Analysis increments using the new ensemble data assimilation methodology. Changes to the prior forecast are now stretched out along the frontal zone, so that the entire position of the warm front is changed by the one observation.

Recent CDC efforts in this area have focused on algorithmic details of ensemble-based data assimilation experiments. We have sought to understand how the statistics of forecast errors estimated from an ensemble depend on the size of the ensemble, and how one might extract useful information from smaller ensembles-an important issue, since larger ensembles make heavier demands on computational resources. This research has demonstrated that with an accurate specification of forecast error statistics, new problems can be tackled in a theoretically justifiable manner, including problems such as determining where supplementary observations would be most beneficial for reducing analysis or forecast error (the problem of "targeting" observations). In addition, since ensemble-based data assimilation techniques are particularly useful when observations are sparse, CDC scientists are planning to adapt such techniques to extend the NCEP reanalysis back into the pre-radiosonde era (pre-1948).