xhydro.optimal_interpolation package¶
Optimal Interpolation module.
Submodules¶
xhydro.optimal_interpolation.ECF_climate_correction module¶
Empirical Covariance Function variogram calibration package.
- xhydro.optimal_interpolation.ECF_climate_correction.calculate_ECF_stats(distance: ndarray, covariance: ndarray, covariance_weights: ndarray, valid_heights: ndarray) tuple[ndarray, ndarray, ndarray] [source]¶
Calculate statistics for Empirical Covariance Function (ECF), climatological version.
Uses the histogram data from all previous days and reapplies the same steps, but inputs are of size (timesteps x variogram_bins). So if we use many days to compute the histogram bins, we get a histogram per day. This function generates a single output from a new histogram.
Parameters¶
- distancenp.ndarray
Array of distances.
- covariancenp.ndarray
Array of covariances.
- covariance_weightsnp.ndarray
Array of weights for covariances.
- valid_heightsnp.ndarray
Array of valid heights.
Returns¶
- tuple[np.ndarray, np.ndarray, np.ndarray]
A tuple containing the following: - h_b: Array of mean distances for each height bin. - cov_b: Array of weighted average covariances for each height bin. - std_b: Array of standard deviations for each height bin.
- xhydro.optimal_interpolation.ECF_climate_correction.correction(da_qobs: DataArray, da_qsim: DataArray, centroid_lon_obs: ndarray, centroid_lat_obs: ndarray, variogram_bins: int = 10, form: int = 3, hmax_divider: float = 2.0, p1_bnds: list | None = None, hmax_mult_range_bnds: list | None = None) tuple [source]¶
Perform correction on flow observations using optimal interpolation.
Parameters¶
- da_qobsxr.DataArray
An xarray DataArray of observed flow data.
- da_qsimxr.DataArray
An xarray DataArray of simulated flow data.
- centroid_lon_obsnp.ndarray
Longitude vector of the catchment centroids for the observed stations.
- centroid_lat_obsnp.ndarray
Latitude vector of the catchment centroids for the observed stations.
- variogram_binsint, optional
Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.
- formint
The form of the ECF equation to use (1, 2, 3 or 4. See Notes below).
- hmax_dividerfloat
Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.
- p1_bndslist, optional
The lower and upper bounds of the parameters for the first parameter of the ECF equation for variogram fitting. Defaults to [0.95, 1.0].
- hmax_mult_range_bndslist, optional
The lower and upper bounds of the parameters for the second parameter of the ECF equation for variogram fitting. It is multiplied by « hmax », which is calculated to be the threshold limit for the variogram sill. Defaults to [0.05, 3.0].
Returns¶
- tuple
A tuple containing the following: - ecf_fun: Partial function for the error covariance function. - par_opt: Optimized parameters for the interpolation.
Notes¶
- The possible forms for the ecf function fitting are as follows:
- Form 1 (From Lachance-Cloutier et al. 2017; and Garand & Grassotti 1995) :
ecf_fun = par[0] * (1 + h / par[1]) * np.exp(-h / par[1])
- Form 2 (Gaussian form) :
ecf_fun = par[0] * np.exp(-0.5 * np.power(h / par[1], 2))
- Form 3 :
ecf_fun = par[0] * np.exp(-h / par[1])
- Form 4 :
ecf_fun = par[0] * np.exp(-(h ** par[1]) / par[0])
- xhydro.optimal_interpolation.ECF_climate_correction.eval_covariance_bin(distances: ndarray, values: ndarray, hmax_divider: float = 2.0, variogram_bins: int = 10) tuple[ndarray, ndarray, ndarray, ndarray] [source]¶
Evaluate the covariance of a binomial distribution.
Parameters¶
- distancesnp.ndarray
Array of distances for each data point.
- valuesnp.ndarray
Array of values corresponding to each data point.
- hmax_dividerfloat
Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.
- variogram_binsint, optional
Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.
Returns¶
- tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]
Arrays for heights, covariance, standard deviation, row length.
- xhydro.optimal_interpolation.ECF_climate_correction.general_ecf(h: ndarray, par: list | ndarray, form: int)[source]¶
Define the form of the Error Covariance Function (ECF) equations.
Parameters¶
- hfloat or array
The distance or distances at which to evaluate the ECF.
- parlist or array-like
Parameters for the ECF equation.
- formint
The form of the ECF equation to use (1, 2, 3 or 4). See
correction()
for details.
Returns¶
- float or array:
The calculated ECF values based on the specified form.
- xhydro.optimal_interpolation.ECF_climate_correction.initialize_stats_variables(heights: ndarray, covariances: ndarray, standard_deviations: ndarray, variogram_bins: int = 10) tuple [source]¶
Initialize variables for statistical calculations in an Empirical Covariance Function (ECF).
Parameters¶
- heightsnp.ndarray
Array of heights.
- covariancesnp.ndarray
Array of covariances.
- standard_deviationsnp.ndarray
Array of standard deviations.
- variogram_binsint
Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.
Returns¶
- tuple
A tuple containing the following: - distance: Array of distances. - covariance: Array of covariances. - covariance_weights: Array of weights for covariances. - valid_heights: Array of valid heights.
xhydro.optimal_interpolation.compare_result module¶
Compare results between simulations and observations.
- xhydro.optimal_interpolation.compare_result.compare(qobs: Dataset, qsim: Dataset, flow_l1o: Dataset, station_correspondence: Dataset, observation_stations: list, percentile_to_plot: int = 50, show_comparison: bool = True)[source]¶
Start the computation of the comparison method.
Parameters¶
- qobsxr.Dataset
Streamflow and catchment properties dataset for observed data.
- qsimxr.Dataset
Streamflow and catchment properties dataset for simulated data.
- flow_l1oxr.Dataset
Streamflow and catchment properties dataset for simulated leave-one-out cross-validation results.
- station_correspondencexr.Dataset
Matching between the tag in the simulated files and the observed station number for the obs dataset.
- observation_stationslist
Observed hydrometric dataset stations to be used in the cross-validation step.
- percentile_to_plotint
Percentile value to plot (default is 50).
- show_comparisonbool
Whether to display the comparison plots (default is True).
xhydro.optimal_interpolation.optimal_interpolation_fun module¶
Package containing the optimal interpolation functions.
- xhydro.optimal_interpolation.optimal_interpolation_fun.execute_interpolation(qobs: Dataset, qsim: Dataset, station_correspondence: Dataset, observation_stations: list, ratio_var_bg: float = 0.15, percentiles: list[float] | None = None, variogram_bins: int = 10, parallelize: bool = False, max_cores: int = 1, leave_one_out_cv: bool = False, form: int = 3, hmax_divider: float = 2.0, p1_bnds: list | None = None, hmax_mult_range_bnds: list | None = None) Dataset [source]¶
Run the interpolation algorithm for leave-one-out cross-validation or operational use.
Parameters¶
- qobsxr.Dataset
Streamflow and catchment properties dataset for observed data.
- qsimxr.Dataset
Streamflow and catchment properties dataset for simulated data.
- station_correspondencexr.Dataset
Correspondence between the tag in the simulated files and the observed station number for the obs dataset.
- observation_stationslist
Observed hydrometric dataset stations to be used in the ECF function building and optimal interpolation application step.
- ratio_var_bgfloat
Ratio for background variance (default is 0.15).
- percentileslist(float), optional
List of percentiles to analyze (default is [25.0, 50.0, 75.0, 100.0]).
- variogram_binsint, optional
Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.
- parallelizebool
Execute the profiler in parallel or in series (default is False).
- max_coresint
Maximum number of cores to use for parallel processing.
- leave_one_out_cvbool
Flag to determine if the code should be run in leave-one-out cross-validation (True) or should be applied operationally (False).
- formint
The form of the ECF equation to use (1, 2, 3 or 4. See documentation).
- hmax_dividerfloat
Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.
- p1_bndslist, optional
The lower and upper bounds of the parameters for the first parameter of the ECF equation for variogram fitting. Defaults to [0.95, 1].
- hmax_mult_range_bndslist, optional
The lower and upper bounds of the parameters for the second parameter of the ECF equation for variogram fitting. It is multiplied by « hmax », which is calculated to be the threshold limit for the variogram sill. Defaults to [0.05, 3].
Returns¶
- xr.Dataset
An xarray dataset containing the flow quantiles and all the associated metadata.
- xhydro.optimal_interpolation.optimal_interpolation_fun.optimal_interpolation(lat_obs: ndarray, lon_obs: ndarray, lat_est: ndarray, lon_est: ndarray, ecf: partial, bg_var_obs: ndarray, bg_var_est: ndarray, var_obs: ndarray, bg_departures: ndarray, bg_est: ndarray, precalcs: dict) tuple[ndarray, ndarray, dict] [source]¶
Perform optimal interpolation to estimate values at specified locations.
Parameters¶
- lat_obsnp.ndarray
Vector of latitudes of the observation stations catchment centroids.
- lon_obsnp.ndarray
Vector of longitudes of the observation stations catchment centroids.
- lat_estnp.ndarray
Vector of latitudes of the estimation/simulation stations catchment centroids.
- lon_estnp.ndarray
Vector of longitudes of the estimation/simulation stations catchment centroids.
- ecfpartial
The function to use for the empirical distribution correction. It is a partial function from functools. The error covariance is a function of distance h, and this partial function represents this relationship.
- bg_var_obsnp.ndarray
Background field variance at the observation stations (vector of size « observation stations »).
- bg_var_estnp.ndarray
Background field variance at estimation sites (vector of size « estimation stations »).
- var_obsnp.ndarray
Observation variance at observation sites (vector of size « observation stations »).
- bg_departuresnp.ndarray
Difference between observation and background field at observation sites (vector of size « observation stations »).
- bg_estnp.ndarray
Background field values at estimation sites (vector of size « estimation stations »).
- precalcsdict
Additional arguments and state information for the interpolation process, to accelerate calculations between timesteps.
Returns¶
- v_estnp.ndarray
Estimated values at the estimation sites (vector of size « estimation stations »).
- var_estnp.ndarray
Estimated variance at the estimation sites (vector of size « estimation stations »).
- precalcsdict
Additional arguments and state information for the interpolation process, to accelerate calculations between timesteps. This variable returns the pre-calculated distance matrices.
xhydro.optimal_interpolation.utilities module¶
Utilities required for managing data in the interpolation toolbox.
- xhydro.optimal_interpolation.utilities.plot_results(kge, kge_l1o, nse, nse_l1o)[source]¶
Generate a plot of the results of model evaluation using various metrics.
Parameters¶
- kgearray-like
Kling-Gupta Efficiency for the entire dataset.
- kge_l1oarray-like
Kling-Gupta Efficiency for leave-one-out cross-validation.
- nsearray-like
Nash-Sutcliffe Efficiency for the entire dataset.
- nse_l1oarray-like
Nash-Sutcliffe Efficiency for leave-one-out cross-validation.
Returns¶
- None :
No return.
- xhydro.optimal_interpolation.utilities.prepare_flow_percentiles_dataset(station_id, lon, lat, drain_area, time, percentile, discharge)[source]¶
Write discharge data as an xarray.Dataset.
Parameters¶
- station_idarray-like
List of station IDs.
- lonarray-like
List of longitudes corresponding to each station.
- latarray-like
List of latitudes corresponding to each station.
- drain_areaarray-like
List of drainage areas corresponding to each station.
- timearray-like
List of datetime objects representing time.
- percentilelist or None
List of percentiles or None if not applicable.
- dischargenumpy.ndarray
3D array of discharge data, dimensions (percentile, station, time).
Returns¶
- xr.Dataset :
The dataset containing the flow percentiles as generated by the optimal interpolation code.
Notes¶
The function creates and returns an xarray Dataset using the provided data.
The function includes appropriate metadata and attributes for each variable.