xhydro.optimal_interpolation package

Optimal Interpolation module.

Submodules

xhydro.optimal_interpolation.ECF_climate_correction module

Empirical Covariance Function variogram calibration package.

xhydro.optimal_interpolation.ECF_climate_correction.calculate_ECF_stats(distance: ndarray, covariance: ndarray, covariance_weights: ndarray, valid_heights: ndarray) tuple[ndarray, ndarray, ndarray][source]

Calculate statistics for Empirical Covariance Function (ECF), climatological version.

Uses the histogram data from all previous days and reapplies the same steps, but inputs are of size (timesteps x variogram_bins). So if we use many days to compute the histogram bins, we get a histogram per day. This function generates a single output from a new histogram.

Parameters

distancenp.ndarray

Array of distances.

covariancenp.ndarray

Array of covariances.

covariance_weightsnp.ndarray

Array of weights for covariances.

valid_heightsnp.ndarray

Array of valid heights.

Returns

tuple[np.ndarray, np.ndarray, np.ndarray]

A tuple containing the following: - h_b: Array of mean distances for each height bin. - cov_b: Array of weighted average covariances for each height bin. - std_b: Array of standard deviations for each height bin.

xhydro.optimal_interpolation.ECF_climate_correction.correction(da_qobs: DataArray, da_qsim: DataArray, centroid_lon_obs: ndarray, centroid_lat_obs: ndarray, variogram_bins: int = 10, form: int = 3, hmax_divider: float = 2.0, p1_bnds: list | None = None, hmax_mult_range_bnds: list | None = None) tuple[source]

Perform correction on flow observations using optimal interpolation.

Parameters

da_qobsxr.DataArray

An xarray DataArray of observed flow data.

da_qsimxr.DataArray

An xarray DataArray of simulated flow data.

centroid_lon_obsnp.ndarray

Longitude vector of the catchment centroids for the observed stations.

centroid_lat_obsnp.ndarray

Latitude vector of the catchment centroids for the observed stations.

variogram_binsint, optional

Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.

formint

The form of the ECF equation to use (1, 2, 3 or 4. See Notes below).

hmax_dividerfloat

Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.

p1_bndslist, optional

The lower and upper bounds of the parameters for the first parameter of the ECF equation for variogram fitting. Defaults to [0.95, 1.0].

hmax_mult_range_bndslist, optional

The lower and upper bounds of the parameters for the second parameter of the ECF equation for variogram fitting. It is multiplied by « hmax », which is calculated to be the threshold limit for the variogram sill. Defaults to [0.05, 3.0].

Returns

tuple

A tuple containing the following: - ecf_fun: Partial function for the error covariance function. - par_opt: Optimized parameters for the interpolation.

Notes

The possible forms for the ecf function fitting are as follows:
Form 1 (From Lachance-Cloutier et al. 2017; and Garand & Grassotti 1995) :

ecf_fun = par[0] * (1 + h / par[1]) * np.exp(-h / par[1])

Form 2 (Gaussian form) :

ecf_fun = par[0] * np.exp(-0.5 * np.power(h / par[1], 2))

Form 3 :

ecf_fun = par[0] * np.exp(-h / par[1])

Form 4 :

ecf_fun = par[0] * np.exp(-(h ** par[1]) / par[0])

xhydro.optimal_interpolation.ECF_climate_correction.eval_covariance_bin(distances: ndarray, values: ndarray, hmax_divider: float = 2.0, variogram_bins: int = 10) tuple[ndarray, ndarray, ndarray, ndarray][source]

Evaluate the covariance of a binomial distribution.

Parameters

distancesnp.ndarray

Array of distances for each data point.

valuesnp.ndarray

Array of values corresponding to each data point.

hmax_dividerfloat

Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.

variogram_binsint, optional

Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.

Returns

tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]

Arrays for heights, covariance, standard deviation, row length.

xhydro.optimal_interpolation.ECF_climate_correction.general_ecf(h: ndarray, par: list | ndarray, form: int)[source]

Define the form of the Error Covariance Function (ECF) equations.

Parameters

hfloat or array

The distance or distances at which to evaluate the ECF.

parlist or array-like

Parameters for the ECF equation.

formint

The form of the ECF equation to use (1, 2, 3 or 4). See correction() for details.

Returns

float or array:

The calculated ECF values based on the specified form.

xhydro.optimal_interpolation.ECF_climate_correction.initialize_stats_variables(heights: ndarray, covariances: ndarray, standard_deviations: ndarray, variogram_bins: int = 10) tuple[source]

Initialize variables for statistical calculations in an Empirical Covariance Function (ECF).

Parameters

heightsnp.ndarray

Array of heights.

covariancesnp.ndarray

Array of covariances.

standard_deviationsnp.ndarray

Array of standard deviations.

variogram_binsint

Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.

Returns

tuple

A tuple containing the following: - distance: Array of distances. - covariance: Array of covariances. - covariance_weights: Array of weights for covariances. - valid_heights: Array of valid heights.

xhydro.optimal_interpolation.compare_result module

Compare results between simulations and observations.

xhydro.optimal_interpolation.compare_result.compare(qobs: Dataset, qsim: Dataset, flow_l1o: Dataset, station_correspondence: Dataset, observation_stations: list, percentile_to_plot: int = 50, show_comparison: bool = True)[source]

Start the computation of the comparison method.

Parameters

qobsxr.Dataset

Streamflow and catchment properties dataset for observed data.

qsimxr.Dataset

Streamflow and catchment properties dataset for simulated data.

flow_l1oxr.Dataset

Streamflow and catchment properties dataset for simulated leave-one-out cross-validation results.

station_correspondencexr.Dataset

Matching between the tag in the simulated files and the observed station number for the obs dataset.

observation_stationslist

Observed hydrometric dataset stations to be used in the cross-validation step.

percentile_to_plotint

Percentile value to plot (default is 50).

show_comparisonbool

Whether to display the comparison plots (default is True).

xhydro.optimal_interpolation.optimal_interpolation_fun module

Package containing the optimal interpolation functions.

xhydro.optimal_interpolation.optimal_interpolation_fun.execute_interpolation(qobs: Dataset, qsim: Dataset, station_correspondence: Dataset, observation_stations: list, ratio_var_bg: float = 0.15, percentiles: list[float] | None = None, variogram_bins: int = 10, parallelize: bool = False, max_cores: int = 1, leave_one_out_cv: bool = False, form: int = 3, hmax_divider: float = 2.0, p1_bnds: list | None = None, hmax_mult_range_bnds: list | None = None) Dataset[source]

Run the interpolation algorithm for leave-one-out cross-validation or operational use.

Parameters

qobsxr.Dataset

Streamflow and catchment properties dataset for observed data.

qsimxr.Dataset

Streamflow and catchment properties dataset for simulated data.

station_correspondencexr.Dataset

Correspondence between the tag in the simulated files and the observed station number for the obs dataset.

observation_stationslist

Observed hydrometric dataset stations to be used in the ECF function building and optimal interpolation application step.

ratio_var_bgfloat

Ratio for background variance (default is 0.15).

percentileslist(float), optional

List of percentiles to analyze (default is [25.0, 50.0, 75.0, 100.0]).

variogram_binsint, optional

Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.

parallelizebool

Execute the profiler in parallel or in series (default is False).

max_coresint

Maximum number of cores to use for parallel processing.

leave_one_out_cvbool

Flag to determine if the code should be run in leave-one-out cross-validation (True) or should be applied operationally (False).

formint

The form of the ECF equation to use (1, 2, 3 or 4. See documentation).

hmax_dividerfloat

Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.

p1_bndslist, optional

The lower and upper bounds of the parameters for the first parameter of the ECF equation for variogram fitting. Defaults to [0.95, 1].

hmax_mult_range_bndslist, optional

The lower and upper bounds of the parameters for the second parameter of the ECF equation for variogram fitting. It is multiplied by « hmax », which is calculated to be the threshold limit for the variogram sill. Defaults to [0.05, 3].

Returns

xr.Dataset

An xarray dataset containing the flow quantiles and all the associated metadata.

xhydro.optimal_interpolation.optimal_interpolation_fun.optimal_interpolation(lat_obs: ndarray, lon_obs: ndarray, lat_est: ndarray, lon_est: ndarray, ecf: partial, bg_var_obs: ndarray, bg_var_est: ndarray, var_obs: ndarray, bg_departures: ndarray, bg_est: ndarray, precalcs: dict) tuple[ndarray, ndarray, dict][source]

Perform optimal interpolation to estimate values at specified locations.

Parameters

lat_obsnp.ndarray

Vector of latitudes of the observation stations catchment centroids.

lon_obsnp.ndarray

Vector of longitudes of the observation stations catchment centroids.

lat_estnp.ndarray

Vector of latitudes of the estimation/simulation stations catchment centroids.

lon_estnp.ndarray

Vector of longitudes of the estimation/simulation stations catchment centroids.

ecfpartial

The function to use for the empirical distribution correction. It is a partial function from functools. The error covariance is a function of distance h, and this partial function represents this relationship.

bg_var_obsnp.ndarray

Background field variance at the observation stations (vector of size « observation stations »).

bg_var_estnp.ndarray

Background field variance at estimation sites (vector of size « estimation stations »).

var_obsnp.ndarray

Observation variance at observation sites (vector of size « observation stations »).

bg_departuresnp.ndarray

Difference between observation and background field at observation sites (vector of size « observation stations »).

bg_estnp.ndarray

Background field values at estimation sites (vector of size « estimation stations »).

precalcsdict

Additional arguments and state information for the interpolation process, to accelerate calculations between timesteps.

Returns

v_estnp.ndarray

Estimated values at the estimation sites (vector of size « estimation stations »).

var_estnp.ndarray

Estimated variance at the estimation sites (vector of size « estimation stations »).

precalcsdict

Additional arguments and state information for the interpolation process, to accelerate calculations between timesteps. This variable returns the pre-calculated distance matrices.

xhydro.optimal_interpolation.utilities module

Utilities required for managing data in the interpolation toolbox.

xhydro.optimal_interpolation.utilities.plot_results(kge, kge_l1o, nse, nse_l1o)[source]

Generate a plot of the results of model evaluation using various metrics.

Parameters

kgearray-like

Kling-Gupta Efficiency for the entire dataset.

kge_l1oarray-like

Kling-Gupta Efficiency for leave-one-out cross-validation.

nsearray-like

Nash-Sutcliffe Efficiency for the entire dataset.

nse_l1oarray-like

Nash-Sutcliffe Efficiency for leave-one-out cross-validation.

Returns

None :

No return.

xhydro.optimal_interpolation.utilities.prepare_flow_percentiles_dataset(station_id, lon, lat, drain_area, time, percentile, discharge)[source]

Write discharge data as an xarray.Dataset.

Parameters

station_idarray-like

List of station IDs.

lonarray-like

List of longitudes corresponding to each station.

latarray-like

List of latitudes corresponding to each station.

drain_areaarray-like

List of drainage areas corresponding to each station.

timearray-like

List of datetime objects representing time.

percentilelist or None

List of percentiles or None if not applicable.

dischargenumpy.ndarray

3D array of discharge data, dimensions (percentile, station, time).

Returns

xr.Dataset :

The dataset containing the flow percentiles as generated by the optimal interpolation code.

Notes

  • The function creates and returns an xarray Dataset using the provided data.

  • The function includes appropriate metadata and attributes for each variable.