xhydro.optimal_interpolation package

Optimal Interpolation module.

Submodules

xhydro.optimal_interpolation.ECF_climate_correction module

Empirical Covariance Function variogram calibration package.

xhydro.optimal_interpolation.ECF_climate_correction.calculate_ECF_stats(distance: ndarray, covariance: ndarray, covariance_weights: ndarray, valid_heights: ndarray) tuple[source]

Calculate statistics for Empirical Covariance Function (ECF), climatological version.

Uses the histogram data from all previous days and reapplies the same steps, but inputs are of size (timesteps x variogram_bins). So if we use many days to compute the histogram bins, we get a histogram per day. This function generates a single output from a new histogram.

Parameters

distancenp.ndarray

Array of distances.

covariancenp.ndarray

Array of covariances.

covariance_weightsnp.ndarray

Array of weights for covariances.

valid_heightsnp.ndarray

Array of valid heights.

Returns

tuple

A tuple containing the following: - h_b: Array of mean distances for each height bin. - cov_b: Array of weighted average covariances for each height bin. - std_b: Array of standard deviations for each height bin.

xhydro.optimal_interpolation.ECF_climate_correction.correction(da_qobs: DataArray, da_qsim: DataArray, centroid_lon_obs: ndarray, centroid_lat_obs: ndarray, variogram_bins: int = 10, form: int = 3, hmax_divider: float = 2.0, p1_bnds: list = None, hmax_mult_range_bnds: list = None) tuple[source]

Perform correction on flow observations using optimal interpolation.

Parameters

da_qobsxr.DataArray

An xarray DataArray of observed flow data.

da_qsimxr.DataArray

An xarray DataArray of simulated flow data.

centroid_lon_obsnp.ndarray

Longitude vector of the catchment centroids for the observed stations.

centroid_lat_obsnp.ndarray

Latitude vector of the catchment centroids for the observed stations.

variogram_binsint, optional

Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.

formint

The form of the ECF equation to use (1, 2, 3 or 4. See Notes below).

hmax_dividerfloat

Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.

p1_bndslist, optional

The lower and upper bounds of the parameters for the first parameter of the ECF equation for variogram fitting. Defaults to [0.95, 1.0].

hmax_mult_range_bndslist, optional

The lower and upper bounds of the parameters for the second parameter of the ECF equation for variogram fitting. It is multiplied by “hmax”, which is calculated to be the threshold limit for the variogram sill. Defaults to [0.05, 3.0].

Returns

tuple

A tuple containing the following: - ecf_fun: Partial function for the error covariance function. - par_opt: Optimized parameters for the interpolation.

Notes

The possible forms for the ecf function fitting are as follows:
Form 1 (From Lachance-Cloutier et al. 2017; and Garand & Grassotti 1995) :

ecf_fun = par[0] * (1 + h / par[1]) * np.exp(-h / par[1])

Form 2 (Gaussian form) :

ecf_fun = par[0] * np.exp(-0.5 * np.power(h / par[1], 2))

Form 3 :

ecf_fun = par[0] * np.exp(-h / par[1])

Form 4 :

ecf_fun = par[0] * np.exp(-(h ** par[1]) / par[0])

xhydro.optimal_interpolation.ECF_climate_correction.eval_covariance_bin(distances: ndarray, values: ndarray, hmax_divider: float = 2.0, variogram_bins: int = 10)[source]

Evaluate the covariance of a binomial distribution.

Parameters

distancesnp.ndarray

Array of distances for each data point.

valuesnp.ndarray

Array of values corresponding to each data point.

hmax_dividerfloat

Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.

variogram_binsint, optional

Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.

Returns

tuple

Arrays for heights, covariance, standard deviation, row length.

xhydro.optimal_interpolation.ECF_climate_correction.general_ecf(h: ndarray, par: list, form: int)[source]

Define the form of the Error Covariance Function (ECF) equations.

Parameters

hfloat or array

The distance or distances at which to evaluate the ECF.

parlist

Parameters for the ECF equation.

formint

The form of the ECF equation to use (1, 2, 3 or 4). See correction() for details.

Returns

float or array:

The calculated ECF values based on the specified form.

xhydro.optimal_interpolation.ECF_climate_correction.initialize_stats_variables(heights: ndarray, covariances: ndarray, standard_deviations: ndarray, variogram_bins: int = 10) tuple[source]

Initialize variables for statistical calculations in an Empirical Covariance Function (ECF).

Parameters

heightsnp.ndarray

Array of heights.

covariancesnp.ndarray

Array of covariances.

standard_deviationsnp.ndarray

Array of standard deviations.

variogram_binsint, optional

Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.

Returns

tuple

A tuple containing the following: - distance: Array of distances. - covariance: Array of covariances. - covariance_weights: Array of weights for covariances. - valid_heights: Array of valid heights.

xhydro.optimal_interpolation.compare_result module

Compare results between simulations and observations.

xhydro.optimal_interpolation.compare_result.compare(qobs: Dataset, qsim: Dataset, flow_l1o: Dataset, station_correspondence: Dataset, observation_stations: list, percentile_to_plot: int = 50, show_comparison: bool = True)[source]

Start the computation of the comparison method.

Parameters

qobsxr.Dataset

Streamflow and catchment properties dataset for observed data.

qsimxr.Dataset

Streamflow and catchment properties dataset for simulated data.

flow_l1oxr.Dataset

Streamflow and catchment properties dataset for simulated leave-one-out cross-validation results.

station_correspondencexr.Dataset

Matching between the tag in the simulated files and the observed station number for the obs dataset.

observation_stationslist

Observed hydrometric dataset stations to be used in the cross-validation step.

percentile_to_plotint

Percentile value to plot (default is 50).

show_comparisonbool

Whether to display the comparison plots (default is True).

xhydro.optimal_interpolation.optimal_interpolation_fun module

Package containing the optimal interpolation functions.

xhydro.optimal_interpolation.optimal_interpolation_fun.execute_interpolation(qobs: Dataset, qsim: Dataset, station_correspondence: Dataset, observation_stations: list, ratio_var_bg: float = 0.15, percentiles: list[float] | None = None, variogram_bins: int = 10, parallelize: bool = False, max_cores: int = 1, leave_one_out_cv: bool = False, form: int = 3, hmax_divider: float = 2.0, p1_bnds: list = None, hmax_mult_range_bnds: list = None)[source]

Run the interpolation algorithm for leave-one-out cross-validation or operational use.

Parameters

qobsxr.Dataset

Streamflow and catchment properties dataset for observed data.

qsimxr.Dataset

Streamflow and catchment properties dataset for simulated data.

station_correspondencexr.Dataset

Correspondence between the tag in the simulated files and the observed station number for the obs dataset.

observation_stationslist

Observed hydrometric dataset stations to be used in the ECF function building and optimal interpolation application step.

ratio_var_bgfloat

Ratio for background variance (default is 0.15).

percentileslist(float), optional

List of percentiles to analyze (default is [25.0, 50.0, 75.0, 100.0]).

variogram_binsint, optional

Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.

parallelizebool

Execute the profiler in parallel or in series (default is False).

max_coresint

Maximum number of cores to use for parallel processing.

leave_one_out_cvbool

Flag to determine if the code should be run in leave-one-out cross-validation (True) or should be applied operationally (False).

formint

The form of the ECF equation to use (1, 2, 3 or 4. See documentation).

hmax_dividerfloat

Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.

p1_bndslist, optional

The lower and upper bounds of the parameters for the first parameter of the ECF equation for variogram fitting. Defaults to [0.95, 1].

hmax_mult_range_bndslist, optional

The lower and upper bounds of the parameters for the second parameter of the ECF equation for variogram fitting. It is multiplied by “hmax”, which is calculated to be the threshold limit for the variogram sill. Defaults to [0.05, 3].

Returns

flow_quantileslist

A list containing the flow quantiles for each desired percentile.

dsxr.Dataset

An xarray dataset containing the flow quantiles and all the associated metadata.

xhydro.optimal_interpolation.optimal_interpolation_fun.optimal_interpolation(lat_obs: ndarray, lon_obs: ndarray, lat_est: ndarray, lon_est: ndarray, ecf: partial, bg_var_obs: ndarray, bg_var_est: ndarray, var_obs: ndarray, bg_departures: ndarray, bg_est: ndarray, precalcs: dict)[source]

Perform optimal interpolation to estimate values at specified locations.

Parameters

lat_obsnp.ndarray

Vector of latitudes of the observation stations catchment centroids.

lon_obsnp.ndarray

Vector of longitudes of the observation stations catchment centroids.

lat_estnp.ndarray

Vector of latitudes of the estimation/simulation stations catchment centroids.

lon_estnp.ndarray

Vector of longitudes of the estimation/simulation stations catchment centroids.

ecfpartial

The function to use for the empirical distribution correction. It is a partial function from functools. The error covariance is a function of distance h, and this partial function represents this relationship.

bg_var_obsnp.ndarray

Background field variance at the observation stations (vector of size “observation stations”).

bg_var_estnp.ndarray

Background field variance at estimation sites (vector of size “estimation stations”).

var_obsnp.ndarray

Observation variance at observation sites (vector of size “observation stations”).

bg_departuresnp.ndarray

Difference between observation and background field at observation sites (vector of size “observation stations”).

bg_estnp.ndarray

Background field values at estimation sites (vector of size “estimation stations”).

precalcsdict

Additional arguments and state information for the interpolation process, to accelerate calculations between timesteps.

Returns

v_estfloat

Estimated values at the estimation sites (vector of size “estimation stations”).

var_estfloat

Estimated variance at the estimation sites (vector of size “estimation stations”).

precalcsdict

Additional arguments and state information for the interpolation process, to accelerate calculations between timesteps. This variable returns the pre-calcualted distance matrices.

xhydro.optimal_interpolation.utilities module

Utilities required for managing data in the interpolation toolbox.

xhydro.optimal_interpolation.utilities.plot_results(kge, kge_l1o, nse, nse_l1o)[source]

Generate a plot of the results of model evaluation using various metrics.

Parameters

kgearray-like

Kling-Gupta Efficiency for the entire dataset.

kge_l1oarray-like

Kling-Gupta Efficiency for leave-one-out cross-validation.

nsearray-like

Nash-Sutcliffe Efficiency for the entire dataset.

nse_l1oarray-like

Nash-Sutcliffe Efficiency for leave-one-out cross-validation.

Returns

None :

No return.

xhydro.optimal_interpolation.utilities.prepare_flow_percentiles_dataset(station_id, lon, lat, drain_area, time, percentile, discharge)[source]

Write discharge data as an xarray.Dataset.

Parameters

station_idarray-like

List of station IDs.

lonarray-like

List of longitudes corresponding to each station.

latarray-like

List of latitudes corresponding to each station.

drain_areaarray-like

List of drainage areas corresponding to each station.

timearray-like

List of datetime objects representing time.

percentilelist or None

List of percentiles or None if not applicable.

dischargenumpy.ndarray

3D array of discharge data, dimensions (percentile, station, time).

Returns

xr.Dataset :

The dataset containing the flow percentiles as generated by the optimal interpolation code.

Notes

  • The function creates and returns an xarray Dataset using the provided data.

  • The function includes appropriate metadata and attributes for each variable.