xhydro.optimal_interpolation package¶

Optimal Interpolation module.

Submodules¶

xhydro.optimal_interpolation.ECF_climate_correction module¶

Empirical Covariance Function variogram calibration package.

xhydro.optimal_interpolation.ECF_climate_correction.calculate_ECF_stats(distance: ndarray, covariance: ndarray, covariance_weights: ndarray, valid_heights: ndarray) → tuple[source]¶

Calculate statistics for Empirical Covariance Function (ECF), climatological version.

Uses the histogram data from all previous days and reapplies the same steps, but inputs are of size (timesteps x variogram_bins). So if we use many days to compute the histogram bins, we get a histogram per day. This function generates a single output from a new histogram.

Parameters¶

distancenp.ndarray: Array of distances.
covariancenp.ndarray: Array of covariances.
covariance_weightsnp.ndarray: Array of weights for covariances.
valid_heightsnp.ndarray: Array of valid heights.

Returns¶

tuple: A tuple containing the following: - h_b: Array of mean distances for each height bin. - cov_b: Array of weighted average covariances for each height bin. - std_b: Array of standard deviations for each height bin.

xhydro.optimal_interpolation.ECF_climate_correction.correction(da_qobs: DataArray, da_qsim: DataArray, centroid_lon_obs: ndarray, centroid_lat_obs: ndarray, variogram_bins: int = 10, form: int = 3, hmax_divider: float = 2.0, p1_bnds: list = None, hmax_mult_range_bnds: list = None) → tuple[source]¶

Perform correction on flow observations using optimal interpolation.

Parameters¶

da_qobsxr.DataArray: An xarray DataArray of observed flow data.
da_qsimxr.DataArray: An xarray DataArray of simulated flow data.
centroid_lon_obsnp.ndarray: Longitude vector of the catchment centroids for the observed stations.
centroid_lat_obsnp.ndarray: Latitude vector of the catchment centroids for the observed stations.
variogram_binsint, optional: Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.
formint: The form of the ECF equation to use (1, 2, 3 or 4. See Notes below).
hmax_dividerfloat: Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.
p1_bndslist, optional: The lower and upper bounds of the parameters for the first parameter of the ECF equation for variogram fitting. Defaults to [0.95, 1.0].
hmax_mult_range_bndslist, optional: The lower and upper bounds of the parameters for the second parameter of the ECF equation for variogram fitting. It is multiplied by “hmax”, which is calculated to be the threshold limit for the variogram sill. Defaults to [0.05, 3.0].

Returns¶

tuple: A tuple containing the following: - ecf_fun: Partial function for the error covariance function. - par_opt: Optimized parameters for the interpolation.

Notes¶

The possible forms for the ecf function fitting are as follows:

Form 1 (From Lachance-Cloutier et al. 2017; and Garand & Grassotti 1995) :: ecf_fun = par[0] * (1 + h / par[1]) * np.exp(-h / par[1])
Form 2 (Gaussian form) :: ecf_fun = par[0] * np.exp(-0.5 * np.power(h / par[1], 2))
Form 3 :: ecf_fun = par[0] * np.exp(-h / par[1])
Form 4 :: ecf_fun = par[0] * np.exp(-(h ** par[1]) / par[0])

xhydro.optimal_interpolation.ECF_climate_correction.eval_covariance_bin(distances: ndarray, values: ndarray, hmax_divider: float = 2.0, variogram_bins: int = 10)[source]¶

Evaluate the covariance of a binomial distribution.

Parameters¶

distancesnp.ndarray: Array of distances for each data point.
valuesnp.ndarray: Array of values corresponding to each data point.
hmax_dividerfloat: Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.
variogram_binsint, optional: Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.

Returns¶

tuple: Arrays for heights, covariance, standard deviation, row length.

xhydro.optimal_interpolation.ECF_climate_correction.general_ecf(h: ndarray, par: list, form: int)[source]¶

Define the form of the Error Covariance Function (ECF) equations.

Parameters¶

hfloat or array: The distance or distances at which to evaluate the ECF.
parlist: Parameters for the ECF equation.
formint: The form of the ECF equation to use (1, 2, 3 or 4). See correction() for details.

Returns¶

float or array:: The calculated ECF values based on the specified form.

xhydro.optimal_interpolation.ECF_climate_correction.initialize_stats_variables(heights: ndarray, covariances: ndarray, standard_deviations: ndarray, variogram_bins: int = 10) → tuple[source]¶

Initialize variables for statistical calculations in an Empirical Covariance Function (ECF).

Parameters¶

heightsnp.ndarray: Array of heights.
covariancesnp.ndarray: Array of covariances.
standard_deviationsnp.ndarray: Array of standard deviations.
variogram_binsint, optional: Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.

Returns¶

tuple: A tuple containing the following: - distance: Array of distances. - covariance: Array of covariances. - covariance_weights: Array of weights for covariances. - valid_heights: Array of valid heights.

xhydro.optimal_interpolation.compare_result module¶

Compare results between simulations and observations.

xhydro.optimal_interpolation.compare_result.compare(qobs: Dataset, qsim: Dataset, flow_l1o: Dataset, station_correspondence: Dataset, observation_stations: list, percentile_to_plot: int = 50, show_comparison: bool = True)[source]¶

Start the computation of the comparison method.

Parameters¶

qobsxr.Dataset: Streamflow and catchment properties dataset for observed data.
qsimxr.Dataset: Streamflow and catchment properties dataset for simulated data.
flow_l1oxr.Dataset: Streamflow and catchment properties dataset for simulated leave-one-out cross-validation results.
station_correspondencexr.Dataset: Matching between the tag in the simulated files and the observed station number for the obs dataset.
observation_stationslist: Observed hydrometric dataset stations to be used in the cross-validation step.
percentile_to_plotint: Percentile value to plot (default is 50).
show_comparisonbool: Whether to display the comparison plots (default is True).

xhydro.optimal_interpolation.optimal_interpolation_fun module¶

Package containing the optimal interpolation functions.

xhydro.optimal_interpolation.optimal_interpolation_fun.execute_interpolation(qobs: Dataset, qsim: Dataset, station_correspondence: Dataset, observation_stations: list, ratio_var_bg: float = 0.15, percentiles: list[float] | None = None, variogram_bins: int = 10, parallelize: bool = False, max_cores: int = 1, leave_one_out_cv: bool = False, form: int = 3, hmax_divider: float = 2.0, p1_bnds: list = None, hmax_mult_range_bnds: list = None)[source]¶

Run the interpolation algorithm for leave-one-out cross-validation or operational use.

Parameters¶

qobsxr.Dataset: Streamflow and catchment properties dataset for observed data.
qsimxr.Dataset: Streamflow and catchment properties dataset for simulated data.
station_correspondencexr.Dataset: Correspondence between the tag in the simulated files and the observed station number for the obs dataset.
observation_stationslist: Observed hydrometric dataset stations to be used in the ECF function building and optimal interpolation application step.
ratio_var_bgfloat: Ratio for background variance (default is 0.15).
percentileslist(float), optional: List of percentiles to analyze (default is [25.0, 50.0, 75.0, 100.0]).
variogram_binsint, optional: Number of bins to split the data to fit the semi-variogram for the ECF. Defaults to 10.
parallelizebool: Execute the profiler in parallel or in series (default is False).
max_coresint: Maximum number of cores to use for parallel processing.
leave_one_out_cvbool: Flag to determine if the code should be run in leave-one-out cross-validation (True) or should be applied operationally (False).
formint: The form of the ECF equation to use (1, 2, 3 or 4. See documentation).
hmax_dividerfloat: Maximum distance for binning is set as hmax_divider times the maximum distance in the input data. Defaults to 2.
p1_bndslist, optional: The lower and upper bounds of the parameters for the first parameter of the ECF equation for variogram fitting. Defaults to [0.95, 1].
hmax_mult_range_bndslist, optional: The lower and upper bounds of the parameters for the second parameter of the ECF equation for variogram fitting. It is multiplied by “hmax”, which is calculated to be the threshold limit for the variogram sill. Defaults to [0.05, 3].

Returns¶

flow_quantileslist: A list containing the flow quantiles for each desired percentile.
dsxr.Dataset: An xarray dataset containing the flow quantiles and all the associated metadata.

xhydro.optimal_interpolation.optimal_interpolation_fun.optimal_interpolation(lat_obs: ndarray, lon_obs: ndarray, lat_est: ndarray, lon_est: ndarray, ecf: partial, bg_var_obs: ndarray, bg_var_est: ndarray, var_obs: ndarray, bg_departures: ndarray, bg_est: ndarray, precalcs: dict)[source]¶

Perform optimal interpolation to estimate values at specified locations.

Parameters¶

lat_obsnp.ndarray: Vector of latitudes of the observation stations catchment centroids.
lon_obsnp.ndarray: Vector of longitudes of the observation stations catchment centroids.
lat_estnp.ndarray: Vector of latitudes of the estimation/simulation stations catchment centroids.
lon_estnp.ndarray: Vector of longitudes of the estimation/simulation stations catchment centroids.
ecfpartial: The function to use for the empirical distribution correction. It is a partial function from functools. The error covariance is a function of distance h, and this partial function represents this relationship.
bg_var_obsnp.ndarray: Background field variance at the observation stations (vector of size “observation stations”).
bg_var_estnp.ndarray: Background field variance at estimation sites (vector of size “estimation stations”).
var_obsnp.ndarray: Observation variance at observation sites (vector of size “observation stations”).
bg_departuresnp.ndarray: Difference between observation and background field at observation sites (vector of size “observation stations”).
bg_estnp.ndarray: Background field values at estimation sites (vector of size “estimation stations”).
precalcsdict: Additional arguments and state information for the interpolation process, to accelerate calculations between timesteps.

Returns¶

v_estfloat: Estimated values at the estimation sites (vector of size “estimation stations”).
var_estfloat: Estimated variance at the estimation sites (vector of size “estimation stations”).
precalcsdict: Additional arguments and state information for the interpolation process, to accelerate calculations between timesteps. This variable returns the pre-calcualted distance matrices.

xhydro.optimal_interpolation.utilities module¶

Utilities required for managing data in the interpolation toolbox.

xhydro.optimal_interpolation.utilities.plot_results(kge, kge_l1o, nse, nse_l1o)[source]¶

Generate a plot of the results of model evaluation using various metrics.

Parameters¶

kgearray-like: Kling-Gupta Efficiency for the entire dataset.
kge_l1oarray-like: Kling-Gupta Efficiency for leave-one-out cross-validation.
nsearray-like: Nash-Sutcliffe Efficiency for the entire dataset.
nse_l1oarray-like: Nash-Sutcliffe Efficiency for leave-one-out cross-validation.

Returns¶

None :: No return.

xhydro.optimal_interpolation.utilities.prepare_flow_percentiles_dataset(station_id, lon, lat, drain_area, time, percentile, discharge)[source]¶

Write discharge data as an xarray.Dataset.

Parameters¶

station_idarray-like: List of station IDs.
lonarray-like: List of longitudes corresponding to each station.
latarray-like: List of latitudes corresponding to each station.
drain_areaarray-like: List of drainage areas corresponding to each station.
timearray-like: List of datetime objects representing time.
percentilelist or None: List of percentiles or None if not applicable.
dischargenumpy.ndarray: 3D array of discharge data, dimensions (percentile, station, time).

Returns¶

xr.Dataset :: The dataset containing the flow percentiles as generated by the optimal interpolation code.

Notes¶

The function creates and returns an xarray Dataset using the provided data.
The function includes appropriate metadata and attributes for each variable.