xhydro.modelling package

The Hydrotel Hydrological Model module.

Submodules

xhydro.modelling._hm module

Hydrological model class.

class xhydro.modelling._hm.HydrologicalModel[source]

Bases: ABC

Hydrological model class.

This class is a wrapper for the different hydrological models that can be used in xhydro.

abstractmethod get_inputs(**kwargs) Dataset[source]

Get the input data for the hydrological model.

Parameters

**kwargsdict

Additional keyword arguments for the hydrological model.

Returns

xr.Dataset

Input data for the hydrological model, in xarray Dataset format.

abstractmethod get_streamflow(**kwargs) Dataset[source]

Get the simulated streamflow data from the hydrological model.

Parameters

**kwargsdict

Additional keyword arguments for the hydrological model.

Returns

xr.Dataset

Input data for the hydrological model, in xarray Dataset format.

abstractmethod run(**kwargs) Dataset[source]

Run the hydrological model.

Parameters

**kwargsdict

Additional keyword arguments for the hydrological model.

Returns

xr.Dataset

Simulated streamflow from the hydrological model, in xarray Dataset format.

xhydro.modelling._hydrotel module

Class to handle Hydrotel simulations.

class xhydro.modelling._hydrotel.Hydrotel(project_dir: str | PathLike, project_file: str, executable: str | PathLike, *, project_config: dict | None = None, simulation_config: dict | None = None, output_config: dict | None = None)[source]

Bases: HydrologicalModel

Class to handle HYDROTEL simulations.

Parameters

project_dirstr or Path

Path to the project folder.

project_filestr

Name of the project file (e.g. ‘projet.csv’).

executablestr or Path

Command to execute HYDROTEL. On Windows, this should be the path to hydrotel.exe.

project_configdict, optional

Dictionary of configuration options to overwrite in the project file.

simulation_configdict, optional

Dictionary of configuration options to overwrite in the simulation file. See the Notes section for more details.

output_configdict, optional

Dictionary of configuration options to overwrite in the output file (output.csv).

Notes

The name of the simulation file must match the name of the ‘SIMULATION COURANTE’ option in the project file.

This class is designed to handle the execution of HYDROTEL simulations, with the ability to overwrite configuration options, but it does not handle the creation of the project folder itself. The project folder must be created beforehand.

For more information on how to configure the project, refer to the documentation of HYDROTEL: https://github.com/INRS-Modelisation-hydrologique/hydrotel

aggregate_outputs(to: Literal['subbasin', 'drainage_area'], subset: list[str] | None = None, **kwargs) None[source]

Aggregate the model outputs to a different spatial unit. See the Notes section for more details.

Parameters

to{“subbasin”, “drainage_area”}

The spatial unit to aggregate to.

subsetlist[str] | None

The list of variables to aggregate. If None, all variables will be processed. The strings should match the names produced by the HYDROTEL model.

**kwargsdict

Keyword arguments to pass to xarray.open_dataset().

Returns

None

The aggregated outputs will be saved as new NetCDF files in the output directory, with a name pattern roughly following what is produced by HYDROTEL (e.g. “variable}_By{aggregation}.nc”). Aggregation will be ‘BySubbasin’ or ‘ByDrainageArea’, depending on the ‘to’ parameter.

Notes

Unlike Raven, HYDROTEL always produces output files at the RHHU level, which is the finest spatial unit in the model. Therefore, unlike its Raven variant, this method does not need a ‘by’ parameter to specify the spatial unit of the input files. Furthermore, this method expects that the ‘standardize_outputs’ method has been called beforehand to ensure that the output files are in a consistent format and contain the necessary spatial information for the aggregation.

get_inputs(subset_time: bool = False, return_config=False, **kwargs) Dataset | tuple[Dataset, dict][source]

Get the weather file from the simulation.

Parameters

subset_timebool

If True, only return the weather data for the time period specified in the simulation configuration file.

return_configbool

Whether to return the configuration file as well. If True, returns a tuple of (dataset, configuration).

**kwargsdict

Keyword arguments to pass to xarray.open_dataset().

Returns

xr.Dataset

If ‘return_config’ is False, returns the weather file.

Tuple[xr.Dataset, dict]

If ‘return_config’ is True, returns the weather file and its configuration.

get_outputs(output: str, return_paths: bool = False, **kwargs) Dataset | Path | list[Path][source]

Get the outputs of the simulation.

Parameters

outputstr

“path” to return the output directory. Otherwise, the name of the output to retrieve, or “q” for the streamflow. This should match the name of the output file without the extension (e.g. “neige” for “neige.nc”).

return_pathsbool

If True, return the path to the output file(s) instead of the dataset. Default is False.

**kwargsdict

Keyword arguments to pass to xarray.open_dataset().

Returns

xr.Dataset

The requested output variable.

Path

The path to the output directory if output is set to “path”.

list[Path]

The path to the output file(s) if return_path is True.

get_streamflow(**kwargs) Dataset[source]

Get the streamflow from the simulation.

Parameters

**kwargsdict

Keyword arguments to pass to xarray.open_dataset().

Returns

xr.Dataset

The streamflow file.

get_watershed_properties()[source]

Retrieve the properties of the watershed from the input files and store them in the class attributes for later use.

It is assumed that the properties of the RHHUs are created by Physitel and follow the standard HYDROTEL structure. See https://github.com/INRS-Modelisation-hydrologique/hydrotel/tree/main/Docs for more information on the input files.

run(*, run_options: list[str] | None = None, dry_run: bool = False, overwrite: bool = False, standardize: bool = True, return_streamflow: bool = True) str | Dataset[source]

Run the simulation.

Parameters

run_optionslist[str] | None

Additional options to pass to the HYDROTEL executable. Common arguments include: - -t NUM: Run the simulation using a given number of threads (default is 1). - -c: Skip the validation of the input files. - -s: Skip the interpolation of missing values in the input files. Only use this if you are sure that the input files are complete. Call the executable without arguments to see the full list of available options.

dry_runbool

If True, returns the command to run the simulation without actually running it.

overwritebool

If True, overwrite the output files if they already exist. Default is False.

standardizebool

If True, standardize the output files to ensure they are in a consistent format. Default is True.

return_streamflowbool

If True, return the simulated streamflow. Default is True.

Returns

str

The command to run the simulation, if ‘dry_run’ is True.

xr.Dataset

The streamflow file, if ‘dry_run’ is False.

standardize_outputs(files: list[str] | None = None, **kwargs)[source]

Standardize the outputs of the simulation to be more consistent with CF conventions.

Parameters

fileslist[str] | None

Names of the output files to standardize. If None, all output files will be standardized. The strings can be part of the file name (e.g. “devil_aval”, “neige”, “debit*”, etc.).

**kwargsdict

Keyword arguments to pass to xarray.open_dataset().

Notes

Be aware that since systems such as Windows do not allow to overwrite files that are currently open, a temporary file will be created and then renamed to overwrite the original file.

update_config(*, project_config: dict | None = None, simulation_config: dict | None = None, output_config: dict | None = None)[source]

Update the configuration options in the project, simulation, and output files.

Parameters

project_configdict, optional

Dictionary of configuration options to overwrite in the project file.

simulation_configdict, optional

Dictionary of configuration options to overwrite in the simulation file.

output_configdict, optional

Dictionary of configuration options to overwrite in the output file (output.csv).

xhydro.modelling._model_utils module

Hidden utilities for HYDROTEL and RavenPy models.

xhydro.modelling._model_utils.aggregate_output(ds: Dataset, by: Literal['hru', 'rhhu', 'unit', 'subbasin'], to: Literal['subbasin', 'drainage_area'], weights: DataArray | None = None) tuple[Dataset, DataArray][source]

Aggregate the model outputs to a different spatial unit. See the Notes section for more details.

Parameters

dsxr.Dataset

The dataset to aggregate. The ‘standardize_outputs’ method must have been called on this dataset beforehand.

by{“hru”, “rhhu”, “unit”, “subbasin”}

The spatial unit to aggregate from. “unit” is the generic term for either “hru” or “rhhu”, depending on the hydrological model used.

to{“subbasin”, “drainage_area”}

The spatial unit to aggregate to.

weightsxr.DataArray, optional

The weights to use for the aggregation. If None, the method will compute them based on the drainage area of the units.

Returns

xr.Dataset

The aggregated dataset.

xr.DataArray

The weights used for the aggregation.

xhydro.modelling._model_utils.standardize_output(ds, spatial_info: DataFrame | None = None, alt_names: dict[str, str] | None = None) Dataset[source]

Standardize the output dataset by renaming dimensions and variables, adding relevant coordinates, and correcting attributes.

Parameters

dsxr.Dataset

The dataset to standardize.

spatial_infopd.DataFrame | None, optional

A dataframe containing the spatial information of the model (RavenpyModel.hru[“hru”] or Hydrotel.rhhu).

alt_namesdict[str, str] | None, optional

A dictionary mapping original variable names to their standardized names.

Returns

xr.Dataset

The standardized dataset.

xhydro.modelling._ravenpy_models module

Implement the ravenpy handler class for emulating raven models in ravenpy.

class xhydro.modelling._ravenpy_models.RavenpyModel(overwrite: bool = False, *, workdir: str | PathLike | None = None, executable: str | PathLike | None = None, run_name: str | None = None, model_name: Literal['Blended', 'GR4JCN', 'HBVEC', 'HMETS', 'HYPR', 'Mohyse', 'SACSMA'] | None = None, start_date: datetime | str | None = None, end_date: datetime | str | None = None, parameters: ndarray | list[float] | None = None, qobs_file: PathLike | str | None = None, alt_name_flow: str | None = 'q', hru: GeoDataFrame | dict | PathLike | str | None = None, output_subbasins: Literal['all', 'qobs'] | list[int] | None = None, minimum_reservoir_area: str | None = None, meteo_file: PathLike | str | None = None, data_type: list[str] | None = None, alt_names_meteo: dict | None = None, meteo_station_properties: dict | None = None, gridweights: str | PathLike | None = None, **kwargs)[source]

Bases: HydrologicalModel

Initialize the RavenPy model class.

Parameters

overwritebool

If True, overwrite the existing project files. Default is False.

workdirstr | Path | None

Path to save the .rv files and model outputs. Default is None, which creates a temporary directory.

executablestr | os.PathLike | None, optional

Path to the Raven executable, bypassing RavenPy. If None (default), the Raven executable from your current Python environment (‘raven-hydro’) will be used.

run_namestr, optional

Name of the run, which will be used to name the project files. Defaults to “raven” if not provided.

model_name{“Blended”, “GR4JCN”, “HBVEC”, “HMETS”, “HYPR”, “Mohyse”, “SACSMA”}, optional

The name of the RavenPy model to run. Only optional if the project files already exist.

start_datedt.datetime | str, optional

The first date of the simulation. Only optional if the project files already exist.

end_datedt.datetime | str, optional

The last date of the simulation. Only optional if the project files already exist.

parametersnp.ndarray | list[float], optional

The model parameters for simulation or calibration. Only optional if the project files already exist.

qobs_filestr | Path, optional

Path to the file containing the observed streamflow data. If there are multiple stations, the file should contain a ‘basin_id’ variable that identifies the subbasin for each time series. If a ‘station_id’ variable is present, it will be used to identify the station.

alt_name_flowstr, optional

Name of the streamflow variable in the observed data file. If not provided, it will be assumed to be “q”.

hrugpd.GeoDataFrame | dict | os.PathLike, optional

A GeoDataFrame, or dictionary containing the HRU properties. Only optional if the project files already exist. For distributed models, it should be readable by ravenpy.extractors.BasinMakerExtractor. For lumped models, should contain the following variables: - area: The watershed drainage area, in km². - elevation: The elevation of the watershed, in meters. - latitude: The latitude of the watershed centroid. - longitude: The longitude of the watershed centroid. - HRU_ID: The ID of the HRU (required for gridded data, optional for station data). If the meteorological data is gridded, the HRU dataset must also contain a SubId, DowSubId, valid geometry and crs. If the input is modified, a new shapefile will be created in the workdir/weights subdirectory.

output_subbasins{“all”, “qobs”} | list[int] | None, optional

If “all”, all subbasins will be outputted. If “qobs”, only the subbasins with observed flow will be outputted. Leave as None to use the value as defined in the HRU file (‘Has_Gauge’ column). Only applicable for distributed HBVEC models.

minimum_reservoir_areastr, optional

Quantified string (e.g. “20 km2”) representing the minimum lake area to consider the lake explicitly as a reservoir. If not provided, all lakes with the ‘HRU_IsLake’ column set to 1 in the HRU file will be considered as reservoirs. Note that ‘reservoirs’ in Raven can also refer to natural lakes with weir-like outflows. Only applicable for distributed HBVEC models.

meteo_filestr | Path, optional

Path to the file containing the observed meteorological data. Only optional if the project files already exist. The meteorological data can be either station or gridded data. Use the ‘xhydro.modelling.format_input’ function to ensure the data is in the correct format. Unless the input is a single station accompanied by ‘meteo_station_properties’, the file should contain the following coordinates: - elevation: The elevation of the station / grid cell, in meters. - latitude: The latitude of the station / grid cell centroid. - longitude: The longitude of the station / grid cell centroid.

data_typelist[str], optional

The list of types of data provided to Raven in the meteorological file. Only optional if the project files already exist. See https://github.com/CSHS-CWRA/RavenPy/blob/master/src/ravenpy/config/conventions.py for the list of available types.

alt_names_meteodict, optional

A dictionary that allows users to link the names of meteorological variables in their dataset to Raven-compliant names. The keys should be the Raven names as listed in the data_type parameter.

meteo_station_propertiesdict, optional

Additional properties of the weather stations providing the meteorological data. Only required if absent from the ‘meteo_file’. For single stations, the format is {“ALL”: {“elevation”: elevation, “latitude”: latitude, “longitude”: longitude}}. This has not been tested for multiple stations or gridded data.

gridweightsstr | Path | None

If using gridded meteorological data, path to a text file containing the weights linking the grid cells to the HRUs. If None, the weights will be computed using ravenpy.extractors.GridWeightExtractor and saved in a ‘weights’ subdirectory of the project folder, using a “{meteo_file}_vs_{hru_file}_weights.txt” pattern.

**kwargsdict, optional

Additional parameters to pass to the RavenPy emulator, to modify the default modules used by a given hydrological model. Typical entries include RainSnowFraction, Evaporation, GlobalParameters, etc. See https://raven.uwaterloo.ca/Downloads.html for the latest Raven documentation. Currently, model templates are listed in Appendix F.

aggregate_outputs(by: Literal['hru', 'unit', 'subbasin'], to: Literal['subbasin', 'drainage_area'], subset: list[str] | None = None, **kwargs) None[source]

Aggregate the model outputs to a different spatial unit. See the Notes section for more details.

Parameters

by{“hru”, “unit”, “subbasin”}

The spatial unit to aggregate from. “unit” is the generic term for “hru”.

to{“subbasin”, “drainage_area”}

The spatial unit to aggregate to.

subsetlist[str] | None

The list of variables to aggregate. If None, all variables will be processed. The strings should match the names produced by the Raven model, typically found under “:CustomOutput” in the .rvi file.

**kwargsdict

Keyword arguments to pass to xarray.open_dataset().

Returns

None

The aggregated outputs will be saved as new NetCDF files in the output directory, with a name pattern following what is produced by the Raven model (e.g. “{run_name}_variable}_By{aggregation}.nc”). Aggregation will be ‘ByHRU’, ‘BySubbasin’, or ‘ByDrainageArea’, depending on the ‘to’ parameter. If a file with the same name already exists, a new file will be saved with a “_v{n}” suffix.

Notes

This method expects that relevant spatial information has been provided to the RavenPy model, either through the initial configuration or through the update_data method. Furthermore, that spatial information should be consistent with ravenpy.extractors.BasinMakerExtractor expectations, as well as the Data Specifications of Basin Maker (https://hydrology.uwaterloo.ca/basinmaker/) and the outputs of BasinMaker’s Generate_HRUs function. In particular, the following variables should be present in the HRU file:

  • Always:
    • SubId: The ID of the subbasins.

    • BasArea: The area of the subbasins.

  • by == “hru”:
    • HRU_ID: The ID of the HRUs.

    • HRU_Area: The area of the HRUs, in units consistent with the area of the subbasins.

  • to == “drainage_area”:
    • DowSubId: The ID of the downstream subbasin for each HRU.

create_rv(*, overwrite: bool = False)[source]

Write the RavenPy project files.

Parameters

overwritebool

If True, overwrite the existing project files. Default is False. Note that to prevent inconsistencies, all files containing the ‘run_name’ will be removed, including the output files.

get_inputs(subset_time: bool = False, **kwargs) Dataset[source]

Return the inputs used to run the Raven model.

Parameters

subset_timebool

If True, only return the weather data for the time period specified in the configuration file.

**kwargsdict

Keyword arguments to pass to xarray.open_mfdataset().

Returns

xr.Dataset

The meteorological data used to run the Raven model simulation.

get_outputs(output: str, return_paths: bool = False, **kwargs) Dataset | Path | list[Path][source]

Return the outputs of the Raven model.

Parameters

outputstr

“path” to return the output directory. “q” to only return the streamflow variable. Alternatively, a string matching the name of the output file to return (e.g. “Hydrographs”, “Storage”, “ByHRU”, etc.).

return_pathsbool

If True, return the path to the output file(s) instead of the dataset. Default is False.

**kwargsdict

Keyword arguments to pass to xarray.open_dataset().

Returns

xr.Dataset

The requested output variable.

Path

The path to the output directory if output is set to “path”.

list[Path]

The path to the output file(s) if return_path is True.

get_streamflow(output: Literal['q', 'all', 'path'] = 'q', **kwargs) Dataset | Path[source]

Return the simulated streamflow from the Raven model.

Parameters

output{“q”, “all”, “path”}

The type of output to return. If “q”, return only the streamflow variable. If “all”, return the entire hydrograph dataset. If “path”, return the path to the streamflow file.

**kwargsdict

Keyword arguments to pass to xarray.open_dataset().

Returns

xr.Dataset

The streamflow file.

Path

The path to the streamflow file if output is set to “path”.

run(*, overwrite: bool = False, standardize: bool = True, return_streamflow: bool = True) Dataset | None[source]

Run the Raven hydrological model and return simulated streamflow.

Parameters

overwritebool

If True, overwrite the existing output files. Default is False.

standardizebool

If True, standardize the output files to ensure they are in a consistent format. Default is True.

return_streamflowbool

If True, return the simulated streamflow. Default is True.

Returns

xr.Dataset

The simulated streamflow.

standardize_outputs(files: list[str] | None = None, **kwargs)[source]

Standardize the outputs of the simulation to be more consistent with CF conventions.

Parameters

fileslist[str] | None

Names of the output files to standardize. If None, all output files will be standardized. The strings can be part of the file name (e.g. “Hydrographs”, “Storage”, “ByHRU”, etc.).

**kwargsdict

Keyword arguments to pass to xarray.open_dataset().

Notes

Be aware that since systems such as Windows do not allow to overwrite files that are currently open, a temporary file will be created and then renamed to overwrite the original file.

update_config(*, rvi_dates: bool = False, rvi_commands: list[str] | None = None, rvt: bool = False, rvh: bool = False) None[source]

Manually update some aspects of the configuration of the RavenPy model.

Parameters

rvi_datesbool

If True, update the .rvi file with the ‘start_date’ and ‘end_date’ defined in the model.

rvi_commandslist[str] | None

A list of commands to include in the .rvi file. If None, no additional commands will be added. Warning: These commands will be added at the end of the .rvi file, with no checks. Use with caution.

rvtbool

If True, update the .rvt file with the meteorological data and observed streamflow data defined in the model.

rvhbool

If True, update the .rvh file with the list of subbasins to output. Nothing else will be changed in that file.

Notes

Ideally, users should favor using the update_data method to update the model configuration, then call the create_rv method to recreate the project files from scratch. This method assumes that the changes brought to the model configuration are minimal, such as wanting to change the meteorological data or the simulation start and end dates.

Be aware that:
  • The .rvh will be rewritten entirely. If multiple sources of data were mentioned, such as both meteorological and observed streamflow data, all of them must be included in the RavenpyModel instance.

  • If the meteorological data is gridded, new weights will be computed using the HRU file in the RavenpyModel instance. If that HRU file is different from the one used to create the original .rvh file, it may lead to inconsistencies or errors.

  • Similarly, only the list of subbasins to output will be modified in the new .rvh file. Any additional changes to the HRU or other components might also lead to inconsistencies or errors.

A backup of the original files will be created before any modifications are made.

update_data(*, qobs_file: PathLike | str | None = None, alt_name_flow: str | None = 'q', hru: GeoDataFrame | dict | PathLike | str | None = None, output_subbasins: Literal['all', 'qobs'] | list[int] | None = None, minimum_reservoir_area: str | None = None, meteo_file: PathLike | str | None = None, data_type: list[str] | None = None, alt_names_meteo: dict | None = None, meteo_station_properties: dict | None = None, gridweights: str | PathLike | None = None)[source]

Update the model configuration with new observed data (self.qobs), HRU properties (self.hru), or meteorological data (self.meteo).

Parameters

qobs_fileos.PathLike | str

Path to the NetCDF file containing the observed streamflow data. If there are multiple stations, the file should contain a ‘basin_id’ variable that identifies the subbasin for each time series. If a ‘station_id’ variable is present, it will be used to identify the station.

alt_name_flowstr, optional

Alternative name for the streamflow variable in the observed data.

hrugpd.GeoDataFrame | dict | os.PathLike | str

A GeoDataFrame, or dictionary containing the HRU properties. Alternatively, a path to a shapefile containing the HRU properties. For distributed models, it should be readable by ravenpy.extractors.BasinMakerExtractor. For lumped models, should contain the following variables: - area: The watershed drainage area, in km². - elevation: The elevation of the watershed, in meters. - latitude: The latitude of the watershed centroid. - longitude: The longitude of the watershed centroid. - HRU_ID: The ID of the HRU (required for gridded data, optional for station data). If the meteorological data is gridded, the HRU dataset must also contain a SubId, DowSubId, valid geometry and crs. If the input is modified, a new shapefile will be created in the workdir/weights subdirectory.

output_subbasins{“all”, “qobs”} | list[int] | None, optional

If “all”, all subbasins will be outputted. If “qobs”, subbasins with observed flow will be outputted, as defined by the basin IDs in the observed streamflow data. If a list of integers is provided, it should contain the basin IDs to output. Leave as None to use the value as defined in the HRU file (‘Has_Gauge’ column).

minimum_reservoir_areastr, optional

Quantified string (e.g. “20 km2”) representing the minimum lake area to consider the lake explicitly as a reservoir. If not provided, all lakes with the ‘HRU_IsLake’ column set to 1 in the HRU file will be considered as reservoirs. Note that ‘reservoirs’ in Raven can also refer to natural lakes with weir-like outflows. Only applicable for distributed HBVEC models.

meteo_filestr | Path, optional

Path to the file containing the observed meteorological data. Only optional if the project files already exist. The meteorological data can be either station or gridded data. Use the ‘xhydro.modelling.format_input’ function to ensure the data is in the correct format. Unless the input is a single station accompanied by ‘meteo_station_properties’, the file should contain the following coordinates: - elevation: The elevation of the station / grid cell, in meters. - latitude: The latitude of the station / grid cell centroid. - longitude: The longitude of the station / grid cell centroid.

data_typelist[str], optional

The list of types of data provided to Raven in the meteorological file. Only optional if the project files already exist. See https://github.com/CSHS-CWRA/RavenPy/blob/master/src/ravenpy/config/conventions.py for the list of available types.

alt_names_meteodict, optional

A dictionary that allows users to link the names of meteorological variables in their dataset to Raven-compliant names. The keys should be the Raven names as listed in the data_type parameter.

meteo_station_propertiesdict, optional

Additional properties of the weather stations providing the meteorological data. Only required if absent from the ‘meteo_file’. For single stations, the format is {“ALL”: {“elevation”: elevation, “latitude”: latitude, “longitude”: longitude}}. This has not been tested for multiple stations or gridded data.

gridweightsstr | Path | None

If using gridded meteorological data, path to a text file containing the weights linking the grid cells to the HRUs. If None, the weights will be computed using ravenpy.extractors.GridWeightExtractor and saved in a ‘weights’ subdirectory of the project folder, using a “{meteo_file}_vs_{hru_file}_weights.txt” pattern.

Notes

If the meteorological data is gridded, new weights will be computed using the HRU file in the RavenpyModel instance and saved in a ‘weights’ subdirectory of the project folder, under the name ‘meteo-name_vs_hru-name.txt’.

xhydro.modelling.calibration module

Calibration package for hydrological models.

This package contains the main framework for hydrological model calibration. It uses the spotpy calibration package applied on a “model_config” object. This object is meant to be a container that can be used as needed by any hydrologic model. For example, it can store datasets directly, paths to datasets (nc files or other), csv files, basically anything that can be stored in a dictionary.

It then becomes the user’s responsibility to ensure that required data for a given model be provided in the model_config object both in the data preparation stage and in the hydrological model implementation. This can be addressed by a set of pre-defined codes for given model structures.

For example, for GR4J, only small datasets are required and can be stored directly in the model_config dictionary. However, for Hydrotel or Raven models, maybe it is better to pass paths to netcdf files which can be passed to the models. This will require pre- and post-processing, but this can easily be handled at the stage where we create a hydrological model and prepare the data.

The calibration aspect then becomes trivial:

  1. A model_config object is passed to the calibrator.

  2. Lower and upper bounds for calibration parameters are defined and passed

  3. An objective function, optimizer and hyperparameters are also passed.

  4. The calibrator uses this information to develop parameter sets that are then passed as inputs to the “model_config” object.

  5. The calibrator launches the desired hydrological model with the model_config object (now containing the parameter set) as input.

  6. The appropriate hydrological model function then parses “model_config”, takes the parameters and required data, launches a simulation and returns simulated flow (Qsim).

  7. The calibration package then compares Qobs and Qsim and computes the objective function value, and returns this to the sampler that will then repeat the process to find optimal parameter sets.

  8. The code returns the best parameter set, objective function value, and we also return the simulated streamflow on the calibration period for user convenience.

This system has the advantage of being extremely flexible, robust, and efficient as all data can be either in-memory or only the reference to the required datasets on disk is passed around the callstack.

Currently, the model_config object has 3 mandatory keywords for the package to run correctly in all instances:

  • model_config[“Qobs”]: Contains the observed streamflow used as the

    calibration target.

  • model_config[“model_name”]: Contains a string referring to the

    hydrological model to be run.

  • model_config[“parameters”]: While not necessary to provide this, it is

    a reserved keyword used by the optimizer.

Any comments are welcome!

xhydro.modelling.calibration.perform_calibration(model_config: dict, obj_func: str, bounds_high: ndarray | list[float | int], bounds_low: ndarray | list[float | int], evaluations: int, qobs: PathLike | ndarray | Dataset | DataArray, algorithm: str = 'DDS', mask: ndarray | list[float | int] | None = None, transform: str | None = None, epsilon: float = 0.01, sampler_kwargs: dict | None = None)[source]

Perform calibration using SPOTPY.

This is the entrypoint for the model calibration. After setting-up the model_config object and other arguments, calling “perform_calibration” will return the optimal parameter set, objective function value and simulated flows on the calibration period.

Parameters

model_configdict

The model configuration object that contains all info to run the model. The model function called to run this model should always use this object and read-in data it requires. It will be up to the user to provide the data that the model requires.

obj_funcstr

The objective function used for calibrating. Can be any one of these:

  • “abs_bias”: Absolute value of the “bias” metric

  • “abs_pbias”: Absolute value of the “pbias” metric

  • “abs_volume_error”: Absolute value of the volume_error metric

  • “agreement_index”: Index of agreement

  • “correlation_coeff”: Correlation coefficient

  • “high_flow_rel_error”: High flow relative error

  • “kge”: Kling Gupta Efficiency metric (2009 version)

  • “kge_mod”: Kling Gupta Efficiency metric (2012 version)

  • “kge_2021”: Kling Gupta Efficiency metric (2021 version)

  • “lce”: Least-squares combined efficiency

  • “low_flow_rel_error”: Low flow relative error

  • “mae”: Mean Absolute Error metric

  • “mare”: Mean Absolute Relative Error metric

  • “mse”: Mean Square Error metric

  • “nse”: Nash-Sutcliffe Efficiency metric

  • “persistence_index”: Persistence index

  • “r2”: r-squared, i.e. square of correlation_coeff.

  • “rmse”: Root Mean Square Error

  • “rrmse”: Relative Root Mean Square Error (RMSE-to-mean ratio)

  • “rsr”: Ratio of RMSE to standard deviation.

  • “volumetric_efficiency”: Volumetric efficiency

bounds_highnp.array

High bounds for the model parameters to be calibrated. SPOTPY will sample parameter sets from within these bounds. The size must be equal to the number of parameters to calibrate.

bounds_lownp.array

Low bounds for the model parameters to be calibrated. SPOTPY will sample parameter sets from within these bounds. The size must be equal to the number of parameters to calibrate.

evaluationsint

Maximum number of model evaluations (calibration budget) to perform before stopping the calibration process.

qobsos.PathLike or np.ndarray or xr.Dataset or xr.DataArray

Observed streamflow dataset (or path to it), used to compute the objective function. If using a dataset, it must contain a “streamflow” variable.

algorithmstr

The optimization algorithm to use. Currently, “DDS” and “SCEUA” are available, but more can be easily added.

masknp.array, optional

A vector indicating which values to preserve/remove from the objective function computation. 0=remove, 1=preserve.

transformstr, optional

The method to transform streamflow prior to computing the objective function. Can be one of: Square root (‘sqrt’), inverse (‘inv’), or logarithmic (‘log’) transformation.

epsilonscalar float

Used to add a small delta to observations for log and inverse transforms, to eliminate errors caused by zero flow days (1/0 and log(0)). The added perturbation is equal to the mean observed streamflow times this value of epsilon.

sampler_kwargsdict

Contains the keywords and hyperparameter values for the optimization algorithm. Keywords depend on the algorithm choice. Currently, SCEUA and DDS are supported with the following default values: - SCEUA: dict(ngs=7, kstop=3, peps=0.1, pcento=0.1) - DDS: dict(trials=1)

Returns

best_parametersarray_like

The optimized parameter set.

qsimxr.Dataset

Simulated streamflow using the optimized parameter set.

bestobjffloat

The best objective function value.

xhydro.modelling.hydrological_modelling module

Hydrological modelling framework.

xhydro.modelling.hydrological_modelling.format_input(ds: Dataset, model: str, convert_calendar_missing: float | str | dict | bool = nan, save_as: str | PathLike | None = None, **kwargs) tuple[Dataset, dict][source]

Reformat CF-compliant meteorological data for use in hydrological models. See the “Notes” section for important details.

Parameters

dsxr.Dataset

A dataset containing the meteorological data. See the “Notes” section for more information on the expected format.

modelstr

The name of the hydrological model to use. Currently supported models are: - “HYDROTEL”, “Raven” (which is an alias for all RavenPy models), “Blended”, “GR4JCN”, “HBVEC”, “HMETS”, “HYPR”, “Mohyse”, “SACSMA”.

convert_calendar_missingfloat | str | dict | bool, optional

The value to use for missing values when converting the calendar to “standard”. If the value is a float, it will be used as the fill value for all variables. If the value is a string “interpolate”, the new dates will be linearly interpolated over time. A dictionary can be used to specify a different fill value for each variable. Keys should be the names of the variables as they appear in the first entry in the “variable_name” lists of the “Notes” section. If True, temperatures will be interpolated and precipitation will be filled with 0. If False, the calendar will not be converted. Only possible for “Raven” models.

save_asstr, optional

Where to save the reformatted data. If None, the data will not be saved. This can be useful when multiple files are needed for a single model run (e.g. HYDROTEL needs a configuration file).

**kwargsdict

Additional keyword arguments to pass to the save function.

Returns

xr.Dataset

The reformatted dataset.

dict

For HYDROTEL, a dictionary containing the configuration for the meteorological data. If save_as is provided, the configuration will have been saved to a file with the same name as save_as, but with a “.nc.config” extension. For Raven, a dictionary containing the ‘data_type’ and ‘alt_names_meteo’ keys required for the ‘model_config’ argument.

Notes

The input dataset should ideally be CF-compliant and follow CMIP6’s Controlled Vocabulary, but this function will attempt to detect the variables based on the standard_name attribute, the cell_methods attribute, or the variable name. More information on those attributes can be found here: https://wcrp-cmip.org/cmip-model-and-experiment-documentation/, and specifically the ‘CMIP6 MIP table’ link provided in the ‘Search for variables’ section.

Specifically:

  • If using 1D time series, the station dimension should have an attribute cf_role set to “timeseries_id”.

  • Units don’t need to be canonical, but they should be convertible to the expected units and be understood by xclim.

  • Elevation represents the altitude of the meteorological data / model grid cell, not the altitude of the ground.

  • Snowfall units should be in water equivalent of precipitation (e.g. mm/day or kg/m²/s), NOT height (e.g. cm of fresh snow on the ground).

  • The function will try to detect the variables based on the attributes and the variable name. The following attempts will be made:
    • Longitude:
      • standard_name: “longitude”

      • variable name: “longitude”, “lon”

    • Latitude:
      • standard_name: “latitude”

      • variable name: “latitude”, “lat”

    • Elevation:
      • standard_name: “surface_altitude”

      • variable name: “elevation”, “orog”, “z”, “altitude”, “height”

    • Precipitation:
      • standard_name: “precipitation” (e.g. “lwe_thickness_of_precipitation_amount”)

      • variable name: “pr”, “precip”, “precipitation”

    • Rainfall:
      • standard_name: “rainfall” (e.g. “rainfall_flux”, “rainfall_amount”)

      • variable name: “prra”, “prlp”, “rainfall”, “rain”, “precipitation_rain”

    • Snowfall:
      • standard_name: “snowfall” (e.g. “snowfall_flux”, “snowfall_amount”)

      • variable name: “prsn”, “snowfall”, “precipitation_snow”

    • Maximum temperature:
      • standard_name: “air_temperature”

      • cell_methods: “time: maximum”

      • variable name: “tasmax”, “tmax”, “t2m_max”, “temperature_max”

    • Minimum temperature:
      • standard_name: “air_temperature”

      • cell_methods: “time: minimum”

      • variable name: “tasmin”, “tmin”, “t2m_min”, “temperature_min”

    • Mean temperature:
      • standard_name: “air_temperature”

      • cell_methods: “time: mean”

      • variable name: “tas”, “tmean”, “t2m”, “temperature_mean”

HYDROTEL requires the following variables: [“longitude”, “latitude”, “elevation”, “time”, “tasmax”, “tasmin”, “pr”]. Raven requires the following variables: [“longitude”, “latitude”, “elevation”, “time”, “tasmax/tasmin” or “tas”, “pr” or “prlp/prsn”].

xhydro.modelling.hydrological_modelling.get_hydrological_model_inputs(model_name: str, required_only: bool = False) tuple[dict, str][source]

Get the required inputs for a given hydrological model.

Parameters

model_namestr

The name of the hydrological model to use. Currently supported models are [“HYDROTEL”, “Blended”, “GR4JCN”, “HBVEC”, “HMETS”, “HYPR”, “Mohyse”, “SACSMA”].

required_onlybool

If True, only the required inputs will be returned.

Returns

dict

A dictionary containing the required configuration for the hydrological model.

str

The documentation for the hydrological model.

xhydro.modelling.hydrological_modelling.hydrological_model(model_config: dict) Hydrotel | RavenpyModel[source]

Initialize an instance of a hydrological model.

Parameters

model_configdict

A dictionary containing the configuration for the hydrological model. Must contain a key “model_name” with the name of the model to use: e.g. “Hydrotel”. The required keys depend on the model being used. Use the function get_hydrological_model_inputs to get the required keys for a given model.

Returns

Hydrotel or RavenpyModel

An instance of the hydrological model.

xhydro.modelling.obj_funcs module

Objective function package for xhydro, for calibration and model evaluation.

This package provides a flexible suite of popular objective function metrics in hydrological modelling and hydrological model calibration. The main function ‘get_objective_function’ returns the value of the desired objective function while allowing users to customize many aspects:

1- Select the objective function to run; 2- Allow providing a mask to remove certain elements from the objective function calculation (e.g. for odd/even year calibration, or calibration on high or low flows only, or any custom setup). 3- Apply a transformation on the flows to modify the behaviour of the objective function calculation (e.g taking the log, inverse or square root transform of the flows before computing the objective function).

This function also contains some tools and inputs reserved for the calibration toolbox, such as the ability to take the negative of the objective function to maximize instead of minimize a metric according to the needs of the optimizing algorithm.

xhydro.modelling.obj_funcs.get_objective_function(qobs: ndarray | Dataset, qsim: ndarray | Dataset, obj_func: str = 'rmse', take_negative: bool = False, mask: ndarray | Dataset | None = None, transform: str | None = None, epsilon: float | None = None)[source]

Entrypoint function for the objective function calculation.

More can be added by adding the function to this file and adding the option in this function.

Parameters

qobsarray_like

Vector containing the Observed streamflow to be used in the objective function calculation. It is the target to attain.

qsimarray_like

Vector containing the Simulated streamflow as generated by the hydrological model. It is modified by changing parameters and resumulating the hydrological model.

obj_funcstr

String representing the objective function to use in the calibration. Options must be one of the accepted objective functions: - “abs_bias” : Absolute value of the “bias” metric - “abs_pbias”: Absolute value of the “pbias” metric - “abs_volume_error” : Absolute value of the volume_error metric - “agreement_index”: Index of agreement - “bias” : Bias metric - “correlation_coeff”: Correlation coefficient - “high_flow_rel_error” : High flow relative error - “kge” : Kling Gupta Efficiency metric (2009 version) - “kge_mod” : Kling Gupta Efficiency metric (2012 version) - “kge_2021” : Kling-Gupta Efficiency (2021 version) - “lce” : Least-squares combined efficiency - “low_flow_rel_error” : Low flow relative error - “mae”: Mean Absolute Error metric - “mare”: Mean Absolute Relative Error metric - “mse” : Mean Square Error metric - “nse”: Nash-Sutcliffe Efficiency metric - “pbias” : Percent bias (relative bias) - “persistence_index” : Measure of the relative magnitude of the residual variance to the variance of the errors - “r2” : r-squared, i.e. square of correlation_coeff. - “rmse” : Root Mean Square Error - “rrmse” : Relative Root Mean Square Error (RMSE-to-mean ratio) - “rsr” : Ratio of RMSE to standard deviation. - “volume_error”: Total volume error over the period. - “volumetric_efficiency” : Fraction of volume delivered at the proper time The default is ‘rmse’.

take_negativebool

Used to force the objective function to be multiplied by minus one (-1) such that it is possible to maximize it if the optimizer is a minimizer and vice versa. Should always be set to False unless required by an optimization setup, which is handled internally and transparently to the user. The default is False.

maskarray_like

Array of 0 or 1 on which the objective function should be applied. Values of 1 indicate that the value is included in the calculation, and values of 0 indicate that the value is excluded and will have no impact on the objective function calculation. This can be useful for specific optimization strategies such as odd/even year calibration, seasonal calibration or calibration based on high/low flows. The default is None and all data are preserved.

transformstr

Indicates the type of transformation required. Can be one of the following values: - “sqrt” : Square root transformation of the flows [sqrt(Q)] - “log” : Logarithmic transformation of the flows [log(Q)] - “inv” : Inverse transformation of the flows [1/Q] The default value is “None”, by which no transformation is performed.

epsilonfloat

Indicates the perturbation to add to the flow time series during a transformation to avoid division by zero and logarithmic transformation. The perturbation is equal to: perturbation = epsilon * mean(qobs). The default value is 0.01.

Returns

float

Value of the selected objective function (obj_fun).

Notes

All data corresponding to NaN values in the observation set are removed from the calculation. If a mask is passed, it must be the same size as the qsim and qobs vectors. If any NaNs are present in the qobs dataset, all corresponding data in the qobs, qsim and mask will be removed prior to passing to the processing function.

xhydro.modelling.obj_funcs.transform_flows(qsim: ndarray, qobs: ndarray, transform: str | None = None, epsilon: float = 0.01) tuple[ndarray, ndarray][source]

Transform flows before computing the objective function.

It is used to transform flows such that the objective function is computed on a transformed flow metric rather than on the original units of flow (ex: inverse, log-transformed, square-root)

Parameters

qsimarray_like

Simulated streamflow vector.

qobsarray_like

Observed streamflow vector.

transformstr, optional

Indicates the type of transformation required. Can be one of the following values: - “sqrt” : Square root transformation of the flows [sqrt(Q)] - “log” : Logarithmic transformation of the flows [log(Q)] - “inv” : Inverse transformation of the flows [1/Q] The default value is “None”, by which no transformation is performed.

epsilonfloat

Indicates the perturbation to add to the flow time series during a transformation to avoid division by zero and logarithmic transformation. The perturbation is equal to: perturbation = epsilon * mean(qobs). The default value is 0.01.

Returns

qsimarray_like

Transformed simulated flow according to user request.

qobsarray_like

Transformed observed flow according to user request.