2. Hydrological modelling module

INFO xhydro provides tools to execute and calibrate hydrological models, but will not prepare the model itself. This should be done beforehand.

xhydro provides a collection of functions that can serve as the main entry point for hydrological modelling. The entire framework is based on the xh.modelling.hydrological_model function and its model_config dictionary, which is meant to contain all necessary information to execute the given hydrological model. For example, depending on the model, it can store meteorological datasets directly, paths to datasets (netCDF files or other), csv configuration files, parameters, and basically anything that is required to configure and execute an hydrological model.

It then becomes the User’s responsibility to ensure that all required information for a given model are provided in the model_config object, both in the data preparation stage and in the hydrological model implementation. This can be addressed by calling the xh.modelling.get_hydrological_model_inputs function to get a list of the required keys for a given model, as well as the documentation. Parameters for that function are: - model_name: As listed below. - required_only: Whether to return all possible inputs, or only the required ones.

Currently available models are: - Hydrotel - Dummy (only used for testing purposes)

[1]:
import xhydro as xh
import xhydro.modelling as xhm

# This function can be called to get a list of the keys for a given model, as well as its documentation.
inputs, docs = xhm.get_hydrological_model_inputs("Hydrotel", required_only=False)

inputs
/home/docs/checkouts/readthedocs.org/user_builds/xhydro/conda/latest/lib/python3.12/site-packages/clisops/core/regrid.py:42: UserWarning: xarray version >= 2023.03.0 is not supported for regridding operations with cf-time indexed arrays. Please use xarray version < 2023.03.0. For more information, see: https://github.com/pydata/xarray/issues/7794.
  warnings.warn(
ERROR 1: PROJ: proj_create_from_database: Open of /home/docs/checkouts/readthedocs.org/user_builds/xhydro/conda/latest/share/proj failed
[1]:
{'model_name': 'Hydrotel',
 'project_dir': typing.Union[str, os.PathLike],
 'project_file': str,
 'project_config': typing.Optional[dict],
 'simulation_config': typing.Optional[dict],
 'output_config': typing.Optional[dict],
 'use_defaults': bool,
 'executable': typing.Union[str, os.PathLike]}
[2]:
print(docs)
Class to handle Hydrotel simulations.

    Parameters
    ----------
    project_dir : str or os.PathLike
        Path to the project folder.
    project_file : str
        Name of the project file (e.g. 'projet.csv').
    project_config : dict, optional
        Dictionary of configuration options to overwrite in the project file.
    simulation_config : dict, optional
        Dictionary of configuration options to overwrite in the simulation file (simulation.csv).
    output_config : dict, optional
        Dictionary of configuration options to overwrite in the output file (output.csv).
    use_defaults : bool
        If True, use default configuration options loaded from xhydro/modelling/data/hydrotel_defaults/.
        If False, read the configuration options directly from the files in the project folder.
    executable : str or os.PathLike
        Command to run the simulation.
        On Windows, this should be the path to Hydrotel.exe.

    Notes
    -----
    At minimum, the project folder must already exist when this function is called
    and either 'use_defaults' must be True or 'SIMULATION COURANTE' must be specified
    as a keyword argument in 'project_config'.

Hydrological models can differ from one another in terms of required inputs and available functions, but an effort will be made to homogenize them as much as possible as new models get added. Currently, all models have these 3 functions: - .run() which will execute the model, reformat the outputs to be compatible with analysis tools in xhydro, then return the simulated streamflows as a xarray.Dataset. - The streamflow will be called streamflow and have units in m3 s-1. - In the case of 1D data (such as hydrometric stations), that dimension in the dataset will be identified trough a cf_role: timeseries_id attribute. - .get_inputs() to retrieve the meteorological inputs. - .get_streamflow() to retrieve the simulated streamflow.

2.1. Initialising the model (e.g. Hydrotel)

The following example will use the Hydrotel model. It is on the more complex side, with most of its parameters hidden within configurations files, but xhydro can be used to easily update configuration files, validate the project directory and the meteorological inputs, execute the model, and reformat the outputs to be more inline with CF conventions and other functions within xhydro.

Do note that xhydro does not prepare the project directory itself, which should be done beforehand. What the class does, when initiating a new instance of xhydro.modelling.Hydrotel, is allow control on the entries located in the three main configuration files: the project file, simulation.csv, and output.csv. The other arguments for the class, as obtained from get_hydrological_model_inputs, are listed above. At any time after initialising the class, update_config() can be called to update the three configuration files. When called, this function will overwrite the CSV files on disk.

[4]:
import os
from pathlib import Path

# The executable depends on the platform
if os.name == "nt":
    executable = "path/to/Hydrotel.exe"
else:
    executable = "hydrotel"

# Prepare the model configuration options
model_config = {
    "model_name": "Hydrotel",
    "project_dir": notebook_folder / "_data" / "example_hydrotel",
    "project_file": "projet.csv",
    "simulation_config": {
        "DATE DEBUT": "2001-01-01",
        "DATE FIN": "2001-12-31",
        "FICHIER STATIONS METEO": "meteo/SLNO_meteo_GC3H.nc",
        "PAS DE TEMPS": 24,
    },
    "output_config": {"TRONCONS": 1, "DEBITS_AVAL": 1},
    "use_defaults": True,
    "executable": executable,
}

For HYDROTEL, DATE DEBUT (start date), DATE FIN (end date), and PAS DE TEMPS (timestep frequency) will always need to be specified, so these need to be added to simulation_config if they don’t already exist in simulation.csv. Additionally, either FICHIER STATIONS METEO (meteorological stations file) or FICHIER GRILLE METEO (meteorological grid file) need to be specified to guide the model towards the meteorological data.

If using the defaults, streamflow for all river reaches will be outputted. You can modify output.csv to change that behaviour.

[6]:
# Note that xhm.Hydrotel(**model_config) could also be used to initiate the model.
ht = xhm.hydrological_model(model_config)

print(f"Simulation directory, taken from the project file: '{ht.simulation_dir}'\n")
print(f"Project configuration: '{ht.project_config}'\n")
print(f"Simulation configuration: '{ht.simulation_config}'\n")
print(f"Output configuration: '{ht.output_config}'")
Simulation directory, taken from the project file: '/home/docs/checkouts/readthedocs.org/user_builds/xhydro/checkouts/latest/docs/notebooks/_data/example_hydrotel/simulation/simulation'

Project configuration: '{'PROJET HYDROTEL VERSION': '', 'FICHIER ALTITUDE': 'physitel/altitude.tif', 'FICHIER PENTE': 'physitel/pente.tif', 'FICHIER ORIENTATION': 'physitel/orientation.tif', 'FICHIER ZONE': 'physitel/uhrh.tif', 'FICHIER NOEUD': 'physitel/noeuds.nds', 'FICHIER TRONCON': 'physitel/troncon.trl', 'FICHIER PIXELS': 'physitel/point.rdx', 'FICHIER MILIEUX HUMIDES PROFONDEUR TRONCONS': '', 'SIMULATION COURANTE': 'simulation'}'

Simulation configuration: '{'SIMULATION HYDROTEL VERSION': '', 'FICHIER OCCUPATION SOL': 'physitel/occupation_sol.cla', 'FICHIER PROPRIETE HYDROLIQUE': 'physitel/proprietehydrolique.sol', 'FICHIER TYPE SOL COUCHE1': 'physitel/type_sol.cla', 'FICHIER TYPE SOL COUCHE2': 'physitel/type_sol.cla', 'FICHIER TYPE SOL COUCHE3': 'physitel/type_sol.cla', 'COEFFICIENT ADDITIF PROPRIETE HYDROLIQUE': '0', 'FICHIER INDICE FOLIERE': 'physio/ind_fol.def', 'FICHIER PROFONDEUR RACINAIRE': 'physio/pro_rac.def', 'FICHIER GRILLE METEO': '', 'FICHIER STATIONS METEO': 'meteo/SLNO_meteo_GC3H.nc', 'FICHIER STATIONS HYDRO': '', 'PREVISION METEO': '0', 'FICHIER GRILLE PREVISION': '', 'DATE DEBUT PREVISION': '', 'DATE DEBUT': '2001-01-01 00:00', 'DATE FIN': '2001-12-31 00:00', 'PAS DE TEMPS': 24, 'TRONCON EXUTOIRE': '1', 'TRONCONS DECONNECTER': 'off', 'NOM FICHIER CORRECTIONS': 'correction.csv', 'LECTURE ETAT FONTE NEIGE': '', 'LECTURE ETAT TEMPERATURE DU SOL': '', 'LECTURE ETAT BILAN VERTICAL': '', 'LECTURE ETAT RUISSELEMENT SURFACE': '', 'LECTURE ETAT ACHEMINEMENT RIVIERE': '', 'ECRITURE ETAT FONTE NEIGE': '', 'ECRITURE ETAT TEMPERATURE DU SOL': '', 'ECRITURE ETAT BILAN VERTICAL': '', 'ECRITURE ETAT RUISSELEMENT SURFACE': '', 'ECRITURE ETAT ACHEMINEMENT RIVIERE': '', 'REPERTOIRE ECRITURE ETAT FONTE NEIGE': 'etat', 'REPERTOIRE ECRITURE ETAT TEMPERATURE DU SOL': 'etat', 'REPERTOIRE ECRITURE ETAT BILAN VERTICAL': 'etat', 'REPERTOIRE ECRITURE ETAT RUISSELEMENT SURFACE': 'etat', 'REPERTOIRE ECRITURE ETAT ACHEMINEMENT RIVIERE': 'etat', 'INTERPOLATION DONNEES': 'THIESSEN', 'FONTE DE NEIGE': 'DEGRE JOUR MODIFIE', 'TEMPERATURE DU SOL': '', 'EVAPOTRANSPIRATION': 'HYDRO-QUEBEC', 'BILAN VERTICAL': 'BV3C', 'RUISSELEMENT': 'ONDE CINEMATIQUE', 'ACHEMINEMENT RIVIERE': 'ONDE CINEMATIQUE MODIFIEE', 'MILIEUX HUMIDES ISOLES': '0', 'MILIEUX HUMIDES RIVERAINS': '0', 'FICHIER DE PARAMETRE GLOBAL': '0', 'LECTURE INTERPOLATION DONNEES': 'lecture_interpolation.csv', 'THIESSEN': 'thiessen.csv', 'MOYENNE 3 STATIONS': 'moyenne_3_stations.csv', 'LECTURE FONTE NEIGE': 'lecture_fonte_neige.csv', 'DEGRE JOUR MODIFIE': 'degre_jour_modifie.csv', 'LECTURE TEMPERATURE DU SOL': 'lecture_tempsol.csv', 'RANKINEN': 'rankinen.csv', 'THORSEN': 'thorsen.csv', 'LECTURE EVAPOTRANSPIRATION': 'lecture_etp.csv', 'HYDRO-QUEBEC': 'hydro_quebec.csv', 'THORNTHWAITE': 'thornthwaite.csv', 'LINACRE': 'linacre.csv', 'PENMAN': 'penman.csv', 'PRIESTLAY-TAYLOR': 'priestlay_taylor.csv', 'PENMAN-MONTEITH': 'penman_monteith.csv', 'ETP-MC-GUINESS': 'etp-mc-guiness.csv', 'LECTURE BILAN VERTICAL': 'lecture_bilan_vertical.csv', 'BV3C': 'bv3c.csv', 'CEQUEAU': 'cequeau.csv', 'LECTURE RUISSELEMENT SURFACE': 'lecture_ruisselement.csv', 'ONDE CINEMATIQUE': 'onde_cinematique.csv', 'LECTURE ACHEMINEMENT RIVIERE': 'lecture_acheminement.csv', 'ONDE CINEMATIQUE MODIFIEE': 'onde_cinematique_modifiee.csv', 'FICHIER MILIEUX HUMIDES ISOLES': '', 'FICHIER MILIEUX HUMIDES RIVERAINS': ''}'

Output configuration: '{'TRONCONS': 1, 'TMIN': '0', 'TMAX': '0', 'TMIN_JOUR': '0', 'TMAX_JOUR': '0', 'PLUIE': '0', 'NEIGE': '0', 'APPORT': '0', 'COUVERT_NIVAL': '0', 'HAUTEUR_NEIGE': '0', 'ALBEDO_NEIGE': '0', 'PROFONDEUR_GEL': '0', 'ETP': '0', 'ETR1': '0', 'ETR2': '0', 'ETR3': '0', 'ETR_TOTAL': '0', 'PRODUCTION_BASE': '0', 'PRODUCTION_HYPO': '0', 'PRODUCTION_SURF': '0', 'Q12': '0', 'Q23': '0', 'THETA1': '0', 'THETA2': '0', 'THETA3': '0', 'APPORT_LATERAL': '0', 'DEBITS_AMONT': '0', 'DEBITS_AVAL': 1, 'FICHIERS_ETATS_SEPARATEUR': ';', 'SEPARATEUR': ';', 'OUTPUT_NETCDF': '1'}'

2.2. Validating the meteorological data

A few basic checks will be automatically performed prior to executing hydrological models, but a user might want to perform more advanced health checks (e.g. unrealistic meteorological inputs). This is possible through the use of xhydro.utils.health_checks. Consult the ‘xscen’ documentation for the full list of possible checks.

In this example, we’ll make sure that there are no abnormal meteorological values or sequence of values. Since the data used for this example is fake, this will raise some flags.

[7]:
health_checks = {
    "raise_on": [],  # If an entry is not here, it will warn the user instead of raising an exception.
    "flags": {
        "pr": {  # You can have specific flags per variable.
            "negative_accumulation_values": {},
            "very_large_precipitation_events": {},
            "outside_n_standard_deviations_of_climatology": {"n": 5},
            "values_repeating_for_n_or_more_days": {"n": 5},
        },
        "tasmax": {
            "tasmax_below_tasmin": {},
            "temperature_extremely_low": {},
            "temperature_extremely_high": {},
            "outside_n_standard_deviations_of_climatology": {"n": 5},
            "values_repeating_for_n_or_more_days": {"n": 5},
        },
        "tasmin": {
            "temperature_extremely_low": {},
            "temperature_extremely_high": {},
            "outside_n_standard_deviations_of_climatology": {"n": 5},
            "values_repeating_for_n_or_more_days": {"n": 5},
        },
    },
}
[8]:
# We can use get_inputs() to automatically retrieve the meteorological data. This is very useful for instances like Hydrotel, where this information is hidden within configuration files.
ds_in = ht.get_inputs()
ds_in["pr"].attrs[
    "units"
] = "mm d-1"  # Hydrotel-to-xclim compatibility. Precipitation in xclim needs to be a flux.

# This example will raise warnings, this is normal since we're using fake data.
xh.utils.health_checks(ds_in, **health_checks)

2.3. Executing the model

In most cases, a few basic checkups will be performed prior to executing the model, when the run() function is called. In the case of Hydrotel, these checks will be made:

  • All files mentioned in the configuration exist.

  • The meteorological dataset has the dimensions, coordinates, and variables named in its configuration file (e.g. SLNO_meteo_GC3H.nc.config, in this example).

  • The dataset has a standard calendar.

  • The frequency is uniform (i.e. all time steps are equally spaced).

  • The start and end dates are contained in the dataset.

  • The dataset is complete (i.e. no missing values).

Only when those checks are satisfied will the function actually execute the model. In addition, specific to Hydrotel, the following arguments can be called:

  • check_missing: bool. Whether to verify for missing data or not. Since this can be very time-consuming, it is False by default.

  • dry_run: bool. Put at True to simply print the command line, without executing it.

Once the model has been executed, xhydro will automatically reformat the NetCDF to bring it closer to CF conventions and make it compatible with other xhydro modules. Note that for Hydrotel, this reformatting currently only supports the DEBITS_AVAL (outgoing streamflow) output option.

[9]:
# For the purpose of this example, we'll leave 'dry_run' as True.
print("Command that would be run in the terminal:")
ht.run(check_missing=True, dry_run=True)
Command that would be run in the terminal:
[9]:
'hydrotel /home/docs/checkouts/readthedocs.org/user_builds/xhydro/checkouts/latest/docs/notebooks/_data/example_hydrotel/projet.csv -t 1'
[10]:
# This is how the output would look like after reformatting (which was skipped by the dry_run argument)
ht._standardise_outputs()
ht.get_streamflow()
[10]:
<xarray.Dataset> Size: 9kB
Dimensions:     (time: 730, station_id: 1)
Coordinates:
  * time        (time) datetime64[ns] 6kB 2001-01-01 2001-01-02 ... 2002-12-31
    troncon     (station_id) int64 8B ...
  * station_id  (station_id) int64 8B 0
Data variables:
    streamflow  (station_id, time) float32 3kB ...
Attributes:
    Hydrotel_version:  

2.4. Model calibration

WARNING This is still a work-in-progress. Only the Dummy is currently implemented.

Model calibration consists of a loop of multiple instances where: model parameters are chosen, the model is run, the results are compared to observations. The calibration functions in xhydro rely on SPOTPY to perform the optimization process.

When using the calibration module, 2 additional keywords are required for the model_config: - Qobs: Contains the observed streamflow used as the calibration target. - parameters: While not necessary to provide this, it is a reserved keyword used by the optimizer.

The calibration function, xh.modelling.perform_calibration, has these parameters:

[11]:
xh.modelling.perform_calibration?
[12]:
import numpy as np

# Prepare the model_config dictionary for the Dummy model
model_config = {
    "model_name": "Dummy",
    "precip": np.array([10, 11, 12, 13, 14, 15]),
    "temperature": np.array([10, 3, -5, 1, 15, 0]),
    "drainage_area": np.array([10]),
    "qobs": np.array([120, 130, 140, 150, 160, 170]),  # Required for calibration
}

# This model has 3 parameters. This will be their possible range.
bounds_low = np.array([0, 0, 0])
bounds_high = np.array([10, 10, 10])

mask = np.array([0, 0, 0, 0, 1, 1])
[13]:
# Run the calibration
best_parameters, best_simulation, best_objfun = xhm.perform_calibration(
    model_config,
    obj_func="mae",
    bounds_low=bounds_low,
    bounds_high=bounds_high,
    evaluations=1000,
    algorithm="DDS",
    mask=mask,
    sampler_kwargs={"trials": 1},
)
Initializing the  Dynamically Dimensioned Search (DDS) algorithm  with  1000  repetitions
The objective function will be maximized
Starting the DDS algotrithm with 1000 repetitions...
Finding best starting point for trial 1 using 5 random samples.
Initialize database...
['csv', 'hdf5', 'ram', 'sql', 'custom', 'noData']
Best solution found has obj function value of -0.4314797367772485 at 5



*** Final SPOTPY summary ***
Total Duration: 0.79 seconds
Total Repetitions: 1000
Maximal objective value: -0.43148
Corresponding parameter setting:
param0: 7.8018
param1: 0.0238488
param2: 0.145928
******************************

Best parameter set:
param0=7.801801777690843, param1=0.023848845865538794, param2=7.326740478016068
Run number 968 has the lowest objectivefunction with: -8136.5627
[14]:
# The first output corresponds to the best set of parameters
best_parameters
[14]:
[7.801801777690843, 0.023848845865538794, 7.326740478016068]
[15]:
# The second output corresponds to the timeseries for the best set of parameters
best_simulation
[15]:
<xarray.Dataset> Size: 96B
Dimensions:     (time: 6)
Coordinates:
  * time        (time) datetime64[ns] 48B 2024-01-01 2024-01-02 ... 2024-01-06
Data variables:
    streamflow  (time) float64 48B 5.734e+03 6.293e+03 ... 8.029e+03 8.574e+03
[16]:
# The second output is the value of the objective function for the best set of parameters
best_objfun
[16]:
8136.562721306232