1. GIS operations

Geographic Information System (GIS) operations are crucial in hydrological analysis. This section illustrates how to leverage the xHydro library for performing key GIS tasks, including delineating watershed boundaries and extracting critical physiographic, climatological, and geographical variables at the watershed scale.

[13]:
import leafmap
import numpy as np
import pandas as pd
import xarray as xr
import xclim
import xdatasets
from IPython.display import clear_output

import xhydro as xh
import xhydro.gis as xhgis

clear_output(wait=False)

1.1. Watershed delineation

Currently, watershed delineation utilizes HydroSHEDS’ HydroBASINS (hybas_na_lev01-12_v1c) and is compatible with any location in North America. The process involves identifying all upstream sub-basins from a specified outlet and consolidating them into a unified watershed. The leafmap library is used to generate interactive maps, which allow for the selection of outlets or the visualization of the resulting watershed boundaries. While the use of the map is not mandatory for performing the calculations, it greatly enhances the visualization and understanding of the watershed delineation process.

[14]:
m = leafmap.Map(center=(48.63, -74.71), zoom=5, basemap="USGS Hydrography")

image_feneral.png

1.1.1. a) From a list of coordinates

A first option is to provide a list of coordinates. In the example below, we select two pour points, with each one representing the outlet for the watersheds of Montmorency and the Beaurivage River, respectively.

[15]:
lng_lat = [
    (-71.15069, 46.89431),  # Montmorency river watershed
    (-71.28512, 46.66141),  # Beaurivage river watershed
]

1.1.2. b) From markers on a map

Rather than using a list, a more interactive approach allows for directly selecting outlets from the existing map m, using the Draw a marker button located on the left of the map. The image below demonstrates how to select pour points by dragging markers to the desired locations on the map.

image_marker.png

After selecting points using either approach a) or b), or a combination of both, we can initiate the watershed delineation calculation. This is done using the function xhydro.gis.watershed_delineation.

Help on function watershed_delineation in module xhydro.gis:

watershed_delineation(
    *,
    coordinates: 'list[tuple] | tuple | None' = None,
    m: 'leafmap.Map | None' = None
) -> 'gpd.GeoDataFrame'
    Calculate watershed delineation from pour point.

    Watershed delineation can be computed at any location in North America using HydroBASINS (hybas_na_lev01-12_v1c).
    The process involves assessing all upstream sub-basins from a specified pour point and consolidating them into a
    unified watershed.

    Parameters
    ----------
    coordinates : list of tuple, tuple, optional
        Geographic coordinates (longitude, latitude) for the location where watershed delineation will be conducted.
    m : leafmap.Map, optional
        If the function receives both a map and coordinates as inputs, it will generate and display watershed
        boundaries on the map. Additionally, any markers present on the map will be transformed into
        corresponding watershed boundaries for each marker.

    Returns
    -------
    gpd.GeoDataFrame
        GeoDataFrame containing the watershed boundaries.

    Warnings
    --------
    This function relies on an Amazon S3-hosted dataset to delineate watersheds.

[17]:
gdf = xhgis.watershed_delineation(coordinates=lng_lat, m=m)
gdf
[17]:
HYBAS_ID Upstream Area (sq. km) geometry
0 7120034480 1157.6 POLYGON ((-71.15208 46.8838, -71.15612 46.8869...
1 7120365812 693.5 POLYGON ((-71.09758 46.40035, -71.09409 46.403...

The outcomes are stored in a GeoPandas gpd.GeoDataFrame object, enabling us to save the polygons in various common formats, such as ESRI Shapefile or GeoJSON. If a map m is available, the polygons will be automatically added to it. To visualize the map, simply type m in the code cell to render it. If displaying the map directly is not compatible with your notebook interpreter, you can use the following code to extract the HTML from the map and plot it:

[18]:
m.zoom_to_gdf(gdf)

image_gdf.png

1.1.3. c) From xdatasets

While automatically delineating watershed boundaries is a valuable tool, users are encouraged to utilize official watershed boundaries when available, rather than generating new ones. The xdatasets library, for example, hosts a few official boundaries that can be accessed using the xdatasets.Query function. As of today, the following watershed sources are available:

[19]:
xdatasets.Query(
    **{
        "datasets": {
            "deh_polygons": {
                "id": ["031502", "0421*"],
                "regulated": ["Natural"],
            }
        }
    }
).data.reset_index()
[19]:
Station Superficie geometry
0 031502 15.708960 POLYGON ((-72.50126 46.21216, -72.50086 46.213...
1 042103 579.479614 POLYGON ((-78.49014 46.64514, -78.4901 46.6450...

1.2. Extraction of watershed properties

Once the watershed boundaries are obtained, we can extract valuable properties such as geographical information, land use classification, and climatological data from the delineated watersheds.

1.2.1. a) Geographical watershed properties

First, xhydro.gis.watershed_properties can be used to extract the geographical properties of the watershed, including the perimeter, total area, Gravelius coefficient, and basin centroid. It is important to note that this function returns all the columns present in the provided gpd.GeoDataFrame argument.

Help on function watershed_properties in module xhydro.gis:

watershed_properties(
    gdf: 'gpd.GeoDataFrame',
    *,
    unique_id: 'str | None' = None,
    projected_crs: 'int | str | None' = 'NAD83',
    output_format: "Literal['xarray', 'xr.Dataset', 'geopandas', 'gpd.GeoDataFrame']" = 'geopandas'
) -> 'gpd.GeoDataFrame | xr.Dataset'
    Watershed properties extracted from a gpd.GeoDataFrame.

    The calculated properties are :
    - area
    - perimeter
    - gravelius
    - centroid coordinates

    Parameters
    ----------
    gdf : gpd.GeoDataFrame
        GeoDataFrame containing watershed polygons. Must have a defined .crs attribute.
    unique_id : str, optional
        The column name in the GeoDataFrame that serves as a unique identifier.
    projected_crs : int | str
        The projected coordinate reference system (crs) to utilize for calculations, such as determining watershed area.
        If a string is provided, it should be a valid Geodetic CRS for the `gpd.estimate_utm_crs()` method.
        If None, the function will use the `gpd.estimate_utm_crs()` default (WGS 84).
        Default is an estimated CRS based on NAD83.
    output_format : str
        One of either `xarray` (or `xr.Dataset`) or `geopandas` (or `gpd.GeoDataFrame`).

    Returns
    -------
    gpd.GeoDataFrame or xr.Dataset
        Output dataset containing the watershed properties.

[21]:
HYBAS_ID Upstream Area (sq. km) area (m2) estimated_area_diff (%) perimeter (m) gravelius (m/m) centroid_lon centroid_lat
0 7120034480 1157.6 1.160550e+09 0.254842 263428.762122 2.181355 -71.091980 47.257839
1 7120365812 693.5 5.855856e+08 -15.560840 158295.639822 1.845308 -71.260464 46.452161

For added convenience, we can also retrieve the same results in the form of an xarray.Dataset:

[22]:
xhgis.watershed_properties(gdf, unique_id="HYBAS_ID", output_format="xarray")
[22]:
<xarray.Dataset> Size: 128B
Dimensions:                 (HYBAS_ID: 2)
Coordinates:
  * HYBAS_ID                (HYBAS_ID) int64 16B 7120034480 7120365812
Data variables:
    Upstream Area (sq. km)  (HYBAS_ID) float64 16B 1.158e+03 693.5
    area                    (HYBAS_ID) float64 16B 1.161e+09 5.856e+08
    estimated_area_diff     (HYBAS_ID) float64 16B 0.2548 -15.56
    perimeter               (HYBAS_ID) float64 16B 2.634e+05 1.583e+05
    gravelius               (HYBAS_ID) float64 16B 2.181 1.845
    centroid_lon            (HYBAS_ID) float64 16B -71.09 -71.26
    centroid_lat            (HYBAS_ID) float64 16B 47.26 46.45

1.2.2. b) Surface properties

We can use xhydro.gis.surface_properties to extract surface properties for the same gpd.GeoDataFrame, such as slope and aspect. By default, these properties are calculated using Copernicus’ GLO-90 Digital Elevation Model as of 2021-04-22. However, both the source and the date can be modified through the function’s arguments.

Help on function surface_properties in module xhydro.gis:

surface_properties(
    gdf: 'gpd.GeoDataFrame',
    *,
    unique_id: 'str | None' = None,
    projected_crs: 'int | str | None' = 'NAD83',
    operation: 'str' = 'mean',
    dataset_date: 'str' = '2021-04-22',
    collection: 'str' = 'cop-dem-glo-90',
    output_format: "Literal['xarray', 'xr.Dataset', 'geopandas', 'gpd.GeoDataFrame']" = 'geopandas'
) -> 'gpd.GeoDataFrame | xr.Dataset'
    Surface properties for watersheds.

    Surface properties are calculated using Copernicus's GLO-90 Digital Elevation Model.
    By default, the dataset has a geographic coordinate system (EPSG: 4326) and this function expects a projected crs for more accurate results.

    The calculated properties are :
    - elevation (meters)
    - slope (degrees)
    - aspect ratio (degrees)

    Parameters
    ----------
    gdf : gpd.GeoDataFrame
        GeoDataFrame containing watershed polygons. Must have a defined .crs attribute.
    unique_id : str, optional
        The column name in the GeoDataFrame that serves as a unique identifier.
    projected_crs : int | str
        The projected coordinate reference system (crs) to utilize for calculations, such as determining watershed area.
        If a string is provided, it should be a valid Geodetic CRS for the `gpd.estimate_utm_crs()` method.
        If None, the function will use the `gpd.estimate_utm_crs()` default (WGS 84).
        Default is an estimated CRS based on NAD83.
    operation : str
        Aggregation statistics such as `mean` or `sum`.
    dataset_date : str
        Date (%Y-%m-%d) for which to select the imagery from the dataset. Date must be available.
    collection : str
        Collection name from the Planetary Computer STAC Catalog. Default is `cop-dem-glo-90`.
    output_format : str
        One of either `xarray` (or `xr.Dataset`) or `geopandas` (or `gpd.GeoDataFrame`).

    Returns
    -------
    gpd.GeoDataFrame or xr.Dataset
        Output dataset containing the surface properties.

    Warnings
    --------
    This function relies on the Microsoft Planetary Computer's STAC Catalog to retrieve the Digital Elevation Model (DEM) data.

[24]:
elevation slope aspect time spatial_ref
geometry
0 725.898315 5.307655 186.393692 2021-04-22 0
1 222.474442 1.356003 212.094788 2021-04-22 0

Again, for convenience, we can output the results in xarray.Dataset format :

[25]:
xhgis.surface_properties(gdf, unique_id="HYBAS_ID", output_format="xarray")
[25]:
<xarray.Dataset> Size: 72B
Dimensions:      (HYBAS_ID: 2)
Coordinates:
  * HYBAS_ID     (HYBAS_ID) int64 16B 7120034480 7120365812
    time         datetime64[ns] 8B 2021-04-22
    spatial_ref  int64 8B 0
    geometry     (HYBAS_ID) geometry 16B POLYGON ((336034.51117111975 5194499...
Data variables:
    elevation    (HYBAS_ID) float32 8B 725.9 222.5
    slope        (HYBAS_ID) float32 8B 5.308 1.356
    aspect       (HYBAS_ID) float32 8B 186.4 212.1

1.2.3. c) Land use classification

Finally, we can retrieve land use classifications using xhydro.gis.land_use_classification. This function is powered by the Planetary Computer’s STAC catalog and, by default, uses the 10m Annual Land Use Land Cover (9-class) V2 dataset (“io-lulc-annual-v02”). However, alternative collections can be specified as arguments to the function.

Help on function land_use_classification in module xhydro.gis:

land_use_classification(
    gdf: 'gpd.GeoDataFrame',
    *,
    unique_id: 'str | None' = None,
    collection='io-lulc-annual-v02',
    year: 'str | int' = 'latest',
    output_format: "Literal['xarray', 'xr.Dataset', 'geopandas', 'gpd.GeoDataFrame']" = 'geopandas'
) -> 'gpd.GeoDataFrame | xr.Dataset'
    Calculate land use classification.

    Calculate land use classification for each polygon from a gpd.GeoDataFrame. By default,
    the classes are generated from the Planetary Computer's STAC catalog using the
    `10m Annual Land Use Land Cover` dataset.

    Parameters
    ----------
    gdf : gpd.GeoDataFrame
        GeoDataFrame containing watershed polygons. Must have a defined .crs attribute.
    unique_id : str
        GeoDataFrame containing watershed polygons. Must have a defined .crs attribute.
    collection : str
        Collection name from the Planetary Computer STAC Catalog.
    year : str | int
        Land use dataset year between 2017 and 2022.
    output_format : str
        One of either `xarray` (or `xr.Dataset`) or `geopandas` (or `gpd.GeoDataFrame`).

    Returns
    -------
    gpd.GeoDataFrame or xr.Dataset
        Output dataset containing the watershed properties.

    Warnings
    --------
    This function relies on the Microsoft Planetary Computer's STAC Catalog to retrieve the Digital Elevation Model (DEM) data.

[27]:
df = xhgis.land_use_classification(gdf, unique_id="HYBAS_ID", output_format="xarray")
df
[27]:
<xarray.Dataset> Size: 168B
Dimensions:                 (HYBAS_ID: 2)
Coordinates:
  * HYBAS_ID                (HYBAS_ID) int64 16B 7120034480 7120365812
    spatial_ref             int64 8B 0
Data variables:
    pct_trees               (HYBAS_ID) float64 16B 0.9088 0.5992
    pct_rangeland           (HYBAS_ID) float64 16B 0.05135 0.08354
    pct_water               (HYBAS_ID) float64 16B 0.02159 0.002957
    pct_built_area          (HYBAS_ID) float64 16B 0.01713 0.03685
    pct_snow/ice            (HYBAS_ID) float64 16B 0.0005296 0.0002937
    pct_crops               (HYBAS_ID) float64 16B 0.0002972 0.277
    pct_flooded_vegetation  (HYBAS_ID) float64 16B 0.0001879 0.0
    pct_bare_ground         (HYBAS_ID) float64 16B 0.0001111 6.814e-05
    pct_clouds              (HYBAS_ID) float64 16B 0.0 5.926e-05
Attributes:
    year:                2023
    collection:          io-lulc-annual-v02
    spatial_resolution:  10.0
[28]:
ax = xhgis.land_use_plot(gdf, unique_id="HYBAS_ID", idx=1)
../_images/notebooks_gis_34_0.png

1.2.4. d) Climate indicators

The extraction of climate indicators is the most complex step, as it requires accessing a weather dataset, then subsetting and averaging the data over the various watersheds within the GeoDataFrame object. These steps are outside the scope of xHydro, and users will need to rely on other libraries for this task. One approach, outlined in the Use Case Example, involves using a combination of xscen and xESMF. Another approach, demonstrated here, utilizes xdatasets. Indeed, xdatasets enables the extraction of all the pixels from a gridded dataset within a watershed while accounting for the weighting of each pixel intersecting the watershed.

For this example, we will use ERA5-Land reanalysis data covering the period from 1981 to 2010 as our climatological dataset.

[29]:
datasets = {
    "era5_land_reanalysis": {"variables": ["t2m", "tp"]},
}
space = {
    "clip": "polygon",  # bbox, point or polygon
    "averaging": True,
    "geometry": gdf,
    "unique_id": "HYBAS_ID",
}
time = {
    "start": "1981-01-01",
    "end": "1982-12-31",
    "timezone": "America/Montreal",
    "timestep": "D",
    "aggregation": {"tp": np.nansum, "t2m": np.nanmean},
}

ds = xdatasets.Query(datasets=datasets, space=space, time=time).data.squeeze()
Temporal operations: processing tp with era5_land_reanalysis: 100%|██████████| 2/2 [00:00<00:00, 34.55it/s]

Because the next steps utilize xclim under the hood, the dataset must be CF-compliant. At a minimum, the xarray.DataArray used should adhere to the following principles:

  • The dataset must include a time dimension, typically at a daily frequency, with no missing timesteps (although NaNs are supported). If your data deviates from this format, extra caution should be taken when interpreting the results.

  • If there is a 1D spatial dimension, such as HYBAS_ID in the example below, it must have an attribute cf_role with the value timeseries_id.

  • The variable must include at least a units attribute. While not mandatory, other attributes such as long_name and cell_methods are expected by xclim, and warnings will be raised if they are missing.

  • Variable names should match those supported by xclim.

The following code will format the ERA5-Land data obtained from xdatasets to add missing metadata and change the variable names and units:

[30]:
ds = ds.rename({"t2m_nanmean": "tas", "tp_nansum": "pr"})
ds["tas"] = xclim.core.units.convert_units_to(ds["tas"], "degC")
ds["tas"].attrs.update({"cell_methods": "time: mean"})

ds["pr"].attrs.update({"units": "m d-1", "cell_methods": "time: mean within days"})
ds["pr"] = xclim.core.units.convert_units_to(ds["pr"], "mm d-1")
ds
[30]:
<xarray.Dataset> Size: 29kB
Dimensions:      (HYBAS_ID: 2, time: 730)
Coordinates:
  * HYBAS_ID     (HYBAS_ID) int64 16B 7120034480 7120365812
  * time         (time) datetime64[ns] 6kB 1981-01-01 1981-01-02 ... 1982-12-31
    spatial_agg  <U7 28B 'polygon'
    timestep     <U1 4B 'D'
    source       <U20 80B 'era5_land_reanalysis'
Data variables:
    tas          (HYBAS_ID, time) float64 12kB -19.64 -15.9 ... -6.387 -5.33
    pr           (HYBAS_ID, time) float64 12kB 0.01237 3.258 ... 0.07566 0.176
Attributes: (12/30)
    GRIB_NV:                                  0
    GRIB_Nx:                                  1171
    GRIB_Ny:                                  701
    GRIB_cfName:                              unknown
    GRIB_cfVarName:                           t2m
    GRIB_dataType:                            fc
    ...                                       ...
    GRIB_totalNumber:                         0
    GRIB_typeOfLevel:                         surface
    GRIB_units:                               K
    long_name:                                2 metre temperature
    standard_name:                            unknown
    units:                                    K

Climate indicators can be calculated in various ways, and for most simple tasks, directly using xclim is always a viable option. xclim offers an extensive list of available indicators within a user-friendly and flexible framework.

[31]:
xclim.atmos.tg_max(ds.tas)
[31]:
<xarray.DataArray 'tg_max' (HYBAS_ID: 2, time: 2)> Size: 32B
array([[292.65401005, 295.1619582 ],
       [296.13479571, 298.57350323]])
Coordinates:
  * HYBAS_ID     (HYBAS_ID) int64 16B 7120034480 7120365812
  * time         (time) datetime64[ns] 16B 1981-01-01 1982-01-01
    spatial_agg  <U7 28B 'polygon'
    timestep     <U1 4B 'D'
    source       <U20 80B 'era5_land_reanalysis'
Attributes:
    units:           K
    units_metadata:  temperature: unknown
    cell_methods:    time: mean time: maximum over days
    history:         [2026-02-19 14:33:32] tg_max: TG_MAX(tas=tas, freq='YS')...
    standard_name:   air_temperature
    long_name:       Maximum daily mean temperature
    description:     Annual maximum of daily mean temperature.

For more complex tasks, or when computing multiple indicators simultaneously, xhydro.indicators.compute_indicators is often the preferred method. This function allows users to build and pass multiple indicators at once, either by providing a list of custom indicators created with xclim.core.indicator.Indicator.from_dict (see the INFO box below) or by referencing a YAML file (see the ``xscen` documentation <https://xscen.readthedocs.io/en/latest/notebooks/2_getting_started.html#Computing-indicators>`__).

INFO

Custom indicators in xHydro are built by following the YAML formatting required by xclim.

A custom indicator built using xclim.core.indicator.Indicator.from_dict will need these elements:

  • “data”: A dictionary with the following information:

    • “base”: The “YAML ID” obtained from here.

    • “input”: A dictionary linking the default xclim input to the name of your variable. Needed only if it is different. In the link above, they are the string following “Uses:”.

    • “parameters”: A dictionary containing all other parameters for a given indicator. In the link above, the easiest way to access them is by clicking the link in the top-right corner of the box describing a given indicator.

    • More entries can be used here, as described in the xclim documentation under “identifier”.

  • “identifier”: A custom name for your indicator. This will be the name returned in the results.

  • “module”: Needed, but can be anything. To prevent an accidental overwriting of xclim indicators, it is best to use something different from: [“atmos”, “land”, “generic”].

Help on function compute_indicators in module xscen.indicators:

compute_indicators(
    ds: xarray.core.dataset.Dataset,
    indicators: str | os.PathLike | collections.abc.Sequence[xclim.core.indicator.Indicator] | collections.abc.Sequence[tuple[str, xclim.core.indicator.Indicator]] | module,
    *,
    periods: list[str] | list[list[str]] | None = None,
    restrict_years: bool = True,
    to_level: str | None = 'indicators',
    rechunk_input: bool = False
) -> dict
    Calculate variables and indicators based on a YAML call to xclim.

    The function cuts the output to be the same years as the inputs.
    Hence, if an indicator creates a timestep outside the original year range (e.g. the first DJF for QS-DEC),
    it will not appear in the output.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset to use for the indicators.
    indicators : str | os.PathLike | Sequence[Indicator] | Sequence[tuple[str, Indicator]] | ModuleType
        Path to a YAML file that instructs on how to calculate missing variables.
        Can also be only the "stem", if translations and custom indices are implemented.
        Can be the indicator module directly, or a sequence of indicators or a sequence of
        tuples (indicator name, indicator) as returned by `iter_indicators()`.
    periods : list of str or list of lists of str, optional
        Either [start, end] or list of [start, end] of continuous periods over which to compute the indicators.
        This is needed when the time axis of ds contains some jumps in time.
        If None, the dataset will be considered continuous.
    restrict_years : bool
        If True, cut the time axis to be within the same years as the input.
        This is mostly useful for frequencies that do not start in January, such as QS-DEC.
        In that instance, `xclim` would start on previous_year-12-01 (DJF), with a NaN.
        `restrict_years` will cut that first timestep. This should have no effect on YS and MS indicators.
    to_level : str, optional
        The processing level to assign to the output.
        If None, the processing level of the inputs is preserved.
    rechunk_input : bool
        If True, the dataset is rechunked with :py:func:`flox.xarray.rechunk_for_blockwise`
        according to the resampling frequency of the indicator. Each rechunking is done
        once per frequency with :py:func:`xscen.utils.rechunk_for_resample`.

    Returns
    -------
    dict
        Dictionary (keys = timedeltas) with indicators separated by temporal resolution.

    See Also
    --------
    xclim.indicators, xclim.core.indicator.build_indicator_module_from_yaml

[33]:
indicators = [
    # Maximum summer temperature
    xclim.core.indicator.Indicator.from_dict(
        data={
            "base": "tg_max",
            "parameters": {"indexer": {"month": [6, 7, 8]}},
        },
        identifier="tg_max_summer",
        module="example",
    ),
    # Number of days with more than 5 mm of precipitation
    xclim.core.indicator.Indicator.from_dict(
        data={
            "base": "wetdays",
            "parameters": {"thresh": "5 mm d-1"},
        },
        identifier="wet5mm",
        module="example",
    ),
]

xh.indicators.compute_indicators(ds, indicators)["YS-JAN"]
[33]:
<xarray.Dataset> Size: 208B
Dimensions:        (HYBAS_ID: 2, time: 2)
Coordinates:
  * HYBAS_ID       (HYBAS_ID) int64 16B 7120034480 7120365812
  * time           (time) datetime64[ns] 16B 1981-01-01 1982-01-01
    spatial_agg    <U7 28B 'polygon'
    timestep       <U1 4B 'D'
    source         <U20 80B 'era5_land_reanalysis'
Data variables:
    tg_max_summer  (HYBAS_ID, time) float64 32B 292.7 295.2 296.1 298.6
    wet5mm         (HYBAS_ID, time) float64 32B 85.0 83.0 80.0 78.0
Attributes: (12/34)
    GRIB_NV:                                  0
    GRIB_Nx:                                  1171
    GRIB_Ny:                                  701
    GRIB_cfName:                              unknown
    GRIB_cfVarName:                           t2m
    GRIB_dataType:                            fc
    ...                                       ...
    standard_name:                            unknown
    units:                                    K
    cat:xrfreq:                               YS-JAN
    cat:frequency:                            yr
    cat:processing_level:                     indicators
    cat:variable:                             ('tg_max_summer', 'wet5mm')

Finally, xhydro.indicators.get_yearly_op, also built upon the xclim library, provides a flexible method for obtaining yearly values of specific statistics, such as annual or seasonal maxima. This can be particularly useful, for instance, when extracting raw data needed for frequency analyses.

Help on function get_yearly_op in module xhydro.indicators.generic:

get_yearly_op(
    ds,
    op,
    *,
    input_var: str = 'q',
    window: int = 1,
    timeargs: dict | None = None,
    missing: str = 'skip',
    missing_options: dict | None = None,
    interpolate_na: bool = False
) -> xarray.core.dataset.Dataset
    Compute yearly operations on a variable.

    Parameters
    ----------
    ds : xr.Dataset
        Dataset containing the variable to compute the operation on.
    op : str
        Operation to compute. One of ["max", "min", "mean", "sum"].
    input_var : str
        Name of the input variable. Defaults to "q".
    window : int
        Size of the rolling window. A "mean" operation is performed on the rolling window before the call to xclim.
        This parameter cannot be used with the "sum" operation.
    timeargs : dict, optional
        Dictionary of time arguments for the operation.
        Keys are the name of the period that will be added to the results (e.g. "winter", "summer", "annual").
        Values are up to two dictionaries, with both being optional.
        The first is {'freq': str}, where str is a frequency supported by xarray (e.g. "YS", "YS-JAN", "YS-DEC").
        It needs to be a yearly frequency. Defaults to "YS-JAN".
        The second is an indexer as supported by :py:func:`xclim.core.calendar.select_time`.
        Defaults to {}, which means the whole year.
        See :py:func:`xclim.core.calendar.select_time` for more information.
        Examples: {"winter": {"freq": "YS-DEC", "date_bounds": ["12-01", "02-28"]}}, {"jan": {"freq": "YS", "month": 1}}, {"annual": {}}.
    missing : str
        How to handle missing values. One of "skip", "any", "at_least_n", "pct", "wmo".
        See :py:func:`xclim.core.missing` for more information.
    missing_options : dict, optional
        Dictionary of options for the missing values' method. See :py:func:`xclim.core.missing` for more information.
    interpolate_na : bool
        Whether to interpolate missing values before computing the operation. Only used with the "sum" operation.
        Defaults to False.

    Returns
    -------
    xr.Dataset
        Dataset containing the computed operations, with one variable per indexer.
        The name of the variable follows the pattern `{input_var}{window}_{op}_{indexer}`.

    Notes
    -----
    If you want to perform a frequency analysis on a frequency that is finer than annual, simply use multiple timeargs
    (e.g. 1 per month) to create multiple distinct variables.

The timeargs argument relies on indexers that are compatible with xclim.core.calendar.select_time. Four types of indexers are currently accepted:

  • month, followed by a sequence of month numbers.

  • season, followed by one or more of 'DJF', 'MAM', 'JJA', and 'SON'.

  • doy_bounds, followed by a sequence representing the inclusive bounds of the period to be considered ('start', 'end').

  • date_bounds, similar to doy_bounds, but using a month-day ('%m-%d') format.

Subsequently, we specify the operations to be calculated for each variable. Supported operations include "max", "min", "mean", and "sum".

[35]:
timeargs = {
    "january": {"month": [1]},
    "spring": {"date_bounds": ["02-11", "06-19"]},
    "summer_fall": {"date_bounds": ["06-20", "11-19"]},
    "year": {"date_bounds": ["01-01", "12-31"]},
}

operations = {
    "tas": ["max", "mean"],
    "pr": ["sum"],
}

The combination of timeargs and operations through the Cartesian product enables the efficient generation of a comprehensive array of climate indicators.

[36]:
ds_climatology = xr.merge(
    [
        xh.indicators.get_yearly_op(ds, input_var=variable, op=op, timeargs=timeargs)
        for (variable, ops) in operations.items()
        for op in ops
    ]
)
ds_climatology
[36]:
<xarray.Dataset> Size: 528B
Dimensions:               (HYBAS_ID: 2, time: 2)
Coordinates:
  * HYBAS_ID              (HYBAS_ID) int64 16B 7120034480 7120365812
  * time                  (time) datetime64[ns] 16B 1981-01-01 1982-01-01
    spatial_agg           <U7 28B 'polygon'
    timestep              <U1 4B 'D'
    source                <U20 80B 'era5_land_reanalysis'
Data variables:
    tas_max_january       (HYBAS_ID, time) float64 32B -4.165 -7.795 ... -3.176
    tas_max_spring        (HYBAS_ID, time) float64 32B 17.11 18.26 21.04 20.31
    tas_max_summer_fall   (HYBAS_ID, time) float64 32B 19.5 22.01 22.98 25.42
    tas_max_year          (HYBAS_ID, time) float64 32B 19.5 22.01 22.98 25.42
    tas_mean_january      (HYBAS_ID, time) float64 32B -18.11 -19.82 ... -16.8
    tas_mean_spring       (HYBAS_ID, time) float64 32B 1.696 -0.8479 5.592 2.777
    tas_mean_summer_fall  (HYBAS_ID, time) float64 32B 9.367 9.219 12.23 12.57
    tas_mean_year         (HYBAS_ID, time) float64 32B 1.734 0.5356 4.881 4.094
    pr_sum_january        (HYBAS_ID, time) float64 32B 34.29 88.93 28.49 85.93
    pr_sum_spring         (HYBAS_ID, time) float64 32B 591.9 343.5 453.4 333.9
    pr_sum_summer_fall    (HYBAS_ID, time) float64 32B 603.6 626.1 743.2 535.9
    pr_sum_year           (HYBAS_ID, time) float64 32B 1.417e+03 ... 1.163e+03
Attributes: (12/35)
    GRIB_NV:                                  0
    GRIB_Nx:                                  1171
    GRIB_Ny:                                  701
    GRIB_cfName:                              unknown
    GRIB_cfVarName:                           t2m
    GRIB_dataType:                            fc
    ...                                       ...
    units:                                    K
    cat:xrfreq:                               YS-JAN
    cat:frequency:                            yr
    cat:processing_level:                     indicators
    cat:variable:                             ('tas_max_january', 'tas_max_sp...
    cat:id:                                   

The same data can also be visualized as a pd.DataFrame as well :

[37]:
pd.set_option("display.max_rows", 100)
ds_climatology.mean("time").to_dataframe().T
[37]:
HYBAS_ID 7120034480 7120365812
spatial_agg polygon polygon
timestep D D
source era5_land_reanalysis era5_land_reanalysis
tas_max_january -5.979976 -1.652047
tas_max_spring 17.686927 20.677712
tas_max_summer_fall 20.757984 24.204149
tas_max_year 20.757984 24.204149
tas_mean_january -18.967689 -16.441736
tas_mean_spring 0.424023 4.184447
tas_mean_summer_fall 9.293055 12.396911
tas_mean_year 1.134716 4.487463
pr_sum_january 61.612786 57.209695
pr_sum_spring 467.706516 393.650887
pr_sum_summer_fall 614.891367 639.558307
pr_sum_year 1351.15288 1290.998159