1. GIS operations¶
Geographic Information System (GIS) operations are crucial in hydrological analysis. This section illustrates how to leverage the xHydro library for performing key GIS tasks, including delineating watershed boundaries and extracting critical physiographic, climatological, and geographical variables at the watershed scale.
[13]:
import leafmap
import numpy as np
import pandas as pd
import xarray as xr
import xclim
import xdatasets
from IPython.display import clear_output
import xhydro as xh
import xhydro.gis as xhgis
clear_output(wait=False)
1.1. Watershed delineation¶
Currently, watershed delineation utilizes HydroSHEDS’ HydroBASINS (hybas_na_lev01-12_v1c) and is compatible with any location in North America. The process involves identifying all upstream sub-basins from a specified outlet and consolidating them into a unified watershed. The leafmap library is used to generate interactive maps, which allow for the selection of outlets or the visualization of the resulting watershed boundaries. While the use of the map is not mandatory for performing the calculations, it greatly enhances the visualization and understanding of the watershed delineation process.
[14]:
m = leafmap.Map(center=(48.63, -74.71), zoom=5, basemap="USGS Hydrography")
1.1.1. a) From a list of coordinates¶
A first option is to provide a list of coordinates. In the example below, we select two pour points, with each one representing the outlet for the watersheds of Montmorency and the Beaurivage River, respectively.
[15]:
lng_lat = [
(-71.15069, 46.89431), # Montmorency river watershed
(-71.28512, 46.66141), # Beaurivage river watershed
]
1.1.2. b) From markers on a map¶
Rather than using a list, a more interactive approach allows for directly selecting outlets from the existing map m, using the Draw a marker button located on the left of the map. The image below demonstrates how to select pour points by dragging markers to the desired locations on the map.
After selecting points using either approach a) or b), or a combination of both, we can initiate the watershed delineation calculation. This is done using the function xhydro.gis.watershed_delineation.
[16]:
Help on function watershed_delineation in module xhydro.gis:
watershed_delineation(
*,
coordinates: 'list[tuple] | tuple | None' = None,
m: 'leafmap.Map | None' = None
) -> 'gpd.GeoDataFrame'
Calculate watershed delineation from pour point.
Watershed delineation can be computed at any location in North America using HydroBASINS (hybas_na_lev01-12_v1c).
The process involves assessing all upstream sub-basins from a specified pour point and consolidating them into a
unified watershed.
Parameters
----------
coordinates : list of tuple, tuple, optional
Geographic coordinates (longitude, latitude) for the location where watershed delineation will be conducted.
m : leafmap.Map, optional
If the function receives both a map and coordinates as inputs, it will generate and display watershed
boundaries on the map. Additionally, any markers present on the map will be transformed into
corresponding watershed boundaries for each marker.
Returns
-------
gpd.GeoDataFrame
GeoDataFrame containing the watershed boundaries.
Warnings
--------
This function relies on an Amazon S3-hosted dataset to delineate watersheds.
[17]:
gdf = xhgis.watershed_delineation(coordinates=lng_lat, m=m)
gdf
[17]:
| HYBAS_ID | Upstream Area (sq. km) | geometry | |
|---|---|---|---|
| 0 | 7120034480 | 1157.6 | POLYGON ((-71.15208 46.8838, -71.15612 46.8869... |
| 1 | 7120365812 | 693.5 | POLYGON ((-71.09758 46.40035, -71.09409 46.403... |
The outcomes are stored in a GeoPandas gpd.GeoDataFrame object, enabling us to save the polygons in various common formats, such as ESRI Shapefile or GeoJSON. If a map m is available, the polygons will be automatically added to it. To visualize the map, simply type m in the code cell to render it. If displaying the map directly is not compatible with your notebook interpreter, you can use the following code to extract the HTML from the map and plot it:
[18]:
m.zoom_to_gdf(gdf)
1.1.3. c) From xdatasets¶
While automatically delineating watershed boundaries is a valuable tool, users are encouraged to utilize official watershed boundaries when available, rather than generating new ones. The xdatasets library, for example, hosts a few official boundaries that can be accessed using the xdatasets.Query function. As of today, the following watershed sources are available:
Source |
Dataset name |
|---|---|
hydat_polygons |
|
deh_polygons |
|
hq_polygons |
[19]:
xdatasets.Query(
**{
"datasets": {
"deh_polygons": {
"id": ["031502", "0421*"],
"regulated": ["Natural"],
}
}
}
).data.reset_index()
[19]:
| Station | Superficie | geometry | |
|---|---|---|---|
| 0 | 031502 | 15.708960 | POLYGON ((-72.50126 46.21216, -72.50086 46.213... |
| 1 | 042103 | 579.479614 | POLYGON ((-78.49014 46.64514, -78.4901 46.6450... |
1.2. Extraction of watershed properties¶
Once the watershed boundaries are obtained, we can extract valuable properties such as geographical information, land use classification, and climatological data from the delineated watersheds.
1.2.1. a) Geographical watershed properties¶
First, xhydro.gis.watershed_properties can be used to extract the geographical properties of the watershed, including the perimeter, total area, Gravelius coefficient, and basin centroid. It is important to note that this function returns all the columns present in the provided gpd.GeoDataFrame argument.
[20]:
Help on function watershed_properties in module xhydro.gis:
watershed_properties(
gdf: 'gpd.GeoDataFrame',
*,
unique_id: 'str | None' = None,
projected_crs: 'int | str | None' = 'NAD83',
output_format: "Literal['xarray', 'xr.Dataset', 'geopandas', 'gpd.GeoDataFrame']" = 'geopandas'
) -> 'gpd.GeoDataFrame | xr.Dataset'
Watershed properties extracted from a gpd.GeoDataFrame.
The calculated properties are :
- area
- perimeter
- gravelius
- centroid coordinates
Parameters
----------
gdf : gpd.GeoDataFrame
GeoDataFrame containing watershed polygons. Must have a defined .crs attribute.
unique_id : str, optional
The column name in the GeoDataFrame that serves as a unique identifier.
projected_crs : int | str
The projected coordinate reference system (crs) to utilize for calculations, such as determining watershed area.
If a string is provided, it should be a valid Geodetic CRS for the `gpd.estimate_utm_crs()` method.
If None, the function will use the `gpd.estimate_utm_crs()` default (WGS 84).
Default is an estimated CRS based on NAD83.
output_format : str
One of either `xarray` (or `xr.Dataset`) or `geopandas` (or `gpd.GeoDataFrame`).
Returns
-------
gpd.GeoDataFrame or xr.Dataset
Output dataset containing the watershed properties.
[21]:
[21]:
| HYBAS_ID | Upstream Area (sq. km) | area (m2) | estimated_area_diff (%) | perimeter (m) | gravelius (m/m) | centroid_lon | centroid_lat | |
|---|---|---|---|---|---|---|---|---|
| 0 | 7120034480 | 1157.6 | 1.160550e+09 | 0.254842 | 263428.762122 | 2.181355 | -71.091980 | 47.257839 |
| 1 | 7120365812 | 693.5 | 5.855856e+08 | -15.560840 | 158295.639822 | 1.845308 | -71.260464 | 46.452161 |
For added convenience, we can also retrieve the same results in the form of an xarray.Dataset:
[22]:
xhgis.watershed_properties(gdf, unique_id="HYBAS_ID", output_format="xarray")
[22]:
<xarray.Dataset> Size: 128B
Dimensions: (HYBAS_ID: 2)
Coordinates:
* HYBAS_ID (HYBAS_ID) int64 16B 7120034480 7120365812
Data variables:
Upstream Area (sq. km) (HYBAS_ID) float64 16B 1.158e+03 693.5
area (HYBAS_ID) float64 16B 1.161e+09 5.856e+08
estimated_area_diff (HYBAS_ID) float64 16B 0.2548 -15.56
perimeter (HYBAS_ID) float64 16B 2.634e+05 1.583e+05
gravelius (HYBAS_ID) float64 16B 2.181 1.845
centroid_lon (HYBAS_ID) float64 16B -71.09 -71.26
centroid_lat (HYBAS_ID) float64 16B 47.26 46.451.2.2. b) Surface properties¶
We can use xhydro.gis.surface_properties to extract surface properties for the same gpd.GeoDataFrame, such as slope and aspect. By default, these properties are calculated using Copernicus’ GLO-90 Digital Elevation Model as of 2021-04-22. However, both the source and the date can be modified through the function’s arguments.
[23]:
help(xhgis.surface_properties)
Help on function surface_properties in module xhydro.gis:
surface_properties(
gdf: 'gpd.GeoDataFrame',
*,
unique_id: 'str | None' = None,
projected_crs: 'int | str | None' = 'NAD83',
operation: 'str' = 'mean',
dataset_date: 'str' = '2021-04-22',
collection: 'str' = 'cop-dem-glo-90',
output_format: "Literal['xarray', 'xr.Dataset', 'geopandas', 'gpd.GeoDataFrame']" = 'geopandas'
) -> 'gpd.GeoDataFrame | xr.Dataset'
Surface properties for watersheds.
Surface properties are calculated using Copernicus's GLO-90 Digital Elevation Model.
By default, the dataset has a geographic coordinate system (EPSG: 4326) and this function expects a projected crs for more accurate results.
The calculated properties are :
- elevation (meters)
- slope (degrees)
- aspect ratio (degrees)
Parameters
----------
gdf : gpd.GeoDataFrame
GeoDataFrame containing watershed polygons. Must have a defined .crs attribute.
unique_id : str, optional
The column name in the GeoDataFrame that serves as a unique identifier.
projected_crs : int | str
The projected coordinate reference system (crs) to utilize for calculations, such as determining watershed area.
If a string is provided, it should be a valid Geodetic CRS for the `gpd.estimate_utm_crs()` method.
If None, the function will use the `gpd.estimate_utm_crs()` default (WGS 84).
Default is an estimated CRS based on NAD83.
operation : str
Aggregation statistics such as `mean` or `sum`.
dataset_date : str
Date (%Y-%m-%d) for which to select the imagery from the dataset. Date must be available.
collection : str
Collection name from the Planetary Computer STAC Catalog. Default is `cop-dem-glo-90`.
output_format : str
One of either `xarray` (or `xr.Dataset`) or `geopandas` (or `gpd.GeoDataFrame`).
Returns
-------
gpd.GeoDataFrame or xr.Dataset
Output dataset containing the surface properties.
Warnings
--------
This function relies on the Microsoft Planetary Computer's STAC Catalog to retrieve the Digital Elevation Model (DEM) data.
[24]:
[24]:
| elevation | slope | aspect | time | spatial_ref | |
|---|---|---|---|---|---|
| geometry | |||||
| 0 | 725.898315 | 5.307655 | 186.393692 | 2021-04-22 | 0 |
| 1 | 222.474442 | 1.356003 | 212.094788 | 2021-04-22 | 0 |
Again, for convenience, we can output the results in xarray.Dataset format :
[25]:
xhgis.surface_properties(gdf, unique_id="HYBAS_ID", output_format="xarray")
[25]:
<xarray.Dataset> Size: 72B
Dimensions: (HYBAS_ID: 2)
Coordinates:
* HYBAS_ID (HYBAS_ID) int64 16B 7120034480 7120365812
time datetime64[ns] 8B 2021-04-22
spatial_ref int64 8B 0
geometry (HYBAS_ID) geometry 16B POLYGON ((336034.51117111975 5194499...
Data variables:
elevation (HYBAS_ID) float32 8B 725.9 222.5
slope (HYBAS_ID) float32 8B 5.308 1.356
aspect (HYBAS_ID) float32 8B 186.4 212.11.2.3. c) Land use classification¶
Finally, we can retrieve land use classifications using xhydro.gis.land_use_classification. This function is powered by the Planetary Computer’s STAC catalog and, by default, uses the 10m Annual Land Use Land Cover (9-class) V2 dataset (“io-lulc-annual-v02”). However, alternative collections can be specified as arguments to the function.
[26]:
Help on function land_use_classification in module xhydro.gis:
land_use_classification(
gdf: 'gpd.GeoDataFrame',
*,
unique_id: 'str | None' = None,
collection='io-lulc-annual-v02',
year: 'str | int' = 'latest',
output_format: "Literal['xarray', 'xr.Dataset', 'geopandas', 'gpd.GeoDataFrame']" = 'geopandas'
) -> 'gpd.GeoDataFrame | xr.Dataset'
Calculate land use classification.
Calculate land use classification for each polygon from a gpd.GeoDataFrame. By default,
the classes are generated from the Planetary Computer's STAC catalog using the
`10m Annual Land Use Land Cover` dataset.
Parameters
----------
gdf : gpd.GeoDataFrame
GeoDataFrame containing watershed polygons. Must have a defined .crs attribute.
unique_id : str
GeoDataFrame containing watershed polygons. Must have a defined .crs attribute.
collection : str
Collection name from the Planetary Computer STAC Catalog.
year : str | int
Land use dataset year between 2017 and 2022.
output_format : str
One of either `xarray` (or `xr.Dataset`) or `geopandas` (or `gpd.GeoDataFrame`).
Returns
-------
gpd.GeoDataFrame or xr.Dataset
Output dataset containing the watershed properties.
Warnings
--------
This function relies on the Microsoft Planetary Computer's STAC Catalog to retrieve the Digital Elevation Model (DEM) data.
[27]:
df = xhgis.land_use_classification(gdf, unique_id="HYBAS_ID", output_format="xarray")
df
[27]:
<xarray.Dataset> Size: 168B
Dimensions: (HYBAS_ID: 2)
Coordinates:
* HYBAS_ID (HYBAS_ID) int64 16B 7120034480 7120365812
spatial_ref int64 8B 0
Data variables:
pct_trees (HYBAS_ID) float64 16B 0.9088 0.5992
pct_rangeland (HYBAS_ID) float64 16B 0.05135 0.08354
pct_water (HYBAS_ID) float64 16B 0.02159 0.002957
pct_built_area (HYBAS_ID) float64 16B 0.01713 0.03685
pct_snow/ice (HYBAS_ID) float64 16B 0.0005296 0.0002937
pct_crops (HYBAS_ID) float64 16B 0.0002972 0.277
pct_flooded_vegetation (HYBAS_ID) float64 16B 0.0001879 0.0
pct_bare_ground (HYBAS_ID) float64 16B 0.0001111 6.814e-05
pct_clouds (HYBAS_ID) float64 16B 0.0 5.926e-05
Attributes:
year: 2023
collection: io-lulc-annual-v02
spatial_resolution: 10.0[28]:
ax = xhgis.land_use_plot(gdf, unique_id="HYBAS_ID", idx=1)
1.2.4. d) Climate indicators¶
The extraction of climate indicators is the most complex step, as it requires accessing a weather dataset, then subsetting and averaging the data over the various watersheds within the GeoDataFrame object. These steps are outside the scope of xHydro, and users will need to rely on other libraries for this task. One approach, outlined in the Use Case Example, involves using a combination of xscen and xESMF. Another approach, demonstrated here, utilizes
xdatasets. Indeed, xdatasets enables the extraction of all the pixels from a gridded dataset within a watershed while accounting for the weighting of each pixel intersecting the watershed.
For this example, we will use ERA5-Land reanalysis data covering the period from 1981 to 2010 as our climatological dataset.
[29]:
datasets = {
"era5_land_reanalysis": {"variables": ["t2m", "tp"]},
}
space = {
"clip": "polygon", # bbox, point or polygon
"averaging": True,
"geometry": gdf,
"unique_id": "HYBAS_ID",
}
time = {
"start": "1981-01-01",
"end": "1982-12-31",
"timezone": "America/Montreal",
"timestep": "D",
"aggregation": {"tp": np.nansum, "t2m": np.nanmean},
}
ds = xdatasets.Query(datasets=datasets, space=space, time=time).data.squeeze()
Temporal operations: processing tp with era5_land_reanalysis: 100%|██████████| 2/2 [00:00<00:00, 34.55it/s]
Because the next steps utilize xclim under the hood, the dataset must be CF-compliant. At a minimum, the xarray.DataArray used should adhere to the following principles:
The dataset must include a
timedimension, typically at a daily frequency, with no missing timesteps (although NaNs are supported). If your data deviates from this format, extra caution should be taken when interpreting the results.If there is a 1D spatial dimension, such as
HYBAS_IDin the example below, it must have an attributecf_rolewith the valuetimeseries_id.The variable must include at least a
unitsattribute. While not mandatory, other attributes such aslong_nameandcell_methodsare expected byxclim, and warnings will be raised if they are missing.Variable names should match those supported by
xclim.
The following code will format the ERA5-Land data obtained from xdatasets to add missing metadata and change the variable names and units:
[30]:
ds = ds.rename({"t2m_nanmean": "tas", "tp_nansum": "pr"})
ds["tas"] = xclim.core.units.convert_units_to(ds["tas"], "degC")
ds["tas"].attrs.update({"cell_methods": "time: mean"})
ds["pr"].attrs.update({"units": "m d-1", "cell_methods": "time: mean within days"})
ds["pr"] = xclim.core.units.convert_units_to(ds["pr"], "mm d-1")
ds
[30]:
<xarray.Dataset> Size: 29kB
Dimensions: (HYBAS_ID: 2, time: 730)
Coordinates:
* HYBAS_ID (HYBAS_ID) int64 16B 7120034480 7120365812
* time (time) datetime64[ns] 6kB 1981-01-01 1981-01-02 ... 1982-12-31
spatial_agg <U7 28B 'polygon'
timestep <U1 4B 'D'
source <U20 80B 'era5_land_reanalysis'
Data variables:
tas (HYBAS_ID, time) float64 12kB -19.64 -15.9 ... -6.387 -5.33
pr (HYBAS_ID, time) float64 12kB 0.01237 3.258 ... 0.07566 0.176
Attributes: (12/30)
GRIB_NV: 0
GRIB_Nx: 1171
GRIB_Ny: 701
GRIB_cfName: unknown
GRIB_cfVarName: t2m
GRIB_dataType: fc
... ...
GRIB_totalNumber: 0
GRIB_typeOfLevel: surface
GRIB_units: K
long_name: 2 metre temperature
standard_name: unknown
units: KClimate indicators can be calculated in various ways, and for most simple tasks, directly using xclim is always a viable option. xclim offers an extensive list of available indicators within a user-friendly and flexible framework.
[31]:
xclim.atmos.tg_max(ds.tas)
[31]:
<xarray.DataArray 'tg_max' (HYBAS_ID: 2, time: 2)> Size: 32B
array([[292.65401005, 295.1619582 ],
[296.13479571, 298.57350323]])
Coordinates:
* HYBAS_ID (HYBAS_ID) int64 16B 7120034480 7120365812
* time (time) datetime64[ns] 16B 1981-01-01 1982-01-01
spatial_agg <U7 28B 'polygon'
timestep <U1 4B 'D'
source <U20 80B 'era5_land_reanalysis'
Attributes:
units: K
units_metadata: temperature: unknown
cell_methods: time: mean time: maximum over days
history: [2026-02-19 14:33:32] tg_max: TG_MAX(tas=tas, freq='YS')...
standard_name: air_temperature
long_name: Maximum daily mean temperature
description: Annual maximum of daily mean temperature.For more complex tasks, or when computing multiple indicators simultaneously, xhydro.indicators.compute_indicators is often the preferred method. This function allows users to build and pass multiple indicators at once, either by providing a list of custom indicators created with xclim.core.indicator.Indicator.from_dict (see the INFO box below) or by referencing a YAML file (see the ``xscen`
documentation <https://xscen.readthedocs.io/en/latest/notebooks/2_getting_started.html#Computing-indicators>`__).
INFO
Custom indicators in xHydro are built by following the YAML formatting required by xclim.
A custom indicator built using xclim.core.indicator.Indicator.from_dict will need these elements:
“data”: A dictionary with the following information:
“base”: The “YAML ID” obtained from here.
“input”: A dictionary linking the default xclim input to the name of your variable. Needed only if it is different. In the link above, they are the string following “Uses:”.
“parameters”: A dictionary containing all other parameters for a given indicator. In the link above, the easiest way to access them is by clicking the link in the top-right corner of the box describing a given indicator.
More entries can be used here, as described in the xclim documentation under “identifier”.
“identifier”: A custom name for your indicator. This will be the name returned in the results.
“module”: Needed, but can be anything. To prevent an accidental overwriting of
xclimindicators, it is best to use something different from: [“atmos”, “land”, “generic”].
[32]:
Help on function compute_indicators in module xscen.indicators:
compute_indicators(
ds: xarray.core.dataset.Dataset,
indicators: str | os.PathLike | collections.abc.Sequence[xclim.core.indicator.Indicator] | collections.abc.Sequence[tuple[str, xclim.core.indicator.Indicator]] | module,
*,
periods: list[str] | list[list[str]] | None = None,
restrict_years: bool = True,
to_level: str | None = 'indicators',
rechunk_input: bool = False
) -> dict
Calculate variables and indicators based on a YAML call to xclim.
The function cuts the output to be the same years as the inputs.
Hence, if an indicator creates a timestep outside the original year range (e.g. the first DJF for QS-DEC),
it will not appear in the output.
Parameters
----------
ds : xr.Dataset
Dataset to use for the indicators.
indicators : str | os.PathLike | Sequence[Indicator] | Sequence[tuple[str, Indicator]] | ModuleType
Path to a YAML file that instructs on how to calculate missing variables.
Can also be only the "stem", if translations and custom indices are implemented.
Can be the indicator module directly, or a sequence of indicators or a sequence of
tuples (indicator name, indicator) as returned by `iter_indicators()`.
periods : list of str or list of lists of str, optional
Either [start, end] or list of [start, end] of continuous periods over which to compute the indicators.
This is needed when the time axis of ds contains some jumps in time.
If None, the dataset will be considered continuous.
restrict_years : bool
If True, cut the time axis to be within the same years as the input.
This is mostly useful for frequencies that do not start in January, such as QS-DEC.
In that instance, `xclim` would start on previous_year-12-01 (DJF), with a NaN.
`restrict_years` will cut that first timestep. This should have no effect on YS and MS indicators.
to_level : str, optional
The processing level to assign to the output.
If None, the processing level of the inputs is preserved.
rechunk_input : bool
If True, the dataset is rechunked with :py:func:`flox.xarray.rechunk_for_blockwise`
according to the resampling frequency of the indicator. Each rechunking is done
once per frequency with :py:func:`xscen.utils.rechunk_for_resample`.
Returns
-------
dict
Dictionary (keys = timedeltas) with indicators separated by temporal resolution.
See Also
--------
xclim.indicators, xclim.core.indicator.build_indicator_module_from_yaml
[33]:
indicators = [
# Maximum summer temperature
xclim.core.indicator.Indicator.from_dict(
data={
"base": "tg_max",
"parameters": {"indexer": {"month": [6, 7, 8]}},
},
identifier="tg_max_summer",
module="example",
),
# Number of days with more than 5 mm of precipitation
xclim.core.indicator.Indicator.from_dict(
data={
"base": "wetdays",
"parameters": {"thresh": "5 mm d-1"},
},
identifier="wet5mm",
module="example",
),
]
xh.indicators.compute_indicators(ds, indicators)["YS-JAN"]
[33]:
<xarray.Dataset> Size: 208B
Dimensions: (HYBAS_ID: 2, time: 2)
Coordinates:
* HYBAS_ID (HYBAS_ID) int64 16B 7120034480 7120365812
* time (time) datetime64[ns] 16B 1981-01-01 1982-01-01
spatial_agg <U7 28B 'polygon'
timestep <U1 4B 'D'
source <U20 80B 'era5_land_reanalysis'
Data variables:
tg_max_summer (HYBAS_ID, time) float64 32B 292.7 295.2 296.1 298.6
wet5mm (HYBAS_ID, time) float64 32B 85.0 83.0 80.0 78.0
Attributes: (12/34)
GRIB_NV: 0
GRIB_Nx: 1171
GRIB_Ny: 701
GRIB_cfName: unknown
GRIB_cfVarName: t2m
GRIB_dataType: fc
... ...
standard_name: unknown
units: K
cat:xrfreq: YS-JAN
cat:frequency: yr
cat:processing_level: indicators
cat:variable: ('tg_max_summer', 'wet5mm')Finally, xhydro.indicators.get_yearly_op, also built upon the xclim library, provides a flexible method for obtaining yearly values of specific statistics, such as annual or seasonal maxima. This can be particularly useful, for instance, when extracting raw data needed for frequency analyses.
[34]:
Help on function get_yearly_op in module xhydro.indicators.generic:
get_yearly_op(
ds,
op,
*,
input_var: str = 'q',
window: int = 1,
timeargs: dict | None = None,
missing: str = 'skip',
missing_options: dict | None = None,
interpolate_na: bool = False
) -> xarray.core.dataset.Dataset
Compute yearly operations on a variable.
Parameters
----------
ds : xr.Dataset
Dataset containing the variable to compute the operation on.
op : str
Operation to compute. One of ["max", "min", "mean", "sum"].
input_var : str
Name of the input variable. Defaults to "q".
window : int
Size of the rolling window. A "mean" operation is performed on the rolling window before the call to xclim.
This parameter cannot be used with the "sum" operation.
timeargs : dict, optional
Dictionary of time arguments for the operation.
Keys are the name of the period that will be added to the results (e.g. "winter", "summer", "annual").
Values are up to two dictionaries, with both being optional.
The first is {'freq': str}, where str is a frequency supported by xarray (e.g. "YS", "YS-JAN", "YS-DEC").
It needs to be a yearly frequency. Defaults to "YS-JAN".
The second is an indexer as supported by :py:func:`xclim.core.calendar.select_time`.
Defaults to {}, which means the whole year.
See :py:func:`xclim.core.calendar.select_time` for more information.
Examples: {"winter": {"freq": "YS-DEC", "date_bounds": ["12-01", "02-28"]}}, {"jan": {"freq": "YS", "month": 1}}, {"annual": {}}.
missing : str
How to handle missing values. One of "skip", "any", "at_least_n", "pct", "wmo".
See :py:func:`xclim.core.missing` for more information.
missing_options : dict, optional
Dictionary of options for the missing values' method. See :py:func:`xclim.core.missing` for more information.
interpolate_na : bool
Whether to interpolate missing values before computing the operation. Only used with the "sum" operation.
Defaults to False.
Returns
-------
xr.Dataset
Dataset containing the computed operations, with one variable per indexer.
The name of the variable follows the pattern `{input_var}{window}_{op}_{indexer}`.
Notes
-----
If you want to perform a frequency analysis on a frequency that is finer than annual, simply use multiple timeargs
(e.g. 1 per month) to create multiple distinct variables.
The timeargs argument relies on indexers that are compatible with xclim.core.calendar.select_time. Four types of indexers are currently accepted:
month, followed by a sequence of month numbers.season, followed by one or more of'DJF','MAM','JJA', and'SON'.doy_bounds, followed by a sequence representing the inclusive bounds of the period to be considered ('start','end').date_bounds, similar todoy_bounds, but using a month-day ('%m-%d') format.
Subsequently, we specify the operations to be calculated for each variable. Supported operations include "max", "min", "mean", and "sum".
[35]:
timeargs = {
"january": {"month": [1]},
"spring": {"date_bounds": ["02-11", "06-19"]},
"summer_fall": {"date_bounds": ["06-20", "11-19"]},
"year": {"date_bounds": ["01-01", "12-31"]},
}
operations = {
"tas": ["max", "mean"],
"pr": ["sum"],
}
The combination of timeargs and operations through the Cartesian product enables the efficient generation of a comprehensive array of climate indicators.
[36]:
ds_climatology = xr.merge(
[
xh.indicators.get_yearly_op(ds, input_var=variable, op=op, timeargs=timeargs)
for (variable, ops) in operations.items()
for op in ops
]
)
ds_climatology
[36]:
<xarray.Dataset> Size: 528B
Dimensions: (HYBAS_ID: 2, time: 2)
Coordinates:
* HYBAS_ID (HYBAS_ID) int64 16B 7120034480 7120365812
* time (time) datetime64[ns] 16B 1981-01-01 1982-01-01
spatial_agg <U7 28B 'polygon'
timestep <U1 4B 'D'
source <U20 80B 'era5_land_reanalysis'
Data variables:
tas_max_january (HYBAS_ID, time) float64 32B -4.165 -7.795 ... -3.176
tas_max_spring (HYBAS_ID, time) float64 32B 17.11 18.26 21.04 20.31
tas_max_summer_fall (HYBAS_ID, time) float64 32B 19.5 22.01 22.98 25.42
tas_max_year (HYBAS_ID, time) float64 32B 19.5 22.01 22.98 25.42
tas_mean_january (HYBAS_ID, time) float64 32B -18.11 -19.82 ... -16.8
tas_mean_spring (HYBAS_ID, time) float64 32B 1.696 -0.8479 5.592 2.777
tas_mean_summer_fall (HYBAS_ID, time) float64 32B 9.367 9.219 12.23 12.57
tas_mean_year (HYBAS_ID, time) float64 32B 1.734 0.5356 4.881 4.094
pr_sum_january (HYBAS_ID, time) float64 32B 34.29 88.93 28.49 85.93
pr_sum_spring (HYBAS_ID, time) float64 32B 591.9 343.5 453.4 333.9
pr_sum_summer_fall (HYBAS_ID, time) float64 32B 603.6 626.1 743.2 535.9
pr_sum_year (HYBAS_ID, time) float64 32B 1.417e+03 ... 1.163e+03
Attributes: (12/35)
GRIB_NV: 0
GRIB_Nx: 1171
GRIB_Ny: 701
GRIB_cfName: unknown
GRIB_cfVarName: t2m
GRIB_dataType: fc
... ...
units: K
cat:xrfreq: YS-JAN
cat:frequency: yr
cat:processing_level: indicators
cat:variable: ('tas_max_january', 'tas_max_sp...
cat:id: The same data can also be visualized as a pd.DataFrame as well :
[37]:
pd.set_option("display.max_rows", 100)
ds_climatology.mean("time").to_dataframe().T
[37]:
| HYBAS_ID | 7120034480 | 7120365812 |
|---|---|---|
| spatial_agg | polygon | polygon |
| timestep | D | D |
| source | era5_land_reanalysis | era5_land_reanalysis |
| tas_max_january | -5.979976 | -1.652047 |
| tas_max_spring | 17.686927 | 20.677712 |
| tas_max_summer_fall | 20.757984 | 24.204149 |
| tas_max_year | 20.757984 | 24.204149 |
| tas_mean_january | -18.967689 | -16.441736 |
| tas_mean_spring | 0.424023 | 4.184447 |
| tas_mean_summer_fall | 9.293055 | 12.396911 |
| tas_mean_year | 1.134716 | 4.487463 |
| pr_sum_january | 61.612786 | 57.209695 |
| pr_sum_spring | 467.706516 | 393.650887 |
| pr_sum_summer_fall | 614.891367 | 639.558307 |
| pr_sum_year | 1351.15288 | 1290.998159 |