1. GIS module

GIS operations are integral to hydrology processes. This page demonstrates how to use xhydro to perform GIS manipulations such as delineating watershed boundaries and extracting physiographic, climatological and geographical variables at the watershed scale.

[17]:
import leafmap
import pandas as pd
import xarray as xr
import xclim
import xdatasets as xd
from IPython.core.display import HTML, display

import xhydro.gis as xhgis
from xhydro.indicators import get_yearly_op

1.1. Watershed delineation

Currently, watershed delineation uses HydroBASINS (hybas_na_lev01-12_v1c) and can work in any location in North America. The process involves assessing all upstream sub-basins from a specified outlet and consolidating them into a unified watershed. The leafmap library is employed for generating interactive maps. This map serves the purpose of selecting outlets or visualizing the resulting watershed boundaries. Although utilizing the map is not essential for conducting the calculations, it proves useful for visualization purposes.

[18]:
m = leafmap.Map(center=(48.63, -74.71), zoom=5, basemap="USGS Hydrography")
m
[18]:

1.1.1. a) From a list of coordinates

In this scenario, we select two pour points, with each one representing the outlet for the watersheds of Lac Saint-Jean and the Ottawa River, respectively.

[19]:
lng_lat = [
    (-69.81971, 48.14720),  # Lac Saint-Jean watershed
    (-74.393438, 45.572442),  # Ottawa river watershed
]

1.1.2. b) From markers on a map

Instead of using a list, a more interactive approach is to directly select outlets from the existing map m. The following image illustrates the process of selecting pour points by dragging markers to the desired locations on the map.

test

The next cell is only useful for the documentation as it simulates a user selecting an outlet from the map m. You should instead remove this code and interact with the map in object m as shown above by positionning markers at sites of interest

[20]:
m.draw_features = [
    {
        "type": "Feature",
        "properties": {},
        "geometry": {"type": "Point", "coordinates": [-73.118597, 46.042467]},
    }
]  # Richelieu watershed

After selecting points using either approach a) or b), or a combination of both, we can initiate the watershed delineation calculation.

[21]:
gdf = xhgis.watershed_delineation(coordinates=lng_lat, map=m)
gdf
[21]:
HYBAS_ID Upstream Area (sq. km). geometry category color
0 7120034330 87595.8 POLYGON ((-74.37864 48.88141, -74.37452 48.886... 3 #41b6c4
1 7120398781 144026.8 POLYGON ((-80.07991 46.77860, -80.08529 46.782... 5 #081d58
2 7120382860 23717.7 POLYGON ((-73.77437 43.36757, -73.77557 43.388... 1 #ffffd9

The outcomes are stored in a GeoPandas gpd.GeoDataFrame (gdf) object, allowing us to save our polygons in various common formats such as an ESRI Shapefile or GeoJSON. If a map m is present, the polygons will automatically be added to it. If you want to visualize the map, simply type m in the code cell to render it. If displaying the map directly is not compatible with your notebook interpreter, you can utilize the following code to extract the HTML from the map and plot it:

[22]:
m.zoom_to_gdf(gdf)

1.1.3. c) From xdatasets

Automatically delineating watershed boundaries is a valuable tool in the toolbox, but users are encouraged to utilize official watershed boundaries if they already exist, instead of creating new ones. This functionality fetches a list of basins from xdatasets supported datasets, and upon request, xdatasets provides a gpd.GeoDataFrame containing the precalculated boundaries for these basins. Currently, the following watershed sources are available as of today.:

Source

Dataset name

DEH

deh_polygons

HYDAT

hydat_polygons

HQ

hq_polygons

[23]:
gdf = xd.Query(
    **{
        "datasets": {
            "deh_polygons": {
                "id": ["031*", "0421*"],
                "regulated": ["Natural"],
            }
        }
    }
).data.reset_index()

gdf
[23]:
Station Superficie geometry
0 031501 21.868620 POLYGON ((-72.47379 46.23340, -72.46888 46.228...
1 031502 15.708960 POLYGON ((-72.50127 46.21216, -72.50086 46.213...
2 042103 579.479614 POLYGON ((-78.49014 46.64514, -78.49010 46.645...

1.2. Extract watershed properties

After obtaining our watershed boundaries, we can extract valuable properties such as geographical information, land use classification and climatological data from the delineated watersheds.

1.2.1. a) Geographical watershed properties

Initially, we extract geographical properties of the watershed, including the perimeter, total area, Gravelius coefficient and basin centroid. It’s important to note that this function returns all the columns present in the provided gpd.GeoDataFrame argument.

[24]:
Station Superficie area perimeter gravelius centroid
0 031501 21.868620 2.186862e+07 27186.996845 1.640007 (-72.48631199105834, 46.22277542928622)
1 031502 15.708960 1.570896e+07 20263.293021 1.442220 (-72.47966677792694, 46.21359517038631)
2 042103 579.479614 5.794796e+08 283765.058390 3.325331 (-78.37036445281987, 46.48287117609677)

For added convenience, we can also retrieve the same results in the form of an xarray.Dataset:

[25]:
xhgis.watershed_properties(
    gdf[["Station", "geometry"]], unique_id="Station", output_format="xarray"
)
[25]:
<xarray.Dataset> Size: 120B
Dimensions:    (Station: 3)
Coordinates:
  * Station    (Station) object 24B '031501' '031502' '042103'
Data variables:
    area       (Station) float64 24B 2.187e+07 1.571e+07 5.795e+08
    perimeter  (Station) float64 24B 2.719e+04 2.026e+04 2.838e+05
    gravelius  (Station) float64 24B 1.64 1.442 3.325
    centroid   (Station) object 24B (-72.48631199105834, 46.22277542928622) ....

1.2.2. b) Land-use classification

Land use classification is powered by the Planetary Computer’s STAC catalog. It uses the 10m Annual Land Use Land Cover dataset by default (“io-lulc-9-class”), but other collections can be specified by using the collection argument.

[26]:
df = xhgis.land_use_classification(gdf, unique_id="Station")
df
[26]:
pct_crops pct_built_area pct_trees pct_rangeland pct_water pct_bare_ground pct_flooded_vegetation pct_snow/ice
Station
031501 0.776151 0.030159 0.191648 0.002042 0.000000 0.000000 0.000000 0.000000e+00
031502 0.755217 0.023085 0.218869 0.002830 0.000000 0.000000 0.000000 0.000000e+00
042103 0.000000 0.000101 0.863602 0.026126 0.109987 0.000021 0.000162 3.780111e-07
[27]:
ax = xhgis.land_use_plot(gdf, unique_id="Station", idx=2)
../_images/notebooks_gis_25_0.png

1.2.3. c) Climate indicators

The step of extracting climatic indicators is the most complex. Indeed, to accomplish this, access to a weather dataset for the various watersheds within our gdf object is required. Fortunately, xdatasets precisely facilitates such operations. Indeed, xdatasets allows extracting from a gridded dataset all the pixels contained within a watershed while respecting the weighting of the watershed intersecting each pixel.Subsequently, the function get_yearly_op, built upon the xclim library, offers impressive flexibility in defining indicators tailored to the user’s needs.

To initiate the process, we employ ERA5-Land reanalysis data spanning the period from 1981 to 2010 as our climatological dataset.

[28]:
datasets = {
    "era5_land_reanalysis": {"variables": ["t2m", "tp", "sd"]},
}
space = {
    "clip": "polygon",  # bbox, point or polygon
    "averaging": True,
    "geometry": gdf,  # 3 polygons
    "unique_id": "Station",
}
time = {
    "start": "1981-01-01",
    "end": "2010-12-31",
    "timezone": "America/Montreal",
}

ds = xd.Query(datasets=datasets, space=space, time=time).data.squeeze()

Because the next few steps use xclim under the hood, the dataset is required to be CF-compliant. At a minimum, the xarray.DataArray used must follow these principles:

  • The dataset needs a time dimension, usually at a daily frequency with no missing timesteps (NaNs are supported). If your data differs from that, you’ll need to be extra careful on the results provided.

  • If there is a spatial dimension, such as “Station” in the example below, it needs an attribute cf_role with timeseries_id as its value.

  • The variable will at the very least need a units attribute, although other attributes such as long_name and cell_methods are also expected by xclim and warnings will be generated if they are missing.

  • While this is not necessary for get_yearly_op, variable names should be one of those supported here for maximum compatibility.

The following code adds the missing attributes :

[29]:
ds = ds.rename({"t2m": "tas", "tp": "pr", "sd": "snd"})
ds["tas"] = xclim.core.units.convert_units_to(
    ds["tas"], "degC"
)  # Convert from degK to degC
ds["tas"].attrs.update({"cell_methods": "time: mean"})

ds["pr"].attrs.update({"units": "m d-1", "cell_methods": "time: mean within days"})
ds["pr"] = xclim.core.units.convert_units_to(
    ds["pr"], "mm d-1"
)  # Convert from m/d to mm/d

ds["snd"].attrs.update({"units": "m", "cell_methods": "time: mean within days"})
ds["snd"] = xclim.core.units.convert_units_to(ds["snd"], "mm")  # Convert from m to mm
ds
[29]:
<xarray.Dataset> Size: 21MB
Dimensions:  (time: 262968, Station: 3)
Coordinates:
  * time     (time) datetime64[ns] 2MB 1981-01-01 ... 2010-12-31T23:00:00
  * Station  (Station) object 24B '031501' '031502' '042103'
    source   <U20 80B 'era5_land_reanalysis'
Data variables:
    tas      (Station, time) float64 6MB -23.29 -23.28 -23.49 ... 2.389 2.339
    pr       (Station, time) float64 6MB 0.0 0.0 0.0 ... 0.003698 0.0006662
    snd      (Station, time) float64 6MB 55.07 55.07 55.07 ... 64.58 64.21 63.84

In the second step, we can define seasons using indexers that are compatible with xclim.core.calendar.select_time. There are currently four accepted types of indexers:

  • month, followed by a sequence of month numbers.

  • season, followed by one or more of ‘DJF’, ‘MAM’, ‘JJA’, and ‘SON’.

  • doy_bounds, followed by a sequence representing the inclusive bounds of the period to be considered ('start', 'end').

  • date_bounds, which is the same as above, but using a month-day ('%m-%d') format.

Following this, we specify the operations we intend to calculate for each variable. The supported operations include "max", "min", "mean", and "sum".

[30]:
timeargs = {
    "01": {"month": [1]},
    "02": {"month": [2]},
    "03": {"month": [3]},
    "04": {"month": [4]},
    "05": {"month": [5]},
    "06": {"month": [6]},
    "07": {"month": [7]},
    "08": {"month": [8]},
    "09": {"month": [9]},
    "10": {"month": [10]},
    "11": {"month": [11]},
    "12": {"month": [12]},
    "spring": {"date_bounds": ["02-11", "06-19"]},
    "summer_fall": {"date_bounds": ["06-20", "11-19"]},
    "year": {"date_bounds": ["01-01", "12-31"]},
}

operations = {
    "tas": ["max", "mean", "min"],
    "pr": ["sum"],
    "snd": ["mean"],
}

The combination of timeargs and operations through the Cartesian product yields a rapid generation of an extensive array of climate indicators.

[31]:
ds_climatology = xr.merge(
    [
        get_yearly_op(ds, input_var=variable, op=op, timeargs=timeargs)
        for (variable, ops) in operations.items()
        for op in ops
    ]
)
ds_climatology
[31]:
<xarray.Dataset> Size: 54kB
Dimensions:               (Station: 3, time: 30)
Coordinates:
  * Station               (Station) object 24B '031501' '031502' '042103'
    source                <U20 80B 'era5_land_reanalysis'
  * time                  (time) datetime64[ns] 240B 1981-01-01 ... 2010-01-01
Data variables: (12/75)
    tas_max_01            (Station, time) float64 720B 5.1 3.616 ... 3.161
    tas_max_02            (Station, time) float64 720B 13.23 1.781 ... 3.544
    tas_max_03            (Station, time) float64 720B 11.61 10.72 ... 12.78
    tas_max_04            (Station, time) float64 720B 19.4 20.66 ... 25.98
    tas_max_05            (Station, time) float64 720B 24.39 28.17 ... 32.24
    tas_max_06            (Station, time) float64 720B 29.32 28.33 ... 25.42
    ...                    ...
    snd_mean_10           (Station, time) float64 720B 0.3611 ... 0.4031
    snd_mean_11           (Station, time) float64 720B 4.29 2.369 ... 7.307
    snd_mean_12           (Station, time) float64 720B 44.07 5.757 ... 52.67
    snd_mean_spring       (Station, time) float64 720B 6.949 92.08 ... 33.31
    snd_mean_summer_fall  (Station, time) float64 720B 0.1345 0.1227 ... 0.7276
    snd_mean_year         (Station, time) float64 720B 14.36 47.45 ... 25.75
Attributes:
    cat:variable:          ('tas_max_01',)
    cat:xrfreq:            YS-JAN
    cat:frequency:         yr
    cat:processing_level:  indicators
    cat:id:                

The same data can also be visualized as a pd.DataFrame as well :

[32]:
pd.set_option("display.max_rows", 100)
ds_climatology.mean("time").to_dataframe().T
[32]:
Station 031501 031502 042103
source era5_land_reanalysis era5_land_reanalysis era5_land_reanalysis
tas_max_01 5.79328 5.83979 3.640586
tas_max_02 5.709544 5.74949 4.044133
tas_max_03 12.451726 12.531728 10.525447
tas_max_04 21.224634 21.254693 19.973493
tas_max_05 26.618743 26.62262 25.819365
tas_max_06 29.915254 29.914234 28.824476
tas_max_07 30.538177 30.541674 29.448428
tas_max_08 29.401676 29.396607 28.537646
tas_max_09 26.409743 26.420074 25.724547
tas_max_10 20.433635 20.457859 19.723571
tas_max_11 14.840299 14.884756 11.896061
tas_max_12 7.110035 7.155206 5.333868
tas_max_spring 29.27284 29.27054 28.397473
tas_max_summer_fall 31.171837 31.172982 30.377644
tas_max_year 31.373761 31.372966 30.683164
tas_mean_01 -11.098958 -11.070966 -12.603846
tas_mean_02 -8.575376 -8.555364 -10.618004
tas_mean_03 -3.011887 -2.994205 -5.155519
tas_mean_04 5.168779 5.193515 2.977892
tas_mean_05 12.849464 12.856895 10.543722
tas_mean_06 18.246934 18.250927 15.93364
tas_mean_07 20.60195 20.605273 18.610954
tas_mean_08 19.556622 19.560292 17.645789
tas_mean_09 15.100446 15.109986 13.367451
tas_mean_10 8.060541 8.074052 6.448969
tas_mean_11 1.330069 1.348995 -0.077759
tas_mean_12 -6.698234 -6.672344 -7.776868
tas_mean_spring 5.031998 5.047103 2.825004
tas_mean_summer_fall 14.466165 14.474516 12.628117
tas_mean_year 6.028106 6.042794 4.177412
tas_min_01 -28.778872 -28.809396 -30.041543
tas_min_02 -25.183904 -25.231202 -26.180241
tas_min_03 -20.468419 -20.494124 -22.264767
tas_min_04 -7.592577 -7.580951 -10.628039
tas_min_05 1.642597 1.65338 -1.659695
tas_min_06 8.113888 8.126952 4.091024
tas_min_07 12.12059 12.114125 9.005369
tas_min_08 10.297272 10.30083 8.655322
tas_min_09 4.89994 4.900005 3.395228
tas_min_10 -1.616498 -1.610984 -2.296447
tas_min_11 -11.544179 -11.551885 -9.818372
tas_min_12 -24.749929 -24.769652 -25.083827
tas_min_spring -24.09364 -24.113668 -25.060736
tas_min_summer_fall -6.582494 -6.571843 -7.22109
tas_min_year -30.032823 -30.080185 -31.024786
pr_sum_01 79.061732 79.090602 63.558415
pr_sum_02 66.888931 66.917058 51.602254
pr_sum_03 76.444733 76.478452 64.438532
pr_sum_04 93.790183 93.890757 71.526317
pr_sum_05 99.64739 99.727621 86.51231
pr_sum_06 110.389847 110.467514 89.062977
pr_sum_07 118.354004 118.343317 96.532123
pr_sum_08 108.421897 108.597886 96.009423
pr_sum_09 105.164435 105.113248 100.346083
pr_sum_10 104.317676 104.318942 96.030414
pr_sum_11 105.9594 105.997777 94.391328
pr_sum_12 92.276419 92.311662 72.925842
pr_sum_spring 376.994477 377.308115 310.340259
pr_sum_summer_fall 552.196939 552.293122 488.178244
pr_sum_year 1160.716646 1161.254838 982.936018
snd_mean_01 67.3871 66.592959 79.101277
snd_mean_02 95.084661 93.688919 113.425968
snd_mean_03 95.804241 93.601279 130.332078
snd_mean_04 20.813106 19.620342 59.455446
snd_mean_05 0.02517 0.021946 0.88963
snd_mean_06 -0.0 -0.0 0.003253
snd_mean_07 -0.0 -0.0 -0.0
snd_mean_08 -0.0 -0.0 -0.0
snd_mean_09 0.000001 0.000001 0.001507
snd_mean_10 0.218364 0.213499 0.825938
snd_mean_11 4.655122 4.610496 8.643051
snd_mean_12 30.770399 30.495989 39.301119
snd_mean_spring 41.788584 40.769412 61.98516
snd_mean_summer_fall 0.319896 0.315362 0.928477
snd_mean_year 25.923512 25.435699 35.612642