4. Frequency analysis module¶

[1]:

# Basic imports
import hvplot.xarray
import numpy as np
import xarray as xr
import xdatasets as xd

import xhydro as xh
import xhydro.frequency_analysis as xhfa

Redefining 'percent' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
Redefining '%' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
Redefining 'year' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
Redefining 'yr' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
Redefining 'C' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
Redefining 'd' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
Redefining 'h' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
Redefining 'degrees_north' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
Redefining 'degrees_east' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
Redefining 'degrees' (<class 'pint.delegates.txt_defparser.plain.UnitDefinition'>)
Redefining '[speed]' (<class 'pint.delegates.txt_defparser.plain.DerivedDimensionDefinition'>)

4.3. Local frequency analysis¶

Once we have our yearly maximums (or volumes/minimums), the first step in a local frequency analysis is to call xhfa.local.fit to obtain distribution parameters. The options are:

distributions: a list of SciPy distributions. Defaults to: ["expon", "gamma", "genextreme", "genpareto", "gumbel_r", "pearson3", "weibull_min"].
min_years: the minimum number of years required to fit the data.
method: the fitting method. Defaults to the maximum likelihood.

[13]:

# To speed up the Notebook, we'll only perform the analysis on a subset of variables
params = xhfa.local.fit(
    ds_4fa[["streamflow_max_spring", "volume_sum_spring"]], min_years=15
)

params

[13]:

<xarray.Dataset> Size: 4kB
Dimensions:                (id: 5, dparams: 5, scipy_dist: 7)
Coordinates: (12/16)
  * id                     (id) object 40B '020302' '020404' ... '020802'
  * dparams                (dparams) <U5 100B 'a' 'c' 'skew' 'loc' 'scale'
  * scipy_dist             (scipy_dist) <U11 308B 'expon' ... 'weibull_min'
    drainage_area          (id) float32 20B 1.09e+03 647.0 59.8 626.0 1.2e+03
    end_date               (id) datetime64[ns] 40B 2006-10-13 ... 1996-08-13
    latitude               (id) float32 20B 48.77 48.81 48.98 48.98 49.2
    ...                     ...
    source                 <U102 408B 'Ministère de l’Environnement, de la Lu...
    spatial_agg            <U9 36B 'watershed'
    start_date             (id) datetime64[ns] 40B 1989-08-12 ... 1970-01-01
    time_agg               <U4 16B 'mean'
    timestep               <U1 4B 'D'
    variable               <U10 40B 'streamflow'
Data variables:
    streamflow_max_spring  (scipy_dist, id, dparams) float64 1kB dask.array<chunksize=(1, 5, 5), meta=np.ndarray>
    volume_sum_spring      (scipy_dist, id, dparams) float64 1kB dask.array<chunksize=(1, 5, 5), meta=np.ndarray>
Attributes:
    cat:frequency:         yr
    cat:processing_level:  indicators
    cat:id:

Information Criteria such as the AIC, BIC, and AICC are useful to determine which statistical distribution is better suited to a given location. These three criteria can be computed using xhfa.local.criteria.

[14]:

criteria = xhfa.local.criteria(
    ds_4fa[["streamflow_max_spring", "volume_sum_spring"]], params
)

criteria

[14]:

<xarray.Dataset> Size: 3kB
Dimensions:                (id: 5, scipy_dist: 7, criterion: 3)
Coordinates: (12/16)
    drainage_area          (id) float32 20B 1.09e+03 647.0 59.8 626.0 1.2e+03
    end_date               (id) datetime64[ns] 40B 2006-10-13 ... 1996-08-13
  * id                     (id) object 40B '020302' '020404' ... '020802'
    latitude               (id) float32 20B 48.77 48.81 48.98 48.98 49.2
    longitude              (id) float32 20B -64.52 -64.92 -64.43 -64.7 -65.29
    name                   (id) object 40B 'Saint' 'York' ... 'Madeleine'
    ...                     ...
    start_date             (id) datetime64[ns] 40B 1989-08-12 ... 1970-01-01
    time_agg               <U4 16B 'mean'
    timestep               <U1 4B 'D'
    variable               <U10 40B 'streamflow'
  * scipy_dist             (scipy_dist) <U11 308B 'expon' ... 'weibull_min'
  * criterion              (criterion) <U4 48B 'aic' 'bic' 'aicc'
Data variables:
    streamflow_max_spring  (scipy_dist, id, criterion) float64 840B dask.array<chunksize=(1, 5, 3), meta=np.ndarray>
    volume_sum_spring      (scipy_dist, id, criterion) float64 840B dask.array<chunksize=(1, 5, 3), meta=np.ndarray>
Attributes:
    cat:frequency:         yr
    cat:processing_level:  indicators
    cat:id:

Finally, return periods can be obtained using xhfa.local.parametric_quantiles. The options are:

t: the return period(s) in years.
mode: whether the return period is the probability of exceedance ("max") or non-exceedance ("min"). Defaults to "max".

[15]:

rp = xhfa.local.parametric_quantiles(params, t=[20, 100])

rp.load()

In a future release, plotting will be handled by a proper function. For now, we’ll show an example in this Notebook using preliminary utilities.

xhfa.local._prepare_plots generates datapoints required to plot the results of the frequency analysis. If log=True, it will return log-spaced x values between xmin and xmax.

[16]:

data = xhfa.local._prepare_plots(params, xmin=1, xmax=1000, npoints=50, log=True)
data.load()

xhfa.local._get_plotting_positions allows you to get plotting positions for all variables in the dataset. It accepts alpha beta arguments. See the SciPy documentation for typical values. By default, (0.4, 0.4) will be used, which corresponds to approximately quantile unbiased (Cunnane).

[17]:

pp = xhfa.local._get_plotting_positions(ds_4fa[["streamflow_max_spring"]])
pp

[18]:

# Lets plot the observations
p1 = data.streamflow_max_spring.hvplot(
    x="return_period", by="scipy_dist", grid=True, groupby=["id"], logx=True
)

[19]:

# Lets now plot the distributions
p2 = pp.hvplot.scatter(
    x="streamflow_max_spring_pp",
    y="streamflow_max_spring",
    grid=True,
    groupby=["id"],
    logx=True,
)

[20]:

# And now combining the plots
p1 * p2

[20]:

4. Frequency analysis module¶

4.1. Extracting and preparing the data¶

4.2. Customizing the analysis settings¶

4.2.1. a) Defining seasons¶

4.2.2. b) Getting block maxima¶

4.2.3. c) Using custom seasons per year or per station¶

4.2.4. d) Computing volumes¶

4.3. Local frequency analysis¶