Regionalization of model parameters

Here we call the Regionalization WPS service to provide estimated streamflow (best estimate and ensemble) at an ungauged site using three pre-calibrated hydrological models and a large hydrometeorological database with catchment attributes (Extended CANOPEX). Multiple regionalization strategies are allowed.

import datetime as dt

from matplotlib import pyplot as plt

from ravenpy.config import commands as rc
from ravenpy.config.emulators import GR4JCN
from ravenpy.utilities.regionalization import (
    read_gauged_params,
    read_gauged_properties,
    regionalize,
)
from ravenpy.utilities.testdata import get_file

We can first start by setting up our model. This model will be setup on our ungauged basin, for which we want to generate streamflow. We still need to provide meteorological forcings and other descriptors (HRUs), however we do not provide a parameter set. This will be done by regionalization later.

# Get the forcing dataset for the ungauged watershed
ts = get_file("notebook_inputs/ERA5_weather_data_Salmon.nc")

# Get HRUs of ungauged watershed
hru = dict(
    area=4250.6,
    elevation=843.0,
    latitude=54.4848,
    longitude=-123.3659,
    hru_type="land",
)

# Set alternative names for netCDF variables
alt_names = {
    "TEMP_MIN": "tmin",
    "TEMP_MAX": "tmax",
    "PRECIP": "pr",
}

# Data types to extract from netCDF
data_type = ["TEMP_MAX", "TEMP_MIN", "PRECIP"]
data_kwds = {
    "ALL": {
        "elevation": hru["elevation"],
        "Latitude": hru["latitude"],
        "Longitude": hru["longitude"],
    },
}

# Model configuration for the ungauged watershed. Notice we are not providing parameters, because,
# by definition, we do not have the optimal parameters for an ungauged basin.
# Also note that, for now, only the GR4JCN, HMETS and MOHYSE models are supported, as they are the only ones
# for which we have a pre-computed database of parameters to use to estimate relationships between descriptors
# and model parameters.
model_config = GR4JCN(
    Gauge=[
        rc.Gauge.from_nc(
            ts, data_type=data_type, alt_names=alt_names, data_kwds=data_kwds
        )
    ],
    HRUs=[hru],
    StartDate=dt.datetime(1990, 1, 1),
    EndDate=dt.datetime(2010, 12, 31),
    RunName="regionalization",
)

We can now start working on the regionalization method and the required information.

# We need to provide the name of the model structure we are using. Can be "GR4JCN", "HMETS" or "MOHYSE"
model_structure = "GR4JCN"

# Read the table of model parameters and calibrated NSE values for all the basins in the donors dataset
nash, params = read_gauged_params(model_structure)

# Which variables do we want to use to estimate the parameter relationships?
# Possible values and their description are provided here:
"""
latitude  (catchment centroid latitude, degrees)
longitude (catchment centroid longitude, degrees)
area      (drainage area, km²)
gravelius (Gravelius index)
perimeter (catchment perimeter, m)
elevation (mean catchment elevation, m)
slope     (mean catchment slope, %)
aspect    (catchment orientation vs. North, degrees)
forest    (Land-use percentage as forest (%))
grass     (Land-use percentage as grass (%))
wetland   (Land-use percentage as wetlands (%))
urban     (Land-use percentage as urban areas (%))
shrubs    (Land-use percentage as shrubs (%))
crops     (Land-use percentage as crops (%))
snowIce   (Land-use percentage as permanent snow/ice (%))
"""
variables = ["latitude", "longitude", "area", "forest"]

# Read the desired properties from the donors table
props = read_gauged_properties(variables)

# Provide the values for the desired variables for the ungauged basin (used to estimate relationships)
ungauged_props = {
    "latitude": 40.4848,
    "longitude": -103.3659,
    "area": 4250.6,
    "forest": 0.4,
}

# Choice of the regionalization method. You can choose between the following methods (with their description):
"""
SP   (Spatial Proximity: Uses the latitude and longitude only by default, returns the nearest donors)
PS   (Physical Similarity: Finds the most similar donor catchments according to your desired variables)
MLR  (Multiple Linear Regression: Build a linear regression between the desired variables and the model
     parameters from the donor database. Then estimate parameters from the linear regression using
     the ungauged basin's properties.)
SP-IDW (Spatial Proximity but average the results of multiple donors using the inverse distance weighting
        based on distance)
PS-IDW (Physical Similarity but average the results of multiple donors using the inverse distance weighting
        of degree of similarity)
SP-IDW-RA (SP-IDW while adding regression-based parameters to the donor parameter dataset
          [Arsenault and Brissette, 2014])
PS-IDW-RA (PS-IDW while adding regression-based parameters to the donor parameter dataset
          [Arsenault and Brissette, 2014])
---
Arsenault, R., and Brissette, F. P. (2014), Continuous streamflow prediction in ungauged basins:
The effects of equifinality and parameter set selection on uncertainty in regionalization approaches,
Water Resour. Res., 50, 6135–6153, doi:10.1002/2013WR014898.
"""
regionalization_method = "SP-IDW-RA"

# Here we provide a threshold to exclude donor catchments. Basically, any donors whose calibration NSE is lower
# than this threshold is considered unreliable and is removed from the database prior to processing. 0.6-0.7 are
# generally well-accepted values in the literature. The higher the threshold, the fewer donors remain so an
# equilibrium must be found.
minimum_donor_NSE = 0.7

# Finally, we can choose how many donors we want to use. The value is only used for SP- and PS-based methods.
# The hydrographs generated by running the model using the parameters of multiple donors are averaged (either
# using a simple mean, or using IDW if we used the IDW tag) which results in generally better hydrographs than
# any of the single hydrographs.
number_donors = 5

# Launch the regionalization method and get
#  - hydrograph: the mean hydrograph, and
#  - ensemble_hydrograph: the hydrographs of each of the individual donors before averaging
hydrograph, ensemble_hydrograph = regionalize(
    config=model_config,
    method=regionalization_method,
    nash=nash,
    params=params,
    props=props,
    target_props=ungauged_props,
    min_NSE=minimum_donor_NSE,
    size=number_donors,
)

The hydrograph and ensemble outputs are netCDF files storing the time series. These files are opened by default using xarray, which provides convenient and powerful time series analysis and plotting tools.

display(hydrograph)

display(ensemble_hydrograph)

qq = ensemble_hydrograph.q_sim[0, :, 0]

ensemble_hydrograph.q_sim[:, :, 0].plot.line("b", x="time", add_legend=False)
ensemble_hydrograph.q_sim[1, :, 0].plot.line(
    "b", x="time", label="Regionalized hydrographs"
)
qq.plot.line("r", x="time", label="observations")
plt.legend(loc="upper right")
plt.show()

print("Max: ", hydrograph.max())
print("Mean: ", hydrograph.mean())
print("Monthly means: ", hydrograph.groupby("time.month").mean(dim="time"))