04 - Emulating hydrological models

Using Ravenpy to emulate an existing hydrological model

In this notebook, we will demonstrate the versatility of the Raven modelling framework to emulate one of eight hydrological models that are currently supported. We will walk through the different configuration parameters required to build the model and simulate streamflow on a catchments. We will also show how to import files from a pre-configured Raven configuration that users can inport into Ravenpy instead of using one of the default emulators.

A note on datasets

There are numerous ways to run a Raven model and to pass its required input data. For this introduction to RavenPy, we will use our ERA5 data we generated in the previous notebook and we will configure the Raven model instance on the fly! In the next tutorials, we will see how users can import and use their own datasets to make the entire process flexible and tailored to the user needs.

Using templated model emulators

The first thing we need to run the raven model is… a Raven model! Raven is not a model per se, but a modelling framework that can be used to build hydrological models from their underlying components. For now, PAVICS-Hydro allows building a set of pre-determined models. The Python wrapper offers at present eight model emulators: GR4J-CN, HMETS, MOHYSE, HBV-EC, Canadian Shield, HYPR, Sacramento and Blended. For each of these, templated configuration files are available to facilitate launching the model with options passed by Python at run-time.

In the next cell, we are going to import the possible models, and later, we will configure and run the GR4J-CN model. Please see the documentation for more details on the mandatory vs optional parameters, and what they represent. A small glimpse is provided here.

# Import the list of possible model templates.
from ravenpy.config.emulators import (
    GR4JCN,
    HBVEC,
    HMETS,
    HYPR,
    SACSMA,
    Blended,
    CanadianShield,
    Mohyse,
)

import datetime as dt

# Import other required packages:
import tempfile
from pathlib import Path

from ravenpy.config import commands as rc
from ravenpy.utilities.testdata import get_file

In this next step, we will define the hydrological response unit (HRU). For lumped models, there is only one unit so the following structure should be good. However, for distributed modelling, there will be more than one HRU, so we would use another tool to help us build the HRUs in that case. The HRU provides information on the area, elevation, and location of the catchment.

For now, let’s provide the basin properties such that Raven can run. These are the minimal values that must always be provided, but some models might require other inputs. Please see the Raven documentation for more information.

# Define the hydrological response unit. We can use the information from the tutorial notebook #02! Here we are using
# arbitrary data for a test catchment.
hru = dict(
    area=4250.6,
    elevation=843.0,
    latitude=54.4848,
    longitude=-123.3659,
    hru_type="land",
)

The next required inputs are the start and end dates for the simulation. The start_date and end_date arguments indicate when a simulation should start and end. As long as the forcing data covers the simulation period, it should work. If these parameters are not defined, then start and end dates default to the start and end of the driving data.

To keep things simple, we will use a short 5-year period. Note that the dates are python datetime.datetime objects.

start_date = dt.datetime(1985, 1, 1)
end_date = dt.datetime(1990, 1, 1)

We are now ready to build our first Raven-based hydrological model. the model will be the GR4JCN model. The following code block will show and describe every step. However, more control options are available for users. Please see the documentation for a more detailed explanation on model options.

# Import required packages. We already imported the GR4JCN emulator in the first cell, but let's keep it here for
# completeness.
from ravenpy.config.emulators import GR4JCN

# Since our meteorological gauge data is all included in a single file, we need to tell the model which variables
# we are providing. We will generate the list now and pass it later to Ravenpy as an argument to the model.
data_type = ["TEMP_MAX", "TEMP_MIN", "PRECIP"]

# Alternative variable names are useful for allowing Raven to read variables from NetCDF files even if the variable
# names are not those that are expected by Raven. For example, our ERA5-dernived temperature variables are named
# "tmax" and "tmin", whereas Raven expects "TEMP_MAX" and "TEMP_MIN". Therefore, instead of forcing users to rename
# their variables, we provide a mechanism to tell Raven which variables in the NetCDF files correspond to which
# meteorological variable.
alt_names = {
    "TEMP_MIN": "tmin",
    "TEMP_MAX": "tmax",
    "PRECIP": "pr",
}

# As per the Raven model itself, the gauge station elevation, latitude and longitude are required to let the model run.
# For lumped models such as the ones we are currently emulating, there is only one "station" that corresponds to the basin-averaged weather.
# In this case, we set the station elevation to that of the mean catchment elevation to remove any adiabatic gradient modifications
# to the data. We also provide the catchment centroid latitude and longitude which we take from the only HRU defining the entire catchment.
# For multi-gauge basins and semi-distributed models, the latitude and longitude must be correctly identified for each station.
data_kwds = {
    "ALL": {  # Use this for all gauges (there is only one gauge, as it is a lumped model using basin-averaged weather)
        "elevation": hru[
            "elevation"
        ],  # extract the values directly from the "hru" we previously built
        "latitude": hru["latitude"],
        "longitude": hru["longitude"],
    }
}

# Provide a run name used to generate output Raven files.
run_name = "test_NB_04"

# Get the weather data. It can either be to a data file that was already in the same folder/workspace as this
# notebook, as we generated in the previous notebook, or it could be a path to a file stored elsewhere.
# Example for using the data we just generated in Notebook 03:
"""
ERA5_full = ERA5_weather_data.nc
"""

# In our case, we will prefer to link to existing, pre-computed and locally stored files to keep things tidy:
ERA5_full = get_file("notebook_inputs/ERA5_weather_data.nc")


# We need to define some configuration options that all models will need. See each line for more details on their use.
default_emulator_config = dict(
    # The HRU as defined earlier must be provided. This is the physical representation of the catchment that Raven
    # needs for certain models and processes.
    HRUs=[hru],
    # Model simulation start and end dates.
    StartDate=start_date,
    EndDate=end_date,
    # Name of the simulation. Raven will prefix all .rvX files and model outputs with the runName.
    RunName=run_name,
    # Custom outputs allow pre-processing of certain statistics and variables after the model runs. Please see the
    # documentation for more details on possible options.
    CustomOutput=[
        rc.CustomOutput(
            time_per="YEARLY",
            stat="AVERAGE",
            variable="PRECIP",
            space_agg="ENTIRE_WATERSHED",
        )
    ],
    # Here we will prepare the weather gauge data to be fed to the model. The data could also come from other
    # sources such as the ERA5 reanalysis product.
    Gauge=[
        rc.Gauge.from_nc(
            ERA5_full,  # Path to the ERA5 file containing all three meteorological variables
            data_type=data_type,  # Note that this is the list of all the variables
            alt_names=alt_names,  # Note that all variables here are mapped to their names in the netcdf file.
            data_kwds=data_kwds,
        )
    ],
)

# Here is where we build the raven configuration. We will first configure the model in memory based on the user
# preferences, and then we will write the Raven configuration files to disk such that Raven can read them and
# execute a simulation. In this case, we are building a GR4JCN model, but you could change this to any of the
# eight models that are preconfigured for direct emulation in Ravenpy.
m = GR4JCN(
    # Raven requires parameters for the GR4JCN model. For now, we provide default values, but in a later notebook
    # we will show how to calibrate and find new parameters for our model.
    params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
    # GR4JCN needs an extra constant parameter for the catchment, corresponding to the G50 parameter in the CEMANEIGE
    # description. Here is how we provide it.
    GlobalParameter={"AVG_ANNUAL_SNOW": 408.480},
    **default_emulator_config,
)

# We are now ready to write this newly-configured model to disk through the use of .rvX files that Raven will read.

# In the following code snippet, there is a "workdir" path that must be provided. This indicates the path
# where the data used to run the model (RAVEN .RV files) will be made available from the PAVICS Jupyter environment.
# The default "workdir" setting puts model outputs in the temporary directory ("/tmp"),
# which is not visible from the Jupyter file explorer. Therefore, You can change the last subfolder,
# but '/notebook_dir/writable-workspace/' must be the beginning  of the path when running on the PAVICS platform.

workdir = Path(tempfile.mkdtemp(prefix="NB4"))
m.write_rv(workdir=workdir)

It can be seen that .rvX files were generated in the indicated path. This makes the Ravenpy platform more flexible for various use-cases and can be exported/imported as needed.

The above code only created the .rvX files. The model did not actually run yet. To do so, we must ask it to, as follows. Don’t worry about the warnings: Raven is informing us that it has generated some internal parameters to build the model configuration based on some of the parameters we provided.

# If we want to import our own raven configuration files and forcing data, we can do so by importing them
# using the ravenpy.run  method. This will run the model exactly as the users will have designed it.
from ravenpy import OutputReader
from ravenpy.ravenpy import run

# This is used to specify the raven configuration files prefixes. In this case, we will retake the previously created files
run_name = run_name

# This is the path where the files were uploaded by the user. Model outputs will also be placed there in a
# subfolder called "outputs"
configdir = workdir

# Run the model and get the path to the outputs folder that can be used in the output reader.
outputs_path = run(modelname=run_name, configdir=configdir)

# Get the outputs using the Output Reader object.
outputs = OutputReader(run_name=run_name, path=outputs_path)

# If we already have a model configuration that we built in-memory (such as the "m" GR4JCN model we built above),
# then we can use the Emulator object to simply emulate the model we were working on and get outputs directly
from ravenpy import Emulator

# Prepare the emulator by writing files on disk
e = Emulator(config=m)

# Run the model and get the outputs.
outputs = e.run()

However, the above demonstration shows how to use one of the eight emulators to run a Raven simulation. Ravenpy also contains other powerful tools to run other user-defined raven models by reading existing configuration files and running the model in Ravenpy. This means that users that already have Raven models of their systems can upload the configuration and hydrometeorological data netcdf files to their private account on the PAVICS-Hydro server and run their model there. Here is an example of how this could be done.

At this stage, no matter the method we used, we have the outputs of the model simulations in the “outputs” object. Let’s explore it:

# Show the files available in the outputs. Each of these can be accessed to get information about the simulation.
outputs.files

The outputs are as follow:

hydrograph: The actual simulated hydrograph (q_sim), in netcdf format. It also contains the observed discharge (q_obs) if observed streamflow was provided as a forcing file.
storage: The state variables of the simulation duration, in netcdf format
solution: The state variables at the end of the simulation, which are saved as a “.rvc” file that can be used to hot-start a model (for forecasting, for example)
messages: A list of messages returned by Raven when executing the run.

You can explore the outputs using othe following syntax. This loads the data into memory to be used directly in another cell for processing or analysis.

# The model outputs are actually already loaded as Python objects in memory, thus we can access the data directly.
print("----------------HYDROGRAPH----------------")
display(outputs.hydrograph)
print("")
print("-----------------STORAGE------------------")
display(outputs.storage)
print("")
print("-----------------SOLUTION-----------------")
display(outputs.solution)
print("")

We can see in the “hydrograph” section that the model has generated a simulation using the forcing data we provided, but it only used the period between the start_date and end_date we asked it for. We can see that the dates of the ERA5 data we requested in the previous notebook cover the period 1980-01-01 to 1991-01-01. In our simulation, we only ask to run over the period from 1985-01-01 to 1990-01-01. Raven takes care of subsetting the data for the required period. We can look at the simulated streamflow from Raven to confirm this:

# Import the graphing utility built to handle Raven model outputs
from ravenpy.utilities.nb_graphs import hydrographs

hydrograph_objects = outputs.hydrograph
hydrographs(hydrograph_objects)

As you can see, the simulated flow covers only the period we asked for. The results probably don’t look good, but that’s OK! We will soon calibrate our model to get reasonable parameters.

We could also simply do basic plots using:

outputs.hydrograph.q_sim.plot()

Finally, we can inspect and work with other state variables in the model outputs. For example, say we want to investigate the snow water equivalent timeseries. We can first get the list of available state variables:

print(list(outputs.storage.keys()))

And then plot the variable of interest:

# Plot the "Snow" variable
outputs.storage["Snow"].plot()

As you can see, PAVICS-Hydro makes it easy to build a hydrological model, run it with forcing data, and then interact with the results! In the next notebooks, we will see how to adjust configuration files (the .rvX files) to setup and run a model, and also how to calibrate its parameters.

Supplementary information on Hydrological response unit definition

Raven requires a description of the watershed streamflow is simulated in. Different models require different parameters, but minimally, area, elevation, latitude and longitude are required. These data need to be provided for a few reasons:

Area is required since the size of the watershed will directly influence the simulated streamflow. Units are in square kilometers (km²).
Elevation (average elevation of the watershed) is required, although in many models the value is not actually used and therefore can be set to an arbitrary number. We strongly recommend using the real elevation as that will ensure that the value is present if you decide to switch to another model that requires elevation. Elevation is expressed in meters above mean sea level.
Latitude and longitude refer to the catchment centroid, and are used, among others, for evapotranspiration computations. They are expressed in decimal degrees (°), with longitudes within [-180, 180].

These values should be either precomputed externally, or they can be computed using the PAVICS-Hydro geophysical extraction toolbox that we used in the second tutorial notebook.

Supplementary information on model parameters

Each model requires a set of tuning parameters to represent and compensate for unknown quantities in certain hydrological processes. Some models have more parameters than others, for example:

GR4JCN = 6 parameters
HMETS = 21 parameters
MOHYSE = 10 parameters
HBVEC = 21 parameters

These parameters are found through calibration by tuning their values until the simulated streamflow matches the observations as much as possible. PAVICS-Hydro provides an integrated calibration toolbox that will be explored in the the 6th step of this tutorial. For now, we simply provided a set of parameters but it is not yet fully calibrated. This explains the poor quality of the simulated hydrograph.

Explore!

With this information in mind, you can now explore running different models and parameters and on different periods, and display the simulated hydrographs. You can change the start and end dates, the area, latitude, and even add other options that you might find in the documentation or in later tutorials.

If you want to run other models than GR4JCN, you can use these preset models:

HMETS:

m = HMETS(
    params=(
        9.5019,
        0.2774,
        6.3942,
        0.6884,
        1.2875,
        5.4134,
        2.3641,
        0.0973,
        0.0464,
        0.1998,
        0.0222,
        -1.0919,
        2.6851,
        0.3740,
        1.0000,
        0.4739,
        0.0114,
        0.0243,
        0.0069,
        310.7211,
        916.1947,
    ),
    **default_emulator_config,
)

Mohyse:

m = Mohyse(
    params=(1.0, 0.0468, 4.2952, 2.658, 0.4038, 0.0621, 0.0273, 0.0453, 0.9039, 5.6167),
    **default_emulator_config,
)

HBVEC:

m = HBVEC(
    params=(
        0.059845,
        4.07223,
        2.00157,
        0.034737,
        0.09985,
        0.506,
        3.4385,
        38.32455,
        0.46066,
        0.06304,
        2.2778,
        4.8737,
        0.5718813,
        0.04505643,
        0.877607,
        18.94145,
        2.036937,
        0.4452843,
        0.6771759,
        1.141608,
        1.024278,
    ),
    **default_emulator_config,
)

CanadianShield:

# The CanadianShield model needs atleast 2 HRUs. We have to modify the default config before executing it.
default_emulator_config["HRUs"] = [hru, hru]

m = CanadianShield(
    params=(
        4.72304300e-01,
        8.16392200e-01,
        9.86197600e-02,
        3.92699900e-03,
        4.69073600e-02,
        4.95528400e-01,
        6.803492000e00,
        4.33050200e-03,
        1.01425900e-05,
        1.823470000e00,
        5.12215400e-01,
        9.017555000e00,
        3.077103000e01,
        5.094095000e01,
        1.69422700e-01,
        8.23412200e-02,
        2.34595300e-01,
        7.30904000e-02,
        1.284052000e00,
        3.653415000e00,
        2.306515000e01,
        2.402183000e00,
        2.522095000e00,
        5.80344900e-01,
        1.614157000e00,
        6.031781000e00,
        3.11129800e-01,
        6.71695100e-02,
        5.83759500e-05,
        9.824723000e00,
        9.00747600e-01,
        8.04057300e-01,
        1.179003000e00,
        7.98001300e-01,
    ),
    **default_emulator_config,
)

HYPR:

m = HYPR(
    params=(
        -1.856410e-01,
        2.92301100e00,
        3.1194200e-02,
        4.3982810e-01,
        4.6509760e-01,
        1.1770040e-01,
        1.31236800e01,
        4.0417950e-01,
        1.21225800e00,
        5.91273900e01,
        1.6612030e-01,
        4.10501500e00,
        8.2296110e-01,
        4.15635200e01,
        5.85111700e00,
        6.9090140e-01,
        9.2459950e-01,
        1.64358800e00,
        1.59920500e00,
        2.51938100e00,
        1.14820100e00,
    ),
    **default_emulator_config,
)

SACSMA:

m = SACSMA(
    params=(
        0.0100000,
        0.0500000,
        0.3000000,
        0.0500000,
        0.0500000,
        0.1300000,
        0.0250000,
        0.0600000,
        0.0600000,
        1.0000000,
        40.000000,
        0.0000000,
        0.0000000,
        0.1000000,
        0.0000000,
        0.0100000,
        1.5000000,
        0.4827523,
        4.0998200,
        1.0000000,
        1.0000000,
    ),
    **default_emulator_config,
)

Blended:

m = Blended(
    params=(
        2.930702e-02,
        2.211166e00,
        2.166229e00,
        0.0002254976,
        2.173976e01,
        1.565091e00,
        6.211146e00,
        9.313578e-01,
        3.486263e-02,
        0.251835,
        0.0002279250,
        1.214339e00,
        4.736668e-02,
        0.2070342,
        7.806324e-02,
        -1.336429e00,
        2.189741e-01,
        3.845617e00,
        2.950022e-01,
        4.827523e-01,
        4.099820e00,
        1.283144e01,
        5.937894e-01,
        1.651588e00,
        1.705806,
        3.719308e-01,
        7.121015e-02,
        1.906440e-02,
        4.080660e-01,
        9.415693e-01,
        -1.856108e00,
        2.356995e00,
        1.0e00,
        1.0e00,
        7.510967e-03,
        5.321608e-01,
        2.891977e-02,
        9.605330e-01,
        6.128669e-01,
        9.558293e-01,
        1.008196e-01,
        9.275730e-02,
        7.469583e-01,
    ),
    **default_emulator_config,
)