05 - Advanced RavenPy configuration

In this notebook, we will explore alternative ways to setup a Raven model and how to parameterize and customize a raven-based hydrological model

Running Raven using pre-existing configuration files

To run Raven, we need configuration (.rvX) files defining hydrological processes, watersheds and meteorological data. If you already have those configuration files ready, or want to see how to import an existing Raven model into PAVICS-Hydro, this tutorial is for you. It shows how to run Raven from a Python programming environment using RavenPy.

Let’s start by importing some utilities that will make our life easier to get data on the servers. If you already have raven model setups, you could simply upload the files here and create your own “config” list:

# Utility that simplifies getting data hosted on the remote PAVICS-Hydro data server.
from ravenpy.utilities.testdata import get_file

A note on datasets

For this part of the tutorial, we will use pre-existing datasets that are hosted on the PAVICS-Hydro servers to setup the Raven model. This means that the .rv files are all built and the forcing file already exists. We could apply all of the same logic to a RavenPy model we would have built at the previous step, but this way lets us show that we can also work on an imported model. Let’s import the configuration files:

# Get the .rv files. It could also be the .rv files returned from the previous notebook, but here we are using a new basin that contains observed streamflow
# to make the calibration possible in the next notebook. Note that these configuration files also include links to the
# required hydrometeorological database (NetCDF file).
config = [
    get_file(f"raven-gr4j-cemaneige/raven-gr4j-salmon.{ext}")
    for ext in ["rvt", "rvc", "rvi", "rvh", "rvp"]
]
config

So “config” is just a set of paths to the various .rvX files (.rvt, .rvc, .rvi. .rvh and .rvp). Therefore, if you have your own .rv files that describe your model, you can upload them and replace “config” with your own files!

Building a hydrological model on-the-fly using existing configuration files.

Here we create a Raven model instance, configuring it using the pre-defined configuration files and running it by providing the full path to the NetCDF driving datasets. The configuration we provide is for a GR4J-CN model emulator that Raven will run for us. We provide the configuration files for GR4J-CN as well as the forcing data (precipitation, temperature, observed streamflow, etc.) that will be used to run the model.

from ravenpy import OutputReader
from ravenpy.ravenpy import run

run_name = "raven-gr4j-salmon"  # As can be seen in the config above, this is the name of the .rvX files.
configdir = config[
    0
].parent  # We can get the path to the folder containing the .rvX files this way

# Run the model and get the path to outputs
outputs_path = run(modelname=run_name, configdir=configdir, overwrite=True)

# Note. The modelname parameter can be confusing. You need to give the FILES extension name (run_name in our case),
# not the name of the model.

outputs_path
# Read the output files at the output_path

outputs = OutputReader(run_name=None, path=outputs_path)  # Get the outputs
# Note. We set up the run_name to None, because we didn't rename the output files. If you gave a different name to your file
# compared to the one above, you should change the run_name value to this new name. It's important though that you keep the end
# of the filename the same

# Show the list of files that were retrived by the OutputReader
outputs.files

The model should have run! But you also might have seen some warnings that Raven is giving us, depending on the input files used:

  • Some might be saying that we are providing rain and snow independently, but in the configuration files, we are asking the model to recompute the separation using an algorithm based on total precipitation and air temperature. This is OK, and we can live with this (alternatively, we could reconfigure the model to remove this but that will be for another notebook!).

  • Others could be saying that we supply PET data, but the model is configured to compute PET from the available temperature and latitude/longitude data. This is also acceptable to us for now, so these warnings can be disregarded.

  • And others might simply explain that our configuration provided some parameters but others were computed internally based on our parameter set rather than being explicitly set in our configuration, which is OK.

Evaluating the model response

That’s it! The code above has launched the GR4J-CN model using weather data and the configuration we provided. There are many other options we could provide, but for now we left everything to the default options to keep things simple. We will explore those in a future tutorial as well.

Now, let’s look at the modeled hydrographs. Note that there is a “q_obs” hydrograph, representing the observations we provided ourselves. This is to facilitate the comparison between observations and simulations, and it is not required per se to run the model. The “q_sim” variable is the simulated streamflow and is the one we are interested in.

Note that RavenPy assumes that model outputs are always saved in netCDF format, and relies on xarray to access data.

To see results, we must first tell the model to read them from the files Raven has written in the output folder:

We can visualize the simulated streamflow using xarray’s built-in plotting tool, as follows:

outputs.hydrograph.q_sim.plot()

We also now have access to diagnostics! This is because along with the simulated discharge, the model has access to observed discharge to compute error metrics such as RMSE and NSE. Let’s see where the file has been generated:

print("-----------------DIAGNOSTICS-----------------")
print(outputs.diagnostics)
print("")

print("-----------------NASH_SUTCLIFFE-----------------")
print(outputs.diagnostics["DIAG_NASH_SUTCLIFFE"])
print("")

print("-----------------RMSE-----------------")
print(outputs.diagnostics["DIAG_RMSE"])

We can see that the Nash-Sutcliffe value is quite poor. This is due to the short simulation period in the configuration (see the hydrograph above!) and the lack of a spin-up period, combined to a poor parameter set choice. We will improve upon all of these shortcomings in the next notebooks!

Advanced RavenPy configuration options

Raven can perform many operations and has multiple configuration options. Here we provide a list of configuration options to explore which you can eventually use to taylor the codes to your own specifications. These can only be run on RavenPy-built hydrological models, and will not operate on Raven models imported by users since those configuration files are not modifiable for the time being.

We will give an overview of the various configuration keywords after this code block, but users should read the Raven documentation for more options for each of these processes.

Let’s first define some variables we will need for all of our tests:

# Get required packages
import datetime as dt

import matplotlib.pyplot as plt

from ravenpy import Emulator
from ravenpy.config import commands as rc
from ravenpy.config.emulators import GR4JCN

# Observed weather data for the Salmon river. We extracted this using Tutorial Notebook 03 and the
# salmon_river.geojson file as the contour.
ts = get_file("notebook_inputs/ERA5_weather_data_Salmon.nc")

# Set alternate variable names in the timeseries data file
alt_names = {
    "TEMP_MIN": "tmin",
    "TEMP_MAX": "tmax",
    "PRECIP": "pr",
}

# Provide the type of data made available to Raven
data_type = ["TEMP_MAX", "TEMP_MIN", "PRECIP"]

# Prepare the catchment properties
hru = dict(
    area=4250.6,
    elevation=843.0,
    latitude=54.4848,
    longitude=-123.3659,
    hru_type="land",
)

# Add some information regarding station data
data_kwds = {
    "ALL": {
        "elevation": hru["elevation"],
        "latitude": hru["latitude"],
        "longitude": hru["longitude"],
    }
}

# Start and end dates of the simulation
start_date = dt.datetime(1985, 1, 1)
end_date = dt.datetime(1990, 1, 1)

# Set parameters
parameters = [0.529, -3.396, 407.29, 1.072, 16.9, 0.947]

We can now perform a “basic” run, with no modifications.

# Run the model (See Notebook 04 for more details on implementation)
m = GR4JCN(
    params=parameters,
    Gauge=[
        rc.Gauge.from_nc(
            ts,
            data_type=data_type,  # Note that this is the list of all the variables
            alt_names=alt_names,  # Note that all variables here are mapped to their names in the netcdf file.
            data_kwds=data_kwds,
        )
    ],
    HRUs=[hru],
    StartDate=start_date,
    EndDate=end_date,
    RunName="NB05_test1",
    # GlobalParameter={"AVG_ANNUAL_RUNOFF": 208.480},
)

# Run the model and get the outputs.
outputs1 = Emulator(m).run()

# Plot the generated hydrograph
outputs1.hydrograph.q_sim.plot.line(x="time", label="Base case")
plt.legend(loc="upper left")
plt.show()

We can now run another model by adding some other properties. To start, we can add some Global Parameters to the model to make Raven adjust the simulations based on the information we provide. Some options of Global Parameters are indicated here, but more can be found in the official Raven documentation.

Examples of GlobalParameter options (Note that some are only available for certain models and others can be mutually exclusive. Please refer to the documentation for this type of adjustment):

Temperature interval of transformation between rain and snow. Set the midpoint of the range and the width of the range, in degrees C:

“RAINSNOW_TEMP”: midpoint_temp // Ex: “RAINSNOW_TEMP”: -1.0

“RAINSNOW_DELTA”: delta_temp // Ex: “RAINSNOW_DELTA”: 3.0

Maximum liquid water content of snow, as a percentage of SWE (0-1). Usually ~0.05.

“SNOW_SWI”: saturation // Ex: “SNOW_SWI”: 0.1

Average annual snow for the entire watershed in mm of SWE. Used in CemaNeige.

“AVG_ANNUAL_SNOW”: average_snow_per_year // Ex: “AVG_ANNUAL_SNOW”: 400.0

There are many others, but this should clarify the implementation. Let’s try some of them out!

# Run the model (See Notebook 04 for more details on implementation)
m = GR4JCN(
    params=parameters,
    Gauge=[
        rc.Gauge.from_nc(
            ts,
            data_type=data_type,  # Note that this is the list of all the variables
            alt_names=alt_names,  # Note that all variables here are mapped to their names in the netcdf file.
            data_kwds=data_kwds,
        )
    ],
    HRUs=[hru],
    StartDate=start_date,
    EndDate=end_date,
    RunName="NB05_test2",
    GlobalParameter={"AVG_ANNUAL_SNOW": 350.0},
)

# Run the model and get the outputs.
outputs2 = Emulator(m).run()

# Plot the generated hydrograph
outputs1.hydrograph.q_sim.plot.line(x="time", label="Base case")
outputs2.hydrograph.q_sim.plot.line(x="time", label="With AVG_ANNUAL_SNOW")

plt.legend(loc="upper left")
plt.show()

We can also adjust the time series data to play with the scaling of units.

By default, RavenPy and Raven will detect units from the forcing data netcdf files. However, in some instances, units might be lacking, or their format might require some tinkering. One such case is for precipitation data that is cumulative in the netcdf file. In these cases, Raven can decumulate the precipitation, but the scaling might lead to undesirable results. For this reason, it is highly recommended to pass the scaling and offsetting variables directly. To do so, add some context in the data_kwds:

# Add some information regarding station data
data_kwds = {
    "ALL": {
        "elevation": hru["elevation"],
        "Latitude": hru["latitude"],
        "Longitude": hru["longitude"],
    },
    # HOW TO PROCESS THE PRECIPITATION DATA: For the Precip variable, we tell Raven we want to Deaccumulate
    # values, shift them in time by 6 hours (for UTC time zone management), and then apply a linear transform
    # to the values to get new scaled values. The linear transform can take two inputs:
    #     "scale"   is the "a" variable in the linear relationship y = ax + b. Usually used to multiply precipitation.
    #     "offset"  is the "b" variable in the linear relationship y = ax + b. Usually used to convert temperatures(K to °C)
    "PRECIP": {
        "Deaccumulate": True,
        "TimeShift": -0.25,
        "LinearTransform": {
            "scale": 1000.0  # # Converting meters to mm (multiply by 1000).
        },
    },
    "TEMP_AVE": {
        "TimeShift": -0.25,
    },
}

In our example, our precipitation is not actually accumulated and the timestep is daily, so we don’t need the “Deaccumulate” or the “TimeShift” parameters. So let’s generate a new data_kwds that is applicable in our case. More complex cases that require “Deaccumulate” and “TimeShift” will be presented in later notebooks that use accumulated precipitation in forecasting applications, in Notebook 12.

# Add some information regarding station data
data_kwds = {
    "ALL": {
        "elevation": hru["elevation"],
        "Latitude": hru["latitude"],
        "Longitude": hru["longitude"],
    },
    # Let's simulate a very rough estimation of the impacts of climate change where precipitation is expected
    # to increase by 10% and temperatures to increase by 3°C. This will be applied to all data on the entire
    # period and is thus not realistic. We will explore more realistic methods in Notebook 08.
    "PRECIP": {"LinearTransform": {"scale": 1.1}},
    "TEMP_AVE": {"LinearTransform": {"offset": 3.0}},
}

Now we can use this new setup to generate another series of streamflow

# Run the model (See Notebook 04 for more details on implementation)
m = GR4JCN(
    params=parameters,
    Gauge=[
        rc.Gauge.from_nc(
            ts,
            data_type=data_type,  # Note that this is the list of all the variables
            alt_names=alt_names,  # Note that all variables here are mapped to their names in the netcdf file.
            data_kwds=data_kwds,
        )
    ],
    HRUs=[hru],
    StartDate=start_date,
    EndDate=end_date,
    RunName="NB05_test3",
    GlobalParameter={"AVG_ANNUAL_SNOW": 350.0},
)

# Run the model and get the outputs.
outputs3 = Emulator(m).run()

# Plot the generated hydrograph
outputs1.hydrograph.q_sim.plot.line(x="time", label="Base case")
outputs2.hydrograph.q_sim.plot.line(x="time", label="With AVG_ANNUAL_SNOW")
outputs3.hydrograph.q_sim.plot.line(x="time", label="With AVG_ANNUAL_SNOW and Scaling")

plt.legend(loc="upper left")
plt.show()

We can see that the scaling increased the flows almost everywhere except in the first year which is the warm-up period.

Other options that can be implemented are indicated here, although more exist and are documented in the official Raven manual.

RainSnowFraction:

Algorithm to use to separate the total precipitation into rainfall and snowfall.

Ex: RainSnowFraction=’RAINSNOW_DINGMAN’

Evaporation

Evaporation: Formula to use to compute the evapotranspiration from the land HRUs.

Ex: Evaporation=”PET_OUDIN”

Suppress model outputs / files

Boolean that indicates if you wish for Raven to provide information after the model evaluation by writing to file. For a single run this can be left to False, but for calibration and other intensive tasks, it is faster to leave it to True.

Ex: SuppressOutputs=True

Finally, let’s see how to implement these commands:

# Run the model (See Notebook 04 for more details on implementation)
m = GR4JCN(
    params=parameters,
    Gauge=[
        rc.Gauge.from_nc(
            ts,
            data_type=data_type,  # Note that this is the list of all the variables
            alt_names=alt_names,  # Note that all variables here are mapped to their names in the netcdf file.
            data_kwds=data_kwds,
        )
    ],
    HRUs=[hru],
    StartDate=start_date,
    EndDate=end_date,
    RunName="NB05_test3",
    GlobalParameter={"AVG_ANNUAL_SNOW": 350.0},
    RainSnowFraction="RAINSNOW_DINGMAN",
    Evaporation="PET_HARGREAVES_1985",
    SuppressOutput=False,  # We can't read the hydrographs if they are not written to disk, so set to False here.
)

# Run the model and get the outputs.
outputs4 = Emulator(m).run()

# Plot the generated hydrograph
outputs1.hydrograph.q_sim.plot.line(x="time", label="Base case")
outputs2.hydrograph.q_sim.plot.line(x="time", label="With AVG_ANNUAL_SNOW")
outputs3.hydrograph.q_sim.plot.line(x="time", label="With AVG_ANNUAL_SNOW and Scaling")
outputs4.hydrograph.q_sim.plot.line(
    x="time", label="With AVG_ANNUAL_SNOW, Scaling and Options"
)

plt.legend(loc="upper left")
plt.show()

A note on the above results

We can see that the results change significantly according to the options we have passed, namely the evaporation algorithm modified the hydrograph quite significantly. However, this is caused by the fact that the parameter set we have used has not been calibrated using this PET method, and thereore the model cannot be expected to perform as well. This means that when using these model options, it is important to recalibrate the model parameters such that they represent the actual model being used!

Finally, we can also ask Raven to supply custom outputs using this line in the model configuration:

CustomOutput=rc.CustomOutput() and by providing a list of desired pre-processed variables. Here we ask for the yearly average of precipitation over the entire watershed:

CustomOutput=rc.CustomOutput(“YEARLY”, “AVERAGE”, “PRECIP”, “ENTIRE_WATERSHED”)

Please see the documentation for more details on using custom outputs.