Comparing hindcasts to a climatological ensemble streamflow prediction (ESP)

This notebook shows how to use climatological weather to perform a Climatology-based Extended Streamflow Prediction (ESP) forecast. Then using the same initial states, uses the CaSPar archived weather forecasts to generate streamflow hindcasts over the same period. It is thus possible to compare both approaches.

CaSPAr (Canadian Surface Prediction Archive) is an archive of historical ECCC forecasts developed by Juliane Mai at the University of Waterloo, Canada. More details on CaSPAr can be found here https://caspar-data.ca/.

Mai, J., Kornelsen, K.C., Tolson, B.A., Fortin, V., Gasset, N., Bouhemhem, D., Schäfer, D., Leahy, M., Anctil, F. and Coulibaly, P., 2020. The Canadian Surface Prediction Archive (CaSPAr): A Platform to Enhance Environmental Modeling in Canada and Globally. Bulletin of the American Meteorological Society, 101(3), pp.E341-E356.

%matplotlib inline
# This entire section is cookie-cutter template to allow calling the servers and instantiating the connection
# to the WPS server. Do not modify this block.
import datetime as dt

import matplotlib.pyplot as plt
import xarray as xr
from clisops.core import average, subset

from ravenpy import Emulator
from ravenpy.config import commands as rc
from ravenpy.config.emulators import GR4JCN
from ravenpy.extractors.forecasts import get_CASPAR_dataset
from ravenpy.utilities import forecasting
from ravenpy.utilities.testdata import get_file, open_dataset

Setting up the warm-up file

Here we tell the model that we want to forecast over the Salmon River catchment and provide its properties (area, lat/long, elevation). We will run it using the GR4JCN hydrological model and have provided some parameters. Other information on the forecast conditions is provided. Thr first step is to generate a hotstart file to prepare the model to generate forecasts.

# Define the warmup period dates
start_date_wu = dt.datetime(2010, 1, 1)
end_date_wu = dt.datetime(2018, 6, 30)

# Define the catchment contour. Here we use the Salmon River file we previously generated using the Delineator
# in Tutorial Notebook 01.
basin_contour = get_file("notebook_inputs/salmon_river.geojson")

# Define some of the catchment properties. Could also be replaced by a call to the properties WPS as in
# the Tutorial Notebook 02.
hru = {}
hru = dict(
    area=4250.6,
    elevation=843.0,
    latitude=54.4848,
    longitude=-123.3659,
    hru_type="land",
)

# Observed weather data for the Salmon river. We extracted this using Tutorial Notebook 03 and the
# salmon_river.geojson file as the contour.
ts = get_file("notebook_inputs/ERA5_weather_data_Salmon.nc")

# Set alternative names for netCDF variables
alt_names = {
    "TEMP_MIN": "tmin",
    "TEMP_MAX": "tmax",
    "PRECIP": "pr",
}

# Data types to extract from netCDF
data_type = ["TEMP_MAX", "TEMP_MIN", "PRECIP"]
data_kwds = {
    "ALL": {
        "elevation": hru["elevation"],
        "Latitude": hru["latitude"],
        "Longitude": hru["longitude"],
    },
}

# Model configuration
model_config_warmup = GR4JCN(
    params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
    Gauge=[
        rc.Gauge.from_nc(
            ts, data_type=data_type, alt_names=alt_names, data_kwds=data_kwds
        )
    ],
    HRUs=[hru],
    StartDate=start_date_wu,
    EndDate=end_date_wu,
    RunName="ESP_vs_NWP_warmup",
)

# Run the model and get the outputs.
out1 = Emulator(config=model_config_warmup).run()

# Extract the path to the final states file that will be used as the next initial states
hotstart = out1.files["solution"]

dss = open_dataset(ts)
dss

Hindcasting using Climatological Ensemble Streamflow Prediction (ESP)

Now that we have the hotstart file ready to go, we can configure our model for forecasting in climatology ESP mode:

# Date of the hindcast
hdate = dt.datetime(2018, 7, 1)

# Duration of the hindcast, in days
duration = 7

# Build a new model config:
# Model configuration
model_config_ESP = GR4JCN(
    params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
    Gauge=[
        rc.Gauge.from_nc(
            ts, data_type=data_type, alt_names=alt_names, data_kwds=data_kwds
        )
    ],
    HRUs=[hru],
    StartDate=hdate,
    Duration=duration,
    RunName="ESP_vs_NWP_ESPfcst",
)

# Set the initial states of this new config to the correct values, i.e. the end of the previous forecast.
model_config_ESP = model_config_ESP.set_solution(hotstart)

# Simulate the climatological ESP:
ESP_sims = forecasting.climatology_esp(config=model_config_ESP)

# Show the results in an xarray dataset, ready to use:
ESP_sims.hydrograph

We have now run the hindcast using Climatological ESP and retrieved the results. Let’s take a look at the resulting forecast.

# Invent an observation so we can compute metrics later, and display as Qobs here. TODO: Add real streamflow data.
qq = ESP_sims.hydrograph.q_sim[0, :, 0]

# This is to be replaced with a call to the forecast graphing WPS as soon as merged.
# model.q_sim.plot.line("b", x="time")
ESP_sims.hydrograph.q_sim[:, :, 0].plot.line("b", x="time", add_legend=False)
ESP_sims.hydrograph.q_sim[1, :, 0].plot.line("b", x="time", label="ESP forecasts")
qq.plot.line("r", x="time", label="observations")
plt.legend(loc="upper left")
plt.show()

Hindcasting using archived weather forecasts from a weather forecast model

In this next part, we will use the CaSPAr dataset (archived weather forecasts from Environment and Climate Change Canada) to forecast flows on the same period using the same hotstart file.

# Get the Forecast data from GEPS via CASPAR.
# Take an extra day to ensure time-shift doesn't remove a part of our day
ts_hindcast, _ = get_CASPAR_dataset("GEPS", hdate - dt.timedelta(days=1))

# Subset the data for the region of interest and take the mean to get a single vector
with xr.set_options(keep_attrs=True):
    ts_subset = subset.subset_shape(ts_hindcast, basin_contour).mean(
        dim=("rlat", "rlon")
    )

ts_subset = ts_subset.resample(time="6H").nearest(
    tolerance="1H"
)  # To make the timesteps identical accross the entire duration

# We need to write the hindcast data as a file for Raven to be able to access it.
fname = "/tmp/hindcast.nc"
ts_subset.to_netcdf(fname)

# We need to adjust the data_type and alt_names according to the data in the forecast:
# Set alternative names for netCDF variables
alt_names = {
    "TEMP_AVE": "tas",
    "PRECIP": "pr",
}

# Data types to extract from netCDF
data_type = ["TEMP_AVE", "PRECIP"]

# We will need to reuse this for GR4J. Update according to your needs. For example, here we will also pass
# the catchment latitude and longitude as our CaSPAr data has been averaged at the catchment scale.
# We also need to tell the model to deaccumulate the precipitation and shift it in time by 6 hours for our
# catchment (UTC timezones):
data_kwds = {
    "ALL": {
        "elevation": hru["elevation"],
        "Latitude": hru["latitude"],
        "Longitude": hru["longitude"],
    },
    "PRECIP": {
        "Deaccumulate": True,
        "TimeShift": -0.25,
        "LinearTransform": {
            "scale": 1000.0
        },  # Since we are deaccumulating, we need to manually specify scale.
    },  # Converting meters to mm (multiply by 1000).
    "TEMP_AVE": {
        "TimeShift": -0.25,
    },
}

# Model configuration for forecasting, including correct start date and forecast duration
model_config_fcst = GR4JCN(
    params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
    Gauge=[
        rc.Gauge.from_nc(
            fname, data_type=data_type, alt_names=alt_names, data_kwds=data_kwds
        )
    ],
    HRUs=[hru],
    StartDate=hdate,
    Duration=duration,
    RunName="NB12_forecast_run",
)

# Update the initial states
model_config_fcst = model_config_fcst.set_solution(hotstart)

# Generate the hindcast by providing all necessary information to generate virtual stations representing
# the forecast members
hindcast_sims = forecasting.hindcast_from_meteo_forecast(
    model_config_fcst,
    forecast=fname,
    overwrite=True,
    # We also need to provide the necessary information to create gauges inside the forecasting model:
    data_kwds=data_kwds,
    data_type=data_type,
    alt_names=alt_names,
)

# Display the hydrographs
display(hindcast_sims.hydrograph)

hindcast_sims.hydrograph.q_sim[:, :, 0].plot.line("b", x="time", add_legend=False)
hindcast_sims.hydrograph.q_sim[1, :, 0].plot.line("b", x="time", label="hindcasts")
qq.plot.line("r", x="time", label="observations")
plt.legend(loc="upper left")
plt.show()

The model has run in forecast mode and we can now easily compare results:

hindcast_sims.hydrograph.q_sim[:, :, 0].plot.line("b", x="time", add_legend=False)
ESP_sims.hydrograph.q_sim[:, :, 0].plot.line("g", x="time", add_legend=False)
ESP_sims.hydrograph.q_sim[1, :, 0].plot.line("g", x="time", label="ESP forecasts")
hindcast_sims.hydrograph.q_sim[1, :, 0].plot.line("b", x="time", label="hindcasts")
qq.plot.line("r", x="time", label="observations")
plt.legend(loc="upper left")
plt.show()