Performing a sensitivity analysis

In this notebook, we perform a sensitivity analysis on GR4JCN to determine the importance of each parameter using the Sobol’ sensitivity analysis method. The example shown herein is done using very few parameter samples, and as such, results will be poor and should not be interpreted as-is. However, it is possible to use this code locally using RavenPy to run a much larger sampling on a local computer.

Prepare data for GR4JCN

We will use GR4JCN for this analysis. Since the sensitivity analysis acts on a model response to different inputs, we must find a metric that can be used to measure the impacts of parameters on the model response. In this case, we will use the Nash-Sutcliffe and Absolute Error metrics as responses. It could be any scalar value: mean flow, peak flow, lowest flow, flow volume, etc. But for this exercice we suppose that we want to know the impact of a parameter set on an objective function value. We therefore use a dataset that contains observed streamflow to compute the evaluation metrics.

Let’s now import the required packages, get the correct data and setup the model HRU for physiographic information.

# Import required packages:
import datetime as dt
import tempfile
from pathlib import Path

import numpy as np
from SALib.analyze import sobol as sobol_analyzer
from SALib.sample import sobol as sobol_sampler
from tqdm.notebook import tqdm

from ravenpy import OutputReader
from ravenpy.config import commands as rc
from ravenpy.config.emulators import GR4JCN
from ravenpy.ravenpy import run
from ravenpy.utilities.testdata import get_file

# We get the netCDF from a server. You can replace the `get_file` function by a string containing the path to your own netCDF.
nc_file = get_file(
    "raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily.nc"
)

# Here, we need to give the name of your different dataset in order to match with Raven models.
alt_names = {
    "RAINFALL": "rain",
    "SNOWFALL": "snow",
    "TEMP_MIN": "tmin",
    "TEMP_MAX": "tmax",
    "PET": "pet",
    "HYDROGRAPH": "qobs",
}

# The HRU of your watershed
hru = dict(area=4250.6, elevation=843.0, latitude=54.4848, longitude=-123.3659)

# The evaluation metrics. Multiple options are possible, as can be found in Tutorial Notebook 06. Here we use Nash-Sutcliffe and Absolute Error.
eval_metrics = ("NASH_SUTCLIFFE", "ABSERR")

# Data keywords for meteorological data stations
data_kwds = {
    "ALL": {
        "elevation": hru[
            "elevation"
        ],  # extract the values directly from the "hru" we previously built
        "latitude": hru["latitude"],
        "longitude": hru["longitude"],
    }
}

Sensitivity analysis step 1: Define the Sobol problem to analyze

Sobol sensitivity analysis requires three distinct steps:

Sample parameter sets from the possible parameter space;
Run the model and gather the model response for each of these parameter spaces;
Analyze the change in model responses as a function of changes in parameter sets.

Therefore, the first step is to sample the parameter space.

# Define the model inputs:
problem = {
    "num_vars": 6,  # Number of variables
    "names": [
        "x1",
        "x2",
        "x3",
        "x4",
        "CN1",
        "CN2",
    ],  # Names of these variables, to make it easier to follow. Can be any string defined by the user
    "bounds": [
        [
            0.01,
            2.5,
        ],  # We must provide lower and upper bounds for each parameter to sample. Must be adjusted for each model.
        [-15.0, 10.0],
        [10.0, 700.0],
        [0.0, 7.0],
        [1.0, 30.0],
        [0.0, 1.0],
    ],
}

# Generate samples. The number of parameter sets to generate will be N * (2D + 2), where N is defined below and D is the number of
# model inputs (6 for GR4JCN).
N = 4
param_values = sobol_sampler.sample(problem, N)

# Display the size of the param_values matrix. We will run the model with each set of parameters, i.e. one per row.
print("The number of parameter sets to evaluate is " + str(param_values.shape[0]))

Sensitivity analysis step 2: Run the model for each parameter set

In this stage, we have our sampled parameter sets according to the Sobol / Saltelli sampling methods. We now need to run the GR4JCN model for each of these parameter sets and compute the objective function (model response) that we want. Here we ask the model to pre-compute two objective functions (NSE and MAE), so we will be able to perform the sensitivity analysis on both metrics while only running the model once for each parameter set.

We use a simple loop to run the model here, but advanced users could parallelize this as it is an “embarassingly parallelizable” problem.

# Set the working directory at one place so all files are overwritten at the same place (Avoids creating hundreds or thousands
# of folders with each run's data)
workdir = Path(tempfile.mkdtemp())

# Pre-define the results matrix based on the number of parameters we will test (and thus how many runs we will need to do). We will test SA with
# two objective functions (NSE and AbsErr). Let's pre-define both vectors now.
Y_NSE = np.zeros([param_values.shape[0]])
Y_ABS = np.zeros([param_values.shape[0]])

# Define a run name for files
run_name = "SA_Sobol"

config = dict(
    ObservationData=[rc.ObservationData.from_nc(nc_file, alt_names="qobs")],
    Gauge=[rc.Gauge.from_nc(nc_file, alt_names=alt_names, data_kwds=data_kwds)],
    HRUs=[hru],
    StartDate=dt.datetime(1990, 1, 1),
    EndDate=dt.datetime(1999, 12, 31),
    RunName=run_name,
    EvaluationMetrics=eval_metrics,  # We add this code to tell Raven which objective function we want to pass.
    SuppressOutput=True,  # This suppresses the writing of files to disk, returning only basic information such as the evaluation metrics values.
)

# Now we have a loop that runs the model iteratively, once per parameter set:
for i, X in enumerate(tqdm(param_values)):
    # We need to create the desired model with its parameters the same way as in the Notebook 04_Emulating_hydrological_models.
    m = GR4JCN(
        params=X.tolist(),  # Here is where we pass the paramter sets to the model, from the loop enumerator X.
        **config,
    )

    # Write the files to disk, and overwrite existing files in the folder (we already got the values we needed from previous runs)
    m.write_rv(workdir=workdir, overwrite=True)

    # Run the model and get the path to the outputs folder that can be used in the output reader.
    outputs_path = run(modelname=run_name, configdir=workdir)

    # Get the outputs using the Output Reader object.
    outputs = OutputReader(run_name=run_name, path=outputs_path)

    # Gather the results for both of the desired objective functions. We will see how the choice of objective function impacts sensitivity.
    Y_NSE[i] = outputs.diagnostics["DIAG_NASH_SUTCLIFFE"][0]
    Y_ABS[i] = outputs.diagnostics["DIAG_ABSERR"][0]

Sensitivity analysis step 3: Analyze results and obtain parameter sensitivity indices

At this point, we have a model response for each of the parameter sets. We can analyze the results using the code below. We will display only the total and 1st order sensitivities.

# Perform analysis for the NSE objective function first
Si = sobol_analyzer.analyze(problem, Y_NSE, print_to_console=True)

# Print the first-order sensitivity indices
print(Si["S1"])

# Now perform the sensitivity analysis for the Absolute Error objective function
Si = sobol_analyzer.analyze(problem, Y_ABS, print_to_console=True)

# Print the first-order sensitivity indices
print(Si["S1"])

Result analysis

We can see that parameters x2 and x3 are more sensitive than the other with total (ST) and 1st order (S1) sensitivities higher than the other parameters. This is true for both objective functions, but could also be different for other metrics, so it is important to keep this in mind when using a sensitivity analysis to determine parameter importance! A common example is a parameter related to snowmelt. This parameter will have no impact if there is no snow during the period used in the model, but would become critical if there were to be snow in following years.

Note that the tables above present the sobol sensitivity in the left column and the confidence interval (95%) in the right column. Values are strange because we are using way too few parameter sets to adequately sample the parameter space here, but increasing the value of “N” to 1024 or 2048 would allow for a much better estimation for a 6-parameter model.