{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hindcasting with CaSPAr-Archived ECCC forecasts\n",
    "\n",
    "This notebook shows how to perform a streamflow hindcast, using CaSPar archived weather forecasts. It generates the hindcasts and plots them.\n",
    "\n",
    "CaSPAr (Canadian Surface Prediction Archive) is an archive of historical ECCC forecasts developed by Juliane Mai at the University of Waterloo, Canada. More details on CaSPAr can be found here https://caspar-data.ca/.\n",
    "\n",
    "\n",
    "Mai, J., Kornelsen, K.C., Tolson, B.A., Fortin, V., Gasset, N., Bouhemhem, D., Schäfer, D., Leahy, M., Anctil, F. and Coulibaly, P., 2020. The Canadian Surface Prediction Archive (CaSPAr): A Platform to Enhance Environmental Modeling in Canada and Globally. Bulletin of the American Meteorological Society, 101(3), pp.E341-E356.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This entire section is cookiecutter template to import required packages and prepare the temporary writing space.\n",
    "import datetime as dt\n",
    "import tempfile\n",
    "import warnings\n",
    "from pathlib import Path\n",
    "\n",
    "import xarray as xr\n",
    "from clisops.core import subset\n",
    "\n",
    "# ignore warnings from xsdba\n",
    "warnings.simplefilter(\"ignore\", category=UserWarning)\n",
    "\n",
    "from ravenpy import Emulator\n",
    "from ravenpy.config import commands as rc\n",
    "from ravenpy.config.emulators import GR4JCN\n",
    "from ravenpy.extractors.forecasts import get_CASPAR_dataset\n",
    "\n",
    "# Utility that simplifies working with test data hosted on GitHub\n",
    "from ravenpy.testing.utils import get_file\n",
    "from ravenpy.utilities import forecasting\n",
    "\n",
    "tmp = Path(tempfile.mkdtemp())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run the model simulations\n",
    "\n",
    "Here we set model parameters somewhat arbitrarily, but you can set the parameters to the calibrated parameters as seen in the \"06_Raven_calibration\" notebook we previously encountered. We can then specify the start date for the hindcast ESP simulations and run the simulations.This means we need to choose the forecast (hindcast) date. Available data include May 2017 onwards."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Date of the hindcast\n",
    "hdate = dt.datetime(2018, 6, 1)\n",
    "\n",
    "# Get the Forecast data from GEPS via CASPAR\n",
    "ts_hindcast, _ = get_CASPAR_dataset(\"GEPS\", hdate)\n",
    "\n",
    "# Get basin contour\n",
    "basin_contour = get_file(\"notebook_inputs/salmon_river.geojson\")\n",
    "\n",
    "# Subset the data for the region of interest and take the mean to get a single vector\n",
    "with xr.set_options(keep_attrs=True):\n",
    "    ts_subset = subset.subset_shape(ts_hindcast, basin_contour).mean(\n",
    "        dim=(\"rlat\", \"rlon\")\n",
    "    )\n",
    "# To make the timesteps identical across the entire duration\n",
    "ts_subset = ts_subset.resample(time=\"6h\").nearest(tolerance=\"1h\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# See how many members we have available\n",
    "len(ts_subset.members)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the correct weather forecasts, we can set up the hydrological model for a warm-up run:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare a RAVEN model run using historical data, GR4JCN in this case.\n",
    "# This is a dummy run to get initial states. In a real forecast situation,\n",
    "# this run would end on the day before the forecast, but process is the same.\n",
    "\n",
    "# Here we need a file of observation data to run a simulation to generate initial conditions for our forecast.\n",
    "# ts = str(\n",
    "#    get_file(\"raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily.nc\")\n",
    "# )\n",
    "\n",
    "# TODO: We will use ERA5 data for Salmon River because it covers the correct period.\n",
    "ts = get_file(\"notebook_inputs/ERA5_weather_data_Salmon.nc\")\n",
    "\n",
    "# This is the model start date, on which the simulation will be launched for a certain duration\n",
    "# to set up the initial states. We will then save the final states as a launching point for the\n",
    "# forecasts.\n",
    "\n",
    "start_date = dt.datetime(2000, 1, 1)\n",
    "end_date = dt.datetime(2018, 6, 2)\n",
    "\n",
    "# Define HRU to build the hydrological model\n",
    "hru = dict(\n",
    "    area=4250.6,\n",
    "    elevation=843.0,\n",
    "    latitude=54.4848,\n",
    "    longitude=-123.3659,\n",
    "    hru_type=\"land\",\n",
    ")\n",
    "\n",
    "# Set alternative names for netCDF variables\n",
    "alt_names = {\n",
    "    \"TEMP_MIN\": \"tmin\",\n",
    "    \"TEMP_MAX\": \"tmax\",\n",
    "    \"PRECIP\": \"pr\",\n",
    "}\n",
    "\n",
    "# Data types to extract from netCDF\n",
    "data_type = [\"TEMP_MAX\", \"TEMP_MIN\", \"PRECIP\"]\n",
    "data_kwds = {\n",
    "    \"ALL\": {\n",
    "        \"elevation\": hru[\"elevation\"],\n",
    "        \"Latitude\": hru[\"latitude\"],\n",
    "        \"Longitude\": hru[\"longitude\"],\n",
    "    },\n",
    "}\n",
    "# Model configuration\n",
    "model_config_warmup = GR4JCN(\n",
    "    params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],\n",
    "    Gauge=[\n",
    "        rc.Gauge.from_nc(\n",
    "            ts, data_type=data_type, alt_names=alt_names, data_kwds=data_kwds\n",
    "        )\n",
    "    ],\n",
    "    HRUs=[hru],\n",
    "    StartDate=start_date,\n",
    "    EndDate=end_date,\n",
    "    RunName=\"NB12_warmup_run\",\n",
    ")\n",
    "\n",
    "# Run the model and get the outputs.\n",
    "out1 = Emulator(config=model_config_warmup).run()\n",
    "\n",
    "\n",
    "# Extract the path to the final states file that will be used as the next initial states\n",
    "hotstart = out1.files[\"solution\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have the initial states ready for the next step, which is to launch the forecasts in hindcasting mode:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Explore the forecast data to see which variables we have:\n",
    "display(ts_subset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configure and run a new model by setting the initial states (equal to the previous run's final states) and prepare\n",
    "# the configuration for the forecasts (including forecast start date, which should be equal to the final simulation\n",
    "# date + 1, as well as the forecast duration.)\n",
    "\n",
    "# We need to write the hindcast data as a file for Raven to be able to access it.\n",
    "fname = tmp / \"hindcast.nc\"\n",
    "ts_subset.to_netcdf(fname)\n",
    "\n",
    "# We need to adjust the data_type and alt_names according to the data in the forecast:\n",
    "# Set alternative names for netCDF variables\n",
    "alt_names = {\n",
    "    \"TEMP_AVE\": \"tas\",\n",
    "    \"PRECIP\": \"pr\",\n",
    "}\n",
    "\n",
    "# Data types to extract from netCDF\n",
    "data_type = [\"TEMP_AVE\", \"PRECIP\"]\n",
    "\n",
    "\n",
    "# We will need to reuse this for GR4J. Update according to your needs. For example, here we will also pass\n",
    "# the catchment latitude and longitude as our CaSPAr data has been averaged at the catchment scale.\n",
    "# We also need to tell the model to deaccumulate the precipitation and shift it in time by 6 hours for our\n",
    "# catchment (UTC timezones):\n",
    "data_kwds = {\n",
    "    \"ALL\": {\n",
    "        \"elevation\": hru[\"elevation\"],\n",
    "        \"Latitude\": hru[\"latitude\"],\n",
    "        \"Longitude\": hru[\"longitude\"],\n",
    "    },\n",
    "    \"PRECIP\": {\n",
    "        \"Deaccumulate\": True,\n",
    "        \"TimeShift\": -0.25,\n",
    "        \"LinearTransform\": {\n",
    "            \"scale\": 1000.0\n",
    "        },  # Since we are deaccumulating, we need to manually specify scale.\n",
    "    },  # Converting meters to mm (multiply by 1000).\n",
    "    \"TEMP_AVE\": {\n",
    "        \"TimeShift\": -0.25,\n",
    "    },\n",
    "}\n",
    "\n",
    "\n",
    "# Model configuration for forecasting, including correct start date and forecast duration\n",
    "model_config_fcst = GR4JCN(\n",
    "    params=[0.529, -3.396, 407.29, 1.072, 16.9, 0.947],\n",
    "    Gauge=[\n",
    "        rc.Gauge.from_nc(\n",
    "            fname, data_type=data_type, alt_names=alt_names, data_kwds=data_kwds\n",
    "        )\n",
    "    ],\n",
    "    HRUs=[hru],\n",
    "    StartDate=end_date + dt.timedelta(days=1),\n",
    "    Duration=7,\n",
    "    RunName=\"NB12_forecast_run\",\n",
    ")\n",
    "\n",
    "# Update the initial states\n",
    "model_config_fcst = model_config_fcst.set_solution(hotstart)\n",
    "\n",
    "# Generate the hindcast by providing all necessary information to generate virtual stations representing\n",
    "# the forecast members\n",
    "hindcast = forecasting.hindcast_from_meteo_forecast(\n",
    "    model_config_fcst,\n",
    "    forecast=fname,\n",
    "    # We also need to provide the necessary information to create gauges inside the forecasting model:\n",
    "    data_kwds=data_kwds,\n",
    "    data_type=data_type,\n",
    "    alt_names=alt_names,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Explore the hindcast data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hindcast.hydrograph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "And, for visual representation of the forecasts:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Simulate an observed streamflow timeseries: Here we take a member from the ensemble, but you should use your own\n",
    "# observed timeseries:\n",
    "qq = hindcast.hydrograph.q_sim[0, :, 0]\n",
    "\n",
    "hindcast.hydrograph.q_sim[:, :, 0].plot.line(\"b\", x=\"time\", add_legend=False)\n",
    "hindcast.hydrograph.q_sim[1, :, 0].plot.line(\"b\", x=\"time\", label=\"forecasts\")\n",
    "qq.plot.line(\"r\", x=\"time\", label=\"observations\")\n",
    "plt.legend(loc=\"upper left\")\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}