ravenpy.utilities package

Submodules

ravenpy.utilities.analysis module

ravenpy.utilities.analysis.circular_mean_aspect(angles: ndarray) ndarray[source]

Return the mean angular aspect based on circular arithmetic approach.

Parameters:

angles (np.ndarray) – Array of aspect angles

Returns:

Circular mean of aspect array.

Return type:

np.ndarray

ravenpy.utilities.analysis.dem_prop(dem: str | Path, geom: shapely.geometry.Polygon | shapely.geometry.MultiPolygon | List[shapely.geometry.Polygon | shapely.geometry.MultiPolygon] | None = None, directory: str | Path | None = None) dict[source]

Return raster properties for each geometry.

This

Parameters:
  • dem (str or Path) – DEM raster in reprojected coordinates.

  • geom (Polygon or MultiPolygon or List[Polygon or MultiPolygon]) – Geometry over which aggregate properties will be computed. If None compute properties over entire raster.

  • directory (str or Path) – Folder to save the GDAL terrain analysis outputs.

Returns:

Dictionary storing mean elevation [m], slope [deg] and aspect [deg].

Return type:

dict

ravenpy.utilities.analysis.gdal_aspect_analysis(dem: str | Path, set_output: str | Path | bool = False, flat_values_are_zero: bool = False) ndarray | osgeo.gdal.Dataset[source]

Return the aspect of the terrain from the DEM.

The aspect is the compass direction of the steepest slope (0: North, 90: East, 180: South, 270: West).

Parameters:
  • dem (str or Path) – Path to file storing DEM.

  • set_output (str or Path or bool) – If set to a valid filepath, will write to this path, otherwise will use an in-memory gdal.Dataset.

  • flat_values_are_zero (bool) – Designate flat values with value zero. Default: -9999.

Returns:

Aspect array.

Return type:

np.ndarray

Notes

Ensure that the DEM is in a projected coordinate, not a geographic coordinate system, so that the horizontal scale is the same as the vertical scale (m).

ravenpy.utilities.analysis.gdal_slope_analysis(dem: str | Path, set_output: str | Path | None = None, units: str = 'degree') ndarray[source]

Return the slope of the terrain from the DEM.

The slope is the magnitude of the gradient of the elevation.

Parameters:
  • dem (str or Path) – Path to file storing DEM.

  • set_output (str or Path, optional) – If set to a valid filepath, will write to this path, otherwise will use an in-memory gdal.Dataset.

  • units (str) – Slope units. Default: ‘degree’.

Returns:

Slope array.

Return type:

np.ndarray

Notes

Ensure that the DEM is in a projected coordinate, not a geographic coordinate system, so that the horizontal scale is the same as the vertical scale (m).

ravenpy.utilities.analysis.geom_prop(geom: shapely.geometry.Polygon | shapely.geometry.MultiPolygon | shapely.geometry.GeometryCollection) dict[source]

Return a dictionary of geometry properties.

Parameters:

geom (Polygon or MultiPolygon or GeometryCollection) – Geometry to analyze.

Returns:

Dictionary storing polygon area, centroid location, perimeter and gravelius shape index.

Return type:

dict

Notes

Some of the properties should be computed using an equal-area projection.

ravenpy.utilities.calibration module

class ravenpy.utilities.calibration.SpotSetup(config: ~ravenpy.config.rvs.Config, low: ~ravenpy.config.base.Params | ~typing.Sequence, high: [<class 'ravenpy.config.base.Params'>, typing.Sequence], workdir: str | ~pathlib.Path | None = None)[source]

Bases: object

evaluation()[source]

Return the observation.

Since Raven computes the objective function itself, we simply return a placeholder.

init_params(low: ~ravenpy.config.base.Params | ~typing.Sequence, high: [<class 'ravenpy.config.base.Params'>, typing.Sequence])[source]
objectivefunction(evaluation, simulation)[source]

Return the objective function.

Note that we short-circuit the evaluation and simulation entries, since the objective function has already been computed by Raven.

parameters()[source]

Return a random parameter combination.

simulation(x)[source]

Run the model, but return a placeholder value instead of the model output.

ravenpy.utilities.checks module

Checks for various geospatial and IO conditions.

ravenpy.utilities.checks.boundary_check(*args: str | Path, max_y: int | float = 60, min_y: int | float = -60) None[source]

Verify that boundaries do not exceed specific latitudes for geographic coordinate data. Emit a UserWarning if so.

Parameters:
  • *args (Sequence of str or Path) – str or Path to file(s)

  • max_y (int or float) – Maximum value allowed for latitude. Default: 60.

  • min_y (int or float) – Minimum value allowed for latitude. Default: -60.

ravenpy.utilities.checks.feature_contains(point: Tuple[str | float | int, str | float | int] | shapely.geometry.Point, shp: str | Path | List[str | Path]) dict | bool[source]

Return the first feature containing a location.

Parameters:
  • point (tuple[Union[int, float, str], Union[str, float, int]], Point]) – Geographic coordinates of a point (lon, lat) or a shapely Point.

  • shp (str or Path or list of str or Path) – The path to the file storing the geometries.

Returns:

The feature found.

Return type:

dict or bool

Notes

This is really slow. Another approach is to use the fiona.Collection.filter method.

ravenpy.utilities.checks.multipolygon_check(geom: shapely.geometry.GeometryCollection) None[source]

Perform a check to verify a geometry is a MultiPolygon

Parameters:

geom (GeometryCollection)

Return type:

None

ravenpy.utilities.checks.single_file_check(file_list: Sequence[str | Path]) Any[source]

Return the first element of a file list. Raise an error if the list is empty or contains more than one element.

Parameters:

file_list (Sequence of str or Path)

ravenpy.utilities.coords module

ravenpy.utilities.coords.infer_scale_and_offset(da: DataArray, data_type: str, cumulative: bool = False) Tuple[float, float][source]

Return scale and offset parameters from data.

Infer scale and offset parameters describing the linear transformation from the units in file to Raven compliant units.

Parameters:
  • da (xr.DataArray) – Input data.

  • data_type (str) – Raven data type, e.g. ‘PRECIP’, ‘TEMP_AVE’, etc.

  • cumulative (bool) – Default: False.

Returns:

Scale and offset parameters.

Return type:

float, float

Notes

Does not work with accumulated variables.

ravenpy.utilities.coords.param(model)[source]

Return a parameter coordinate.

Parameters:

model (str) – Model name.

ravenpy.utilities.coords.realization(n)[source]

Return a realization coordinate.

Parameters:

n (int) – Size of the ensemble.

ravenpy.utilities.coords.units_transform(source, target, context='hydro')[source]

Return linear transform parameters to convert one unit to another.

If the target unit is given by y = ax + b, where x is the value of the source unit, then this function returns a, b.

Parameters:
  • source (str, pint.Unit) – Source unit string, pint-recognized.

  • target (str) – Target unit string, pint-recognized.

  • context (str, optional) – Context of unit conversion. Default: “hydro”.

ravenpy.utilities.forecasting module

Created on Fri Jul 17 09:11:58 2020

@author: ets

ravenpy.utilities.forecasting.climatology_esp(config, workdir: str | Path | None = None, years: List[int] | None = None, overwrite: bool = False) EnsembleReader[source]

Ensemble Streamflow Prediction based on historical variability.

Run the model using forcing for different years. No model warm-up is performed by this function, make sure the initial states are consistent with the start date.

Parameters:
  • config (ravenpy.config.rvs.Config) – Model configuration.

  • years (List[int]) – Years from which forcing time series will be drawn. If None, run for all years where forcing data is available.

  • workdir (str or Path) – The path to rv files and model outputs. If None, create a temporary directory.

  • overwrite (bool) – Whether to overwrite existing values or not. Default: False

Returns:

Class facilitating the analysis of multiple Raven outputs.

Return type:

EnsembleReader

ravenpy.utilities.forecasting.compute_forecast_flood_risk(forecast: Dataset, flood_level: float, thredds: str = 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/') Dataset[source]

Returns the empirical exceedance probability for each forecast day based on a flood level threshold.

Parameters:
  • forecast (xr.Dataset) – Ensemble or deterministic streamflow forecast.

  • flood_level (float) – Flood level threshold. Will be used to determine if forecasts exceed this specified flood threshold. Should be in the same units as the forecasted streamflow.

  • thredds (str) – The thredds server url. Default: “https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/

Returns:

Time series of probabilities of flood level exceedance.

Return type:

xr.Dataset

ravenpy.utilities.forecasting.ensemble_prediction(config, forecast: str | Path, ens_dim: str = 'member', workdir=None, overwrite=True, **kwds) EnsembleReader[source]

Ensemble Streamflow Prediction based on historical weather forecasts (CASPAR or other).

Run the model using forcing for different years. No model warm-up is performed by this function, make sure the initial states are consistent with the start date.

Parameters:
  • config (ravenpy.config.rvs.Config) – Model configuration.

  • forecast (str or Path) – Forecast subsetted to the catchment location (.nc).

  • ens_dim (str) – Name of dimension to iterate over.

  • workdir (str or Path) – The path to rv files and model outputs. If None, create temporary directory.

  • overwrite (bool) – Overwrite files when writing to disk.

  • **kwds – Keywords for the Gauge.from_nc function.

Returns:

Class facilitating the analysis of multiple Raven outputs.

Return type:

EnsembleReader

ravenpy.utilities.forecasting.hindcast_climatology_esp(config: Config, warm_up_duration: int, years: List[int] | None = None, hindcast_years: List[int] | None = None, workdir: str | Path | None = None, overwrite: bool = False) Dataset[source]

Hindcast of Ensemble Prediction Streamflow.

This function runs an emulator initialized for each year in hindcast_years, using the forcing time series for each year in years. This allows an assessment of the performance of the ESP. The total number of simulations is given by len(years) * len(hindcasts_years).

Parameters:
  • config (ravenpy.config.rvs.Config) – Model configuration. Initial states will be overwritten.

  • warm_up_duration (int) – Number of days to run the model prior to the starting date to initialize the state variables.

  • workdir (Path) – Work directory. If None, creates a temporary directory.

  • years (List[int]) – Years from which forcing time series will be drawn. If None, run for all years where forcing data is available.

  • hindcast_years (List[int]) – Years for which the model will be initialized and the climatology_esp function run. Defaults to all years when forcing data is available.

  • overwrite (bool) – If True, overwrite existing files.

Returns:

The array containing the (init, member, lead) dimensions ready for using in climpred. (qsim)

Return type:

xarray.DataArray

Notes

The dataset output dimensions are
  • init: hindcast issue date,

  • member: ESP members of the hindcasting experiment,

  • lead: number of lead days of the forecast.

ravenpy.utilities.forecasting.hindcast_from_meteo_forecast(config, forecast: str | Path, ens_dim: str = 'member', workdir=None, overwrite=True, **kwds) EnsembleReader

Ensemble Streamflow Prediction based on historical weather forecasts (CASPAR or other).

Run the model using forcing for different years. No model warm-up is performed by this function, make sure the initial states are consistent with the start date.

Parameters:
  • config (ravenpy.config.rvs.Config) – Model configuration.

  • forecast (str or Path) – Forecast subsetted to the catchment location (.nc).

  • ens_dim (str) – Name of dimension to iterate over.

  • workdir (str or Path) – The path to rv files and model outputs. If None, create temporary directory.

  • overwrite (bool) – Overwrite files when writing to disk.

  • **kwds – Keywords for the Gauge.from_nc function.

Returns:

Class facilitating the analysis of multiple Raven outputs.

Return type:

EnsembleReader

ravenpy.utilities.forecasting.to_climpred_hindcast_ensemble(hindcast: Dataset, observations: Dataset) climpred.HindcastEnsemble[source]

Create a hindcasting object that can be used by the climpred toolbox for hindcast verification.

Parameters:
  • hindcast (xarray.Dataset) – The hindcasted streamflow data for a given period.

  • observations (xarray.Dataset) – The streamflow observations that are used to verify the hindcasts.

Returns:

The hindcast ensemble formatted to be used in climpred.

Return type:

climpred.HindcastEnsemble

ravenpy.utilities.forecasting.warm_up(config, duration: int, workdir: str | Path | None = None, overwrite: bool = False) Config[source]

Run the model on a time series preceding the start date.

Parameters:
  • config (ravenpy.config.rvs.Config) – Model configuration.

  • duration (int) – Number of days the warm-up simulation should last before the start date.

  • workdir (Path) – Work directory.

  • overwrite (bool) – If True, overwrite existing files.

Returns:

Model configuration with initial state set by running the model prior to the start date.

Return type:

ravenpy.config.rvs.Config

ravenpy.utilities.geo module

Tools for performing geospatial translations and transformations.

ravenpy.utilities.geo.determine_upstream_ids(fid: str, df: DataFrame | geopandas.GeoDataFrame, basin_field: str | None = None, downstream_field: str | None = None, basin_family: str | None = None) DataFrame | geopandas.GeoDataFrame[source]

Return a list of upstream features by evaluating the downstream networks.

Parameters:
  • fid (str) – feature ID of the downstream feature of interest.

  • df (pd.DataFrame) – A Dataframe comprising the watershed attributes.

  • basin_field (str) – The field used to determine the id of the basin according to hydro project.

  • downstream_field (str) – The field identifying the downstream sub-basin for the hydro project.

  • basin_family (str, optional) – Regional watershed code (For HydroBASINS dataset).

Returns:

Basins ids including fid and its upstream contributors.

Return type:

pd.DataFrame

ravenpy.utilities.geo.find_geometry_from_coord(lon: float, lat: float, df: geopandas.GeoDataFrame) geopandas.GeoDataFrame[source]

Return the geometry containing the given coordinates.

lonfloat

Longitude.

latfloat

Latitude.

dfGeoDataFrame

Data.

Returns:

Record whose geometry contains the point.

Return type:

GeoDataFrame

ravenpy.utilities.geo.generic_raster_clip(raster: str | Path, output: str | Path, geometry: shapely.geometry.Polygon | shapely.geometry.MultiPolygon | List[shapely.geometry.Polygon | shapely.geometry.MultiPolygon], touches: bool = False, fill_with_nodata: bool = True, padded: bool = True, raster_compression: str = 'lzw') None[source]

Crop a raster file to a given geometry.

Parameters:
  • raster (Union[str, Path]) – Path to input raster.

  • output (Union[str, Path]) – Path to output raster.

  • geometry (Union[Polygon, MultiPolygon, List[Union[Polygon, MultiPolygon]]) – Geometry defining the region to crop.

  • touches (bool) – Whether to include cells that intersect the geometry or not. Default: True.

  • fill_with_nodata (bool) – Whether to keep pixel values for regions outside of shape or set as nodata or not. Default: True.

  • padded (bool) – Whether to add a half-pixel buffer to shape before masking or not. Default: True.

  • raster_compression (str) – Level of data compression. Default: ‘lzw’.

Return type:

None

ravenpy.utilities.geo.generic_raster_warp(raster: str | Path, output: str | Path, target_crs: str | dict | pyproj.CRS, raster_compression: str = 'lzw') None[source]

Reproject a raster file.

Parameters:
  • raster (Union[str, Path]) – Path to input raster.

  • output (Union[str, Path]) – Path to output raster.

  • target_crs (str or dict) – Target projection identifier.

  • raster_compression (str) – Level of data compression. Default: ‘lzw’.

Return type:

None

ravenpy.utilities.geo.generic_vector_reproject(vector: str | Path, projected: str | Path, source_crs: str | pyproj.CRS = 4326, target_crs: str | pyproj.CRS | None = None) None[source]

Reproject all features and layers within a vector file and return a GeoJSON

Parameters:
  • vector (Union[str, Path]) – Path to a file containing a valid vector layer.

  • projected (Union[str, Path]) – Path to a file to be written.

  • source_crs (Union[str, pyproj.crs.CRS]) – CRS for the source geometry. Default: 4326.

  • target_crs (Union[str, pyproj.crs.CRS]) – CRS for the target geometry.

Return type:

None

ravenpy.utilities.geo.geom_transform(geom: shapely.geometry.GeometryCollection | shapely.geometry.shape, source_crs: str | int | pyproj.CRS = 4326, target_crs: str | int | pyproj.CRS | None = None) shapely.geometry.GeometryCollection[source]

Change the projection of a geometry.

Assuming a geometry’s coordinates are in a source_crs, compute the new coordinates under the target_crs.

Parameters:
  • geom (Union[GeometryCollection, shape]) – Source geometry.

  • source_crs (Union[str, int, CRS]) – Projection identifier (proj4) for the source geometry, e.g. ‘+proj=longlat +datum=WGS84 +no_defs’.

  • target_crs (Union[str, int, CRS]) – Projection identifier (proj4) for the target geometry.

Returns:

Reprojected geometry.

Return type:

GeometryCollection

ravenpy.utilities.geoserver module

GeoServer interaction operations.

Working assumptions for this module: * Point coordinates are passed as shapely.geometry.Point instances. * BBox coordinates are passed as (lon1, lat1, lon2, lat2). * Shapes (polygons) are passed as shapely.geometry.shape parsable objects. * All functions that require a CRS have a CRS argument with a default set to WGS84. * GEOSERVER_URL points to the GeoServer instance hosting all files. * For legacy reasons, we also accept the GEO_URL environment variable.

TODO: Refactor to remove functions that are just 2-lines of code. For example, many function’s logic essentially consists in creating the layer name. We could have a function that returns the layer name, and then other functions expect the layer name.

ravenpy.utilities.geoserver.filter_hydro_routing_attributes_wfs(attribute: str | None = None, value: str | float | int | None = None, level: int = 12, lakes: str = '1km', geoserver: str = 'https://pavics.ouranos.ca/geoserver/') str[source]

Return a URL that formats and returns a remote GetFeatures request from hydro routing dataset.

For geographic rasters, subsetting is based on WGS84 (Long, Lat) boundaries. If not geographic, subsetting based on projected coordinate system (Easting, Northing) boundaries.

Parameters:
  • attribute (list) – Attributes/fields to be queried.

  • value (str or int or float) – The requested value for the attribute.

  • level (int) – Level of granularity requested for the lakes vector (range(7,13)). Default: 12.

  • lakes ({"1km", "all"}) – Query the version of dataset with lakes under 1km in width removed (“1km”) or return all lakes (“all”).

  • geoserver (str) – The address of the geoserver housing the layer to be queried. Default: https://pavics.ouranos.ca/geoserver/.

Returns:

URL to the GeoJSON-encoded WFS response.

Return type:

str

ravenpy.utilities.geoserver.filter_hydrobasins_attributes_wfs(attribute: str, value: str | float | int, domain: str, geoserver: str = 'https://pavics.ouranos.ca/geoserver/') str[source]

Return a URL that formats and returns a remote GetFeatures request from the USGS HydroBASINS dataset.

For geographic raster grids, subsetting is based on WGS84 (Long, Lat) boundaries. If not geographic, subsetting based on projected coordinate system (Easting, Northing) boundaries.

Parameters:
  • attribute (str) – Attribute/field to be queried.

  • value (str or float or int) – Value for attribute queried.

  • domain ({"na", "ar"}) – The domain of the HydroBASINS data.

  • geoserver (str) – The address of the geoserver housing the layer to be queried. Default: https://pavics.ouranos.ca/geoserver/.

Returns:

URL to the GeoJSON-encoded WFS response.

Return type:

str

ravenpy.utilities.geoserver.get_hydro_routing_attributes_wfs(attribute: Sequence[str], level: int = 12, lakes: str = '1km', geoserver: str = 'https://pavics.ouranos.ca/geoserver/') str[source]

Return a URL that formats and returns a remote GetFeatures request from hydro routing dataset.

For geographic rasters, subsetting is based on WGS84 (Long, Lat) boundaries. If not geographic, subsetting based on projected coordinate system (Easting, Northing) boundaries.

Parameters:
  • attribute (list) – Attributes/fields to be queried.

  • level (int) – Level of granularity requested for the lakes vector (range(7,13)). Default: 12.

  • lakes ({"1km", "all"}) – Query the version of dataset with lakes under 1km in width removed (“1km”) or return all lakes (“all”).

  • geoserver (str) – The address of the geoserver housing the layer to be queried. Default: https://pavics.ouranos.ca/geoserver/.

Returns:

URL to the GeoJSON-encoded WFS response.

Return type:

str

ravenpy.utilities.geoserver.get_hydro_routing_location_wfs(coordinates: Tuple[str | float | int, str | float | int], lakes: str, level: int = 12, geoserver: str = 'https://pavics.ouranos.ca/geoserver/') dict[source]

Return features from the hydro routing data set using bounding box coordinates.

For geographic rasters, subsetting is based on WGS84 (Long, Lat) boundaries. If not geographic, subsetting based on projected coordinate system (Easting, Northing) boundaries.

Parameters:
  • coordinates (Tuple[str or float or int, str or float or int]) – Geographic coordinates of the bounding box (left, down, right, up).

  • lakes ({"1km", "all"}) – Query the version of dataset with lakes under 1km in width removed (“1km”) or return all lakes (“all”).

  • level (int) – Level of granularity requested for the lakes vector (range(7,13)). Default: 12.

  • geoserver (str) – The address of the geoserver housing the layer to be queried. Default: https://pavics.ouranos.ca/geoserver/.

Returns:

A GeoJSON-derived dictionary of vector features (FeatureCollection).

Return type:

dict

ravenpy.utilities.geoserver.get_hydrobasins_location_wfs(coordinates: Tuple[str | float | int, str | float | int], domain: str | None = None, geoserver: str = 'https://pavics.ouranos.ca/geoserver/') Dict[str, str | int | float][source]

Return features from the USGS HydroBASINS data set using bounding box coordinates.

For geographic raster grids, subsetting is based on WGS84 (Long, Lat) boundaries. If not geographic, subsetting based on projected coordinate system (Easting, Northing) boundaries.

Parameters:
  • coordinates (Tuple[str or float or int, str or float or int]) – Geographic coordinates of the bounding box (left, down, right, up).

  • domain ({"na", "ar"}) – The domain of the HydroBASINS data.

  • geoserver (str) – The address of the geoserver housing the layer to be queried. Default: https://pavics.ouranos.ca/geoserver/.

Returns:

A GeoJSON-encoded vector feature.

Return type:

dict

ravenpy.utilities.geoserver.get_raster_wcs(coordinates: Iterable | Sequence[float | str], geographic: bool = True, layer: str | None = None, geoserver: str = 'https://pavics.ouranos.ca/geoserver/') bytes[source]

Return a subset of a raster image from the local GeoServer via WCS 2.0.1 protocol.

For geographic raster grids, subsetting is based on WGS84 (Long, Lat) boundaries. If not geographic, subsetting based on projected coordinate system (Easting, Northing) boundaries.

Parameters:
  • coordinates (Sequence of int or float or str) – Geographic coordinates of the bounding box (left, down, right, up)

  • geographic (bool) – If True, uses “Long” and “Lat” in WCS call. Otherwise, uses “E” and “N”.

  • layer (str) – Layer name of raster exposed on GeoServer instance, e.g. ‘public:CEC_NALCMS_LandUse_2010’

  • geoserver (str) – The address of the geoserver housing the layer to be queried. Default: https://pavics.ouranos.ca/geoserver/.

Returns:

A GeoTIFF array.

Return type:

bytes

ravenpy.utilities.geoserver.hydro_routing_upstream(fid: str | float | int, level: int = 12, lakes: str = '1km', geoserver: str = 'https://pavics.ouranos.ca/geoserver/') Series[source]

Return a list of hydro routing features located upstream.

Parameters:
  • fid (str or float or int) – Basin feature ID code of the downstream feature.

  • level (int) – Level of granularity requested for the lakes vector (range(7,13)). Default: 12.

  • lakes ({"1km", "all"}) – Query the version of dataset with lakes under 1km in width removed (“1km”) or return all lakes (“all”).

  • geoserver (str) – The address of the geoserver housing the layer to be queried. Default: https://pavics.ouranos.ca/geoserver/.

Returns:

Basins ids including fid and its upstream contributors.

Return type:

pd.Series

ravenpy.utilities.geoserver.hydrobasins_aggregate(gdf: DataFrame) DataFrame[source]

Aggregate multiple HydroBASINS watersheds into a single geometry.

Parameters:

gdf (pd.DataFrame) – Watershed attributes indexed by HYBAS_ID

Return type:

pd.DataFrame

ravenpy.utilities.geoserver.hydrobasins_upstream(feature: dict, domain: str) DataFrame[source]

Return a list of HydroBASINS features located upstream.

Parameters:
  • feature (dict) – Basin feature attributes, including the fields [“HYBAS_ID”, “NEXT_DOWN”, “MAIN_BAS”].

  • domain ({"na", "ar"}) – Domain of the feature, North America or Arctic.

Returns:

Basins ids including fid and its upstream contributors.

Return type:

pd.Series

ravenpy.utilities.geoserver.select_hybas_domain(bbox: Tuple[int | float, int | float, int | float, int | float] | None = None, point: Tuple[int | float, int | float] | None = None) str[source]

Provided a given coordinate or boundary box, return the domain name of the geographic region the coordinate is located within.

Parameters:
  • bbox (Optional[Tuple[Union[float, int], Union[float, int], Union[float, int], Union[float, int]]]) – Geographic coordinates of the bounding box (left, down, right, up).

  • point (Optional[Tuple[Union[float, int], Union[float, int]]]) – Geographic coordinates of an intersecting point (lon, lat).

Returns:

The domain that the coordinate falls within. Possible results: “na”, “ar”.

Return type:

str

ravenpy.utilities.graphs module

Library to perform graphs for the streamflow time series analysis.

The following graphs can be plotted:
  • hydrograph

  • mean_annual_hydrograph

  • spaghetti_annual_hydrograph

ravenpy.utilities.graphs.forecast(file: str | Path, fcst_var: str = 'q_sim') Figure[source]

Create a graphic of the hydrograph for each forecast member.

Parameters:
  • file (str or Path) – Raven output file containing simulated streamflows.

  • fcst_var (str) – Name of the streamflow variable.

Return type:

matplotlib.pyplot.Figure

ravenpy.utilities.graphs.hindcast(file: str | Path, fcst_var: str, qobs: str | Path, qobs_var: str) Figure[source]

Create a graphic of the hydrograph for each hindcast member.

Parameters:
  • file (str or Path) – Raven output file containing simulated streamflows.

  • fcst_var (str) – Name of the streamflow variable.

  • qobs (str or Path) – Streamflow observation file, with times matching the hindcast.

  • qobs_var (str) – Name of the streamflow observation variable.

Return type:

matplotlib.pyplot.Figure

ravenpy.utilities.graphs.hydrograph(file_list: Sequence[str | Path])[source]

Create a graphic of the hydrograph for each model simulation.

Parameters:

file_list (Sequence of str or Path) – Raven output files containing simulated streamflows.

ravenpy.utilities.graphs.mean_annual_hydrograph(file_list: Sequence[str | Path])[source]

Create a graphic of the mean hydrological cycle for each model simulation.

Parameters:

file_list (Sequence of str or Path) – Raven output files containing simulated streamflows.

ravenpy.utilities.graphs.spaghetti_annual_hydrograph(file: str | Path)[source]

Create a spaghetti plot of the mean hydrological cycle for one model simulations.

The mean simulation is also displayed.

Parameters:

file (str or Path) – Raven output files containing simulated streamflows of one model.

ravenpy.utilities.graphs.ts_fit_graph(ts: DataArray, params: DataArray) Figure[source]

Create graphic showing a histogram of the data and the distribution fitted to it.

The graphic contains one panel per watershed.

Parameters:
  • ts (xr.DataArray) – Stream flow time series with dimensions (time, nbasins).

  • params (xr.DataArray) – Fitted distribution parameters returned by xclim.land.fit indicator.

Returns:

Figure showing a histogram and the parameterized pdf.

Return type:

matplotlib.pyplot.Figure

ravenpy.utilities.graphs.ts_graphs(file, trend=True, alpha=0.05)[source]

Create a figure with the statistics so one can see a trend in the data.

Graphs for time series statistics.

Parameters:

file (str or Path) – xarray-compatible file containing streamflow statistics for one run.

ravenpy.utilities.io module

Tools for reading and writing geospatial data formats.

ravenpy.utilities.io.address_append(address: str | Path) str[source]

Format a URL/URI to be more easily read with libraries such as “rasterstats”.

Parameters:

address (Union[str, Path]) – URL/URI to a potential zip or tar file

Returns:

URL/URI prefixed for archive type

Return type:

str

ravenpy.utilities.io.archive_sniffer(archives: str | Path | List[str | Path], working_dir: str | Path | None = None, extensions: Sequence[str] | None = None) List[str | Path][source]

Return a list of locally unarchived files that match the desired extensions.

Parameters:
  • archives (str or Path or list of str or Path) – Archive location or list of archive locations.

  • working_dir (str or Path, optional) – String or Path to a working location.

  • extensions (Sequence of str, optional) – List of accepted extensions.

Returns:

List of files with matching accepted extensions.

Return type:

list of str or Path

ravenpy.utilities.io.crs_sniffer(*args: str | Path | Sequence[str | Path]) List[int | str] | str | int[source]

Return the list of CRS found in files.

Parameters:

args (Union[str, Path, Sequence[Union[str, Path]]]) – Path(s) to the file(s) to examine.

Returns:

Returns either a list of CRSes or a single CRS definition, depending on the number of instances found.

Return type:

Union[List[str], str]

ravenpy.utilities.io.generic_extract_archive(resources: str | Path | List[bytes | str | Path], output_dir: str | Path | None = None) List[str][source]

Extract archives (tar/zip) to a working directory.

Parameters:
  • resources (str or Path or list of bytes or str or Path) – List of archive files (if netCDF files are in list, they are passed and returned as well in the return).

  • output_dir (str or Path, optional) – String or Path to a working location (default: temporary folder).

Returns:

List of original or of extracted files.

Return type:

list

ravenpy.utilities.io.get_bbox(vector: str | Path, all_features: bool = True) Tuple[float, float, float, float][source]

Return bounding box of all features or the first feature in file.

Parameters:
  • vector (str or Path) – A path to file storing vector features.

  • all_features (bool) – Return the bounding box for all features. Default: True.

Returns:

Geographic coordinates of the bounding box (lon0, lat0, lon1, lat1).

Return type:

float, float, float, float

ravenpy.utilities.io.is_within_directory(directory: str | PathLike, target: str | PathLike) bool[source]
ravenpy.utilities.io.raster_datatype_sniffer(file: str | Path) str[source]

Return the type of the raster stored in the file.

Parameters:

file (Union[str, Path]) – Path to file.

Returns:

rasterio datatype of array values

Return type:

str

ravenpy.utilities.io.safe_extract(tar: TarFile, path: str = '.', members=None, *, numeric_owner=False) None[source]

ravenpy.utilities.mk_test module

Created on Wed Jul 29 09:16:06 2015 @author: Michael Schramm

ravenpy.utilities.mk_test.check_num_samples(beta: float, delta: float, std_dev: float, alpha: float = 0.05, n: float = 4, num_iter: int = 1000, tol: float = 1e-06, num_cycles: int = 10000, m: int = 5) int | float[source]

Check number of samples.

This function is an implementation of the “Calculation of Number of Samples Required to Detect a Trend” section written by Sat Kumar Tomer (satkumartomer@gmail.com) which can be found at: http://vsp.pnnl.gov/help/Vsample/Design_Trend_Mann_Kendall.htm As stated on the webpage in the URL above the method uses a Monte-Carlo simulation to determine the required number of points in time, n, to take a measurement in order to detect a linear trend for specified small probabilities that the MK test will make decision errors. If a non-linear trend is actually present, then the value of n computed by VSP is only an approximation to the correct n. If non-detects are expected in the resulting data, then the value of n computed by VSP is only an approximation to the correct n, and this approximation will tend to be less accurate as the number of non-detects increases.

Parameters:
  • beta (float) – Probability of falsely accepting the null hypothesis

  • delta (float) – Change per sample period, i.e., the change that occurs between two adjacent sampling times

  • std_dev (float) – Standard deviation of the sample points.

  • alpha (float) – Significance level (0.05 default)

  • n (int) – Initial number of sample points (4 default).

  • num_iter (int) – Number of iterations of the Monte-Carlo simulation (1000 default).

  • tol (float) – Tolerance level to decide if the predicted probability is close enough to the required statistical power value (1e-6 default).

  • num_cycles (int) – Total number of cycles of the simulation. This is to ensure that the simulation does finish regardless of convergence or not (10000 default).

  • m (int) – If the tolerance is too small then the simulation could continue to cycle through the same sample numbers over and over. This parameter determines how many cycles to look back. If the same number of samples has been determined m cycles ago then the simulation will stop.

Examples

>>> num_samples = check_num_samples(0.2, 1, 0.1)
ravenpy.utilities.mk_test.mk_test_calc(x: ndarray, alpha: float = 0.05) Tuple[str, float, float, float] | None[source]

Make test calculation.

This function is derived from code originally posted by Sat Kumar Tomer (satkumartomer@gmail.com) See also: http://vsp.pnnl.gov/help/Vsample/Design_Trend_Mann_Kendall.htm The purpose of the Mann-Kendall (MK) test (Mann 1945, Kendall 1975, Gilbert 1987) is to statistically assess if there is a monotonic upward or downward trend of the variable of interest over time. A monotonic upward (downward) trend means that the variable consistently increases (decreases) through time, but the trend may or may not be linear. The MK test can be used in place of a parametric linear regression analysis, which can be used to test if the slope of the estimated linear regression line is different from zero. The regression analysis requires that the residuals from the fitted regression line be normally distributed; an assumption not required by the MK test, that is, the MK test is a non-parametric (distribution-free) test. Hirsch, Slack and Smith (1982, page 107) indicate that the MK test is best viewed as an exploratory analysis and is most appropriately used to identify stations where changes are significant or of large magnitude and to quantify these findings.

Parameters:
  • x (np.array) – a vector of data.

  • alpha (float) – significance level (0.05 default)

Return type:

str, float, float, float

Notes

trend: tells the trend (increasing, decreasing or no trend) h: True (if trend is present) or False (if trend is absence) p: p value of the significance test z: normalized test statistics

Examples

>>> x = np.random.rand(100)
>>> trend, h, p, z = mk_test(x, 0.05)

ravenpy.utilities.nb_graphs module

This module contains functions creating web-friendly interactive graphics using holoviews.

The graphic outputs are meant to be displayed in a notebook. In a console, use hvplot.show(fig) to render the figures.

ravenpy.utilities.nb_graphs.hydrographs(ds: Dataset)[source]

Return a graphic showing the discharge simulations and observations.

ravenpy.utilities.nb_graphs.mean_annual_hydrograph(ds: Dataset)[source]

Return a graphic showing the discharge simulations and observations.

ravenpy.utilities.nb_graphs.spaghetti_annual_hydrograph(ds: Dataset)[source]

Create a spaghetti plot of the mean hydrological cycle for one model simulations.

ravenpy.utilities.nb_graphs.ts_fit_graph(ts: DataArray, params: DataArray) Figure[source]

Create graphic showing a histogram of the data and the distribution fitted to it.

The graphic contains one panel per watershed.

Parameters:
  • ts (xr.DataArray) – Stream flow time series with dimensions (time, nbasins).

  • params (xr.DataArray) – Fitted distribution parameters returned by xclim.land.fit indicator.

Returns:

Figure showing a histogram and the parameterized pdf.

Return type:

matplotlib.pyplot.Figure

ravenpy.utilities.ravenio module

Tools for reading outputs and writing inputs for the Raven executable.

ravenpy.utilities.ravenio.parse_configuration(fn) Dict[str, Any][source]

Parse Raven configuration file.

Returns a dictionary keyed by parameter name.

ravenpy.utilities.regionalization module

Tools for hydrological regionalization.

ravenpy.utilities.regionalization.IDW(qsims: DataArray, dist: Series) DataArray[source]

Inverse distance weighting.

Parameters:
  • qsims (xr.DataArray) – Ensemble of hydrogram stacked along the members dimension.

  • dist (pd.Series) – Distance from catchment which generated each hydrogram to target catchment.

Returns:

Inverse distance weighted average of ensemble.

Return type:

xr.DataArray

ravenpy.utilities.regionalization.distance(gauged: DataFrame, ungauged: Series) Series[source]

Return geographic distance [km] between ungauged and database of gauged catchments.

Parameters:
  • gauged (pd.DataFrame) – Table containing columns for longitude and latitude of catchment’s centroid.

  • ungauged (pd.Series) – Coordinates of the ungauged catchment.

Return type:

pd.Series

ravenpy.utilities.regionalization.multiple_linear_regression(source: DataFrame, params: DataFrame, target: DataFrame) Tuple[List[Any], List[int]][source]

Multiple Linear Regression for model parameters over catchment properties.

Uses known catchment properties and model parameters to estimate model parameter over an ungauged catchment using its properties.

Parameters:
  • source (pd.DataFrame) – Properties of gauged catchments.

  • params (pd.DataFrame) – Model parameters of gauged catchments.

  • target (pd.DataFrame) – Properties of the ungauged catchment.

Returns:

A named tuple of the estimated model parameters and the R2 of the linear regression.

Return type:

list of Any, list of int

ravenpy.utilities.regionalization.read_gauged_params(model)[source]

Return table of NASH-Sutcliffe Efficiency values and model parameters for North American catchments.

Returns:

  • pd.DataFrame – Nash-Sutcliffe Efficiency keyed by catchment ID.

  • pd.DataFrame – Model parameters keyed by catchment ID.

ravenpy.utilities.regionalization.read_gauged_properties(properties) DataFrame[source]

Return table of gauged catchments properties over North America.

Returns:

Catchment properties keyed by catchment ID.

Return type:

pd.DataFrame

ravenpy.utilities.regionalization.regionalization_params(method: str, gauged_params: DataFrame, gauged_properties: DataFrame, ungauged_properties: DataFrame, filtered_params: DataFrame, filtered_prop: DataFrame) List[float][source]

Return the model parameters to use for the regionalization.

Parameters:
  • method ({'MLR', 'SP', 'PS', 'SP_IDW', 'PS_IDW', 'SP_IDW_RA', 'PS_IDW_RA'}) – Name of the regionalization method to use.

  • gauged_params (pd.DataFrame) – A DataFrame of parameters for donor catchments (size = number of donors)

  • gauged_properties (pd.DataFrame) – A DataFrame of properties of the donor catchments (size = number of donors)

  • ungauged_properties (pd.DataFrame) – A DataFrame of properties of the ungauged catchment (size = 1)

  • filtered_params (pd.DataFrame) – A DataFrame of parameters of all filtered catchments (size = all catchments with NSE > min_NSE)

  • filtered_prop (pd.DataFrame) – A DataFrame of properties of all filtered catchments (size = all catchments with NSE > min_NSE)

Returns:

List of model parameters to be used for the regionalization.

Return type:

list

ravenpy.utilities.regionalization.regionalize(config: Config, method: str, nash: Series, params: DataFrame | None = None, props: DataFrame | None = None, target_props: Series | dict | None = None, size: int = 5, min_NSE: float = 0.6, workdir: str | Path | None = None, overwrite: bool = False, **kwds) Tuple[DataArray, Dataset][source]

Perform regionalization for catchment whose outlet is defined by coordinates.

Parameters:
  • config (ravenpy.config.rvs.Config) – Symbolic emulator configuration. Only GR4JCN, HMETS and Mohyse are supported.

  • method ({'MLR', 'SP', 'PS', 'SP_IDW', 'PS_IDW', 'SP_IDW_RA', 'PS_IDW_RA'}) – Name of the regionalization method to use.

  • nash (pd.Series) – NSE values for the parameters of gauged catchments.

  • params (pd.DataFrame) – Model parameters of gauged catchments. Needed for all but MRL method.

  • props (pd.DataFrame) – Properties of gauged catchments to be analyzed for the regionalization. Needed for MLR and RA methods.

  • target_props (pd.Series or dict) – Properties of ungauged catchment. Needed for MLR and RA methods.

  • size (int) – Number of catchments to use in the regionalization.

  • min_NSE (float) – Minimum calibration NSE value required to be considered as a donor.

  • workdir (Union[str, Path]) – Work directory. If None, a temporary directory will be created.

  • overwrite (bool) – If True, existing files will be overwritten.

  • **kwds – Model configuration parameters, including the forcing files (ts).

Returns:

  • (qsim, ensemble)

  • qsim (DataArray (time, )) – Multi-donor averaged predicted streamflow.

  • ensemble (Dataset) –

    q_simDataArray (realization, time)

    Ensemble of members based on number of donors.

    parameterDataArray (realization, param)

    Parameters used to run the model.

ravenpy.utilities.regionalization.similarity(gauged: DataFrame, ungauged: DataFrame, kind: str = 'ptp') Series[source]

Return similarity measure between gauged and ungauged catchments.

Parameters:
  • gauged (pd.DataFrame) – Gauged catchment properties.

  • ungauged (pd.DataFrame) – Ungauged catchment properties

  • kind ({'ptp', 'std', 'iqr'}) – Normalization method: peak to peak (maximum - minimum), standard deviation, inter-quartile range.

Return type:

pd.Series

ravenpy.utilities.testdata module

Tools for searching for and acquiring test data.

ravenpy.utilities.testdata.get_file(name: str | Path | Sequence[str | Path], github_url: str = 'https://github.com/Ouranosinc/raven-testdata', branch: str = 'master', cache_dir: str | Path = '/home/docs/.cache/raven_testing_data') Path | List[Path][source]

Return a file from an online GitHub-like repository. If a local copy is found then always use that to avoid network traffic.

Parameters:
  • name (str or Path or Sequence of str or Path) – Name of the file or list/tuple of names of files containing the dataset(s) including suffixes.

  • github_url (str) – URL to GitHub repository where the data is stored.

  • branch (str) – For GitHub-hosted files, the branch to download from. Default: “master”.

  • cache_dir (str or Path) – The directory in which to search for and write cached data.

Return type:

Path or list of Path

ravenpy.utilities.testdata.get_local_testdata(patterns: str | Sequence[str], temp_folder: str | Path, branch: str = 'master', _local_cache: str | Path = '/home/docs/.cache/raven_testing_data') Path | List[Path][source]

Copy specific testdata from a default cache to a temporary folder.

Return files matching pattern in the default cache dir and move to a local temp folder.

Parameters:
  • patterns (str or Sequence of str) – Glob patterns, which must include the folder.

  • temp_folder (str or Path) – Target folder to copy files and filetree to.

  • branch (str) – For GitHub-hosted files, the branch to download from. Default: “master”.

  • _local_cache (str or Path) – Local cache of testing data.

Return type:

Union[Path, List[Path]]

ravenpy.utilities.testdata.open_dataset(name: str, suffix: str | None = None, dap_url: str | None = None, github_url: str = 'https://github.com/Ouranosinc/raven-testdata', branch: str = 'master', cache: bool = True, cache_dir: str | Path = '/home/docs/.cache/raven_testing_data', **kwds) Dataset[source]

Open a dataset from the online GitHub-like repository.

If a local copy is found then always use that to avoid network traffic.

Parameters:
  • name (str) – Name of the file containing the dataset. If no suffix is given, assumed to be netCDF (‘.nc’ is appended).

  • suffix (str, optional) – If no suffix is given, assumed to be netCDF (‘.nc’ is appended). For no suffix, set “”.

  • dap_url (str, optional) – URL to OPeNDAP folder where the data is stored. If supplied, supersedes github_url.

  • github_url (str) – URL to GitHub repository where the data is stored.

  • branch (str, optional) – For GitHub-hosted files, the branch to download from.

  • cache (bool) – If True, then cache data locally for use on subsequent calls.

  • cache_dir (str or Path) – The directory in which to search for and write cached data.

  • **kwds – For NetCDF files, keywords passed to xarray.open_dataset.

Return type:

xr.Dataset

See also

xarray.open_dataset

ravenpy.utilities.testdata.query_folder(folder: str | None = None, pattern: str | None = None, github_url: str = 'https://github.com/Ouranosinc/raven-testdata', branch: str = 'master') List[str][source]

Lists the files available for retrieval from a remote git repository with get_file. If provided a folder name, will perform a globbing-like filtering operation for parent folders.

Parameters:
  • folder (str, optional) – Relative pathname of the sub-folder from the top-level.

  • pattern (str, optional) – Regex pattern to identify a file.

  • github_url (str) – URL to GitHub repository where the data is stored.

  • branch (str) – For GitHub-hosted files, the branch to download from. Default: “master”.

Return type:

list of str