00 - Introduction to JupyterLab

Before going any further:

These notebooks are best visualized by copying them to your writable-workspace on your PAVICS account, as files will be created and written on your writable-workspace that has write access. Please copy the tutorial notebooks (00-12) to that folder before continuing.

Please see the PAVICS-Hydro documentation on the PAVICS website for more details.

JupyterLab

Welcome to this PAVICS-Hydro tutorial, where we will explore the various hydrological modelling and forecasting possibilities offered by PAVICS-Hydro. The platform uses the Raven hydrological modelling framework to emulate different hydrological models, and relies on various scientific Python libraries to analyze the results. This tutorial starts by exploring the JupyterLab environment that this notebook is currently running on.

The file explorer

The file explorer to the left of your screen works in much the same way as any file explorer on Windows, Mac or Linux. Here, we have files and folders that contain notebooks and data that we will want to use in our research or operations. You can:

  • Upload files here by using drag-and-drop OR using the button to that effect above the file explorer to send files from computer to the server (e.g. watershed boundaries, model files, input data, streamflow data, etc.) as required;

  • Cut, Copy, Paste files from one folder to another in the JupyterLab server;

  • Download files by right-clicking the file and saving to your computer locally;

  • Open notebook files (.ipynb) in the editor to modify and run the codes within;

  • Shutting down a running notebook by right-clicking and selecting “Shut Down Kernel”.

The file explorer allows users to manage the files and codes. To modify the codes and run them, we need to double-click on a notebook to open it in the file editor.

The file editor

The file editor is what is being used right now to read the contents of this notebook! It is what is on the right side of your screen. If you open multiple notebooks, there will be as many tabs open on the top of the file editor. This is where the magic happens! Once a notebook has been opened, you can see that there are some “Text” cells and some “Code” cells.

The text cells give context to what is happening and can be seen as meta-comments on top of the regular code comments, to ensure everything is clear to the users. The cell you are currently reading, for example, is a code cell. If you double-click it, you can modify its contents. To make it appear as text again, press the “play” or “run” button in the button list at the top of the file editor. These texts cells follow the Markdown templating.

The code cells will actually perform the work on the PAVICS-Hydro server. For example, here is a simple code cell that will import a python package in our notebook. These code cells are in Python.

import xarray as xr

The above cell does not do anything unless we tell the notebook that it needs to be run. To run a cell, you need to select it (click on it) and press the “play”/”run” button. This will tell the PAVICS-Hydro server that it needs to run this piece of code. To see if a code is running, the small brackets to the left of code cells will briefly turn to an asterisk, and will display a number once the code has finished running. If there is an error, there will be a red box with clear error messages under the executed cell.

We can then see if importing the xarray package has worked by using a quick test. Run the below code. If it displays an error, then the importing has failed and should be run again. Also, there will be an empty space between the brackets. If it has worked, you should see a version in the brackets and the xarray version displayed under the cell!

print(xr.__version__)

Order of operations and variables in memory

Jupyter Notebooks function in the same way as scripts in most programming languages. That is to say:

  • Cells will execute the first line within that cell before the second one, etc.;

  • If a cell tries to use a variable that has not been created yet, it will cause an error;

  • If a cell creates a variable, then that variable will be available to all other cells from that point on;

  • If a cell deletes a variable, then that variable is no longer available to any cell.

To test this, you can try skipping the next cell and running the following one, first, which will return an error:

# SKIP THIS CELL FOR NOW!

# Run this cell to create variable "b"
b = 5
# You can also add comments by prefixing a line with a hashtag, like this.
a = b + 3

You will get an error saying that “name ‘b’ is not defined”.

This is normal, because the code is expecting that variable “b” exists somewhere in its memory, but it hasn’t been created yet! So, let’s now create variable “b” by running the cell that we skipped (Note that we are presenting the cells in this order because otherwise errors would break our quality testing checks!)

Now that variable “b” has been created, try and re-run the cell that gave an error previously. It should work, because now “b” exists! So you can see how ordering cells is important. It is also possible to use this to create small tests within a notebook, by changing some variable at key points between larger blocks of code.

Managing files

JupyterLab allows loading and saving files in the file explorer. Let’s explore this capability.

First we will create a random array of numbers and save them to a file. We will then read that variable back into memory and compare results.

# We will do this with the numpy package:
import numpy as np

c = np.random.rand(100)  # Create 100 random values and store them in variable "c"
np.savetxt("array_c.txt", c)  # Write the array to a file named "array_c.txt"

If you look in your workspace, you will see a new file named “array_c.txt”. All files you generate will be written to the folder in which your notebook is running from.

Now let’s read it back in and verify that the values are the same. We can do this by taking the sum of the absolute differences between each element. If the sum is 0, that means we have succeeded:

d = np.loadtxt("array_c.txt")
print(sum(abs(c - d)))

As you can see, files can be accessed easily by simply refering to them by their name if they are in the same folder as the notebook. If they aren’t you also need to specify the folder path. Example “writable-workspace/array_c.txt”. This is also true for all files, even those you upload yourself. You can also access data that can be found online on an accessible server using the URL, but we will get to that in a later notebook.

IMPORTANT: Multiple noteooks running concurrently

Notebooks are independent instances and do not communicate with one another. This means that if you load a package or a variable in memory in one notebook, the information will not be available in your other notebooks. This means you would need to import the data in the different notebooks. This will have the drawback of consuming more memory on the server, which can slow down computations for everyone. Therefore, we ask that you kindly close and shut down the notebooks once you are done with them. This can be done by right-clicking the notebook in the file explorer and selecting “Shut Down Kernel”. This will close the instance and free all memory the notebook was taking up. Thanks!

Important closing remarks

JupyerLab environments make using codes easy and repeatable, without having to worry too much about packages, data access and other such elements that can be difficult to work with. However, there are some drawbacks:

  • You are running codes on a remote server, so it might be slower than on a high-performance local computer;

  • You might require more resources than are available on the remote server;

  • You might want to implement major changes that are not compatible with the Python packages available in the PAVICS-Hydro environment.

To add packages, you can simply add a cell and “! pip install package” as required, which will add it to your local server. It will need to be re-added every time you close and re-spawn your server. You can also use the Jupyter conda plugin to install via conda (Settings –> Conda Packages Manager).

Otherwise, you can always install the PAVICS-Hydro environment on your local computer following the instructions found [here] (https://pavics-sdi.readthedocs.io/projects/raven/en/latest/index.html). Note that these instructions are for more advanced python users / developers.

In the next notebooks, we will start using your JupyterLab instance to start doing hydrological and hydroclimatological science!