Jupyterhub Configuration

Jan 1, 2019

Status Report

Okay so where are we at?

  • Spawner needs to be sussed out. Right now I really like DockerSpawner, and it seems to persist storage.
    • Each student gets a user account but it’s only used for authenetication right now. Can’t figure out volume-mapping.
  • With dockerspawner.DockerSpawner:

    • Each time someone is logged in, a container is either loaded up or built from jupyterlab_img container. These are very flexible, many stacks available.
    • In this set-up, each student gets a container, which is a full-fledged linux machine. Since Docker is managing these alongside Dockerhub in the network, communication between containers is not possible right now. should be
    • One hour of inactivity results in shutdown of container thanks to a python script from jupyter.
  • What we want: dockerspawner.SystemUserSpawner

    • with maps to home directories that exist on the jupyterhub container.
    • this way a teacher opens the docker container with the “hub” and all the students are there.
  • What we have now:

    • each student has a container with their name on it. each is its own linux machine
    • to get into their linux machines, run docker exec -ti jupyter-{surname} /bin/bash, do your thing, Ctrl-D to exit.
    • The script that gets installed in the hub container stops idle single-user servers (I think this means it shuts down the containers that are inactive).
    • The containers are spun up based on a jupyterlab image.
      • What happens when we update this? Perhaps to include files for every student?
    • Alternatively, you have all of them learn to manage push/pull from a class repository
    • It appears that to do this, we sub-class the Spawner.
    from dockerspawner import DockerSpawner
    class MyDockerSpawner(DockerSpawner):
        team_map = {
            'user1': 'team1',
            'user2': 'team1',
            'user3': 'team2',
        }

        def start(self):
            team = self.team_map[self.user.name]
            # add team volume to volumes
            self.volumes['jupyterhub-team-{}'.format(team)] = {
                'bind': '/home/shared/{}'.format(team),
                'mode': 'rw',  # or ro for read-only
            }

    c.JupyterHub.spawner_class = MyDockerSpawner

New Server

As root:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
apt-cache policy docker-ce
sudo apt-get install -y docker-ce

sudo groupadd docker
sudo usermod -aG docker $USER
# also add any users you want to be running docker. 
# I added `michael` on my machine. will have to log out/in to refresh group membership. 

sudo apt-get install -y docker-compose

As user michael:

cd repos/
git clone git clone https://github.com/mathematicalmichael/hubsetup.git
cd hubsetup/

# make sure you clean up images/volumes/containers. I didn't have much there from before, did have hello-world.

docker-compose build

This came up, may be a problem?

WARNING: The COMPOSE_PROJECT_NAME variable is not set. Defaulting to a blank string.

But it kept going…

but then.

ERROR: Service 'jupyterlab' failed to build: failed to register layer: Error processing tar file(exit status 1): write /opt/conda/lib/python3.6/site-packages/pandas/_libs/tslibs/timestamps.cpython-36m-x86_64-linux-gnu.so: no space left on device

So I went ahead and deleted a couple GB of space by removing unused conda environments and Lucas’ user account.

docker-compose up 

fairly sure this will fail because of a mis-specified IP address. Should also enable security since now I have them on this server.

the SSL is messing with me since I already have it set up on the server.

trying to launch jupyterhub with dockerspawner with jupyterhub local install.

export DOCKER_JUPYTER_IMAGE=jupyter/datascience-notebook:7254cdcfa22b

Okay well the hub worked and spawner did not.

I dove into making a custom spawner (may be necessary?), but it was a rabbit hole.

Note from 1/8/19: The image needs to be available on the machine. It is separate from the hub, so just make sure the names line up correctly by checking docker images against the jupyterhub_config.py file.


Jan 2-3, 2019

See proxy page.

Useful Reading

Here is something interesting for version-controlling notebooks. https://github.com/mwouts/jupytext

Here is an introduction to the notebook format. https://nbformat.readthedocs.io/en/latest/format_description.html TODO: You should turn this into a write-up.

The basic examples in here are actually a great demo of publishing LaTeX documents right from Jupyter. https://github.com/jupyter/nbconvert-examples

Nice article https://blog.dominodatalab.com/data-science-vs-engineering-tension-points/

This might be how to set up binderhub on your own – minikube https://github.com/jupyterhub/binderhub/blob/master/CONTRIBUTING.md

Other Projects for sharing results: https://github.com/minrk/thebelab https://github.com/QuantStack/voila Write about them in a new section summarizing sharing results.

https://github.com/jupyter/dashboards

Adding extra libraries to the jupyter-stacks image https://github.com/binder-examples/jupyter-stacks

Allowing students to get latest files without knowing git https://github.com/jupyterhub/nbgitpuller

When a link is clicked, we try to make opinionated intelligent guesses on how to do a merge automatically, without making the user do a conflict resolution. nbgitpuller is designed to be used by folks who do not know that git is being used underneath, and are only pulling content one way from a source and modifying it - not pushing it back. So we have made the following opinionated decisions.

  • If content has changed in both places, prefer local changes over remote changes.
  • If a file was deleted locally but present in the remote, remote file is restored to local repository. This allows users to get a ‘fresh copy’ of a file by just deleting the file locally & clicking the link again.
  • If a file exists locally but is untracked by git (maybe someone uploaded it manually), then rename the file, and pull in remote copy.

Hippylib-Hub, example to follow. https://github.com/g2s3-2018/hippylib-hub

Dockerspawner

Want: - Dockerspawner is nice (can restart hub without issues). - Although, restarting if all-in-one isn’t that bad, either. Temporary inconvenience.

Add this to installations!!! It’s amazing. https://github.com/yuvipanda/nbresuse

This might be a good thing to test on our server. https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

USEFUL: in Vim: r:! openssl rand -hex 32 will paste a token into a file like config.yaml

proxy:
    secretToken: xxxx

File Storage

Wow I can’t believe this exists. https://www.katacoda.com/

This is a good file-storage solution https://www.youtube.com/watch?v=hqE5c5pyfrk https://storageos.com/developers/ another alternative, which appears to be a bit more complicated to set-up (though helm-chart should be available by now), but is open-source and from redhat: https://www.youtube.com/watch?v=Fgpr2lMnBVY [16:30]

Kubernetes

Kubernetes 101 introduction https://medium.com/google-cloud/kubernetes-101-pods-nodes-containers-and-clusters-c1509e409e16

The monitoring of my memory usage led me to discover that repeated execution of plotting cells led to memory usage going through the roof. The solution was to add this cell-magic to the top of any plotting-cell: %reset -f out. What this does is purge the output of the cell


Jan 4-5, 2019

Tried and failed to get Traefik working. See proxy notes. Took some time to relax.

Note

If you base a Dockerfile on this image:

FROM juptyerhub/jupyterhub-onbuild:0.6 … then your jupyterhub_config.py adjacent to your Dockerfile will be loaded into the image and used by JupyterHub.

File permissions

Correct permissions for exposed shared directories require chmod -R 777 for the folder at the top

What this means:

Permissions: 1 – can execute 2 – can write 4 – can read

The octal number is the sum of those free permissions, i.e. 3 (1+2) – can execute and write 6 (2+4) – can write and read

Position of the digit in value: 1 – what owner can 2 – what users in the file group(class) can 3 – what users not in the file group(class) can

So the third is what we care about since no we don’t want to create users on the machine running docker.

Database

https://jupyterhub.readthedocs.io/en/stable/reference/database.html It comes pre-packaged with one, but it is recommended to use something else for production, which we will do!

Hash authentication… Very nice https://github.com/thedataincubator/jupyterhub-hashauthenticator

You can generate a good secret key with openssl rand -hex 32.

c.JupyterHub.authenticator_class = 'hashauthenticator.HashAuthenticator'
c.HashAuthenticator.secret_key = 'my secret key'  # Defaults to ''
c.HashAuthenticator.password_length = 10          # Defaults to 6
c.HashAuthenticator.show_logins = True            # Optional, defaults to False

If the show_logins option is set to True, a CSV file containing login names and passwords will be served (to admins only) at /hub/login_list. Do we want this? Maybe for analytics? If possible.

To figure out my password, I used hashauthpw --length 10 mathematicalmichael [secret key] on any computer that has run pip install jupyterhub-hashauthenticator

Multiple Spawners

multiple spawners! definitely my favorite way to go. Says this will allow them to choose upon login. If so… amazing.

[multiple spawners issue on github][https://github.com/jupyterhub/dockerspawner/issues/236]:

from dockerspawner import SystemUserSpawner

class MultiDockerImageSpawner(SystemUserSpawner):
    images = {
        'SciPy': 'jupyter/scipy-notebook:0f73f7488fa0',
        'Tensorflow': 'jupyter/tensorflow-notebook:59904dd7776a',
        'R': 'jupyter/r-notebook:59904dd7776a',
    }
    def _options_form_default(self):
        outval = """
        <label for="image">Docker Image</label>
        <select name="image">
        """
        for name, image in self.images.items():
            outval += "<option value=\"%s\">%s (%s)</option>" % (name, name, image)

        outval += """
        </select>
        """
        return outval

    def options_from_form(self, formdata):
        options = {}
        options['image'] = formdata.get('image', ['SciPy'])[0]
        self.image = self.images[options['image']]
        return options

Everything is now up and running! New images can be added with total ease, the hub restart only has minimal disruption.

A note on startup files: > IPython startup files, placed in ~/.ipython/profile_default/startup will be executed. These can be Python scripts (.py) or IPython scripts (.ipy with %magic commands). Notebooks aren’t supported as startup files, but if it really needs to be a notebook, you can use %run /path/to/notebook.ipynb in a .ipy startup file.

This means that for testing, we can create an image with a file that gets run at startup, set it as the temporary default, and launch servers.

I GOT RSTUDIO WORKING. Okay, so just, build any image you want, and reference it in the spawner. If you can execute the following and see the same output, you’ll have something working properly.

mpilosov@math-ws-204:~/Packages/deploy/singleuser$ docker run --rm -ti rstudio_test
Executing the command: jupyter notebook
[I 22:39:24.130 NotebookApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
[I 22:39:24.745 NotebookApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab
[I 22:39:24.745 NotebookApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 22:39:24.748 NotebookApp] Serving notebooks from local directory: /home/jovyan
[I 22:39:24.748 NotebookApp] The Jupyter Notebook is running at:
[I 22:39:24.748 NotebookApp] http://(620acce394ce or 127.0.0.1):8888/?token=cbf208a110db20a3fce4814d5cf1bf2e41aca5e4a165c69d
[I 22:39:24.749 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 22:39:24.749 NotebookApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://(620acce394ce or 127.0.0.1):8888/?token=cbf208a110db20a3fce4814d5cf1bf2e41aca5e4a165c69d
^C[I 22:39:36.445 NotebookApp] interrupted
Serving notebooks from local directory: /home/jovyan
0 active kernels
The Jupyter Notebook is running at:
http://(620acce394ce or 127.0.0.1):8888/?token=cbf208a110db20a3fce4814d5cf1bf2e41aca5e4a165c69d
Shutdown this notebook server (y/[n])? ^C[C 22:39:37.364 NotebookApp] received signal 2, stopping
[I 22:39:37.366 NotebookApp] Shutting down 0 kernels

Jan 6-7, 2019

Math-hub

everything is up and running on math-hub, 100 notebooks idle take up 8gb of ram.

I managed to mount volumes easily, but if you create a new volume with docker, use docker inspect to find out where it is and change the permissions.

pilosovm@math-hub:~/repos/deploy$ docker inspect rw_shared_volume
[
    {
        "CreatedAt": "2019-01-04T17:57:04-07:00",
        "Driver": "local",
        "Labels": {},
        "Mountpoint": "/var/lib/docker/volumes/rw_shared_volume/_data",
        "Name": "rw_shared_volume",
        "Options": {},
        "Scope": "local"
    }
]
pilosovm@math-hub:~/repos/deploy$ sudo chmod 777 /var/lib/docker/volumes/rw_shared_volume/_data
[sudo] password for pilosovm:
pilosovm@math-hub:~/repos/deploy$

startup scripts: bootstrap scripts https://github.com/jupyterhub/jupyterhub/tree/master/examples/bootstrap-script


Jan 8, 2019

https://www.paraview.org/web/ Paraview in-browser is now a thing… Can we somehow create a container that includes the ability to launch this application?

https://jupyter-docker-stacks.readthedocs.io/en/latest/using/recipes.html Jupyter Docker-Stacks

Python 2

Adding Python 2: dd a Python 2.x environment Python 2.x was removed from all images on August 10th, 2017, starting in tag cc9feab481f7. You can add a Python 2.x environment by defining your own Dockerfile inheriting from one of the images like so:

# Choose your desired base image
FROM jupyter/scipy-notebook:latest

# Create a Python 2.x environment using conda including at least the ipython kernel
# and the kernda utility. Add any additional packages you want available for use
# in a Python 2 notebook to the first line here (e.g., pandas, matplotlib, etc.)
RUN conda create --quiet --yes -p $CONDA_DIR/envs/python2 python=2.7 ipython ipykernel kernda numpy pandas matplotlib ipywidgets yaml && \
    conda clean -tipsy

USER root

# Create a global kernelspec in the image and modify it so that it properly activates
# the python2 conda environment.
RUN $CONDA_DIR/envs/python2/bin/python -m ipykernel install && \
$CONDA_DIR/envs/python2/bin/kernda -o -y /usr/local/share/jupyter/kernels/python2/kernel.json

USER $NB_USER

Mixed Authentication

The main authentication page on the Jupyterhub wiki is pretty useful but also kind of incomplete.

Right now I am creating users based on the folders in /home/math, authenticating with HashAuthenticator, but this example demonstrates how to mix Authentication methods.

TO DO Try google authentication on your own website.


Jan 8-10, 2019

Final Challenges

Proxy stress, user testing, configuration, security.

I spoke with Audrey about setting up a hub. She wants very clear set of instructions.


Jan 11, 2019

Finishing Touches

Need to write up an entire summary.

Here’s what needs to happen:

  1. Project Name (minimal changes)
  2. Document Process on Hub using newest version of repo.
  3. Nginx / Letsencrypt Instructions
  4. Re-do with math.computer on a new Droplet, screen cap?
  5. Wrap procedure into bash script?

Changes for a user to make to set up a new hub: - touch secrets/postgres.env and touch userlist - rename project folder before running anything else. - change .env to reflect project name, include the port in there (no changes should be necessary to any other files) - (optional) tweak limits in jupyterhub_config.py - (goal: no secrets, auth, etc). try to edit makefile so that it takes care of all of that stuff. (or wrap into a first_run.sh script). - get password the first time? hashauth… (or automate it so that it’s printed to a file)?

Customizing Spawning Options

Based on users…

from dockerspawner import DockerSpawner
class MyDockerSpawner(DockerSpawner):
    team_map = {
        'username1': 'team-a',
        'username2': 'team-b',
        'username3': 'team-a',
    }

    def start(self):
        if self.user.name in self.team_map:
            team = self.team_map[self.user.name]
            # add team volume to volumes
            self.volumes['/directory/jupyterhub-team-{}'.format(team)] = {
                'bind': '/home/jovyan/teamfolder',
                'mode': 'rw',  # or ro for read-only
            }
        return super().start()

c.JupyterHub.spawner_class = MyDockerSpawner

So, with that, refer to jupyterhub_config.py for instances of DockerSpawner and the children of that subclass. You can modify these properties (such as image)

mathematicalmichael@math-hub:~$ docker images
REPOSITORY                      TAG                 IMAGE ID            CREATED             SIZE
math-user                       latest              64698da59274        3 hours ago         11.5GB
math                            latest              7368bc798b20        4 hours ago         1.05GB
postgres                        9.5                 fc003c9dded6        29 hours ago        227MB
jupyter/datascience-notebook    latest              18c805bb3afb        3 days ago          6.32GB
jupyterhub/jupyterhub-onbuild   0.9.4               9ca16c1a77c3        3 months ago        812MB

However, my Disk-Usage on Digital Ocean reads 15GB. This tells me that math-user includes the size jupyter/datascience-notebook, since the sum of these two alone would exceed 15GB. ‘


Jan 15

Note: Copying over from walkthrough.

We will use this guy’s walkthrough.

curl -L https://raw.githubusercontent.com/wmnnd/nginx-certbot/master/init-letsencrypt.sh > init-letsencrypt.sh
sed 's/example.com/mathfight.club/g' init-letsencrypt.sh > letsencrypt.sh
mv letsencrypt.sh init-letsencrypt.sh
chmod +x init-letsencrypt.sh
sudo ./init-letsencrypt.sh

Had to edit some lines, ran out of requests for letsencrypt.