Open Science: Working towards reproducibility

Jan 1, 2019

Today I split up the documentation here on devlog and began the write-ups for Jupyterhub. I think I should segment out a section that goes through the entire process of deployment

Notes: - bash script to give you nice shortcuts - SSH key generated ahead of time, ready to log in without passwords - get Traefik proxy working - It seems extremely straightforward except that I do not know how to direct traffic to a folder. I suppose that since it is running port-mappings, I could have an nginx server

This is helpful, but it’s not using docker. https://github.com/jupyterhub/the-littlest-jupyterhub/tree/master/tljh

okay the only thing I’ve managed to set up correctly is nginx outside of docker. This is annoying.

0 3 * * 1 certbot renew --pre-hook "service nginx stop" --post-hook "service nginx start"

crontab for renewals.

hub.conf inside of /etc/nginx/sites-enabled (to be mounted)

# top-level http config for websocket headers
# If Upgrade is defined, Connection = upgrade
# If Upgrade is empty, Connection = close
map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}

# HTTP server to redirect all 80 traffic to SSL/HTTPS
server {
    listen 80;
    server_name hub.consistentbayes.com;

    # Tell all requests to port 80 to be 302 redirected to HTTPS
    return 302 https://$host$request_uri;
}

# HTTPS server to handle JupyterHub
server {
    listen 443;
    ssl on;

    server_name hub.consistentbayes.com;

    ssl_certificate /etc/letsencrypt/live/consistentbayes.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/consistentbayes.com/privkey.pem;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_prefer_server_ciphers on;
    ssl_dhparam /etc/ssl/certs/dhparam.pem;
    ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';
    
    ssl_session_timeout 1d;
    ssl_session_cache shared:SSL:50m;
    ssl_stapling on;
    ssl_stapling_verify on;
    add_header Strict-Transport-Security max-age=15768000;

    # Managing literal requests to the JupyterHub front end
    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # websocket headers
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
    }

    # Managing requests to verify letsencrypt host
    location ~ /.well-known {
        allow all;
    }
}

whereas the website looks like /etc/nginxsites-enabled/consistentbayes.conf

server {
    listen 80;
    server_name consistentbayes.com;

    # Tell all requests to port 80 to be 302 redirected to HTTPS
    return 302 https://$host$request_uri;
}

server {
    listen 443;
    ssl on;

    # INSERT OTHER SSL PARAMETERS HERE AS ABOVE
    ssl_certificate /etc/letsencrypt/live/consistentbayes.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/consistentbayes.com/privkey.pem;

    # Set the appropriate root directory
    root /var/www/consistentbayes.com/public_html;

    # Set URI handling
    location / {
        try_files $uri $uri/ =404;
    }

    # Managing requests to verify letsencrypt host
    location ~ /.well-known {
        allow all;
    }

}

Traefik Problems netstat -ltnp | grep -w ':80'

So far this is the easiest I’ve had it configuring the set-up.. with nginx. Talk to Joe about getting Traefik to work. https://blog.raveland.org/post/traefik_le/ maybe that will help??? https://www.bennadel.com/blog/3420-obtaining-a-wildcard-ssl-certificate-from-letsencrypt-using-the-dns-challenge.htm or that… (it is important to note that you need to follow instructions after feb 18 due to letsencrypt changing something major).


Jan 2, 2019

Spent day configuring proxy. Succeeded eventually on my domain, but not with a reverse proxy. Just a regular one…

Messed around with apache but couldn’t make it work. Documentation on Jupyter’s website is god awful, there are numerous omissions and a typo in the configuration files (including ones from the repository I got my original setup files), no context or setup instructions. They just assume you know what you are doing.

I do not, though. So let’s go through it.

Here’s what a reverse-proxy is doing. It catches requests to a website and handles them.

When you hit consistentbayes.com, it directs you to a folder with a bunch of files that make up my website. I had that working with Apache but couldn’t do what I wanted, which was direct traffic to “hub.consistentbayes.com” instead.

I got the “hub” part set up by adding an A Record into my DNS Control Panel (talk more about that later). Now the proxy just had to be able to handle and direct that request to a jupyterhub instance running in a docker container that was exposing itself on http://127.0.0.1:8000 (I’ll show config files later). The hub would be in a container that is just a Linux machine, with users and everything in there.

If the container name is “nostalgic_colden” (TO DO: figure out how to name these)… then running

docker exec -ti -u mpilosov nostalgic_colden /bin/bash

will log you into the linux computer as “mpilosov”

The user accounts are handled at build-time, but can be managed within the computer just as you would on a linux machine (to log in as root, omit -u mpilosov above).

So… those accounts. Github usernames. By far the best authentication method I found, but it relied on having a Fully Qualified Domain Name (website name, which the server at school would not let me do). But… we can loop the authenticator to whatever we want.

Here are my nginx configurations:

In /etc/nginx/sites-enabled/consistentbayes.conf

server {
    listen 80;
    server_name NO_HUB.DOMAIN.TLD;

    # Tell all requests to port 80 to be 302 redirected to HTTPS
    return 302 https://$host$request_uri;
}

server {
    listen 443;
    ssl on;

    # INSERT OTHER SSL PARAMETERS HERE AS ABOVE
    # SSL cert may differ

    # Set the appropriate root directory
    root /var/www/html;

    # Set URI handling
    location / {
        try_files $uri $uri/ =404;
    }

    # Managing requests to verify letsencrypt host
    location ~ /.well-known {
        allow all;
    }

}

In /etc/nginx/sites-enabled/jupyterhub.conf (titles dont matter it seems).

# top-level http config for websocket headers
# If Upgrade is defined, Connection = upgrade
# If Upgrade is empty, Connection = close
map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}

# HTTP server to redirect all 80 traffic to SSL/HTTPS
server {
    listen 80;
    server_name HUB.DOMAIN.TLD;

    # Tell all requests to port 80 to be 302 redirected to HTTPS
    return 302 https://$host$request_uri;
}

# HTTPS server to handle JupyterHub
server {
    listen 443;
    ssl on;

    server_name HUB.DOMAIN.TLD;

    ssl_certificate /etc/letsencrypt/live/HUB.DOMAIN.TLD/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/HUB.DOMAIN.TLD/privkey.pem;

    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_prefer_server_ciphers on;
    ssl_dhparam /etc/ssl/certs/dhparam.pem;
    ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';
    ssl_session_timeout 1d;
    ssl_session_cache shared:SSL:50m;
    ssl_stapling on;
    ssl_stapling_verify on;
    add_header Strict-Transport-Security max-age=15768000;

    # Managing literal requests to the JupyterHub front end
    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # websocket headers
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
    }

    # Managing requests to verify letsencrypt host
    location ~ /.well-known {
        allow all;
    }
}

Talk about how you had to generate letsencrypt scripts for nginx with certbot.

sudo apt install python-certbot
sudo apt-get update
sudo apt-get install software-properties-common
sudo add-apt-repository universe
sudo add-apt-repository ppa:certbot/certbot
sudo apt-get update
sudo apt-get install python-certbot-nginx 
sudo certbot certonly

I actually ran into trouble being unable to start nginx with the sites enabled. Kind of a catch-22. Removed letsencrypt files

$ sudo certbot certonly --webroot -w /var/www/example -d example.com -d www.example.com -w /var/www/thing -d thing.is -d m.thing.is

This command will obtain a single cert for example.com, www.example.com, thing.is, and m.thing.is; it will place files below /var/www/example to prove control of the first two domains, and under /var/www/thing for the second pair.

I dont quite want that. Let’s try the interactive.; sudo certbot certonly Hit 1, enter website names (consistentbayes.com, www.consistentbayes.com),

Then do the same again for the hub.consistentbayes.com

And we’re golden. The files have been placed where nginx is expecting them to be. (e.g. /etc/letsencrypt/live/hub.consistentbayes.com/fullchain.pem)

And also you will need to issue the following command once.

openssl dhparam -out /etc/ssl/certs/dhparam.pem 4096

And then go ahead and make the Dockerfile:

# Designed to be run as
#
# docker run -it -p 8000:8000 jupyterhub/oauthenticator

FROM jupyterhub/jupyterhub

MAINTAINER Project Jupyter <ipython-dev@scipy.org>

# Install oauthenticator from git
RUN python3 -m pip install oauthenticator
RUN python3 -m pip install notebook>=4.0
# Create oauthenticator directory and put necessary files in it
RUN mkdir /srv/oauthenticator
WORKDIR /srv/oauthenticator
ENV OAUTHENTICATOR_DIR /srv/oauthenticator
ADD jupyterhub_config.py jupyterhub_config.py
ADD addusers.sh /srv/oauthenticator/addusers.sh
ADD userlist /srv/oauthenticator/userlist
ADD ssl /srv/oauthenticator/ssl
RUN chmod 700 /srv/oauthenticator

RUN ["sh", "/srv/oauthenticator/addusers.sh"]

Then this bash script addusers.sh:

#!/bin/sh

IFS="
"
for line in `cat userlist`; do
  test -z "$line" && continue
  user=`echo $line | cut -f 1 -d' '`
  echo "adding user $user"
  useradd -m -s /bin/bash $user
#  cp -r /srv/ipython/examples /home/$user/examples
  mkdir /home/$user/examples
# may only be necessary since we are copying files from root above.
  chown -R $user /home/$user/examples
done

And, for now, the simplest version of our jupyterhub_config.py:

# Configuration file for Jupyter Hub

c = get_config()

c.JupyterHub.log_level = 10
from oauthenticator.github import LocalGitHubOAuthenticator
c.JupyterHub.authenticator_class = LocalGitHubOAuthenticator
c.GenericOAuthenticator.login_service = 'my service'

c.LocalGitHubOAuthenticator.create_system_users = True

c.Authenticator.whitelist = whitelist = set()
c.JupyterHub.admin_users = admin = set()

import os
import sys

join = os.path.join

here = os.path.dirname(__file__)
root = os.environ.get('OAUTHENTICATOR_DIR', here)
sys.path.insert(0, root)

with open(join(root, 'userlist')) as f:
    for line in f:
        if not line:
            continue
        parts = line.split()
        name = parts[0]
        whitelist.add(name)
        if len(parts) > 1 and parts[1] == 'admin':
            admin.add(name)

c.GitHubOAuthenticator.oauth_callback_url = os.environ['OAUTH_CALLBACK_URL']

A userlist file with github usernames1

mpilosov admin
mathematicalmichael
eescu 

A ssl folder with encryption keys in them (which we won’t use!) (We don’t use them because we expose Jupyterhub in http, and use the reverse-proxy to handle the security).

And an env file

# add your github oauth config to this file,
# and run the container with `docker run -it -p 9000:8000 --env-file=env jupyterhub-oauth`
OAUTH_CLIENT_ID=
OAUTH_CLIENT_SECRET=
OAUTH_CALLBACK_URL=https://hub.consistentbayes.com/hub/oauth_callback

Which you get when you register the application on Github. (I filled in the third to show you, the first two are secrets!)

So from here on, we can start tweaking the spawner, volume persistence, do checks on memory limits, etc. Write up detailed instructions. Jupyterhub from docker like this is nice.

I think we can test directory-mounting right to the linux machine to be honest.

OMG I RAN THIS AND IT WORKED. I mounted volumes to folders on a per-user basis (since users live inside the container that jupyterhub is in). No write permissions, but they can see files just fine. Cant duplicate.

docker run -it -p 8000:8000 -v /home/michael/repos/:/home/mathematicalmichael/examples -v /home/michael/repos:/home/mpilosov/examples --env-file=env --name hubtest jupyterhub-oauth

Problem with relying on user data in the hub is potential updates could cause loss of files. mounting volumes would fix this for sure. But they can’t write files in this new directory, so…

I think the best storage solution is each student has their own container. hubname-studentname that a teacher can get into. For workshops, the solution presented as-is works great.

  • Make sure to upgrade pip, no one seems to.
  • volume permissions are probably things joe knows about.

VOLUMES WORKING!

--mount creates directories if they dont exist, whereas --volume does not. Good to create up at root one /shared directory. Additionally we can link them to volumes to host their data instead, managed via docker. Ideally we use a mix of the two. Adding a volume (even if it doesn’t exist) to be managed by docker would be accomplished with -v volume-name:/directory/to/mount

So we should think about what in our docker-compose we can automatically mount. Additionally (or in replacement), volumes can be handled via the spawner.

In either set up, we can shut down and restart the container all we want and data persists. We can even re-build the image. Volumes seem to just be symbolic links to directories on your unix machine, anyway, visible via docker inspect volume-name.

docker run -d -it -p 8000:8000 --mount source=/home/michael/repos/packages/lyricalart,type=bind,target=/home/mpilosov/examples/shared-folder-mp/,bind-propagation=rshared --mount source=/home/michael/repos/packages/lyricalart,type=bind,target=/home/mathematicalmichael/shared-folder-mm/,bind-propagation=rshared --env-file=env --name hub jupyterhub-oauth

Now that we’ve got this figured out… It would be nice to figure out how to get that docker-compose up and running, including with the nginx server, all based on a configuration file that takes in the website name, ip of host, etc. so you can get in, git clone, run bash script, edit the ip, and close out knowing it will work.

To build everything, we did docker build -t jupyterhub-oauth . in the directory we set up.

Definitely want to containerize each student. Currently it’s possible to get into the directories mounted above as shared by going through Terminal. Can move files in/out… not good.

To destroy containers after they close down, simply add .remove = True to the c.DockerSpawner attributes.

[8:59 PM] Pilosov, Michael oficially did it on hub.consistentbayes.com! holy shit that took forever but I linked up the pieces. Now it spawns up containers per user, with BOTH volumes that persist even as containers change and shared directories accessible by all users, automatically mounted to every user. One environment file controls everything. the version of jupyterhub, the docker image you want to spawn, etc. ​ [9:04 PM] Pilosov, Michael that is our ideal set up.. the shared folders can be determined by rules such as group membership (students within class or even, and the environment to spawn can be similarly chosen. This means we can use one hub for every student across every class with total ease. No containers stick around. At all. They just get appropriately mounted to volumes when they are created. the decision to make one hub per class is purely an aesthetic one. the hub can spin up dozens of different configurations depending on what the user needs. I’m going to package this all nice so that I can deploy to a powerful server that I can rent for like …. a couple hours for testing. Throw a whole bunch of simultaneous use-cases at it. ​ [9:08 PM] Pilosov, Michael we can even use the admin panel to start/stop servers instead of logging in simultaneously. if each single-user server has a heavy python script to run on startup, we can simulate heavy loads. I played around a little already and saw the containers being created (not restarted, created!) when I “started” the single-user server and then thrown away when I clicked “stop” which keeps our physical memory literally AT THE LOWEST possible limit at any given time since stopped containers dont have to sit around.

So, how?

I cloned the jupyterhub-deploy-docker repo. (again), and made my fixes…

I don’t know why the Makefile doesn’t handle this, but echo "POSTGRES_PASSWORD=$( openssl rand -hex 32)" >> secrets/postgres.env will create a necessary file for make build to work. And after that works, run make notebook_image to create the necessary image to be spawned based on the .env file in the root directory.

Note from 1/8/2019: Just run make secrets/postgres.env or whatever other file it needs, and it will create and set permissions. If it doesn’t create the file, then simply add it with touch.

I commented out lines 40-49 in the Makefile since I want to handle certification on my own through the reverse proxy.

Would be nice to get this working with nginx as well. But all that stuff can be handled from the bash script.

In the jupyterhub_config.py file, I removed references to SSL and set up shared volumes. It would be great to learn how to sub-class the spawner now and create rules for mounting volumes. A simple restart lets students gain new files.

I also removed SSL references from Dockerfile.jupyerhub

Docker-compose guidelines

Just tested the containerized solution and love it (even though i built small containers). I can log in, add nbextensions as root, and the changes are reflected without touching docker. No restarting required.

Can add user accounts as admin through the cPanel. If they don’t exist, a home directory or volume is created on their behalf. I wouldn’t suggest sharing directories automatically for the containerized solutions. That part should be part of a custom spawner (and require hub restart, which can also be done through cpanel since it runs on another port!).

Admin is powerful! You can start/stop servers at your whim and launch them to poke around on your own. Very cool.

For a smaller class, auto-mounting some shared directory is a great idea.


Jan 5, 2019

To get a bash script for the set-up, you need to package the reverse-proxy with docker-compose. Let’s figure out how to use Traefik. Between these two sources, you should be able to figure it out: https://github.com/defeo/jupyterhub-docker/blob/master/docker-compose.yml https://github.com/containous/traefik/blob/master/examples/quickstart/docker-compose.yml


  1. Fun note, it may be possible to accidentally revoke admin privileges from everyone. Test this. But editing the config file in the container as root should be able to make it work again. ^