We are creating docker containers for students to run a Jupyter Notebook and then embedding that notebook on Learn.co in an iframe. Because Learn.co is using SSL, the connection with the Jupyter Notebook must use SSL as well otherwise…
This is happening because when the user's page loads, it makes a request to connect to a Jupyter Notebook. This request goes through a GeoDNS server, then a location based load balancer (where the SSL is terminated), until it finally hits an instance of our app (Phoeyonce) that will create a Docker container running a Jupyter Notebook for them. The app then sends back the server and port that the user's docker container is running on so that the user's webpage can connect directly to the Jupyter Notebook.
As you can see from the diagram though, our app was in charge of terminating the SSL cert. Now, we no longer have a secure connection when the client connects directly to its Jupyter Notebook container.
No big deal, why not just…
Take the quiz: What Coding Course Is Right For Me?
Why not just follow the same path to the server as you did the first time?
The problem with trying to connect through the GeoDNS and load balancer again is that we cannot be sure that we'll end up at the same server. Now we need to get back to the same server and port because it's already running our Jupyter Notebook. The problem here is that it's not the load balancer's job to send us back to the same place, but to send us to the server with the most available resources. Right from the start, it was clear that this solution really wasn't going to work.
Why not just have the app that sets up the Jupyter Notebook terminate the SSL cert?
Doing this would require that each Docker container have a copy of the cert. This would end up exposing the cert to any user that knew to dig around and look in the right place. That's definitely not something we want to allow.
Our Solution
Our solution was to setup a proxy server that would terminate the SSL and then forward the connection to the correct server. For this to work, we needed every request sent to the Jupyter Notebook to go through our ide-proxy.ide.learn.co
proxy server and have it contain the address of the server that the user's container was already running on along with the port. This setup looks something like this:
This seemed like a great job for query params! We tried out doing something like ide-proxy.ide.learn.co/notebook?server=nyc-01&port=6578
. Unfortunately, we ran into quite a large problem when the notebook started loading on the page and started requesting more assets from the container. When the notebook tried to request additional assets, it did not know that it needed to add these particular query params to the end of each request. After trying for some time to add these query params to all requests made by the notebook (or to the header of all requests made by the notebook), we realized this was not a viable solution. The solution that we landed on was using a particularly formatted sub-domain to act as the server and port identifier!We constructed the URL sent back from the initial request to look something like servername-port.ide-proxy.ide.learn.co
(e.g. nyc-01-6578.ide-proxy.ide.learn.co
). Because the nyc-01-6578
is a subdomain, the request still came into the proxy server we set up at ide-proxy.ide.learn.co
. We were able to accomplish this by making the new GeoDNS a Wildcard DNS. Now all requests made by Jupyter Notebook will, in some way, contain the info to find the exact server and port that it's running on.The next step was to extract this oddly constructed subdomain at our proxy server to forward the request to the right place. To do this, we configured our nginx.conf
to look as follows:
http {
server {
listen 443 ssl;
# generic ssl cert settings...
server_name ~^(?(.*?))-(?d+).ide-proxy.ide.learn.co$;
location / {
proxy_pass http://$phoeyonce_host.ide.learn.co:$jupyter_port;
proxy_read_timeout 300s;
proxy_set_header Host $host;
proxy_set_header X-Real-Ip $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
}
I think the trickiest bit here is the regex at server_name
– ~^(?(.*?))-(?d+).ide-proxy.ide.learn.co$
to assign phoeyonce_host
as nyc-01
and jupyter_port
as 6578
.We can then use those variables in proxy_pass
to forward all traffic to the correct location! This setup allows the ide-proxy.ide.learn.co
proxy server to terminate the SSL cert, forward traffic to the correct server and port for a user's jupyter container and allow all subsequent requests to securely follow the same path!