Using Keras and R

Introduction

What is Keras?

Keras is a library that lets you create neural networks. It’s sticking point is that it wants to get you from 0 to trained model in a jiffy. For more detail, read about the integration with R. In this tutorial, we are going to be stepping through using Keras (via R) on a high performance computing (HPC) cluster at Stanford, specifically the Sherlock 2 cluster.

Credits

This tutorial was made possible by community members:

  • @rdrr1990: who also took the good initiative to open up an issue about some of the troubles with Keras that were discovered! This is a great example of taking open source initiative.

Container Usage

If you are like me, you don’t want to have to do this twice! The benefits of using a container are exactly that. Once it’s built and in a container registry, you don’t need to do it again. Today we are going to be writing a Docker recipe called a Dockerfile. The benefits of using Docker is that we can convert a Docker image directly to Singularity, and with Singularity we can use the container on a shared resource. This means that we get two birds with one stone, and only need to maintain one recipe file. I’ll save the “how do I create a container recipe” for another post, let’s look at the keras-r container!

1. Grab an Interactive Node

You should generally not run compute on a login node. If you are working on your local machine, you can skip this step. If you are on a shared resource, you should jump onto an interactive node. Here is how to do that, asking for a GPU node:

$ srun -p gpu --gres gpu:1 --pty bash
  • srun is the SLURM executable to run parallel jobs.
  • -p means partiion, which is in reference to a group of worker nodes you have access to.
  • --gres means “generic consumable resource,” which you can think about is a kind of attribute you are asking for your node. The count of how many comes after the name, so gpu:1 means that you want one GPU.
  • --pty means “pseudo terminal mode” and without it, we wouldn’t get an interactive node.

Asking for an interactive node, period, is important to do because assembling the container can take up enough memory that the process will be killed. It’s nothing personal, it’s just a friendly reminder that you are sharing the login nodes with other users!

2. Setup Singularity

Another terrible situation you can avoid is filling up the default cache (in your $HOME) so that you use your entire quota, are unable to write files, and then get locked out of your own $HOME. This has happened to me before, and it’s terrible and humiliating because you have to ask research computing to bail you out. By default, Singularity would use $HOME/.singularity as a cache, so let’s tell it to use somewhere else (that doesn’t have the space limitation!).


$ SINGULARITY_CACHEDIR=$SCRATCH/.singularity
$ export SINGULARITY_CACHEDIR

For a one time definition, you can prepend the variable to a command:

$ SINGULARITY_CACHEDIR=$SCRATCH/.singularity singularity pull docker://ubuntu

I find it easiest to put the first export lines in my $HOME/.bash_profile so that I don’t need to remember to do it. If you just added these lines to yours, just source $HOME/.bash_profile to make sure it’s active!

You also need to load the module for Singularity, of course, if it’s not already loaded or on your path:

$ module load singularity

# search for packages
# module spider singularity

3. Pull the Image

This is the last step! Let’s pull the image. It’s here on Docker Hub, and we pull as follows:


# One time definition of cache
$ SINGULARITY_CACHEDIR=/scratch/users/vsochat/.singularity singularity pull docker://vanessa/keras-r

# After you've exported
$ singularity pull docker://vanessa/keras-r
WARNING: pull for Docker Hub is not guaranteed to produce the
WARNING: same image on repeated pull. Use Singularity Registry
WARNING: (shub://) to pull exactly equivalent images.
Docker image path: index.docker.io/vanessa/keras-r:latest
Cache folder set to /scratch/users/vsochat/.singularity/docker
[13/13] |===================================| 100.0% 
Importing: base Singularity environment
...
WARNING: Building container as an unprivileged user. If you run this container as root
WARNING: it may be missing some functionality.
Building Singularity image...
Singularity container built: /scratch/users/vsochat/.singularity/keras-r.simg
Cleaning up...
Done. Container is at: /scratch/users/vsochat/.singularity/keras-r.simg

4. Test the Image

We can actually just run the image to get an interactive R session!


$ singularity run /scratch/users/vsochat/.singularity/keras-r.simg 

R version 3.5.0 (2018-04-23) -- "Joy in Playing"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library('keras')
> 

R Version 3.5 is called “Joy in Playing.” What exactly are you playing with, R? :) I digress! You can have the same interaction with shell, and then start R on your own:


$ singularity shell /scratch/users/vsochat/.singularity/keras-r.simg 
Singularity: Invoking an interactive shell within container...

Singularity keras-r.simg:~> which R
/usr/bin/R
Singularity keras-r.simg:~> R
...

For a more substantial test, the container has the additional packages kerasformula and ggplot2:


> require(kerasformula)
> require(ggplot2)

> out = kms(mpg ~ ., mtcars)   # use X11 to see epoch-by-epoch progress
> plot(out$history)

The above should run without error, and the plot produced is the following:

keras-r.png

Extend the Container

What is the intended use of your container? If it serves as a working environment, you probably can extend it most easily by installing packages that save to your $HOME on the cluster (where you have writable). If this is intended for a reproducible analysis, you can use the container for development but ultimately save all your changes to the final container for dissemination. We discuss both more in detail.

Use it as a Working Environment

While containers on the cluster are read only, one (potentially error prone) but nice feature is that you can maintain a library of packages (installed in your $HOME) for just your user, and since Singularity mounts your $HOME (where you have write) it will feel just like installing locally. You should be careful, however, because these packages are not being installed into your container! If you need a working environment to mess around, this is OK. But if you are creating what you hope to be a reproducible environment, you really need to go back to the container recipe and add the packages to be installed (along with your scripts). If you need help with this step, please reach out.

Use it as a Base for Development

A development container is the same as a working container, but the goals are different. Your aim is to develop a recipe (and container to share in a registry) that delivers a frozen version of your software. While this is somewhat possible given that you can install packages to your $HOME in a read only environment, it’s sometimes easier to develop with a writable image on your local machine. How to do this? You would create a container recipe that uses this keras-r container (or another of your choice) as a base, and add commands on to it. Here is how you would use it as a base in a Dockerfile:

FROM vanessa/r-keras

and in a Singularity Recipe


Bootstrap: docker
From: vanessa/r-keras

For each, you would start with this base in a Dockerfile or Singularity recipe on your local machine (text files with the above text) and then create writable images that you can shell into and test commands. Your workflow should look like this:

  1. Build the base image with writable (default with Docker)
  2. Shell into your base container
  3. Test commands, and write those in your recipe that you want to keep

And at the end of each iteration when you are happy with local testing, you can move the container onto your shared resource and test for the intended use case, which is likely running something at scale.

Singularity Example

Given the above text in a file called Singularity, to build a writable (sandbox) image, you can do:

$ sudo singularity build --sandbox sandbox Singularity

and then you would want to shell inside also with writable

$ sudo singularity shell --writable sandbox 

If you don’t have the Singularity file yet and want a quick development environment, you can do this:

$ sudo singularity build --sandbox sandbox docker://vanessa/r-keras

Notice that the only example is the source of the bootstrap, which is a docker identifier. And when you like a command, add it to the %post section of your Singularity file. When you are ready to give the entire thing a test run, build in squashfs (read only) format:

$ sudo singularity build my-keras.simg Singularity

Docker Example

My preference for development is to use Docker, and the reason is because the build layers get cached, and I don’t need to wait for long, complicated compile routines as I would with Singularitys. To build a Docker image, given a Dockerfile with the text above:

$ docker build -t vanessa/r-keras .

If you need to disable the cache (for example, a remote resource changes but the file doesn’t have changes and so the cache would be used) you can do this:

$ docker build --no-cache -t vanessa/r-keras .

And there you have it, a handy-dandy container for bringing around Keras, R, and other cool things! You can use as is, as a working environment that you customize, or as a base for your own work.

Native Usage

If you aren’t into using containers, you of course can get this working natively. We need Python, R, and a few libraries. In the example below, we will use LMOD on Sherlock to manage software modules. This basically means you can type “module load” to fuss around with the path. Without a container, you need to have the libraries and software on the host, and yes versions matter.

Step 1: Get an Interactive Node

In the examples below, we are first going to get an interactive node. The reason is because we should generally not run things on the login nodes! You must get a GPU node or you will get a bunch of weird errors.


# Grab an interactive node, you get an hour default!
$ srun -p gpu --gres gpu:1 --pty bash

Step 2: Load Tensorflow

Let’s pretend we have forgotten how to find tensorflow. We can use spider to see what is available:

[vsochat@sh-08-37 ~]$ module spider tensorflow

----------------------------------------------------------------------------
  py-tensorflow:
----------------------------------------------------------------------------
    Description:
      TensorFlow™ is an open source software library for numerical
      computation using data flow graphs.

     Versions:
        py-tensorflow/1.4.0_py27
        py-tensorflow/1.5.0_py27
        py-tensorflow/1.5.0_py36
        py-tensorflow/1.6.0_py27
        py-tensorflow/1.6.0_py36
        py-tensorflow/1.7.0_py27
        py-tensorflow/1.8.0_py27

You can also use module avail tensorflow for more detailed output.


[vsochat@sh-08-37 ~]$ module avail tensorflow

--- math -- numerical libraries, statistics, deep-learning, computer science ---
   py-tensorflow/1.4.0_py27 (g)      py-tensorflow/1.6.0_py36 (g)
   py-tensorflow/1.5.0_py27 (g)      py-tensorflow/1.7.0_py27 (g)
   py-tensorflow/1.5.0_py36 (g)      py-tensorflow/1.8.0_py27 (g)
   py-tensorflow/1.6.0_py27 (g,D)

  Where:
   D:  Default Module
   g:  GPU support

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching
any of the "keys".

We are interested in having gpu support, so it’s good to see that we have it. Note that in the above we can choose between python 2 OR 3, and if we just loaded py-tensorflow without a version string, the default we would get is python 2.7 with tensorflow 1.6.0. Let’s make the assumption that the sys admin had some wise rationale for this to be default (and the others are user requests?) and use the default.


$ module load py-tensorflow
The following have been reloaded with a version change:
  1) python/3.6.1 => python/2.7.13

It should show you that you are also loading python 2.7. This needs this version to work, so if you don’t see that, make sure you have a Python 2.7 somewhere! Peeking into the install base for py-tensorflow, we see that loading the module basically adds one of these to the path:


[vsochat@sh-08-37 ~]$ ls /share/software/user/open/py-tensorflow/
1.3.0_py27  1.4.0_py27  1.5.0_py27  1.6.0_py27  1.7.0_py27
1.3.0_py36  1.4.0_py36  1.5.0_py36  1.6.0_py36  1.8.0_py27

and here is how to check your python:


[vsochat@sh-101-58 ~]$ which python
/share/software/user/open/python/2.7.13/bin/python

Step 3: Install Keras API

The Keras API is a Python module, so we install it with pip. On Sherlock, we need the open ssl library loaded:

$ module load libressl/2.5.3

Install keras as a user:


$ pip install keras utils np_utils tensorflow --user

The --user flag will install the extra modules to my $USER home. If we don’t do this, then it would try to install to the system site-packages and you would get a permissions error.

Step 4: Install R Packages

We need to now install the wrapper for Keras in R, along with other supporting packages. But first! We need… additional libraries! These will be needed to run R.


$ module load openblas/0.2.19

And of course we need to load R

$ module load R/3.4.0
$ which R
/share/software/user/open/R/3.4.0/bin/R
$ R

Oh darn, looks like we only have verison 3.4.0, code name “You Stupid Darkness.” Darkness is good for sleeping, and it inspires insightful reflection. If you have issues with Darkness, you should complain to the sun. I’m pretty sure the sun would just engulf you in a fiery arm and go on with his day. Oup, I am digressing again…

Once in R, we can install keras, tensorflow, and reticulate. I’m adding the extra packages kerasformula and ggplot2 so we can do the same test as we did previously. To find the python on your path:


# Here is how I found the path
> system('which python')
/share/software/user/open/python/2.7.13/bin/python

Note the below is the same script that I use to generate the container!

install.packages('reticulate')
reticulate::use_python('/share/software/user/open/python/2.7.13/bin/python')
install.packages('devtools')
devtools::install_github('rstudio/keras')

require(tensorflow)
require(reticulate)
require(keras)

Sys.setenv(TENSORFLOW_PYTHON='/share/software/user/open/python/2.7.13/bin/python')
use_python('/share/software/user/open/python/2.7.13/bin/python')

py_discover_config('tensorflow')
py_discover_config('keras')
is_keras_available()

packages = c("kerasformula",
             "kerasR",
             "ggplot2", 
             "dplyr",
             "magrittr",
             "zeallot",
             "tfruns")

for (package in packages) {
    install.packages(package)
}

Debugging

If you didn’t get a GPU node, you’d see an error like this:


> reticulate:::import('keras')
Error in py_module_import(module, convert = convert) : 
  ImportError: cannot import name np_utils

Detailed traceback: 
  File "/home/users/vsochat/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/keras/__init__.py", line 3, in <module>
    from . import utils
  File "/home/users/vsochat/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/keras/utils/__init__.py", line 2, in <module>
    from . import np_utils

The real issue is that the tensorflow backend couldn’t import CUDA. You could verify this by trying to use tensorflow:


> library(tensorflow)
Error: package or namespace load failed for ‘tensorflow’:
 .onLoad failed in loadNamespace() for 'tensorflow', details:
  call: py_module_import(module, convert = convert)
  error: ImportError: Traceback (most recent call last):
  File "/share/software/user/open/py-tensorflow/1.6.0_py27/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/share/software/user/open/py-tensorflow/1.6.0_py27/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/share/software/user/open/py-tensorflow/1.6.0_py27/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

And this ladies and gents is why wrapping things in layers of other things is so dangerous! Here is the other issue you might hit, if you want to plot things:

In (function (display = "", width, height, pointsize, gamma, bg,  :
  unable to open connection to X11 display ''
> 

You might try going back and adding --x11 to your node request.

From personal experience, you can get different results depending on your internet connectivity, the mirror that you use, and even different system environment variables. Hence why (in some cases) it’s nice to use a container. Some other bugs you might run into:

  • not having a system library like openssl or curl installed and found
  • having a previous attempt lock file that needs to be removed
  • another package dependency missing

This part of R - installing and loading packages - is the most fragile step. Importantly, when you get an error message please read the entire text output. Most of the time, an error message will tell you exactly what the issue is. If you aren’t sure what it means, try a Google search. If you are still lost, then contact research computing support.

Step 5: Test it Out!

Activate your new packages in the session:


> require(tensorflow)
> require(reticulate)
> require(keras)
 

Let’s now test as we did before!

> require(kerasformula)
> require(ggplot2)

> out = kms(mpg ~ ., mtcars)   # use X11 to see epoch-by-epoch progress
> plot(out$history)

I’ll spare you the redundant picture! It looks the same.

The Next Time

The next time around, since you have everything installed, you mostly need to load things again. You can have a small script to do that. You would run this on an interactive node.

$ srun -p gpu --gres gpu:1 --x11 --pty bash

#!/bin/bash
module load py-tensorflow
module load R/3.4.0
module load openblas/0.2.19
exec R

# And then interact with keras

I hope that you can see the huge benefit of using a container here. With the container, you can go to a cluster anywhere that has Singularity installed, and be up and running just in the amount of time it takes to pull the container. The native method, although using modules, is fragile because of the huge number of dependencies. You very likely will log in at some future date, and it won’t reproduce, and you may not know why. A colleage on a different resource also can’t easily reproduce your software. As a strong example, just in deriving this native tutorial it took me over a few days to get working based on the notes from my colleague primarily because of subtle differences in modules and libraries that were loaded. Save your future self the time and figure out your software base once, put it in a container, and be done!