As a follow-up to my Primer On Universal Function Approximation with Deep Learning , I’ve created a project on Github that provides a working example of building, training, and evaluating a neural network. Included are helper functions in Lua that I wrote to simplify creating the data and using some functional programming techniques.
The basic workflow for the example is this:
Create/acquire a training set;
Analyze the data for traits, distributions, noise, etc.;
Design a deep learning architecture including the layers and activation functions. Also make sure you understand the type of problem you are trying to solve);
Choose hyper parameters, such as cost function, optimizer, and learning rate;
Train the model;
Evaluate in-sample and out-of-sample performance.
My personal preference is to limit the use of a deep learning framework to building and training models. To construct the datasets and analyze performance, it’s easier to use R (YMMV of course). What’s nice about this approach is that if you primarily work in Python or R, then you can continue to use the tools you’re most familiar with. It also means that it’s easy to swap out one deep learning framework with another without having to start over. These frameworks are also a bit of a bear to setup (I’m looking at you, TensorFlow), particularly if you want to leverage GPUs. It’s also convenient to use a Docker image for this purpose to isolate the effects of a specialized configuration and make it repeatable if you want to work on *gasp* a second computer.
Which Deep Learning Framework?
Having some experience with TensorFlow, Theano, and Torch, I find Torch to have the friendliest high-level semantics. Theano and TensorFlow are much more low-level, which is not as well suited to practitioners or applied researchers. That means it’s a little harder to get started. The trade-off with Torch is that you have to learn Lua, which is a simple scripting language but also has some awkward paradigms (I’ve never been a fan of the Prototype object model).
On the other hand, Theano and TensorFlow are built on Python, so most people will be familiar with the language. However, my time could be spent better if I didn’t have to write my own mini-batch algorithm. As an alternative, Keras provides a semantically rich high-level interface that works with both Theano and TensorFlow. I will be adding a corresponding function approximation example in the deep_learning_ex project to make it easier to compare. At that point, it will be easier to compare compute performance, as well as how close the optimizers are to each other.
Deep learning can be painfully slow
In terms of TensorFlow, unless you plan on working for Google, I wouldn’t recommend using it. The fact that it requires Google’s proprietary Bazel build system means it’s DOA for me. When I want to work with deep learning, I really don’t have the patience to wait for a 1.1 GB download of just the build system . I mean, I only have Time Warner Cable for crissakes. Others have reported that even using pre-built models, like SyntaxNet are slow, so unless you have the compute power and storage capacity of Google’s data centers along with the bandwidth of Google Fiber, you’re better off watching YouTube.
Learning Deep Learning Frameworks
Learning Torch can be split into two tasks: learning Lua, and then understanding the Torch framework, specifically the nn package. Most people will find that learning Lua will take the majority of the time, as nn is nicely organized and easy to use.
If you already are comfortable with programming languages, then this 15 minute tutorial is good. Alternatively, this other 15 minute tutorial is a bit more terse but rather comprehensive. This will cover the basics. Beyond that, you need to understand how to work with data, which is less well covered. The simplecsv module can simplify I/O.
The actual data format that the optimzer needs is a table object with an attached size method. Each element of this table is itself a table with two elements: input and corresponding output. So this can be considered a row-major matrix representation of the data. To use the provided StochasticGradient optimizer, the data must be constructed this way, as shown in ex_fun_approx.lua . It is up to you to reserve some data for testing.
From a practical perspective, you don’t need to know much about Torch itself. It’s probably more efficient to familiarize yourself with the nn package first. I spend most of my time in this documentation. At some later point, it might be worthwhile learning how Torch itself works, in which case their github repo is flush with documentation and examples. I haven’t needed to look elsewhere.
If Theano is like Torch, then Keras is like the nn package. Unless you need to descend into the bits, it’s probably best to stay high-level. Unlike nn , there are alternatives to Keras for Theano, which I won’t cover. Like Torch, Keras comes pre-installed in the Docker image provided in the deep_learning_ex repository. The best way to get started is to read the Keras documentation , which includes a working example of a simple neural network.
As with nn , the trick is understanding the framework’s interface, particularly around what expectations it has for the data. Keras essentially expects a 4-tuple of (input training, input testing, output training, output testing). Their built-in datasets all return data organized like this (actually two pairs representing input and output).
Deep learning doesn’t need to be hard to learn. By following the prescribed workflow, using the provided Docker image , and streamlining your learning of deep learning frameworks to the essentials, you can get up to speed quickly.
Have any resources you’d like to share? Add them in the comments!