Building a Machine Learning Framework from scratch

Building a Machine Learning Framework from Scratch

Great article by Florian Cäsar on how his team developed a new machine learning framework. From scratch. In 491 steps!

He summarizes the entire process up in this great quote:

From images, text files, or your cat videos, bits are fed to the data pipeline that transforms them into usable data chunks and in turn to data sets,
which are then fed in small pieces to a trainer that manages all the training and passes it right on to the underlying neural network,
which consists of many underlying neural network layers connected through an arbitrarily linear or funky architecture,
which consist of many underlying neurons that form the smallest computational unit and are nudged in the right direction according to the trainer’s optimiser,
which takes the network and the transient training data in the shape of layer buffers, marks the parameters it can improve, runs every layer, and calculates a “how well did we do” score based on the calculated and correct answers from the supplied small pieces of the given dataset according to the optimiser’s settings, 
which computes the gradient of every parameter with respect to the score and then nudges the individual neurons correspondingly,
which then is run again and again until the optimiser reports results that are good enough as set in a rich criteria and hook system,
which is based on global and local nested parameter-identifier-registries that contain the shared parameters and distribute them safely to all workers
which are the actual workhorses of the training process that do as their operator says using individual and separate mathematical backends, 
which use the layer-defined placeholder computation graphs and put in the raw data and then execute it on their computational backend,
which are all also managed by the operator that distributes the worker’s work as needed and configured and also functions as a coordinator to the owning trainer,
which connects the network, the optimiser, the operator, the initialisers, 
which tell the trainer with which distribution to initialise what parameters, which work similar to hooks that act as a bridge between them all and communicate with external things using the Sigma environment,
which is the container and laid-back manager to everything that also supplies and runs these external things called monitors, 
which can be truly anything that makes us of the training data and
which finally display the learned funny cat image
… from the hooks from the workers from their operator from its assigned network from its dozens of layers from its millions of individual neurons derived from some data records from data chunks from data sets from data extractors.

In other words, they created a new software called Sigma.Core.


Sigma.core appears to be a Windows based machine learning software that uses deep learning. It’s feature list is small but impressive:

  • Uses different deep learning layers (i.e. dropouts, recurrent, etc)
  • Uses both linear and nonlinear networks
  • Four (4) different optimizers
  • Has hooks for storing, restoring checkpoints, CPU and runtime metrics
  • Runs on multi and single CPU’s, CUDA GPU
  • Native Windows GUI
  • Functional automatic differentiation

How long did it take?

According to Florian it took about 700 hours of intro/research, 2000 hours of development, and 2 souls sold to the devil. That’s over 1 full year of work for one person, assuming a standard 40 hour work week!

Keras and NLTK

Lately I’ve been doing a lot more Python hacking, especially around text mining and using the deep learning library Keras and NLTK. Normally I’d do most of my work in RapidMiner but I wanted to do some grunt work and learn something along the way.  It was really about educating myself on Recurrent Neural Networks (RNN) and doing it the hard way I guess.

Keras and NLTK

As usually I went to google to do some sleuthing about how to text mine using an LSTM implementation of Keras and boy did I find some goodies.

The best tutorials are easy to understand and follow along. My introduction to Deep Learning with Keras was via Jason’s excellent tutorial called Text Generation with LSTM Recurrent Neural Networks in Python with Keras.

Jason took a every easy to bite approach to implementing Keras to read in the Alice In Wonderland book character by character and then try to generate some text in the ‘style’ of what was written before. It was a great Proof of Concept but fraught with some strange results. He acknowledges that and offers some additional guidance at the end of the tutorial, mainly removing punctuation and more training epochs.

The text processing is one thing but the model optimization is another. Since I have a crappy laptop I can just forget about optimizing a Keras script, so I went the text process route and used NLTK.

Now that I’ve been around the text mining/processing block a bunch of times, the NLTK python library makes more sense in this application. I much prefer using the RapidMiner Text Processing implementation for 90% of what I do with text but every so often you need something special and atypical.

Initial Results

The first results were terrible as my tweet can attest too!

So I added a short function to Jason’s script that preprocesses a new file loaded with haikus. I removed all punctuation and stop words with the express goal of generating haiku.

While this script was learning I started to dig around the Internet for some other interesting and related posts on LSTM’s, NLTK and text generation until I found Click-O-Tron.  That cracked me up. Leave it to us humans to take some cool piece of technology and implement it for lulz.


I have grandiose dreams of using this script so I would need to put it in production one day. This is where everything got to be a pain in the ass. My first thought was to run the training on  a smaller machine and then use the trained weights to autogenerate new haikus in a separate scripts. This is not an atypical type of implementation. Right now I don’t care if this will take days to train.

While Python is great in many ways, dealing with libraries on one machine might be different on another machine and hardware. Especially when dealing with GPU’s and stuff like that.  It’s gets tricky and annoying considering I work on many different workstations these days. I have a crappy little ACER laptop that I use to cron python scripts for my Twitter related work, which also happens to be an AMD processor.

I do most of my ‘hacking’ on larger laptop that happens to have an Intel processor. To transfer my scripts from one machine to another I have to always make sure that every single Python package is installed on each machine. PITA!

Despite these annoyances, I ended up learning A LOT about Deep Learning architecture, their application, and short comings. In the end, it’s another tool in a Data Science toolkit, just don’t expect it to be a miracle savior.

Additional reading list


The Python Script