How StockTwits Uses Machine Learning

Fascinating behind the scenes interview of StockTwit’s Senior Data Scientist Garrett Hoffman. He shares great tidbits on how StockTwits uses machine learning for sentiment analysis. I’ve summarized the highlights below:

  • Idea generation is a huge barrier for active trading
  • Next gen of traders uses social media to make decisions
  • Garrett solves data problems and builds features for the StockTwits platform
  • This includes: production data science, product analytics, and insights research
  • Understanding social dynamics makes for a better user experience
  • Focus is to understand social dynamics of StockTwits (ST) community
  • Focuses on what’s happening inside the ST community
  • ST’s market sentiment model helps users with decision making
  • Users ‘tag’ content for bullish or bearish classes
  • Only 20 to 30% of content is tagged
  • Using ST’s market sentiment model increases coverage to 100%
  • For Data Science work, Python Stack is used
  • Use: Numpy, SciPy, Pandas, Scikit-Learn
  • Jupyter Notebooks for research and prototyping
  • Flask for API deployment
  • For Deep Learning, uses Tensorflow with AWS EC2 instances
  • Can spin up GPU’s as needed
  • Deep Learning methods used are Recurrent Neural Nets, Word2Vec, and Autoencoders
  • Stays abreast of new machine learning techniques from blogs, conferences and Twitter
  • Follows Twitter accounts from Google, Spotify, Apple, and small tech companies
  • One area ST wants to improve on is DevOps around Data Science
  • Bridge the gap between research/prototype phase and embedding it into tech stack for deployment
  • Misconception that complex solutions are best
  • Complexity ONLY ok if it leads to deeper insight
  • Simple solutions are best
  • Future long-term ideas: use AI around natural language

Keras and NLTK

Lately I’ve been doing a lot more Python hacking, especially around text mining and using the deep learning library Keras and NLTK. Normally I’d do most of my work in RapidMiner but I wanted to do some grunt work and learn something along the way.  It was really about educating myself on Recurrent Neural Networks (RNN) and doing it the hard way I guess.

Keras and NLTK

As usually I went to google to do some sleuthing about how to text mine using an LSTM implementation of Keras and boy did I find some goodies.

The best tutorials are easy to understand and follow along. My introduction to Deep Learning with Keras was via Jason’s excellent tutorial called Text Generation with LSTM Recurrent Neural Networks in Python with Keras.

Jason took a every easy to bite approach to implementing Keras to read in the Alice In Wonderland book character by character and then try to generate some text in the ‘style’ of what was written before. It was a great Proof of Concept but fraught with some strange results. He acknowledges that and offers some additional guidance at the end of the tutorial, mainly removing punctuation and more training epochs.

The text processing is one thing but the model optimization is another. Since I have a crappy laptop I can just forget about optimizing a Keras script, so I went the text process route and used NLTK.

Now that I’ve been around the text mining/processing block a bunch of times, the NLTK python library makes more sense in this application. I much prefer using the RapidMiner Text Processing implementation for 90% of what I do with text but every so often you need something special and atypical.

Initial Results

The first results were terrible as my tweet can attest too!

So I added a short function to Jason’s script that preprocesses a new file loaded with haikus. I removed all punctuation and stop words with the express goal of generating haiku.

While this script was learning I started to dig around the Internet for some other interesting and related posts on LSTM’s, NLTK and text generation until I found Click-O-Tron.  That cracked me up. Leave it to us humans to take some cool piece of technology and implement it for lulz.


I have grandiose dreams of using this script so I would need to put it in production one day. This is where everything got to be a pain in the ass. My first thought was to run the training on  a smaller machine and then use the trained weights to autogenerate new haikus in a separate scripts. This is not an atypical type of implementation. Right now I don’t care if this will take days to train.

While Python is great in many ways, dealing with libraries on one machine might be different on another machine and hardware. Especially when dealing with GPU’s and stuff like that.  It’s gets tricky and annoying considering I work on many different workstations these days. I have a crappy little ACER laptop that I use to cron python scripts for my Twitter related work, which also happens to be an AMD processor.

I do most of my ‘hacking’ on larger laptop that happens to have an Intel processor. To transfer my scripts from one machine to another I have to always make sure that every single Python package is installed on each machine. PITA!

Despite these annoyances, I ended up learning A LOT about Deep Learning architecture, their application, and short comings. In the end, it’s another tool in a Data Science toolkit, just don’t expect it to be a miracle savior.

Additional reading list


The Python Script




Best Adsense month so far

Last month I made $6.51 from Adsense revenue, the best month so far since I started this experiment. Although I didn’t hit the magic “1 roll of Portra 400” mark, it came pretty close.


I credit a lot of the new Adsense revenue to switching to a WordPress theme and using the Adsense plugin by Google. Everything appears to be better optimized, but I’m sure I could do more.

Python Script Experiments

I also started experimenting with some modified Python scripts to automate some of my Twitter tasks. My automation bot R2D2 spends time each morning scanning popular #ai and #machinelearning posts and then retweets them.

Since I’ve been doing that, I’ve noticed a deluge of Twitter users putting me on a list. I suspect that’s some sort of Bot scanning retweets and then auto populating me on a list. I will monitor this as a I go along.

I’ve also noticed a bump in new followers but also a strange unfollowing within 24 hours. I think there is some sort of automated script running that autofollows me in the hope that I’ll follow them back and then it unfollows me. I’ve noticed the same handful of Tweeple follow me and then follow me again. So they must be unfollowing me between the two follows.