Keras and NLTK

Lately I’ve been doing a lot more Python hacking, especially around text mining and using the deep learning library Keras and NLTK. Normally I’d do most of my work in RapidMiner but I wanted to do some grunt work and learn something along the way.  It was really about educating myself on Recurrent Neural Networks (RNN) and doing it the hard way I guess.

Keras and NLTK

As usually I went to google to do some sleuthing about how to text mine using an LSTM implementation of Keras and boy did I find some goodies.

The best tutorials are easy to understand and follow along. My introduction to Deep Learning with Keras was via Jason’s excellent tutorial called Text Generation with LSTM Recurrent Neural Networks in Python with Keras.

Jason took a every easy to bite approach to implementing Keras to read in the Alice In Wonderland book character by character and then try to generate some text in the ‘style’ of what was written before. It was a great Proof of Concept but fraught with some strange results. He acknowledges that and offers some additional guidance at the end of the tutorial, mainly removing punctuation and more training epochs.

The text processing is one thing but the model optimization is another. Since I have a crappy laptop I can just forget about optimizing a Keras script, so I went the text process route and used NLTK.

Now that I’ve been around the text mining/processing block a bunch of times, the NLTK python library makes more sense in this application. I much prefer using the RapidMiner Text Processing implementation for 90% of what I do with text but every so often you need something special and atypical.

Initial Results

The first results were terrible as my tweet can attest too!

So I added a short function to Jason’s script that preprocesses a new file loaded with haikus. I removed all punctuation and stop words with the express goal of generating haiku.

While this script was learning I started to dig around the Internet for some other interesting and related posts on LSTM’s, NLTK and text generation until I found Click-O-Tron.  That cracked me up. Leave it to us humans to take some cool piece of technology and implement it for lulz.

Implementation

I have grandiose dreams of using this script so I would need to put it in production one day. This is where everything got to be a pain in the ass. My first thought was to run the training on  a smaller machine and then use the trained weights to autogenerate new haikus in a separate scripts. This is not an atypical type of implementation. Right now I don’t care if this will take days to train.

While Python is great in many ways, dealing with libraries on one machine might be different on another machine and hardware. Especially when dealing with GPU’s and stuff like that.  It’s gets tricky and annoying considering I work on many different workstations these days. I have a crappy little ACER laptop that I use to cron python scripts for my Twitter related work, which also happens to be an AMD processor.

I do most of my ‘hacking’ on larger laptop that happens to have an Intel processor. To transfer my scripts from one machine to another I have to always make sure that every single Python package is installed on each machine. PITA!

Despite these annoyances, I ended up learning A LOT about Deep Learning architecture, their application, and short comings. In the end, it’s another tool in a Data Science toolkit, just don’t expect it to be a miracle savior.

Additional reading list

  • http://h6o6.com/2013/03/using-python-and-the-nltk-to-find-haikus-in-the-public-twitter-stream/
  • https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

The Python Script

 

 

 

Best Adsense month so far

Last month I made $6.51 from Adsense revenue, the best month so far since I started this experiment. Although I didn’t hit the magic “1 roll of Portra 400” mark, it came pretty close.

november-2016-adsense

I credit a lot of the new Adsense revenue to switching to a WordPress theme and using the Adsense plugin by Google. Everything appears to be better optimized, but I’m sure I could do more.

Python Script Experiments

I also started experimenting with some modified Python scripts to automate some of my Twitter tasks. My automation bot R2D2 spends time each morning scanning popular #ai and #machinelearning posts and then retweets them.

Since I’ve been doing that, I’ve noticed a deluge of Twitter users putting me on a list. I suspect that’s some sort of Bot scanning retweets and then auto populating me on a list. I will monitor this as a I go along.

I’ve also noticed a bump in new followers but also a strange unfollowing within 24 hours. I think there is some sort of automated script running that autofollows me in the hope that I’ll follow them back and then it unfollows me. I’ve noticed the same handful of Tweeple follow me and then follow me again. So they must be unfollowing me between the two follows.

 

A Twitter Bot in Groovy Script – Part 1

For the last two years I’ve been working with the Twython package to build my (R2D2) Twitter Bot. It’s been successful and I’ve learned how to hack Python better than I ever did before. Now I’m setting my sites on building a Twitter Bot in Groovy Script.

Why Groovy Script? Groovy is a lot like Python in the sense that it’s a dynamic programming language. It can be used everywhere, easily, because of the JVM. Plus, I always wanted to teach myself Java so I thought this is a good way to get started.

Of course you can argue that I should just jump into Java but this feels more fun right now. I always learn better if I have a fun little project to work on, so I want to rebuild my R2D2 Bot in Groovy.

Hello World!

Everyone starts with Hello World” and I did too. The simple program looks different and a bit more complex than python.

In Python it’s:

In Groovy Script it’s way more complex. I have to create an R2D2 class, tell the class that it will be using strings, and then print my hello world-ish” message.

 

Posting a Tweet

After I finished that introductory task, I then started to scour the Internet for a Twython like Java package. I discovered Twitter4j.

At first glance it reminded me of the Twython package but it seemed a lot harder – at first. It wasn’t until I started poking through the example code and questions on Stackflow that I found a script I could modify. Note: This is not my script 100%, I’ll give attribution to the author when I figure out where I got it from again. The closest post I got is this one.

The first thing I needed to do was download the Twitter4j package and I discovered a handy way to do just that. I just added

 

to the top of my script and everytime it runs it will download the twitter4j-core JAR. Probably not a good thing to do everytime but I’m just learned how to do this. In the future I’ll make sure that it’s already downloaded and properly referenced in the script.

Next I had to import the methods I was going to use. The methods are .Status,” .Twitter,”, .TwitterFactory,” etc. This is just the same as it is in Python when I want to import only select modules into the script. Note: a key one is the .ConfigurationBuilder” which is what you need to do the Twitter OAuth stuff.

 

The original author of this script then creates a class called TweetMain. He tells the class that he’s going use some strings and assign them to the ConfigurationBuilder method. Those strings happen to be ConsumerKey, ConsumerSecret, OAuthAccessToken, and OAuthAccessSecret, also known as your Twitter keys. All those keys are needed to auto post to Twitter and you can get this info from apps.twitter.com.

What happens next is opening a connection to Twitter by using the TwitterFactory method and passing the keys to Twitter. Once the instance is established then you can use the .Status” method to post your tweet and then print out to your console that the status was successfully updated.

A Twitter Bot in Groovy Script

The final script looks like this:

 

You’ll have to replace the XXXXs” with your own Twitter keys but this test script works.

There you have it, a simple Twitter Bot in Groovy Script. Of course, there is more work to do.  The next task is to have Groovy open a text file, randomly parse a line and post that as a Tweet. That’ll be Part 2 when I figure out how to do it.