Monthly Archives: November 2010

An Example of R and Rapidminer for Trading

Ingo over at the Rapid-I blog found this link from a Rapid-I forum member about using R and Rapidminer for Trading.  It’s a pretty wild process developed by Neural Concepts, and he goes into detail about the the win/loss ratios for the system!  An utterly fascinating read and a job well done indeed!

It goes to show you that the application Rapidminer, and the growing plugin list, makes this software very flexible indeed for ANY application you need!

Can Twitter Sentiment Analysis Predict the Stock Market

Ugly over at Uglychart.com just posted a link about research that sentiment mined over 10 million tweets from 2008 and was able to predict daily market behavior to an accuracy of 87.6% .  While the post is vastly interesting from a text & sentiment mining perspective using social media, and the application of it to the stock market, I’m not 100% convinced its very viable.

Why? Well I tend to echo some of the comments left by readers at the bottom of the original post.  For example, once this “edge” is discovered by general market participants, it tends to get discounted and the edge goes away.  So what we read here today is probably already discounted by the market and is just routine “business as usual.”

Now, I certainly don’t mean we should abandon text & sentiment mining for the markets but rather we should continue to use these tools to develop our own secret edges and evolve them as the market changes.  Follow the advice of poker players and underarm deodorant manufacturers,  never show your hand and never let them see you sweat.

Rapidminer Text Mining Videos

There’s a whole new set of text mining tutorial videos currently being produced, and they’re not by me!  Neil over at Vancouver Data Blog is rolling out 5 brand new tutorial videos over the course of the week on how to use Rapidminer for text mining.  His first video on how to load text in Rapidminer is a great way for novice text miners to get started and learn how to wield unstructured data.

I’m definately checking out his posts this week, especially the ones toward Friday because they intersect with what I’m doing with my Twitter project, and you should too!

Tweeting Sentiment

I finally got my hands on some Twitter data from my collaborative partner and began the process of text mining it.  The creation of the Rapidminer model and then its subsequent execution took all of 10 minutes.  That’s the beauty of the Rapidminer system, you can build templates and have processes ready to go! Just add data!

But adding the data is usually the hardest and most time consuming part of text mining, especially getting the right data in the right format!  Since we’re working on a proof of concept model for now, my collaborative partner had to crawl Twitter, parse the tweets, and then hand classify 1,500 Twitter posts into Positive, Neutral, and Negative labels!  Whew!

Once I got the data I built a 10 fold cross validation model to process train and test the sentiment in Tweets for accuracy. Then I identified the most strongly correlated words to sentiment classification.  Our results are definitely promising, we achieved a near 80% classification accuracy and nailed all the correlated words.  There were some issues with missclassification of positive sentiment as negative and vice versa which we have to work on but overall this is a great start.

We now know how to fine tune the process/data, and hopefully squeeze out more accuracy between parameter optimization and better crawled data.

Now its back to civil engineering for a while, unless you guys want to hire me full time. :)