Tweeting Sentiment

I finally got my hands on some Twitter data from my collaborative partner and began the process of text mining it.  The creation of the Rapidminer model and then its subsequent execution took all of 10 minutes.  That’s the beauty of the Rapidminer system, you can build templates and have processes ready to go! Just add data!

But adding the data is usually the hardest and most time consuming part of text mining, especially getting the right data in the right format!  Since we’re working on a proof of concept model for now, my collaborative partner had to crawl Twitter, parse the tweets, and then hand classify 1,500 Twitter posts into Positive, Neutral, and Negative labels!  Whew!

Once I got the data I built a 10 fold cross validation model to process train and test the sentiment in Tweets for accuracy. Then I identified the most strongly correlated words to sentiment classification.  Our results are definitely promising, we achieved a near 80% classification accuracy and nailed all the correlated words.  There were some issues with missclassification of positive sentiment as negative and vice versa which we have to work on but overall this is a great start.

We now know how to fine tune the process/data, and hopefully squeeze out more accuracy between parameter optimization and better crawled data.

Now its back to civil engineering for a while, unless you guys want to hire me full time. :)

  • http://decisionomics.blogspot.com RT

    Sounds great. Is there any chance you can share some of the outputs with us?

  • Tom

    RT: I really can’t say a whole lot more than what I’ve posted, sorry :(

  • http://pequaswans.blogspot.com/ Ron McEwan

    Nice work Tom. Have you ever checked out TweetGrid,com? Lets you monitor multiple Tweets. I was thinking that I could put the RM results of my RSS Feeds into TweetGrid to monitor the updates. It would be great if TweetGrid could be read by RM. The next great trick is to put together a news feed with a price prediction routine to monitor the impact of a pending event on a stock. You could see if the stock moves before the news (insider info) or if the event (news) moves the stock.

  • http://www.neuralmarkettrends.com Tom

    @Ron: I think mining StockTwits might be interesting too, especially prior to earnings release and then seeing the direction of the stock after the announcement.

  • http://pequaswans.blogspot.com/ Ron McEwan

    Yes I agree StockTwits is a good feed. So Far StockTwits has been working out well with the RSS Feed Reader. Last night I was getting indications on the metals which worked out well today. Also got some good indications on FX markets recently. Thnxs

  • http://pequaswans.blogspot.com/ Ron McEwan

    Tom, for the Twitter Feed I use the Twitter Breaking News Feed. Saves me the distraction of getting updates on Lindsay Lohan in Rehab.
    http://twitter.com/statuses/user_timeline/6017542.rss

  • http://pequaswans.blogspot.com/ Ron McEwan

    If you goto this Yahoo Page you will see that all topics (left column) here have an RSS feed.

    http://finance.yahoo.com/news

    I put the Earnings feed into RM and instantly got back the information on Abiomed Inc (ABMD)

    http://feeds.finance.yahoo.com/rss/2.0/category-earnings?region=US&lang=en-US