September 22, 2010

Text Mining Annual Reports


I’m playing around with Rapidminer’s powerful text mining tools to dig through annual reports this evening and I’m making progress.  Rapidminer can text mine all sorts of formats but the operators are still a bit tough to use if you don’t know what you’re doing, like me!  Still, I did pick up a thing or two at RCOMM and I’m putting that to good use.

For tonight I decided to mine through the annual reports of $CSCO, $XOM, $INTC, $AMD, and $BP.  Granted, these stocks are in three different industry groups but I’m just poking around to see how they use buzz words like sustainability” and greenhouse.” It’s all rather fun and silly, but wait till I post about my Twitter mining experiment.  LOL.

“Sustainability” buzzword

(Note: AMD never used it but BP did the most)

“Greenhouse” buzzword

(Note: AMD never used it but BP did the most)

Don't forget to sign up for our monthly newsletter on Data Science and RapidMiner here!

tutorials RapidMiner

Previous post
The Whirlwind that was RCOMM Part 2 Wow, Well the jet lag finally caught up to me so I apologize for this late post on RCOMM. Thursday morning was kicked off by yours truly, and I
Next post
Using the SVM RBF Kernel Wow, I’m happy to announce that today is the first of a two part guest post series. Today’s guest post is by Marin Matijas, who gave a