Trends come and go in the blink of an eye these days. Usually some market disrupter comes along and changes the game with a shiny new thing. Sometimes it’s a service or product that gets shutdown. That’s exactly what happened here. This is the story of Google Reader and the rise of Social Sharing.Continue reading “RSS vs Sharing”
Long time readers know that I’ve wrestled with different CMS engines on this site. I started with WordPress, then switched to Expression Engine, and then back to WordPress.Continue reading “Blogging platform of choice”
I’ve been traveling a lot lately and managed to catch up on a bit of reading when I’m crusing at 30,000 feet. On my nook right now is a fascinating book that all text miners should at least browse in a book store. It’s called “The Secret Life of Pronouns,” by James Pennebaker.
The premise of the book is that your social status, sex, personality, and secret intentions can be determined by analyzing pronouns (I, you, they), artciles (a, an, the), and few other functional words. In the beginning of his research, James used the Liguisitic Inquiry and Word Count (LIWC) program but appears to have modified it with proprietary word dictionaries.
From the surface, LIWC looks similar to the word frequency routine that Rapidminer does in the Process Documents operator, but they went further and added a bit more “intelligence” to the analysis. What they did was roll out a fun servce called Analyze Words. You just enter your Twitter handle, click the button, and it gives you a snapshot into your tweet sentiment.
So how does this work? I suspect that James and team use their dictionaries to categorize incoming text documents and test against them and for the author’s sex, social status, personality, and sentiment. I’m sure that a lot of “up front” and hard work was done to build these dictionaries. A lot of “up front” work is the norm with text mining and if you try using shortcuts, you’ll likely get crappy models.
I think a model like his can be done quite easily in RapidMiner, especially if you build a good crawling and sentiment system to test against. All that it requires is a bit of thought and the will to do it. Most likely this is written in Python but it would be fun to replicate this. Isn’t the data-driven world we live in, cool?