|||

The Secret Life of Pronouns

I’ve been traveling a lot lately and managed to catch up on a bit of reading when I’m crusing at 30,000 feet. On my nook right now is a fascinating book that all text miners should at least browse in a book store. It’s called The Secret Life of Pronouns,” by  James Pennebaker.

The premise of the book is that your social status, sex, personality, and secret intentions can be determined by analyzing pronouns (I, you, they), artciles (a, an, the), and few other functional words. In the beginning of his research, James used the Liguisitic Inquiry and Word Count (LIWC) program but appears to have modified it with proprietary word dictionaries.

From the surface, LIWC looks similar to the word frequency routine that Rapidminer does in the Process Documents operator, but they went further and added a bit more intelligence” to the analysis. What they did was roll out a fun servce called Analyze Words. You just enter your Twitter handle, click the button, and it gives you a snapshot into your tweet sentiment.

So how does this work?  I suspect that James and team use their dictionaries to categorize incoming text documents and test against them and for the author’s sex, social status, personality, and sentiment. I’m sure that a lot of up front” and hard work was done to build these dictionaries.  A lot of up front” work is the norm with text mining and if you try using short cuts, you’ll likely get crappy models.

I think a model like his can be done quite easily in Rapidminer, especially if you build a good crawling and sentiment system to test against. All that it requires is a bit of thought and the will to do it.  Isn’t the data driven world we live in, cool?

Up next Using SVM Kernels for Time Series Analysis I wanted to share two research papers that are invaluable to anyone trying to use Support Vector Machines (SVM) for modeling the stock market.  One Rolling Performance of SPY, GLD, and SLV It’s been a while since I fired up R and charted out the rolling performance of SPY, GLD, and SLV.  Here it is. You’ll note that the SPY has
Latest posts Revisiting GOOG, GE, NE, IYR from 2007 The Ye Old Blog List Motorola: Then and Now EWM Redux Testing for mean reversion with Python & developing simple VIX system - Talaikis unsorted - Tadas Talaikis Blog Steps to calculate centroids in cluster using K-means clustering algorithm - Data Science Central Basics of Statistical Mean Reversion Testing - QuantStart Algorithmic trading in less than 100 lines of Python code - O’Reilly Media Interpreting Machine Learning Models Microsoft the AI Powerhouse Investing in the S&P500 still beats AI Trading Microsoft makes a push to simplify machine learning | TechCrunch 10 Great Articles On Python Development — Hacker Noon Introduction to Keras Democratising Machine learning with H2O — Towards Data Science Getting started with Python datatable | Kaggle Phone Addiction Machine Learning Making Pesto Tastier 5 Dangerous Things You Should Let Your Kids Do The Pyschology of Writing Investing in 2019 and beyond TensorFlow and High Level APIs Driving Marketing Performance with H2O Driverless AI Machine Learning and Data Munging in H2O Driverless AI with datatable Making AI Happen Without Getting Fired Latest Musings from a Traveling Sales Engineer The Night before H2O World 2019 Why Forex Trading is Frustrating Functional Programming in Python Automatic Feature Engineering with Driverless AI Ray Dalio's Pure Alpha Fund