Looks like lots of volume activity around the 1300 level for the S&P500. That’s the line in the sand, for now.
Now, if only I could figure out how to make the dates appear correctly in the x-axis…
I’ve been traveling a lot lately and managed to catch up on a bit of reading when I’m crusing at 30,000 feet. On my nook right now is a fascinating book that all text miners should at least browse in a book store. It’s called “The Secret Life of Pronouns,” by James Pennebaker.
The premise of the book is that your social status, sex, personality, and secret intentions can be determined by analyzing pronouns (I, you, they), artciles (a, an, the), and few other functional words. In the beginning of his research, James used the Liguisitic Inquiry and Word Count (LIWC) program but appears to have modified it with proprietary word dictionaries.
From the surface, LIWC looks similar to the word frequency routine that Rapidminer does in the Process Documents operator, but they went further and added a bit more “intelligence” to the analysis. What they did was roll out a fun servce called Analyze Words. You just enter your Twitter handle, click the button, and it gives you a snapshot into your tweet sentiment.
So how does this work? I suspect that James and team use their dictionaries to categorize incoming text documents and test against them and for the author’s sex, social status, personality, and sentiment. I’m sure that a lot of “up front” and hard work was done to build these dictionaries. A lot of “up front” work is the norm with text mining and if you try using short cuts, you’ll likely get crappy models.
I think a model like his can be done quite easily in Rapidminer, especially if you build a good crawling and sentiment system to test against. All that it requires is a bit of thought and the will to do it. Isn’t the data driven world we live in, cool?
I think I finally got all the hacker code cleared from this site. What a terrible, time consuming mess!
The next steps is to get Google to check if I’m still hosting any malware.
Thanks for your patience.
Hi Readers,
It’s been a while since my last post, but life has been ridicuously busy for me. I have a mountain of questions and comments to answer from readers and I apologize that its taking so long. On top of all that, I still need to start writing my chapter in the Rapidminer Book!
I ask for your patience as I work this all out in the coming month. Thanks.
This was an interesting and funny Colbert Report about how Target “knows” a lot about its customers. Of course we know this as predictive analytics.
What I found hilarious was the father who complained that his daughter was getting “baby coupons” by mistake, when in fact her spending patterns at Target revealed she was pregnant. Enjoy!
Link to the video if doesn’t show up for you: http://www.colbertnation.com/the-colbert-report-videos/408981/february-22-2012/the-word—surrender-to-a-buyer-power
I wanted to share two research papers that are invaluable to anyone trying to use Support Vector Machines (SVM) for modeling the stock market. One written by an author well known to the Rapid-I team, and another by Korean researcher. I’ve used both of these papers as blueprints for some of my past stock market analysis processes.
The first one is by Kyoung-jae Kim and titled “Financial time series forecasting using support vector machines.
The second is by Stefan Ruping (forgive the missing umlaut) and titled “SVM Kernels for Time Series Analysis.”
I often use the Multiply operator to make copies of my data set and feed it into different learners. I do this because sometimes I don’t know if a Neural Net operator, or a SVM operator, give me better performance. Once I know which operator performs my task better, I then use the parameter optimization process to see if I can squeeze more accuracy out it.
The sample process below uses the Iris data set, just switch it out with your data set and enjoy.
This is the sample Rapidminer process I used in Video #14. Just download the text file and import into into RM using the import process function. Please note, you will need to create the Excel spreadsheet yourself, as I show you in the video. Just save the Excel to a 2003 format and you’re done.
Enjoy!
Below is a simple parameter optimization process in Rapidminer using the Iris data set. Download the TXT file and import it into Rapidminer. Of course, you may use whatever data set you want and switch out the learner. Make sure to update the parameter optimization operator parameters. :)
Expression Engine = Terrible
Textpattern = Terrible
Never again will I be switching from WordPress.
I came across a fantastic R script from blogger Milk Trader. Â It’s about generating something called violin plots of volatility for the S&P500 index and the VIX, which he got from this CBOE paper. Â I took that script a bit further and added in one of my current trend positions, $ARLP, just for fun.
The plot is essentially a “combination of a box plot and a kernel density plot” and shows us the absolute value of volatility (negative returns are represented as positive) on the y axis.
A very simple R code (you can get it on his site) with a great visual impact. Â Great job Milk Trader!
Don’t miss my posts about Rapidminer and R! Sign up for my feed now! It’s easy to do and I’d be eternally grateful!