16
Oct
2007
Posted by Tom as Neural Nets
Sarah left a comment about modeling the microstructure of financial time series data. I think she means “high frequency data” which is what all the tick and sales data is commonly referred too. She specifically asked:
I’ve never modeled this type of data because I don’t have access to it so, unfortunately, I don’t have plots or figures to share with her. However, I did think about how to set up this type of experiment and what the challenges might be.
I would think the hardest part of modeling this data would be to figure out what are the important patterns you want Rapidminer to look for. My guess would be that you want to know the price outcome when similar patterns of large block buying/selling occurs before the stock rises or falls rapidly. It make sense right? If Goldman Sachs starts buying 10,000 shares of XYZ, other traders will get jump on.
The biggest challenge would be to find those large blocks (possibly through a filter using Excel) and tag those records with a nominal or numerical label. Of course this would present itself as an insane task if you had thousands of records of data for hundred’s of stocks over a month’s time! To overcome this, I would suggest to mine your data at a higher level first, such as identifying large volume and large moving stocks, then tag them with a label (UP, DOWN, INTERESTING CANDIDATE, etc) and then select those for your final analysis.
This problem is nothing more than trying to find a needle in a haystack but has its merits, after all a lot of traders I know who read the tape do quite well using their biological neural network.
FYI, I would start out using a classification operator in Rapidminer, then maybe a backpropogation operator.
17 Responses
Sarah
October 18th, 2007 at 2:50 pm
1Hi Tom,
Thanks for writing a post on my question.
Knowing what to look for in a data is certainly a very important question.
You have put the finger on the pulse.
I will try to see if I can extract more information, if I can, then I will share it will everyone.
As they say, “The devil is in the detail!”
~Sarah
Tom
October 19th, 2007 at 5:23 am
2Sarah,
Let me know how you make out, I would love to help you troubleshoot this interesting project.
Shane B
November 25th, 2007 at 7:54 pm
3Sarah,
I too have been thinking about if it would be possible to do some analysis on Time and Sales data. Did some searching and came up with this term: ‘data stream mining’. There is a page on wikipedia about it here: http://en.wikipedia.org/wiki/Data_stream_mining
and much to my surprise, it lists Rapid Miner (Yale) as software for data stream mining! There is a plugin for RM that is called the Data Stream plugin.
Now if only someone could figure out how to use it.
Tom
November 27th, 2007 at 6:11 am
4Shane: I wish I could help but I don’t have access to a data stream.
Dan
November 28th, 2007 at 4:36 am
5Tom, Have you looked at OpenTick? They have the streaming data which you require for free.
Dan
Tom
November 28th, 2007 at 6:11 am
6Dan,
This is awesome, thanks so much. I will check into this when I have some more time.
Shane B
November 29th, 2007 at 9:46 am
7I tried Opentick a while back and couldn’t get their sample apps to work, so I gave up on them. I decided to give it another go for this project. There sample app still doesn’t work, but I set up a java project and ran their code sample with my login and changed the stock symbol to \YMZ7 which is the current Dow mini futures contract (the futures contract on the DJIA 30) (it is free for real time data) and voila it works. The quotes were streaming in real time to the console window!
But I wondered if this data was valid, so I ran Tradestation and opened up a time and sales window for the ym. I put both programs side by side and hit the print screen button and copied it to paint brush. Every trade in both programs matched up perfectly (all in the same order). I did this about 5 times with no differences. That’s pretty amazing!
Not bad for free.
Tom
November 30th, 2007 at 6:05 am
8You know, I’ll have to check it out when I have time. Unless some readers want to help me out? I could use a few “assistants.”
Sarah
February 4th, 2008 at 5:36 pm
9Dan and Shane,
Thanks for the information about opentick. It works like a charm.
I just downloaded two years data for GOOG on minute tick for free! Just Awsome!
Thanks,
~Sarah
linda
March 30th, 2008 at 11:51 am
10Hi,
for my master thesis I want to do some analyzes on High frequency data. Using a model wich has stochastic volatility. I found it hard to find some data to use. The only thing I could find was daily data on yahoo, but for that my model doesn’t work.
I have a mac so I can not use opentick software, moreover it is currently not possible to sign up for opentick.
My question is if one of you could send me some high frequency data wich you already have downloaded?
hope to here from you.
linda.vos1 ‘at’ gmail.com
Shane
March 30th, 2008 at 12:52 pm
11Linda,
OpenTick has a Java library that will work on the Mac (any operating system w/ a JVM).
Here is some market depth data from the S&P 500 emini futures contract:
http://www.spectrumtech.net/jbt/es-Feb12-Mar26.zip
Shane
Shane
March 30th, 2008 at 1:48 pm
12Linda,
Please let me know once you have downloaded the file.
Shane
linda
March 30th, 2008 at 2:03 pm
13Thanks a lot shane for your help! I’m downloading at the moment. So probably I will be finished in an hour.
I will try the java library as soon as opentick except new users again. Thanks a lot
Linda
linda
March 30th, 2008 at 6:13 pm
14I just got a message that the downloading failed can you please give me some more time?
Shane
March 30th, 2008 at 7:36 pm
15You must be on dialup.
Ok, try again.
Tom
March 31st, 2008 at 5:14 am
16Shane: Thanks for helping Linda out.
Linda: Good luck with your graduate work!
linda
March 31st, 2008 at 6:16 am
17thanks a lot for your help shane. It is just perfect!!!
RSS feed for comments on this post · TrackBack URI
Leave a reply
previous post: Posting Hiatus
next post: Random Thoughts
to top of page...