Sarah left a comment about modeling the microstructure of financial time series data. I think she means “high frequency data” which is what all the tick and sales data is commonly referred too. She specifically asked:

  1. Have you ever analyzed financial time series data? I mean digging into microstructure data or processing tick data.
  2. What are the things one looks for?
  3. What are the biggest challenges of such data set?
  4. Do you have some plots or figures you can share with me?

I’ve never modeled this type of data because I don’t have access to it so, unfortunately, I don’t have plots or figures to share with her. However, I did think about how to set up this type of experiment and what the challenges might be.

I would think the hardest part of modeling this data would be to figure out what are the important patterns you want Rapidminer to look for. My guess would be that you want to know the price outcome when similar patterns of large block buying/selling occurs before the stock rises or falls rapidly. It make sense right? If Goldman Sachs starts buying 10,000 shares of XYZ, other traders will get jump on.

The biggest challenge would be to find those large blocks (possibly through a filter using Excel) and tag those records with a nominal or numerical label. Of course this would present itself as an insane task if you had thousands of records of data for hundred’s of stocks over a month’s time! To overcome this, I would suggest to mine your data at a higher level first, such as identifying large volume and large moving stocks, then tag them with a label (UP, DOWN, INTERESTING CANDIDATE, etc) and then select those for your final analysis.

This problem is nothing more than trying to find a needle in a haystack but has its merits, after all a lot of traders I know who read the tape do quite well using their biological neural network. :)

FYI, I would start out using a classification operator in Rapidminer, then maybe a backpropogation operator.