In this video we continue building a financial time series model, using S&P500 daily OHLCV data, and the windowing, sliding validation, and forecasting performance operator. Â We test the model with some out of sample S&P500 data.
[flashvideo file=wp-content/uploads/2010/03/Rapidminer5-vid10.mp4 /]
This video can be viewed in HQ by clicking this link here.  Please make sure you have Quicktime or another MP4 capable reader installed in your browser.
Here are the XLS training and out of sample files.
Nicolas, I suppose you could add the next day’s (Tuesday) data to your training set and then retrain. Or you can use the Join operator to Join the two (or more) days together into one training set.
Yes i try it, but the size of data is a little big. I was trying to figure out how to retrain the model only with the new data (tuesday), not with whole data (Monday + Tuesday).
Read monday data’s
train model
write model to file
…
…
… (pass 24 hours)
Read Tuesday data’s
Read model from file
retrain model
write new model to file
i don’t know if this is possible.
Thanks again men!
Just load in only the Tueday data and train that, then write the new model as Tuesday data?
Thanks for this tutorial Thomas,
I duplicated this tutorial with daily EURUSD data and added a couple of extra inputs which I exported from Metatrader. What would you consider an acceptable prediction trend accuracy with an SVM dot kernel and forex data? In the end the model seemed much better at predicting highs and lows than the close so now I am predicting all three. Is there a way to merge the resulting example sets so that the predicted H,L,C are one chart?
Thanks again,
Alex
Hi Alex. Glad you liked the tutorials. I would suggest looking at using a RBF kernel instead of Dot Matrix, they usually work better for forex data. The acceptable prediction trend accuracy really depends on you, so I don’t know what that would be.
For merging example sets, I haven’t done it but I suspect it can be done. Are you creating three different models for H, L, and C? If yes, they output three different example sets which you can then perhaps combine them using the Join Operator and save as one example set. Then you can use the Series Multiple graph to display them. Like I said, I haven’t done it but it should be feasible to do in RM.
Thanks Thomas,
After some head scratching, I finally got the join operator to work. I did try RBF but seem to have better results with DOT. I have added extra inputs and the model is doing a good job of finding support and resistance. Are your forums open?
Alex
My forums are closed because of intense spam bot activity. I have to find and then figure out how to migrate them to a better forum system.
Hey Tom.
First of all: Great videos! Thanks so much for creating/sharing those. They are of great help!
But I have a question: Why exactly is windowing necessary? I see and understand what it does. But even without Windowing…Rapidminer would still “see” all the previous prices of previous days.
Is it because windows creates an emphasis on let’s say the last 5 days (or whatever else you set as your windowing-range) instead of the previous 1000 days and is therefore more of a short-term indicator that it would be without windowing? In other words: If i only trade long-term…windows would be of no/little use for me?
Or is it something completely different?
Thanks in advance!!
Hi Carsten,
Thanks, I’m glad you liked the videos. The Rapid-i guys, if they still read this blog, hopefully will pipe in here but the current Windowing operator evolved from two separate operators in previous versions. Windowing is necessary to analyze how the time series data itself is collected. It comes in handy when you could have one long column of time series data, say data from drilling operations. You could have the timestamp cell A1, the depth in cell A2, the intensity at A3, and then it repeats at A4 for timestamp, A5 for depth, etc. Essentially you create a series vertical windows
The way I represent the financial time series data, in my videos is through example rows, where the rows are the windowed horizontally and a new window created for the next example row.
I hope this clears it up for you. Just post more questions if you have them.
Thanks for your answer. I totally understand what you did and how the windowing operator behaves. My questions is: Why is it necessary?
Because what you do is basically just duplicate data you already have. That’s what my questions was about: Why do you have to duplicate data?
Example: If you didn’t use windowing…Rapidminer would still know that Jan30 is one step before Jan31 and so on, right?
Sorry I missed that part of your original question. I use the windowing operator to create the “one step ahead” label for prediction. For example, I use the “create label” parameter to shift my closing price one day ahead before feeding into the Sliding X-Val Operator to build a model. I’m not “duplicating” data per se, but transforming it.
You can, of course, manually shift your label ahead by one day on your data set and ten feed it into the sliding X-val, but that can become tedious.
You can analyze it another way, without windowing, and building a model from just your example rows – I’ve done this and it works just fine. Of course, you wouldn’t use the Sliding X-Val Operator if you do it this, you’d use the regular X-Val Operator.