Rapidminer 5.0 Video Tutorial #10 – Financial Time Series Modeling Part 2

In this video we continue building a financial time series model, using S&P500 daily OHLCV data, and the windowing, sliding validation, and forecasting performance operator.  We test the model with some out of sample S&P500 data.

[flashvideo file=wp-content/uploads/2010/03/Rapidminer5-vid10.mp4 /]

This video can be viewed in HQ by clicking this link here.  Please make sure you have Quicktime or another MP4 capable reader installed in your browser.

Here are the XLS training and out of sample files.

  1. S&P500 Training XLS
  2. S&P500 Out of Sample XLS
  • http://www.neuralmarkettrends.com Thomas Ott

    Nicolas, I suppose you could add the next day’s (Tuesday) data to your training set and then retrain. Or you can use the Join operator to Join the two (or more) days together into one training set.

  • Nicolás

    Yes i try it, but the size of data is a little big. I was trying to figure out how to retrain the model only with the new data (tuesday), not with whole data (Monday + Tuesday).

    Read monday data’s
    train model
    write model to file


    … (pass 24 hours)
    Read Tuesday data’s
    Read model from file
    retrain model
    write new model to file

    i don’t know if this is possible.

    Thanks again men!

  • http://www.neuralmarkettrends.com Thomas Ott

    Just load in only the Tueday data and train that, then write the new model as Tuesday data?

  • Alex Fleming

    Thanks for this tutorial Thomas,

    I duplicated this tutorial with daily EURUSD data and added a couple of extra inputs which I exported from Metatrader. What would you consider an acceptable prediction trend accuracy with an SVM dot kernel and forex data? In the end the model seemed much better at predicting highs and lows than the close so now I am predicting all three. Is there a way to merge the resulting example sets so that the predicted H,L,C are one chart?

    Thanks again,

    Alex

  • http://www.neuralmarkettrends.com Thomas Ott

    Hi Alex. Glad you liked the tutorials. I would suggest looking at using a RBF kernel instead of Dot Matrix, they usually work better for forex data. The acceptable prediction trend accuracy really depends on you, so I don’t know what that would be.

    For merging example sets, I haven’t done it but I suspect it can be done. Are you creating three different models for H, L, and C? If yes, they output three different example sets which you can then perhaps combine them using the Join Operator and save as one example set. Then you can use the Series Multiple graph to display them. Like I said, I haven’t done it but it should be feasible to do in RM.

  • Alex Fleming

    Thanks Thomas,

    After some head scratching, I finally got the join operator to work. I did try RBF but seem to have better results with DOT. I have added extra inputs and the model is doing a good job of finding support and resistance. Are your forums open?

    Alex

  • http://www.neuralmarkettrends.com Thomas Ott

    My forums are closed because of intense spam bot activity. I have to find and then figure out how to migrate them to a better forum system.

  • Carsten

    Hey Tom.

    First of all: Great videos! Thanks so much for creating/sharing those. They are of great help!

    But I have a question: Why exactly is windowing necessary? I see and understand what it does. But even without Windowing…Rapidminer would still “see” all the previous prices of previous days.
    Is it because windows creates an emphasis on let’s say the last 5 days (or whatever else you set as your windowing-range) instead of the previous 1000 days and is therefore more of a short-term indicator that it would be without windowing? In other words: If i only trade long-term…windows would be of no/little use for me?

    Or is it something completely different?

    Thanks in advance!!

  • http://www.neuralmarkettrends.com Tom

    Hi Carsten,

    Thanks, I’m glad you liked the videos. The Rapid-i guys, if they still read this blog, hopefully will pipe in here but the current Windowing operator evolved from two separate operators in previous versions. Windowing is necessary to analyze how the time series data itself is collected. It comes in handy when you could have one long column of time series data, say data from drilling operations. You could have the timestamp cell A1, the depth in cell A2, the intensity at A3, and then it repeats at A4 for timestamp, A5 for depth, etc. Essentially you create a series vertical windows

    The way I represent the financial time series data, in my videos is through example rows, where the rows are the windowed horizontally and a new window created for the next example row.

    I hope this clears it up for you. Just post more questions if you have them.

  • Carsten

    Thanks for your answer. I totally understand what you did and how the windowing operator behaves. My questions is: Why is it necessary?

    Because what you do is basically just duplicate data you already have. That’s what my questions was about: Why do you have to duplicate data?

    Example: If you didn’t use windowing…Rapidminer would still know that Jan30 is one step before Jan31 and so on, right?

  • http://www.neuralmarkettrends.com Tom

    Sorry I missed that part of your original question. I use the windowing operator to create the “one step ahead” label for prediction. For example, I use the “create label” parameter to shift my closing price one day ahead before feeding into the Sliding X-Val Operator to build a model. I’m not “duplicating” data per se, but transforming it.

    You can, of course, manually shift your label ahead by one day on your data set and ten feed it into the sliding X-val, but that can become tedious.

    You can analyze it another way, without windowing, and building a model from just your example rows – I’ve done this and it works just fine. Of course, you wouldn’t use the Sliding X-Val Operator if you do it this, you’d use the regular X-Val Operator.

  • Mazda

    Hi Tom,
    First thanks for the great job on the videos, very helpful. I have question about your Time Series Video #10.
    I like to forecast more than one day out. But, not sure how to do this, since I have to first predict the values for other variables. How are you generating the values for open, high, low and volume for your Out of Sample Data?
    When I test mine, following your Video #10, no matter what values I use for my horizon, I always get 1 day out forecast, but I like to forecast few days out.
    Any suggestions?
    Thanks,
    Mazda