Rapidminer 5.0 Video Tutorial #3 – Building a Gold Trend Classification Model Part 2

In this video I discuss how to use a cross and simple validation operator to split your training data into two sets: training and validation data sets.  I also highlight the new intuitive “quick fix” error solution suggestions in Rapidminer 5.0. Enjoy!

Video download link (HQ): Rapidminer 5.0 Video Tutorial #3

See the Rapidminer 5.0 Video Tutorial #2 post for the data files used in this video

  • http://rapid-i.com Ingo Mierswa

    Hi Thomas,
    as always a really great video! I look forward to see the next videos of you and I am glad that you consider to provide your and the RapidMiner community such a great ressource. Thanks for that! I have linked your blog entries and your YouTube channel on our video tutorial web site for the first three videos.
    I have a small comment on this video which might help some of the readers: After the single split / cross validation was performed the estimation of the model perfomance can be taken from the "average" port (avg) of the validation operator. Just connect the "avg" port to another result port at the right border of the process and together with the predictions of the model also the estimation of correctness will be reported. But don't expect this correctness estimation to be too high for the default model ;-)
    All the best and thanks again for your great work,
    Ingo (Rapid-I)

  • http://www.neuralmarkettrends.com Tom

    Hi Ingo,
    Thanks so much for linking Rapid-I to NMT, I’m truly honored.

    You are right, I forgot about reporting the performance estimation in the nested process!  I'll have to discuss the reporting aspects of RM 5.0 in an upcoming video.  Thanks for pointing this out.  Yes, the correctness estimate wasn't that high at all, this was merely for tutorial show. =)

  • Nir

    Hi Thomas,
    Thanks for the amazing help you provide in your blog and videos for RapidMiner users.
    Can you please check the link for this video tutorial it's looks broken :)
    "Video download link (HQ): Rapidminer 5.0 Video Tutorial #3"
    Thanks,
    Nir.

  • http://www.neuralmarkettrends.com Tom

    Hi Nir,

    The link should be fixed. Thanks for pointing that out.

    Tom

  • m.d-r

    Tom – Great videos. You really help to drill down into the basics of applying models. learners, training and testing sets  – which then can be used to expand.
    Thanks.
    mdr

  • http://www.neuralmarkettrends.com Tom

    m.d-r: Its all about sharing what this powerful open source system can do.  I just scratch the surface in this tutorials.

  • Lenny

    nicely done Tom. Thanks for the useful resources.

  • Prashant Nagpure

    Thanks for the simplicity but effective videos.

  • Anonymous

    Your welcome Prashant.

  • Nicolas Meng

    Dear Tom, many thanks for these helpful tutorials, just one question to the gold trend model. You are using labels for the gold trend such as up and down. How did you determine this, by hand ? or using specific rules for determing the trend such as regression ? thanks for an answer

    Nicolas

  • Anonymous

    Nicolas: I originally looked at the chart and then identified the which areas were in the “UP” trend and which areas were in the “DOWN” trend. Of course you can use something like an ATR function and then compare the ATR price to the closing price and label “UP” if the closing price > ATR and “DOWN” if ATR > closing price to label a trend break.

  • Nicolas Meng

    All right, I just thought of an alternative to it, refering to one of your tutorials about fitting a trendline with the Fit Trend operator using a nonlinear SVM-Kernel. What you’d get is actually a trendline which is almost perfect but subject to data-snooping. However, I thought that since it is based on future date, you could use it for automatically labeling the trend direction and then train your supervised learner exactly to these labels … what do you reckon, it’s just an idea, haven’t tried it so far, but I intend to ..
    Cheers
    Nicolas

    PD. Many thanks for the great videos in rapidminer, they are very helpful to get started, I learned a lot of theory doing a MSc in Statistics, but this gives me definetly some practical insights (also at my university profs are introducing rapidminer by the way).
    However my forex trials which should predict trading direction have been rather frustrating, since I haven’t got anything statisically significantly different to chance.
    I think there is some problem directly using technical indicators of financial market time series such as MA etc … which are by definition non stationary. It means that one trains a SVM or neural net on trainig data wich is moving as time passes. Once you apply it to your out of sample data, markets may be at a totally different level. Am i getting something wrong, or shouldn’t we first transform data into a form that repeating patterns are comparable ?
    Sorry, for posing that many questions, but I was curious since I am the first time applying machine learning to forex trading …

    PDD Last personal question, are you American or German since Thomas Ott sound quite German … and rapidminer is from there as well (however your English seems to be 100% American ;-) )

  • Anonymous

    To answer the first and second part of your email; this is the problem with trying to predict financial markets, its better to look for repeatable patterns. In the video example with the Fit Trend operator, I do note in the comments that its subject to data snooping, because we are fitting a trend line on past time series data. What happens today in the market can be radically different than what we trained for, as is typically the case.
    To answer your last question, yes. =)

  • Alex Fleming

    Hi Tom,
    I created a table with daily EURUSD,Dollar Index,Bund,Dax,Spot Gold,Brent Crude,S&P500,10 year notes”. I set EURUSD as the label and used a classify by trend operator to create the up and down. Using 70 days out of sample the prediction(label) is always up. I have been able to optimize weights and there are no apparent errors. Any ideas?

    kind regards,

    Alex Fleming

  • Alex Fleming

    Never mind Tom, I found a solution another way. Thanks for the tutorial.

    Alex

  • http://www.neuralmarkettrends.com Thomas Ott

    Essentially your model isn’t sensitive enough to pick the changes in trend. Usually I try to stay away from the Classify by Trend operator because it doesn’t give me the ability to remove the noise from, say, one or two consolidation/down days that are part of an uptrend. I usually hand code in the parts of the trend and turnining pts.

    Glad you found another solution!

  • Andrew

    Hi Tom,

    I’m confused as to what the prediction for ‘DOWN’ in Video’s #2 and #3 is being based on..? Is it actually a combination of the other variables or is it purely just being based on the number of UP’s and DOWN’s in the training data. The confidence for both (0.469 and 0.531) would suggest it is just the latter…but that wouldn’t make any sense?

    Could you please shed some light on how it is taking into consideration all the variables in it’s prediction, or how we can make this happen?

    Thanks…

  • http://www.neuralmarkettrends.com Tom

    Hi Andrew. The confidence levels you point out are indeed horrible. I did this video as an example to show how you would build something like this. The model takes all input variables into account when it tries to the classify the trend as DOWN or UP.

  • Andrew

    Hi,

    Sorry to bug you again…I still don’t quite understand how it’s taking into account all the variables. The 0.469 and 0.531 confidence measures seem to just be indicating the proportion of UP’s and DOWN’s there are in the initial ‘training dataset’ (i.e. 46.9% of the 516 records in the training set are UP’s and 53.1% are DOWN). It seems like that is all that the predictions are then being based on when the model is applied to the prediction dataset…? I tried my own make shift dataset with this process and came out with the same result…the confidence measures are just showing the proportion of the ‘label’ variable in the initial training set…

    Would using a Support Vector Machine work for this?

    Apologies if I’m not making sense, still trying to get my head around these processes…

    Thanks for your help so far,
    Andrew