In this video I discuss how to use a cross and simple validation operator to split your training data into two sets: training and validation data sets. I also highlight the new intuitive “quick fix” error solution suggestions in Rapidminer 5.0. Enjoy!
Video download link (HQ): Rapidminer 5.0 Video Tutorial #3
See the Rapidminer 5.0 Video Tutorial #2 post for the data files used in this video
Hi Thomas,
as always a really great video! I look forward to see the next videos of you and I am glad that you consider to provide your and the RapidMiner community such a great ressource. Thanks for that! I have linked your blog entries and your YouTube channel on our video tutorial web site for the first three videos.
I have a small comment on this video which might help some of the readers: After the single split / cross validation was performed the estimation of the model perfomance can be taken from the "average" port (avg) of the validation operator. Just connect the "avg" port to another result port at the right border of the process and together with the predictions of the model also the estimation of correctness will be reported. But don't expect this correctness estimation to be too high for the default model ;-)
All the best and thanks again for your great work,
Ingo (Rapid-I)
Hi Ingo,
Thanks so much for linking Rapid-I to NMT, I’m truly honored.
You are right, I forgot about reporting the performance estimation in the nested process! I'll have to discuss the reporting aspects of RM 5.0 in an upcoming video. Thanks for pointing this out. Yes, the correctness estimate wasn't that high at all, this was merely for tutorial show. =)
Hi Thomas,
Thanks for the amazing help you provide in your blog and videos for RapidMiner users.
Can you please check the link for this video tutorial it's looks broken :)
"Video download link (HQ): Rapidminer 5.0 Video Tutorial #3"
Thanks,
Nir.
Hi Nir,
The link should be fixed. Thanks for pointing that out.
Tom
Tom – Great videos. You really help to drill down into the basics of applying models. learners, training and testing sets – which then can be used to expand.
Thanks.
mdr
m.d-r: Its all about sharing what this powerful open source system can do. I just scratch the surface in this tutorials.
nicely done Tom. Thanks for the useful resources.
Thanks for the simplicity but effective videos.
Your welcome Prashant.
Dear Tom, many thanks for these helpful tutorials, just one question to the gold trend model. You are using labels for the gold trend such as up and down. How did you determine this, by hand ? or using specific rules for determing the trend such as regression ? thanks for an answer
Nicolas
Nicolas: I originally looked at the chart and then identified the which areas were in the “UP” trend and which areas were in the “DOWN” trend. Of course you can use something like an ATR function and then compare the ATR price to the closing price and label “UP” if the closing price > ATR and “DOWN” if ATR > closing price to label a trend break.
All right, I just thought of an alternative to it, refering to one of your tutorials about fitting a trendline with the Fit Trend operator using a nonlinear SVM-Kernel. What you’d get is actually a trendline which is almost perfect but subject to data-snooping. However, I thought that since it is based on future date, you could use it for automatically labeling the trend direction and then train your supervised learner exactly to these labels … what do you reckon, it’s just an idea, haven’t tried it so far, but I intend to ..
Cheers
Nicolas
PD. Many thanks for the great videos in rapidminer, they are very helpful to get started, I learned a lot of theory doing a MSc in Statistics, but this gives me definetly some practical insights (also at my university profs are introducing rapidminer by the way).
However my forex trials which should predict trading direction have been rather frustrating, since I haven’t got anything statisically significantly different to chance.
I think there is some problem directly using technical indicators of financial market time series such as MA etc … which are by definition non stationary. It means that one trains a SVM or neural net on trainig data wich is moving as time passes. Once you apply it to your out of sample data, markets may be at a totally different level. Am i getting something wrong, or shouldn’t we first transform data into a form that repeating patterns are comparable ?
Sorry, for posing that many questions, but I was curious since I am the first time applying machine learning to forex trading …
PDD Last personal question, are you American or German since Thomas Ott sound quite German … and rapidminer is from there as well (however your English seems to be 100% American ;-) )
To answer the first and second part of your email; this is the problem with trying to predict financial markets, its better to look for repeatable patterns. In the video example with the Fit Trend operator, I do note in the comments that its subject to data snooping, because we are fitting a trend line on past time series data. What happens today in the market can be radically different than what we trained for, as is typically the case.
To answer your last question, yes. =)
Hi Tom,
I created a table with daily EURUSD,Dollar Index,Bund,Dax,Spot Gold,Brent Crude,S&P500,10 year notes”. I set EURUSD as the label and used a classify by trend operator to create the up and down. Using 70 days out of sample the prediction(label) is always up. I have been able to optimize weights and there are no apparent errors. Any ideas?
kind regards,
Alex Fleming
Never mind Tom, I found a solution another way. Thanks for the tutorial.
Alex
Essentially your model isn’t sensitive enough to pick the changes in trend. Usually I try to stay away from the Classify by Trend operator because it doesn’t give me the ability to remove the noise from, say, one or two consolidation/down days that are part of an uptrend. I usually hand code in the parts of the trend and turnining pts.
Glad you found another solution!