Calling all marketers! In this video we discuss how we can use Rapidminer to create a decision tree to help us find "sweet spots" in a particular market segment. This video tutorial uses the Rapidminer direct mail marketing data generator and a split validation operator to build the decision tree.
Video download link (HQ): Rapidminer 5.0 Video Tutorial #6
Hi Tom,
First, i wanna thank you for your great tutorials on rapidminer 5.0!
I am really looking forward to your time series tutorials as i tried to figure out myself – with no success until now :( – how i can forecast time series (for at least one day in the future) with a neural network. So my question would be, how do i get rapidminer to show me the prediction for (at least) the next day? I tried to fool around a bit with the "impute missing values" but the results seem to be strange…
All in all, keep on working buddy, i love your site :)
Hi Sven,
Forecasting times series, especially for markets, is a dicey business and what works today won't work in a week or a month. Its easy to build a time series model in RM for a given stock symbol and train it but, how do you predict tomorrow's close?
That's always been troubling because you typically train your model with OHLCV attributes (where C – Close is your label) and then prediction set (out of sample) needs to have OHLV for the next day to predict your Close. But If YOU don't know what the OHLV is for the next day, how can you predict the close?
One way to do it is to train the model to issue a BUY or SELL signal (binomial label) for t-1 (you want the signal to generate the day before) on training data that contains OHLCVS (S is for signal, not "sucks") and then predict the signal for the out of sample set sing OHLCV.
I hope this helps. Thanks for the kind words about my tutorials and thanks for reading.
Hi Tom,
Thank you for your quick answer! And your right, i forgot about the hidden markov process in neural networks – silly me :)
The idea of generating a future signal is a great idea and i will try that !
Looking forward to your tutorial #7 :)
Sven
HI Tom,
I do not know how to thanx you. This is great help to work on rapid miner
This is the thing I was waiting from long time.
look forward tutorial 7,8,9,10.
Thank you in advance.
regards
Hi scbhush, you can buy me a beer sometime. =)
HI Tom
No problem, you are always welcome.
Thank you once again
Hi Tom!
Thanks a lot for all these. I am new to rapidminer and your tutorials really help.
Looking forward to the next tutorials! Hope you can do one on the text mining capability of rapidminer. Like how to extract information from large volumes of texts using the Text Processing operators. :-)
Thanks again!
Jill: I was planning on doing a text processing one for the next batch of videos in late Spring. I plan on re-engineering this site after the tutorial #10 and taking a break. Thanks for reading and I'm glad you enjoy these tutorials.
Tom,
This was another good video. Thanks!
One question that keeps coming up for me and possibly will be answered in a later video is what technology is actually in the underlying model? Is it something that has logical rules, or is is something neural network based? I _think_ that this one was logical and the rules could be converted to some other environment, but maybe others are neural and couldn’t be used anywhere else.
Maybe a future video could cover how to mine and extract rules that could be converted to code in another language.
- Mark
@Mark: Are you referring to the actual formula and theory behind an artificial network? Or are you referring to the parameters in the operators? If you are interested in “refining” the parameters, such as learning rate and/or momentum, you can use the Parameter Optimization operator.
Tom – I’m interested in discovering non-genetic, algebraic solutions using RM. I day trade and neural networks aren’t going to play well inside of TradeStation. Personally I’m not comfortable with trying to tie Rapid Miner to TradeStation in a live setup. However if I can mine some solutions that use my technical indicators with some sort of weightings then I can write that back into TradeStation code and backtest what I’m finding in Rapid Miner for accuracy.
As an example on of the first tutorials looks at the Golf sample data and builds a simple decision tree that appears to be completely algebraic/logic oriented. I believe one of your later tutorials looking at marketing data and deciding that certain zip codes and ages made sense to target. I’d like to do models more like that and not neural based, at least for day trading.
Neural networks for multi-day trend trading make lots of sense to me.
@Mark: Ok, I see. Yes, there are algebraic formula’s for all this stuff, some more complicated than others. I remember doing them by hand in my decision analysis classes, it felt like I was being kicked in the crotch repeatedly.
I suggest getting a hold of a good operations research book, it should have the formulas for a lot of what you might be looking for.
Thanks for the tutorials, it is great, I have a question, where can i find insight on how to customize or make a new model for data generation?
@Justin: On making a random data generator or do you mean creating your own data files?
Hi Tom,
thank you so much for these great tutorials! It’s really a professional level where it is fun for the visitors to learn more!
If you plan to make more tutorials in future I would kindly ask for a session on the different types of operators in Rapid Miner. Something like a rough overview without having to read tons of documentation… For what types do I use a neural net? What are typical question where I use a cross validation to find an answer? What is the different between weighting and …
I’m pretty new in data mining but thanks to your great videos and the easy GUI I already feel quite comfortable with the program. However, I’m struggling on using the right operators or in other words I don’t know whether the result that I get is really significant. A short overview on the general types of operators would be so great!
Thanks a million for all your effort you put in your blog and keep posting!!
Kind regards,
Kihmo
Hi Kihmo,
I’m working on book with the Rapid-i team that addresses this very issue. We hope to have it ready for press in April 2011.
Hey Thomas, I loved your tutorials…keep up the rockin’ work. I must say though that you make it look very easy. I Have been having extreme difficulty creating a decision tree using the following fields in an Excel file: bought price,lender,state,employment,pay period,monthly income,requested amt,employed months.
I want to make the Label column ‘bought price’ but then it complains about not being able to do a split test with numerical. Perhaps I’m going about this model the wrong way? Basically I want to find the relationships/correlations as it pertains to ‘bought price’ and the other variables. Any help would be sincerely appreciated.
Hi Cort! If you want to know the correlation between your variables, you could use RM’s correlation matrix operator. The Decision Tree operator can’t handle a numerical label (output) so you’d have to transform it into some sort of nominal label if possible. You can try using the discretizing operators to transform your numerical label into a nomimal one. Good luck and thanks for watching the tutorials!