Working On A Time Series Tutorial | Home | A 100 Year Old 1000 Reichsbanknote

March 10, 2010

Rapidminer 5.0 Video Tutorial #6 – Creating a Decision Tree with Rapidminer 5.0

Calling all marketers!  In this video we discuss how we can use Rapidminer to create a decision tree to help us find "sweet spots" in a particular market segment.  This video tutorial uses the Rapidminer direct mail marketing data generator and a split validation operator to build the decision tree.  

Video download link (HQ): Rapidminer 5.0 Video Tutorial #6

14 Responses to “Rapidminer 5.0 Video Tutorial #6 – Creating a Decision Tree with Rapidminer 5.0”

  1. Sven said:

    Hi Tom,
    First, i wanna thank you for your great tutorials on rapidminer 5.0!
    I am really looking forward to your time series tutorials as i tried to figure out myself – with no success until now :( – how i can forecast time series (for at least one day in the future) with a neural network. So my question would be, how do i get rapidminer to show me the prediction for (at least) the next day? I tried to fool around a bit with the "impute missing values" but the results seem to be strange…
    All in all, keep on working buddy, i love your site :)

  2. Tom said:

    Hi Sven,
    Forecasting times series, especially for markets, is a dicey business and what works today won't work in a week or a month.  Its easy to build a time series model in RM for a given stock symbol and train it but, how do you predict tomorrow's close?  
    That's always been troubling because you typically train your model with OHLCV attributes (where C – Close is your label) and then prediction set (out of sample) needs to have OHLV for the next day to predict your Close.  But If YOU don't know what the OHLV is for the next day, how can you predict the close?
    One way to do it is to train the model to issue a BUY or SELL signal (binomial label) for t-1 (you want the signal to generate the day before) on training data that contains OHLCVS (S is for signal, not "sucks") and then predict the signal for the out of sample set sing OHLCV.
    I hope this helps. Thanks for the kind words about my tutorials and thanks for reading.

  3. Sven said:

    Hi Tom,
    Thank you for your quick answer!  And your right, i forgot about the hidden markov process in neural networks – silly me  :)
    The idea of generating a future signal is a great idea and i will try that !
    Looking forward to your tutorial #7 :)
    Sven

  4. scbhush said:

    HI Tom, 
    I do not know how to thanx you.   This is great help to work on rapid miner
    This is the thing I was waiting from long time.

    look forward tutorial 7,8,9,10.
    Thank you in advance.
     
    regards

  5. Tom said:

    Hi scbhush, you can buy me a beer sometime. =)

  6. scbhush said:

    HI Tom
     
    No problem,  you are always welcome.
     
    Thank you once again

  7. jill said:

    Hi Tom!

    Thanks a lot for all these. I am new to rapidminer and your tutorials really help.
    Looking forward to the next tutorials! Hope you can do one on the text mining capability of rapidminer. Like how to extract information from large volumes of texts using the Text Processing operators. :-)
    Thanks again!

  8. Tom said:

    Jill: I was planning on doing a text processing one for the next batch of videos in late Spring.  I plan on re-engineering this site after the tutorial #10 and taking a break.  Thanks for reading and I'm glad you enjoy these tutorials.

  9. Mark Knecht said:

    Tom,
    This was another good video. Thanks!

    One question that keeps coming up for me and possibly will be answered in a later video is what technology is actually in the underlying model? Is it something that has logical rules, or is is something neural network based? I _think_ that this one was logical and the rules could be converted to some other environment, but maybe others are neural and couldn’t be used anywhere else.

    Maybe a future video could cover how to mine and extract rules that could be converted to code in another language.

    - Mark

  10. Tom said:

    @Mark: Are you referring to the actual formula and theory behind an artificial network? Or are you referring to the parameters in the operators? If you are interested in “refining” the parameters, such as learning rate and/or momentum, you can use the Parameter Optimization operator.

  11. Mark Knecht said:

    Tom – I’m interested in discovering non-genetic, algebraic solutions using RM. I day trade and neural networks aren’t going to play well inside of TradeStation. Personally I’m not comfortable with trying to tie Rapid Miner to TradeStation in a live setup. However if I can mine some solutions that use my technical indicators with some sort of weightings then I can write that back into TradeStation code and backtest what I’m finding in Rapid Miner for accuracy.

    As an example on of the first tutorials looks at the Golf sample data and builds a simple decision tree that appears to be completely algebraic/logic oriented. I believe one of your later tutorials looking at marketing data and deciding that certain zip codes and ages made sense to target. I’d like to do models more like that and not neural based, at least for day trading.

    Neural networks for multi-day trend trading make lots of sense to me.

  12. Tom said:

    @Mark: Ok, I see. Yes, there are algebraic formula’s for all this stuff, some more complicated than others. I remember doing them by hand in my decision analysis classes, it felt like I was being kicked in the crotch repeatedly.

    I suggest getting a hold of a good operations research book, it should have the formulas for a lot of what you might be looking for.

  13. Justin said:

    Thanks for the tutorials, it is great, I have a question, where can i find insight on how to customize or make a new model for data generation?

  14. Tom said:

    @Justin: On making a random data generator or do you mean creating your own data files?

Post your opinion