Rapidminer 5.0 Video Tutorial #5 – Genetic Algorithmic Data Preprocessing Part 2

In this video we continue where we left off in Video Tutorial #4.  We discuss some of the parameters that are available in the Genetic Algorithm data transformers to select the best attributes in the data set.  We also replace the first operator with another Genetic Algorithm data transformer that allows us to manipulate population size, mutation rate, and change the selection schemes (tournament, roulette, etc).

Video download link (HQ): Rapidminer 5.0 Video Tutorial #5

About Tom

Blog owner of Neural Market Trends
This entry was posted in RapidMiner, Tutorials, Video. Bookmark the permalink.

11 Responses to Rapidminer 5.0 Video Tutorial #5 – Genetic Algorithmic Data Preprocessing Part 2

  1. Calastro says:

    Maan! i got a insighht right now!
    I just posted a comment in the last  video and "puff"
    Imagine this scenery:
    I want to create a score to one mailing that tell me the probability of i sale one product to him (based on gender, city, how many times i called him,product and the price(was selled and missed)
    What is the algoritm that give to me this "score'/?
    Thank you!

    Obrigado!
     
    Merci

  2. Tom says:

    Calastro: It sounds like you want to create a value like a "credit score."  That's mostly likely a formula that you'll have to create yourself or use the formula results writer in RM.  You could use a Bayesian learner to find out how often a particular variable shows up in your data space (assuming each entry is independent).

  3. c1borg says:

    Many thanks for the videos so far, I have an 80% prediction using genetic optimisation. If you dont mind I have a question, where do I put the model writer in the experiment. I would assume this would go after the evaluator, as if it goes in the testing section the file is constantly overwritten as each generation of results is tried. However I get erors if I try to put the model writer in this position in the experiment.
    Many thanks in advance and cant wait for the remaining 4 videos.

  4. Tom says:

    @c1borg: attach the model writer operator to the "mod" node on the apply model operator in the testing section of the Split Validation operator.  Make sure you give your mod a name or else it will give you errors.  See if that helps.

  5. c1borg says:

    Ok thanks for that this is what I tried before and discarded as I thought the model file is overwritten many times and this cannot be correct. However I guess your saying the last write to the file will be the best result. Why would it not be correct to attach to the mod o/p of the validator?

  6. Tom says:

    c1borg: You could place it there too and it should work too, but I rarely use the model writer now.  I just create a prediction experiment at the same time and connect the "mod" node to it so the model learns and predicts when its done.

  7. Calastro says:

    Thanks, tom!
    I'lll keep visiting your blog and learning more about the RM!

  8. Pathros says:

    Hello, Tom!
    When executing my process using Optimize Selection (evolutionary) i save the model using “write model” and when this process finishes, it is supposed that the last saved model is the best one.
    When i want to apply the model again, i have problems trying to apply it to a new data. Rapidminer says that the mapping of certain variables is wrong. And it gets me even much more confused when it says that certain variables are not included (but those ones where discarded by the optimizer!)

    from 60 variables, the process chose 30 as the best ones that help explain the model. So i read this new data with “read csv”, where i got the 30 variables plus the ID variable except the label variable (which is the one that i want to predict).

    i have problems trying to apply this model. Do you have some ideas that can help to achieve this???

    thanks.

  9. Tom says:

    Hmm, without seeing the whole experiment process and data I can offer this suggestion. Once the optimization has happened, write the selected variables to a file and then create a new input file with those variables. Then train and save your model on the “short list” of variabels and go through the process of prediction as you described. Let me know how it turns out.

  10. Seyhan says:

    Hi,

    I really like blog. Thanks for sharing your knowledge on rapidminer.

    I have a huge problem on scoring unlabeled data with rapid miner on 10 fold cross validation for two class classification.

    I use 10 fold xval for model training & testing usinf libsvm on rapidminer. It gives me 86% accurate classification on testing. Everthing is fine upto this point.

    But, when I apply the score dataset with unlabeled data to predict the classification of the score dateset. The model classifies every observation as only one class, which has the hightes frequency of the training dataset.

    I checked the confidence probabilities of each score dataset observations they are all the same (0.36 for No, and 0.64 for Yes).

    Could you please advice me where the problem is, or if you have any sample share with us?

    Is there any option where I can manipulate the confidences.

    I use Rapidminer 5 and also look at the scoring video tutorial of Rapidminer. But it only shows training and scoring. It does not show Training, Testing and Scoring.

    Thanks in advanve.

    Seyhan

  11. Tom says:

    @Seyhan: Perhaps you can ask this question in the forums?

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>