In this video we continue where we left off in Video Tutorial #4. We discuss some of the parameters that are available in the Genetic Algorithm data transformers to select the best attributes in the data set. We also replace the first operator with another Genetic Algorithm data transformer that allows us to manipulate population size, mutation rate, and change the selection schemes (tournament, roulette, etc).
Video download link (HQ): Rapidminer 5.0 Video Tutorial #5
Maan! i got a insighht right now!
I just posted a comment in the last video and "puff"
Imagine this scenery:
I want to create a score to one mailing that tell me the probability of i sale one product to him (based on gender, city, how many times i called him,product and the price(was selled and missed)
What is the algoritm that give to me this "score'/?
Thank you!
Obrigado!
Merci
Calastro: It sounds like you want to create a value like a "credit score." That's mostly likely a formula that you'll have to create yourself or use the formula results writer in RM. You could use a Bayesian learner to find out how often a particular variable shows up in your data space (assuming each entry is independent).
Many thanks for the videos so far, I have an 80% prediction using genetic optimisation. If you dont mind I have a question, where do I put the model writer in the experiment. I would assume this would go after the evaluator, as if it goes in the testing section the file is constantly overwritten as each generation of results is tried. However I get erors if I try to put the model writer in this position in the experiment.
Many thanks in advance and cant wait for the remaining 4 videos.
@c1borg: attach the model writer operator to the "mod" node on the apply model operator in the testing section of the Split Validation operator. Make sure you give your mod a name or else it will give you errors. See if that helps.
Ok thanks for that this is what I tried before and discarded as I thought the model file is overwritten many times and this cannot be correct. However I guess your saying the last write to the file will be the best result. Why would it not be correct to attach to the mod o/p of the validator?
c1borg: You could place it there too and it should work too, but I rarely use the model writer now. I just create a prediction experiment at the same time and connect the "mod" node to it so the model learns and predicts when its done.
Thanks, tom!
I'lll keep visiting your blog and learning more about the RM!
Hello, Tom!
When executing my process using Optimize Selection (evolutionary) i save the model using “write model” and when this process finishes, it is supposed that the last saved model is the best one.
When i want to apply the model again, i have problems trying to apply it to a new data. Rapidminer says that the mapping of certain variables is wrong. And it gets me even much more confused when it says that certain variables are not included (but those ones where discarded by the optimizer!)
from 60 variables, the process chose 30 as the best ones that help explain the model. So i read this new data with “read csv”, where i got the 30 variables plus the ID variable except the label variable (which is the one that i want to predict).
i have problems trying to apply this model. Do you have some ideas that can help to achieve this???
thanks.
Hmm, without seeing the whole experiment process and data I can offer this suggestion. Once the optimization has happened, write the selected variables to a file and then create a new input file with those variables. Then train and save your model on the “short list” of variabels and go through the process of prediction as you described. Let me know how it turns out.
Hi,
I really like blog. Thanks for sharing your knowledge on rapidminer.
I have a huge problem on scoring unlabeled data with rapid miner on 10 fold cross validation for two class classification.
I use 10 fold xval for model training & testing usinf libsvm on rapidminer. It gives me 86% accurate classification on testing. Everthing is fine upto this point.
But, when I apply the score dataset with unlabeled data to predict the classification of the score dateset. The model classifies every observation as only one class, which has the hightes frequency of the training dataset.
I checked the confidence probabilities of each score dataset observations they are all the same (0.36 for No, and 0.64 for Yes).
Could you please advice me where the problem is, or if you have any sample share with us?
Is there any option where I can manipulate the confidences.
I use Rapidminer 5 and also look at the scoring video tutorial of Rapidminer. But it only shows training and scoring. It does not show Training, Testing and Scoring.
Thanks in advanve.
Seyhan
@Seyhan: Perhaps you can ask this question in the forums?