May 9, 2007
Building an AI financial market model – Lesson IV
Welcome back! In this lesson we will set our preferences to make sure everything loads in correctly before we create the model. By now, you should’ve read and built the experiment framework, as described in Lesson III. If you haven’t, then what I’ll post here might not make a lot of sense.
We’ll cover 4 items in this lesson:
- Data Loading Preferences
- Model Writing Preferences
- Performance Preferences
- Run the experiment
Data Loading Preferences
Before you run the experiment for the first time, you have tell it where to find your data. If you click on the ExcelExampleSource operator, you’ll see the following preferences and toggle box.
I highlighted the important preferences with red dots in the above image so to avoid confusion. First select your data spreadsheet (Gold Final Input.xls) from your file folders, click on the “first_row_as_names“, enter the number 9 into the label_column field, and enter the number 1 into the id_column field.
It’s important that you get this step right. What you are doing is telling the experiment where your output variable is located and what your reference id is. The label_column field is the Excel column number of your output variable and the id_column field should be your date column number. Remember this because you’ll have to fill these fields in for your other experiments.
Next, you should create a breakpoint in the experiment which is nothing more than a pause in the experiment’s run. We’re doing this because we want the experiment to pause right after its loaded in the data. Why do we do this? By creating a breakpoint at this point in the experiment, you can inspect the loaded in data and make sure the experiment is reading in your output variable correctly.
Tip: You can skip this step but I highly advise that you don’t. You can create breakpoints at any step in the experiment if you choose but its more valuable during the data loading stage.
Model Saving Preferences
Scroll down to the ModelWriter operator and click on it. You’ll see only one field that will allow you select the path location to save your model. Click on it > select your data directory > type “gold_final.mod” and hit enter. Done!
Performance Preferences
Now we reach the final step, the setting of the performance preferences. Scroll down to the Performance Evaluator operator and click on it. You should see several fields available with check boxes. Scroll down and check the absolute error, relative error, correlation, square correlation, accuracy, and classification error boxes. Make sure the field with the pull down menu is set at correlation. Refer to the image below for the setup.
You’re done now. Let’s run the experiment!
Run the Experiment
This is the best part, all your hard work is about to pay off! Find the “play” button and click it!
The experiment should load your data in flash and then reach the breakpoint we discussed about. The experiment will automatically switch to the results screen which should look like this:
This is where the fun in data analysis begins! This results screen (only if you used the breakpoint) will tell you what the model sees as your output variable (label column). If its not GC Trend, then press the stop button and go back to the ExcelExampleSource operator and check your preferences.
Take a moment and click on the “plot view” option. Here you can create scatter plots, self organizing maps, or historgrams to your heart’s content. Take a moment and create a scatter plot, choose whatever you want for the X, Y, and Point Colors. YALE should automatically create a plot for you with several dots. These dots are from your id_column preference, in this case the date.
Remember we added in the data visualization operator? Doing this allows us to click on anyone of those scatter points and find out more about that data point. Adding this operator lets you determine that data composition of outliers and or specific information about a data point of your choosing.
When you’re all done, you’ll have to resume the experiment. Click on the resume button.
Now the experiment will create the model and determine its performance. This step could take a few minutes, depending on the size of your data. While you’re waiting, take a moment to subscribe to my RSS feed (shameless plug).
When the experiment finishes you should see the information in the results tab be replaced with the following screen:
I’m not going to discuss the importance of the statistical measures here but I will tell you that in building a classification model, like this, a high correlation is good. The correlation can be positive or negative and the closer it is to 1 (or -1) the better.
Congratulations! You’ve finished your first YALE experiment and build your first model! In Lesson V, I will show you how to build a prediction experiment and we’ll finally predict some current trends for Gold.
As always, if you have any questions regarding this lesson or the topics covered so far, please leave a comment or email me.

May 14th, 2007 at 9:15 pm
Thx for your explanation, very clear.
I managed to built my first model in one evening.
Quite a simple one.
5 year of NDX and VXN data and a prediction of the NDX in 30days
I am very much interested and will work on it a bit more.
Thank you,
Ice
May 15th, 2007 at 8:02 am
Ernst,
I’m happy to hear that! I’m glad I could help. Let me know if you need to trouble shoot it or want to see a post about anything in particular.
How did you build the prediction experiment? I haven’t posted about it yet.
June 21st, 2007 at 9:10 am
Hello Tom,
Thanks for all the great information in this blog! I wanted ask you a few questions about building an AI financial market model. I went through your lesson last night and found some differences with my ExcelExampleSource operator and the one in your lesson. I didn’t see fields for id_column and datamanagement (double_array) in rapid miner. Are there minor differences between YALE and rapid miner?
I noticed that my results were the same as yours after I ran the experiment but I’m getting some sort of “ID error” when I try to resume the experiment after the breakpoint. I’m also having trouble finding a resume experiment button.
Any information would be appreciated. Thanks
June 21st, 2007 at 9:12 am
Tim,
There are bug issues with the RapidMiner upgrade and I’ve since gone back to Yale version 3.4. Just like you, I started getting strange error messages and different results for the same work.
I should write a post about that.
September 21st, 2007 at 8:29 am
You need to be in the advanced user mode to see the mentioned field Tim. That is the little guy icon should not wear professor suits (weird ergonomics here but..)
Nice work Tom, thanks
September 21st, 2007 at 12:35 pm
Thanks Rapidminer User!
November 6th, 2007 at 10:09 am
I am using RapidMiner and got the same results you got with Yale.
November 6th, 2007 at 10:16 am
I did get an error however that says that the PerformanceEvaluator is now deprecated and says: “Please use the operators BasicPerformance, RegressionPerformance, ClassificationPerformance, or BinominalClassificationPerformance instead.”
Any recommendations?
November 6th, 2007 at 10:56 am
Glad to hear it!
November 6th, 2007 at 10:59 am
Try using the ClassificationPerformance operator.
November 6th, 2007 at 10:59 am
The ClassificationPerformance operator seems to be the only one that has all of the same settings that you selected with the PerformanceEvaluator. It ran without any errors.
November 25th, 2007 at 6:03 pm
Hi :-),
congrats to these great site, I have founded also very usefull!!!
Do anybody know if RapidMiner can deal with date format?
For example, if I’m using the Process Wizard, I can tell RapidMiner that my value type is should be of date, but in this case it is not possible to load more then 120 examples. If I use the ExcelExampleSource operator I can’t specify which value type should be of date :(((.
Thanks,
Cristina
November 25th, 2007 at 6:30 pm
Cristina,
I usually set the date column as the ID column.
I never really had to set the date format per se but let me check it out.
December 29th, 2007 at 2:24 pm
[...] an AI financial market model – Lesson IIIBuilding an AI financial market model – Lesson IIBuilding an AI financial market model – Lesson IVUnderstanding Fuzzy Trend Following in ExcelBuilding an AI financial market model – Lesson [...]
December 31st, 2007 at 2:03 pm
[...] an AI financial market model – Lesson IIIBuilding an AI financial market model – Lesson IIBuilding an AI financial market model – Lesson IVUnderstanding Fuzzy Trend Following in ExcelBuilding an AI financial market model – Lesson VMonte [...]
September 23rd, 2008 at 1:25 am
I’m a data analytics professional from Cognizant Technolgy Solutions, India. I wanted to break from the use of licensed data mining software for my anti-fraud analytics in American health insurance. I was evaluating Rapid miner over the past one week and almost gave up. Your posts have changed it all. Now I know how to progress with my analyses in Rapid Miner. Thank You.
September 23rd, 2008 at 11:53 am
Prakash,
I’m glad that my tutorials could be of help.
October 17th, 2008 at 6:48 pm
Hi,Tom! It’s me again!
How can we take a look at the model? For example, is there a way to visualize it like a y=x^2+x+b or something?
October 19th, 2008 at 8:03 am
Pathros: That’s a good question. I suspect for the linear regression operators you can do that by creating a results file. Look in the Root Operator and fill out where you want the results file to be saved. It should be a text file.
I don’t know about neural net operators (which are non-linear), I’d have to check that out and get back to you.
November 14th, 2008 at 10:53 pm
Thanks for your tutorial on Financial AI Modeling on RapidMiner. I have one question, about where you say to set a breakpoint. Can you be more specific on where to set it, how to set it and what kind of breakpoint? Thanks.
November 14th, 2008 at 11:06 pm
I selected ExampleVisualiser, right clicked it and selected Breakpoint After. I think this is right.
November 16th, 2008 at 4:50 pm
Alex: Great, I’m glad you found it.