09
May
2007
Posted by Tom as Neural Nets, Tutorials
Welcome back! In this lesson we will set our preferences to make sure everything loads in correctly before we create the model. By now, you should’ve read and built the experiment framework, as described in Lesson III. If you haven’t, then what I’ll post here might not make a lot of sense.
We’ll cover 4 items in this lesson:
Data Loading Preferences
Before you run the experiment for the first time, you have tell it where to find your data. If you click on the ExcelExampleSource operator, you’ll see the following preferences and toggle box.
I highlighted the important preferences with red dots in the above image so to avoid confusion. First select your data spreadsheet (Gold Final Input.xls) from your file folders, click on the “first_row_as_names“, enter the number 9 into the label_column field, and enter the number 1 into the id_column field.
It’s important that you get this step right. What you are doing is telling the experiment where your output variable is located and what your reference id is. The label_column field is the Excel column number of your output variable and the id_column field should be your date column number. Remember this because you’ll have to fill these fields in for your other experiments.
Next, you should create a breakpoint in the experiment which is nothing more than a pause in the experiment’s run. We’re doing this because we want the experiment to pause right after its loaded in the data. Why do we do this? By creating a breakpoint at this point in the experiment, you can inspect the loaded in data and make sure the experiment is reading in your output variable correctly.
Tip: You can skip this step but I highly advise that you don’t. You can create breakpoints at any step in the experiment if you choose but its more valuable during the data loading stage.
Model Saving Preferences
Scroll down to the ModelWriter operator and click on it. You’ll see only one field that will allow you select the path location to save your model. Click on it > select your data directory > type “gold_final.mod” and hit enter. Done!
Performance Preferences
Now we reach the final step, the setting of the performance preferences. Scroll down to the Performance Evaluator operator and click on it. You should see several fields available with check boxes. Scroll down and check the absolute error, relative error, correlation, square correlation, accuracy, and classification error boxes. Make sure the field with the pull down menu is set at correlation. Refer to the image below for the setup.
You’re done now. Let’s run the experiment!
Run the Experiment
This is the best part, all your hard work is about to pay off! Find the “play” button and click it!
The experiment should load your data in flash and then reach the breakpoint we discussed about. The experiment will automatically switch to the results screen which should look like this:
This is where the fun in data analysis begins! This results screen (only if you used the breakpoint) will tell you what the model sees as your output variable (label column). If its not GC Trend, then press the stop button and go back to the ExcelExampleSource operator and check your preferences.
Take a moment and click on the “plot view” option. Here you can create scatter plots, self organizing maps, or historgrams to your heart’s content. Take a moment and create a scatter plot, choose whatever you want for the X, Y, and Point Colors. YALE should automatically create a plot for you with several dots. These dots are from your id_column preference, in this case the date.
Remember we added in the data visualization operator? Doing this allows us to click on anyone of those scatter points and find out more about that data point. Adding this operator lets you determine that data composition of outliers and or specific information about a data point of your choosing.
When you’re all done, you’ll have to resume the experiment. Click on the resume button.
Now the experiment will create the model and determine its performance. This step could take a few minutes, depending on the size of your data. While you’re waiting, take a moment to subscribe to my RSS feed (shameless plug).
When the experiment finishes you should see the information in the results tab be replaced with the following screen:
I’m not going to discuss the importance of the statistical measures here but I will tell you that in building a classification model, like this, a high correlation is good. The correlation can be positive or negative and the closer it is to 1 (or -1) the better.
Congratulations! You’ve finished your first YALE experiment and build your first model! In Lesson V, I will show you how to build a prediction experiment and we’ll finally predict some current trends for Gold.
As always, if you have any questions regarding this lesson or the topics covered so far, please leave a comment or email me.
15 Responses
Ernst
May 14th, 2007 at 9:15 pm
1Thx for your explanation, very clear.
I managed to built my first model in one evening.
Quite a simple one.
5 year of NDX and VXN data and a prediction of the NDX in 30days
I am very much interested and will work on it a bit more.
Thank you,
Ice
Tom
May 15th, 2007 at 8:02 am
2Ernst,
I’m happy to hear that! I’m glad I could help. Let me know if you need to trouble shoot it or want to see a post about anything in particular.
How did you build the prediction experiment? I haven’t posted about it yet.
Tim
June 21st, 2007 at 9:10 am
3Hello Tom,
Thanks for all the great information in this blog! I wanted ask you a few questions about building an AI financial market model. I went through your lesson last night and found some differences with my ExcelExampleSource operator and the one in your lesson. I didn’t see fields for id_column and datamanagement (double_array) in rapid miner. Are there minor differences between YALE and rapid miner?
I noticed that my results were the same as yours after I ran the experiment but I’m getting some sort of “ID error” when I try to resume the experiment after the breakpoint. I’m also having trouble finding a resume experiment button.
Any information would be appreciated. Thanks
admin
June 21st, 2007 at 9:12 am
4Tim,
There are bug issues with the RapidMiner upgrade and I’ve since gone back to Yale version 3.4. Just like you, I started getting strange error messages and different results for the same work.
I should write a post about that.
rapidminer user
September 21st, 2007 at 8:29 am
5You need to be in the advanced user mode to see the mentioned field Tim. That is the little guy icon should not wear professor suits (weird ergonomics here but..)
Nice work Tom, thanks
Tom
September 21st, 2007 at 12:35 pm
6Thanks Rapidminer User!
Shane B
November 6th, 2007 at 10:09 am
7I am using RapidMiner and got the same results you got with Yale.
Shane B
November 6th, 2007 at 10:16 am
8I did get an error however that says that the PerformanceEvaluator is now deprecated and says: “Please use the operators BasicPerformance, RegressionPerformance, ClassificationPerformance, or BinominalClassificationPerformance instead.”
Any recommendations?
Tom
November 6th, 2007 at 10:56 am
9Glad to hear it!
Tom
November 6th, 2007 at 10:59 am
10Try using the ClassificationPerformance operator.
Shane B
November 6th, 2007 at 10:59 am
11The ClassificationPerformance operator seems to be the only one that has all of the same settings that you selected with the PerformanceEvaluator. It ran without any errors.
Cristina
November 25th, 2007 at 6:03 pm
12Hi :-),
congrats to these great site, I have founded also very usefull!!!
Do anybody know if RapidMiner can deal with date format?
For example, if I’m using the Process Wizard, I can tell RapidMiner that my value type is should be of date, but in this case it is not possible to load more then 120 examples. If I use the ExcelExampleSource operator I can’t specify which value type should be of date :(((.
Thanks,
Cristina
Tom
November 25th, 2007 at 6:30 pm
13Cristina,
I usually set the date column as the ID column.
I never really had to set the date format per se but let me check it out.
Building an AI financial market model - Lesson V | Neural Market Trends
December 29th, 2007 at 2:24 pm
14[...] an AI financial market model - Lesson IIIBuilding an AI financial market model – Lesson IIBuilding an AI financial market model - Lesson IVUnderstanding Fuzzy Trend Following in ExcelBuilding an AI financial market model - Lesson [...]
Building an AI financial market model - Lesson II | Neural Market Trends
December 31st, 2007 at 2:03 pm
15[...] an AI financial market model - Lesson IIIBuilding an AI financial market model - Lesson IIBuilding an AI financial market model - Lesson IVUnderstanding Fuzzy Trend Following in ExcelBuilding an AI financial market model - Lesson VMonte [...]
RSS feed for comments on this post · TrackBack URI
Leave a reply
previous post: New Quants on the Block
next post: S&P500 SPDR’s - (SPY)
to top of page...