Building an AI financial market model – Lesson IV

Welcome back! In this lesson we will set our preferences to make sure everything loads in correctly before we create the model. By now, you should’ve read and built the experiment framework, as described in Lesson III. If you haven’t, then what I’ll post here might not make a lot of sense.

We’ll cover 4 items in this lesson:

  1. Data Loading Preferences
  2. Model Writing Preferences
  3. Performance Preferences
  4. Run the experiment

Data Loading Preferences

Before you run the experiment for the first time, you have tell it where to find your data. If you click on the ExcelExampleSource operator, you’ll see the following preferences and toggle box.

Data Loader Pref

I highlighted the important preferences with red dots in the above image so to avoid confusion. First select your data spreadsheet (Gold Final Input.xls) from your file folders, click on the “first_row_as_names“, enter the number 9 into the label_column field, and enter the number 1 into the id_column field.

It’s important that you get this step right. What you are doing is telling the experiment where your output variable is located and what your reference id is. The label_column field is the Excel column number of your output variable and the id_column field should be your date column number. Remember this because you’ll have to fill these fields in for your other experiments.

Data Loader Pref2

Next, you should create a breakpoint in the experiment which is nothing more than a pause in the experiment’s run. We’re doing this because we want the experiment to pause right after its loaded in the data. Why do we do this? By creating a breakpoint at this point in the experiment, you can inspect the loaded in data and make sure the experiment is reading in your output variable correctly.

Tip: You can skip this step but I highly advise that you don’t. You can create breakpoints at any step in the experiment if you choose but its more valuable during the data loading stage.

Model Saving Preferences

Scroll down to the ModelWriter operator and click on it. You’ll see only one field that will allow you select the path location to save your model. Click on it > select your data directory > type “gold_final.mod” and hit enter. Done!

Data Model Pref

Performance Preferences

Now we reach the final step, the setting of the performance preferences. Scroll down to the Performance Evaluator operator and click on it. You should see several fields available with check boxes. Scroll down and check the absolute error, relative error, correlation, square correlation, accuracy, and classification error boxes. Make sure the field with the pull down menu is set at correlation. Refer to the image below for the setup.

Data Preformance Pref

You’re done now. Let’s run the experiment!

Run the Experiment

This is the best part, all your hard work is about to pay off! Find the “play” button and click it!

Yale Run

The experiment should load your data in flash and then reach the breakpoint we discussed about. The experiment will automatically switch to the results screen which should look like this:

Data Loader Results

This is where the fun in data analysis begins! This results screen (only if you used the breakpoint) will tell you what the model sees as your output variable (label column). If its not GC Trend, then press the stop button and go back to the ExcelExampleSource operator and check your preferences.

Take a moment and click on the “plot view” option. Here you can create scatter plots, self organizing maps, or historgrams to your heart’s content. Take a moment and create a scatter plot, choose whatever you want for the X, Y, and Point Colors. YALE should automatically create a plot for you with several dots. These dots are from your id_column preference, in this case the date.

Remember we added in the data visualization operator? Doing this allows us to click on anyone of those scatter points and find out more about that data point. Adding this operator lets you determine that data composition of outliers and or specific information about a data point of your choosing.

Data Loader Results2

When you’re all done, you’ll have to resume the experiment. Click on the resume button.

Yale Resume

Now the experiment will create the model and determine its performance. This step could take a few minutes, depending on the size of your data. While you’re waiting, take a moment to subscribe to my RSS feed (shameless plug).

When the experiment finishes you should see the information in the results tab be replaced with the following screen:

Performance Output

I’m not going to discuss the importance of the statistical measures here but I will tell you that in building a classification model, like this, a high correlation is good. The correlation can be positive or negative and the closer it is to 1 (or -1) the better.

Congratulations! You’ve finished your first YALE experiment and build your first model! In Lesson V, I will show you how to build a prediction experiment and we’ll finally predict some current trends for Gold.

As always, if you have any questions regarding this lesson or the topics covered so far, please leave a comment or email me.

  • http://www.tanakasite.com Ernst

    Thx for your explanation, very clear.
    I managed to built my first model in one evening.

    Quite a simple one.

    5 year of NDX and VXN data and a prediction of the NDX in 30days

    I am very much interested and will work on it a bit more.

    Thank you,
    Ice

  • http://www.neuralmarkettrends.com Tom

    Ernst,

    I’m happy to hear that! I’m glad I could help. Let me know if you need to trouble shoot it or want to see a post about anything in particular.

    How did you build the prediction experiment? I haven’t posted about it yet.

  • Tim

    Hello Tom,

    Thanks for all the great information in this blog! I wanted ask you a few questions about building an AI financial market model. I went through your lesson last night and found some differences with my ExcelExampleSource operator and the one in your lesson. I didn’t see fields for id_column and datamanagement (double_array) in rapid miner. Are there minor differences between YALE and rapid miner?

    I noticed that my results were the same as yours after I ran the experiment but I’m getting some sort of “ID error” when I try to resume the experiment after the breakpoint. I’m also having trouble finding a resume experiment button.

    Any information would be appreciated. Thanks

  • admin

    Tim,

    There are bug issues with the RapidMiner upgrade and I’ve since gone back to Yale version 3.4. Just like you, I started getting strange error messages and different results for the same work.

    I should write a post about that.

  • rapidminer user

    You need to be in the advanced user mode to see the mentioned field Tim. That is the little guy icon should not wear professor suits (weird ergonomics here but..)

    Nice work Tom, thanks

  • http://www.neuralmarkettrends.com Tom

    Thanks Rapidminer User!

  • Shane B

    I am using RapidMiner and got the same results you got with Yale.

  • Shane B

    I did get an error however that says that the PerformanceEvaluator is now deprecated and says: “Please use the operators BasicPerformance, RegressionPerformance, ClassificationPerformance, or BinominalClassificationPerformance instead.”

    Any recommendations?

  • http://www.neuralmarkettrends.com Tom

    Glad to hear it!

  • http://www.neuralmarkettrends.com Tom

    Try using the ClassificationPerformance operator.

  • Shane B

    The ClassificationPerformance operator seems to be the only one that has all of the same settings that you selected with the PerformanceEvaluator. It ran without any errors.

  • Cristina

    Hi :-),

    congrats to these great site, I have founded also very usefull!!!

    Do anybody know if RapidMiner can deal with date format?
    For example, if I’m using the Process Wizard, I can tell RapidMiner that my value type is should be of date, but in this case it is not possible to load more then 120 examples. If I use the ExcelExampleSource operator I can’t specify which value type should be of date :(((.

    Thanks,
    Cristina

  • http://www.neuralmarkettrends.com Tom

    Cristina,
    I usually set the date column as the ID column.

    I never really had to set the date format per se but let me check it out.

  • Pingback: Building an AI financial market model - Lesson V | Neural Market Trends

  • Pingback: Building an AI financial market model - Lesson II | Neural Market Trends

  • Prakash Sridharan

    I’m a data analytics professional from Cognizant Technolgy Solutions, India. I wanted to break from the use of licensed data mining software for my anti-fraud analytics in American health insurance. I was evaluating Rapid miner over the past one week and almost gave up. Your posts have changed it all. Now I know how to progress with my analyses in Rapid Miner. Thank You.

  • http://www.neuralmarkettrends.com Tom

    Prakash,
    I’m glad that my tutorials could be of help.

  • Pathros

    Hi,Tom! It’s me again!
    How can we take a look at the model? For example, is there a way to visualize it like a y=x^2+x+b or something?

  • Tom

    Pathros: That’s a good question. I suspect for the linear regression operators you can do that by creating a results file. Look in the Root Operator and fill out where you want the results file to be saved. It should be a text file.

    I don’t know about neural net operators (which are non-linear), I’d have to check that out and get back to you.

  • Alex

    Thanks for your tutorial on Financial AI Modeling on RapidMiner. I have one question, about where you say to set a breakpoint. Can you be more specific on where to set it, how to set it and what kind of breakpoint? Thanks.

  • Alex

    I selected ExampleVisualiser, right clicked it and selected Breakpoint After. I think this is right.

  • Tom

    Alex: Great, I’m glad you found it.