Welcome back! In this lesson we will set our preferences to make sure everything loads in correctly before we create the model. By now, you should’ve read and built the experiment framework, as described in Lesson III. If you haven’t, then what I’ll post here might not make a lot of sense.

We’ll cover 4 items in this lesson:

  1. Data Loading Preferences
  2. Model Writing Preferences
  3. Performance Preferences
  4. Run the experiment

Data Loading Preferences

Before you run the experiment for the first time, you have tell it where to find your data. If you click on the ExcelExampleSource operator, you’ll see the following preferences and toggle box.

Data Loader Pref

I highlighted the important preferences with red dots in the above image so to avoid confusion. First select your data spreadsheet (Gold Final Input.xls) from your file folders, click on the “first_row_as_names“, enter the number 9 into the label_column field, and enter the number 1 into the id_column field.

It’s important that you get this step right. What you are doing is telling the experiment where your output variable is located and what your reference id is. The label_column field is the Excel column number of your output variable and the id_column field should be your date column number. Remember this because you’ll have to fill these fields in for your other experiments.

Data Loader Pref2

Next, you should create a breakpoint in the experiment which is nothing more than a pause in the experiment’s run. We’re doing this because we want the experiment to pause right after its loaded in the data. Why do we do this? By creating a breakpoint at this point in the experiment, you can inspect the loaded in data and make sure the experiment is reading in your output variable correctly.

Tip: You can skip this step but I highly advise that you don’t. You can create breakpoints at any step in the experiment if you choose but its more valuable during the data loading stage.

Model Saving Preferences

Scroll down to the ModelWriter operator and click on it. You’ll see only one field that will allow you select the path location to save your model. Click on it > select your data directory > type “gold_final.mod” and hit enter. Done!

Data Model Pref

Performance Preferences

Now we reach the final step, the setting of the performance preferences. Scroll down to the Performance Evaluator operator and click on it. You should see several fields available with check boxes. Scroll down and check the absolute error, relative error, correlation, square correlation, accuracy, and classification error boxes. Make sure the field with the pull down menu is set at correlation. Refer to the image below for the setup.

Data Preformance Pref

You’re done now. Let’s run the experiment!

Run the Experiment

This is the best part, all your hard work is about to pay off! Find the “play” button and click it!

Yale Run

The experiment should load your data in flash and then reach the breakpoint we discussed about. The experiment will automatically switch to the results screen which should look like this:

Data Loader Results

This is where the fun in data analysis begins! This results screen (only if you used the breakpoint) will tell you what the model sees as your output variable (label column). If its not GC Trend, then press the stop button and go back to the ExcelExampleSource operator and check your preferences.

Take a moment and click on the “plot view” option. Here you can create scatter plots, self organizing maps, or historgrams to your heart’s content. Take a moment and create a scatter plot, choose whatever you want for the X, Y, and Point Colors. YALE should automatically create a plot for you with several dots. These dots are from your id_column preference, in this case the date.

Remember we added in the data visualization operator? Doing this allows us to click on anyone of those scatter points and find out more about that data point. Adding this operator lets you determine that data composition of outliers and or specific information about a data point of your choosing.

Data Loader Results2

When you’re all done, you’ll have to resume the experiment. Click on the resume button.

Yale Resume

Now the experiment will create the model and determine its performance. This step could take a few minutes, depending on the size of your data. While you’re waiting, take a moment to subscribe to my RSS feed (shameless plug).

When the experiment finishes you should see the information in the results tab be replaced with the following screen:

Performance Output

I’m not going to discuss the importance of the statistical measures here but I will tell you that in building a classification model, like this, a high correlation is good. The correlation can be positive or negative and the closer it is to 1 (or -1) the better.

Congratulations! You’ve finished your first YALE experiment and build your first model! In Lesson V, I will show you how to build a prediction experiment and we’ll finally predict some current trends for Gold.

As always, if you have any questions regarding this lesson or the topics covered so far, please leave a comment or email me.