Building an AI financial market model – Lesson IV

In Lesson 3, I introduced the Sliding Window Validation operator to test how well we can forecast a trend in a time series.  Our initial results are very poor, we were able to forecast the trend with an average accuracy of 55.5%. This is fractionally better than a simple coin flip! In this updated lesson I will introduce the ability of Parameter Optimization in RapidMiner to see if we can forecast the trend better.

Parameter Optimization

We begin with the same process in Lesson 3 but we introduce a new operator called the Optimize Parameter (Grid) operator. We also do some house cleaning for putting this process into production.

The Optimize Parameter (Grid) operator let’s you do some amazing things, it lets you vary – by your predefined limits – parameter values of different operators.  Any operator that you put inside this operator’s subprocess can have their parameters automatically iterated over and the overall performance measured.  This is a great way to fine tune and optimize models for your analysis and ultimately for production.

For our process, we want to vary the training window width, testing window width, training step width on the Sliding Window Validation operator, the C and gamma parameter of the SVM machine learning algorithm, and the forecasting horizon on the Forecast Trend Performance operator. We want to test all combinations and ultimately determine the best combination of these parameters that will give us the best tuned trend prediction.

Note: I run a weekly optimization process for my volatility trend predictions. I’ve noticed depending on market activity, the training width of the Sliding Window Validation operator needs to be tweaked between 8 and 12 weeks.

I also add a few Store operators to save the Performance and Weights of the Optimize Selection operator, and the Performance and Parameter Set of the Optimization Parameter (Grid) operator. We’ll need this data for production.

Varying Parameters Automatically

Whatever operators you put inside the Optimize Parameters (Grid) operator can have their parameters varied automatically, you just have to select which ones and set minimum and maximum values.  Just click on the Edit Parameter settings button. Once you do, you are presented with a list of available operators to vary. Select one operator and another list of available parameters is shown. Then select which parameter you want and define min/max values.

Note: If you select a lot of parameters to vary with a very large max value, you could be optimizing for hours and even days. This operator consumes your computer resources when you millions of combinations!

The Log File

The log file is a handy operator that we use in optimization because we can create a custom log file that has the values of the parameters we’re measuring and the resulting forecast performance. You just name your column and select which operator and parameter you want to have an entry for.

Pro Tip: If you want to measure the performance, make sure you select the Sliding Window Validation operator’s performance port and NOT the Forecast Trend Performance operator. Why? Because the Forecast Trend Performance operator generates several models as it slides across the time series. Some performances are better than others. The Sliding Window Validation operator averages all the results together, and that’s the measure you want!

This is a great way of seeing what initial parameter combinations are generating the best performance.  It can also be used to visualize your best parameter combinations too!

The Results

The results are point to a parameter combination of:

  • Training Window Width: 10
  • Testing Window Width: 5
  • Step Width: 4
  • C: 0
  • Gamma: 0.1
  • Horizon: 3

To generate an average Forecast Trend accuracy of 61.5%. Compared to the original accuracy, this is an improvement.

That’s the end of Lesson 4 for your first AI financial market model. You can download the above sample process here. To install it, just go to File > Import Process. Lesson 5 will be updated shortly.

This is an update to my original 2007 YALE tutorials and are updated for RapidMiner v7+. In the original set of posts I used the term AI when I really meant Machine Learning

Using Genetic and Evolutionary Algorithms to Build a Trading Model

Ugly posted a great article from the New Scientist magazine that discusses how scientists are using Genetic and Evolutionary algorithms to solve all kinds of problems. The article highlights a few uses for these algorithms such as finding the optimal hull shape for boats or determining the best design for cochlear implants.

Still though, why should you even bother using Genetic and Evolutionary algorithms in the first place? The reason why is because these algorithms use an evolutionary approach to selecting the best fit” input variables. They’ll forward project outcomes to see which evolutionary path provides the best result for your output variable by transforming the input variables. In some cases they’ll even mutate the œoffspring” to see what happens to your output!

What makes these algorithms so vastly different then just using a standard back propagation or regression algorithm is that they work by œpreprocessing” your input data and help build a highly correlated model by transforming your data in the most robust input data set it can.

Now I’ve only scratched the surface of using Genetic and Evolutionary algorithms in YALE and there’s tons more for me to learn, but I’ve used them before in my experiments have had good success. Here are some examples where I’ve used them before:

  • I’ve used them to automatically select the best inputs from a list of 100 stock symbols and data points that help best explain my single output variable
  • I’ve used them to build weights” for my fundamental data trading model (still in Beta), and
  • I’ve even crossbred ETFs to make hybrid ETFs (that’s a bit weird but I was experimenting).

All of these algorithms are found in YALEs preprocessing data section and all of them are used right after you load your data into the experiment. They then apply their algorithms and preprocess” your data before the experiment learns” a model.

To show you how easy it is to use these types of algorithms in YALE/Rapidminer, I’m posting a small example of how to build a trading model using fundamental data. I won’t go into detail about all the different settings for the EA and we’ll just use the default settings for this experiment. First download the following files:

The Excel data file: Fundamental Data

The YALE XML file (in zip format): EA Experiment

EA Experiment.

Open YALE and load in your XML file and then the Excel Data file. Your experiment should look something like the image to the left.

This experiment takes fundamental data on several stocks such as book value, dividends payout, and EBITA, and tries to explain the output variable œ1 Year Target Price” (or some other measure of your choosing).

EA Weights.

If you have 100 years of time to spare you can assign weights to each of your input variables and then vary them till they match your output variable. The other faster way of doing this is to let YALEs Evolutionary Weighting algorithm preprocess the data for you.

Then, the newly assigned weights are fed into the learner, in this case a SVM learner, and the model learns the relationships between the data. Once the model has finished learning, you should be left with a highly correlated model! Voila, you have now built a machine learned model using Evolutionary algorithms!

As always, if you have any questions, please email me or leave me a comment! Thanks!


Calculating Historical Volatility

 Hi there!   This is one my most popular posts and I would love it if you became an RSS Reader!

The inspiration for my S&P500 Volatility Timing model came from rereading portions of Mandelbrot’s book, The (Mis)Behavior of Markets, and trolling around the Internet for Nassim Taleb’s research work on risk. I think both guys push the envelope on truly understanding unseen risk and those pesky financial asteroids. Since my model is currently being developed, I thought it would be worth my while to truly learn and understand how historical volatility (HV) is calculated. I first searched the Internet for any free data downloads of HV but came across several pay for data download sites.

One of them,, seemed to have comprehensive data but it was expensive. One year’s worth of HV for one asset price would’ve cost me $5! So what does any engineering type person do in the face of expensive solutions? He (or she) build’s their own cheaper solution. I decided to calculate HV on my own after reading about it on wikipedia. Now, I’m submitting this analysis for peer review as I’m treading in unfamiliar waters. Please feel free to correct me if my computations or understanding of the material is wrong.

Wikipedia defines HV as:

The annualized volatility σ is the standard deviation σ of the instrument’s logarithmic returns in a year.


The generalized volatility σT for time horizon T in years is expressed as:

sigma_T = sigma sqrt{T}sigma_T = sigma sqrt{T}.

Note: There’s a flaw in the Wikipedia’s formula logic after the generalized volatility formula above as pointed out by C++ Trader (thanks for the catch). Please see the related links below for more information on the correct calculation of HV.

Note that the formula used to annualize returns is not deterministic, but is an extrapolation valid for a random walk process whose steps have finite variance.

So the first step is to calculate the S&P500′s logarithmic returns for a year. I’ll be using the weekly time series and I’ll analyze it in a handy Excel spreadsheet here: HV Example.xls

Once again I’ll turn to wikipedia for an explanation of logarithmic returns:

Academics use in their research natural log return called logarithmic return or continuously compounded return. The continuously compounded return is asymmetric thus clearly indicating that positive and negative percent returns are not equal. A 10% return results in 9.53% continuously compounded return while a -10% results in -10.53%. This clearly indicates that the investment will result in a dollar amount loss corresponding to the difference between the absolute values of the two numbers: 1% (this is an approximate equality).

  • Vi is the initial investment value
  • Vf is the final investment value

ROI_{Log} = lnleft(frac{V_f}{V_i}right)ROI_{Log} = lnleft(frac{V_f}{V_i}right).

  • ROILog > 0 is profit
  • ROILog < 0 is a loss
  • Doubling occurs when ROI_{Log}=ln(2)=69.3%ROI_{Log}=ln(2)=69.3%
  • Total loss occurs when ROI_{Log}to-inftyROI_{Log}to-infty.

This should be straightforward and I will calculate the weekly ROI for the S&P500. Why? Well I’m interested in calculating weekly HV so my Vi will be the week(1)’s closing price and Vf will be week(2)’s closing price. For the next iteration Vi will be the week(2)’s closing price and Vf will be week(3)’s closing price and so forth.

Next I created an Excel Macro that would calculate the natural log and simultaneously calculate the HV for 10, 20, and 30 days using the standard deviation of the daily logarithmic returns multiplied by 252 (see related links below).

There you have it, your very own weekly HV! Feel free to download the Excel macro and play with it. By all means, please critique my analysis and let me know if my logic is flawed! The more I learn about this, the more my ATS takes shape!

Update: The Excel Macro matches the output from for the 10, 20, and 30 day HVs. Check!