Building an AI financial market model - Lesson V
***The Downloads For This Lessons Have Been Updated for RapidMiner4.0***
In this lesson we will build a prediction experiment for you to test new data against and predict if Gold’s trend is UP or DOWN! Since I put you through four grueling lesson previously, I’m going to take it easy on you here and give you the completed model.
A prediction model, for trend analysis, is typically the same for every one you develop. All it really requires is a data loader for your test data, an experiment visualizer (this is not mandatory but highly suggested), a model loader, and finally a model applier. That’s it, only 4 operators are needed to complete your entire experiment!
Step 1 - Build the Prediction Experiment
This is where you get off easy, I’ve built the prediction model already for you. All you have to do is download the predictive experiment (in zip format: GA-Gold-prediction) and then load it into YALE. (click here for the RapidMiner Compatible Prediction experiment)
Step 2 - Load in the Test Data & Model
Get your test data and load into YALE through the ExcelExampleSource operator. For this example we will use the following Excel spreadsheet: GA-Gold-Test. Select this spreadsheet in the operator and make sure the label field is set at zero (your test data should not have your output label in it because this is what you are trying to predict, it should therefore be zero) and change the id field to 1 (this is for your date column).
Next load in your model (gold_final.mod) that you created in Lesson IV and that I so graciously added to the GA-Gold-Prediction zip file. (click here for the RapidMiner Compatible version)
Step 3 - Run the Experiment
Click Run and YALE will spit back the results in a few seconds. Click on the data view in the Results Tab and you’ll see your predicted trend value (UP or DOWN). Congratulations! You’ve built your first trend model and predicted the trend in Gold!
Conclusion
I hope that you enjoyed these lessons and learned a little bit about the powerful ability of YALE. When I started learning YALE, no one was there to help me. It was one heck of learning curve but after trail and error, it got significantly easier for me to create models. I would guess that it took me 9 months of learning to get to where I am now, it took you 5 lessons!
If you decide to continue with building models, I highly suggest that you continue using YALE, you’ll be surprised at what it can do. Make sure you tip me when you make that cool million trading Gold. ![]()
Future Lessons
In future lessons I hope to show my readers how to build a model driven by genetic feature selection and an event driven model. If you have any questions or comments, please email me or drop me a comment. If you enjoyed these lessons, please consider subscribing to my RSS feed and passing the word about Neural Market Trends around.
***Please note, the information I’m sharing with you is very valuable and FREE, there’s nothing like this available on the Internet that I know of. Please consider becoming an RSS Reader or buying something from my sponsors.***



May 19th, 2007 at 4:57 pm
Nice one Tom! Very helpful tutorials. But one thing.. could you shed some light on the theory behind using NNs for predicting prices? Any useful references out there? Thanks, P.
May 20th, 2007 at 6:36 am
Piwi: Check out this Wikipeida link: http://en.wikipedia.org/wiki/Neural_net
It has the history behind it. The short answer is that Neural Nets are super statistical algorithms that help people forecast just about anything. Some people want to forecast how many jeans they sell at Walmart, others want to know about the trends of currencies. Hope that helps!
July 13th, 2007 at 2:48 pm
Great Work Tom!
Thanks for all your hard work. I went over all the five tutorials and now will try to figure out why SVM and others are not working on the data.
But you provided me with a great tutorial and momentum.
~Sarah
July 13th, 2007 at 2:51 pm
Sarah: Are you using Rapid Miner 4.0beta? If so, you will get different results due to several bugs in the new version.
September 13th, 2007 at 10:47 pm
Do you have any results on performance vs actual?
What is an event driven model?
September 14th, 2007 at 4:57 am
bebo, now I don’t have that posted but I suppose I can do something like that. Look for a future post about it.
An event driven model would be like trying to understand how earnings or other news events affect your particular model. Something like that would be really useful to forex markets or individual stocks.
September 15th, 2007 at 10:02 pm
Thanks Tom. Now I have clearer picture with the rapidminer stuff. I’m planning to build a model to predict a manufacturing yield. probably will need your help in future.
September 17th, 2007 at 4:59 am
Sure thing Ciku, drop me an email.
September 23rd, 2007 at 12:00 am
Tom,
In order to predict yield outcome say “GOOD” or “BAD”, without obtaining the current attributes first, we are not able to predict them.
How to predict the future outcome without first waiting for all the current attributes comes in?
In manufacturing, once you finished gathering all the current data for the attributes, its already too late especially when data collection comes from the end of the manufacturing line.
Do we need to predict all the future attributes first and then predict the class outcome?
Any idea to share?
September 23rd, 2007 at 7:50 am
Ciku, in time series data, like stock data, you can create a “one step ahead” forecast and tag your output to give you a “Buy” or “Sell” signal. Modeling time series is different than modeling defects and quality in a manufacturing process.
I don’t know exactly know what you are manufacturing or the kind of data you’ve compiled, so I’m going to make some assumptions. I would suggest that you create different “what if” scenarios with your prediction set. Create plausible data sets and run them through the model and then graph the results. The way to do it is to set some attributes constant and then vary the other in the prediction set.
You should see linear or nonlinear relationships that will help you in figuring out where your product defects are.
September 23rd, 2007 at 7:54 am
Let me follow up. If you’re running a process, say making widgets, you have many variables in your control. I’m guessing things like plastic temperature, belt speed, machine speed, machine type, etc. Those variables you can set constant or vary in prediction set.
The prediction set should give you a clue as to how to set up your production line to insure max productivity.
So in essence you don’t have to wait for the widgets to be made and then gather data at the end of the process, you want to know how to “tweak” the manufacturing process in the beginning.
Hope that helps, if you need more detailed help, drop me an email. I do provide consultant services.
September 23rd, 2007 at 10:35 am
Tom,
The data logged by a tester machine at the end of the manufacturing line. Hence,we have time series data with quite a numbers of attributes.It is very unfortunate that every single linked process to process before the tester machines are not a closed loop system, thus the data is not easily available to relate with yield.
I did run rapidminer with selected numbers of attributes which normally monitored by engineers. The result looks good with quit impressive prediction accuracy, classification error and correlation.
But as I mentioned previously the model is not actually predicting the future. Talking about “one step ahead” forecast, I thinking of forecasting a step ahead of each attributes first which involving numerical value, and then predicting the yield outcome “GOOD” or “BAD”.
September 23rd, 2007 at 7:02 pm
Ciku: I think I know what you want to do. You want to find the optimal mix (and/or range) of attributes that if applied to your production process would yield a Good or Bad outcome. Right?
If that’s the case, and you have the time series data with several input variables, you could use Rapidminer’s weighting algorithms (Evolutionary or other). You then apply these weights to your attributes and your data set.
You would need to set some measure that would tell you what is Good or Bad. Rapidminer can then run through several scenarios by altering your attributes weights to get the best mix of Good or Bad results. This should help you in predicting the yield/outcome of your manufacturing process.
September 23rd, 2007 at 7:06 pm
Ciku, I wrote about RapidMiner’s Evolutionary and Genetic Algorithms, check it out: http://www.neuralmarkettrends.com/2007/07/30/using-genetic-and-evolutionary-algorithms-to-build-a-trading-model/
September 25th, 2007 at 9:37 am
You are right, Tom. Ok, Let me check it out first.
Thank you.
November 12th, 2007 at 9:14 am
Is there any way to plot the data to show the date on the x axis, gold on the y axis and then color it based on the prediction? When I try to plot, it doesn’t let me select the date for the x axis.
November 12th, 2007 at 10:29 am
Shane: As far as I know YALE/Rapidminer can’t plot those labels out. What I usually do is save the results as a DAT file and import them into Excel and the chart the results.
February 15th, 2008 at 9:36 pm
Thanks for the tutorials. One question: In the “Data View” under the “Results Tab”, is the predicted value in Row 1? Should it be UP in this case?
Thanks again.
February 21st, 2008 at 5:56 am
The predictions will be in first group of columns, usually after the ID.
The prediction maybe be DOWN due to the data input. The idea behind this model is to see if the inputs of this week can “flip” the trend. It gives the modeler/trader ample warning that something *might* happen based on the market environment.
March 16th, 2008 at 8:52 pm
Hi Tom,
I have just jumped into this series of tutorials at this lesson without completing the prior lessons. When attempting to load the model file, I get the error:
Error in: ModelLoader (ModelLoader) Could not read file ‘C:\Users\Jeff\Downloads\rapidminer\ga-gold-prediction\gold_final.mod’: Cannot read from XML stream, wrong format: : only whitespace content allowed before start tag and not \uac (position: START_DOCUMENT seen \uac… @1:1)
I opened the file gold_final.mod in a text editor expecting to see something that looks like XML, but found that gold_final.mod looks more like binary data than anything else. Anyways, the tutorial isn’t working for me very easily. I will go back to lesson one and develop my own model, and I think that will probably work better. Any hints as to how to resolve the above error, for those who want to jump in to this series of tutorials without completing lessons 1 to 4?
March 16th, 2008 at 8:55 pm
Hi Tom,
Just following up. If I had to guess at the contents of gold_final.mod, I would guess that it’s java bytecode. It looks a lot like a .class or .jar file when examined in a text editor. However, it seems like RapidMiner is expecting gold_final.mod to contain data in XML format. Just an observation.
March 16th, 2008 at 9:11 pm
Jeff: These tutorials were written for YALE v3.4, the new version called RapidMiner has a new structure which makes the data files presented in the example unreadable. It’s not 100% backward compatible.
April 7th, 2008 at 8:45 am
Hi,
I found your articles very interesting. I would have a question regarding RapidMiner:
How can we us it to predict a numerical value, is this possible?
April 7th, 2008 at 8:53 am
Ax: Yes it is possible but I advise against it. Forecasting closing stock prices is very dicey, directions are better.
If you want to forecast stock prices, I’d suggest using the Multilayer Prectpron operator instead of a Classifier operator
June 5th, 2008 at 2:00 am
Hi Tom,
I recently started looking into predictions using rapidminer after having had a bad spell in the forex market. I came across your tutorial and it helped me a lot when starting out. In the mean time, I have also been learning a lot about rapidminer and structuring prediction models (specifically time-series models). I have built, actually I am in the process of building a EURUSD prediction model, and I was wondering if anybody in this community might be interested in forming a development team and then beyond the development stage also form kind of a trading team.
I will upload all the data, processes, and tests I have built or done if there is any interest at all.
I am looking forward your responses.
PS: I’m sure I’m leaving this comment at the wrong place, but I didn’t know where to start a new thread.
Deon
June 5th, 2008 at 5:18 am
Hi Deon,
At one time I had done the exact same thing you wanted to do and we all started energetically but then fizzled out. I’d be interested to do it again but I wouldn’t be trading because I’m too busy right now. I could be more of a reviewer and suggestion maker if you like.
You can access the project collaborating site here, http://www.neuralmarkettrends.com/collaborate
Email me via the contact form and I’ll give you access.
June 5th, 2008 at 8:41 am
Deon/Tom - I would be interested in the EURUSD Rapid Miner development work. I am also in the beginning stages of developing Rapid Miner based systems & have been trading currencies for a couple years now.
Thanks,
Eric
June 9th, 2008 at 12:33 am
In response to EUR/USD pred mod. - A great idea and will have my full attention.
June 9th, 2008 at 4:48 am
David, email me your email address and I’ll send you the login information for the project collaboration site.
August 12th, 2008 at 4:07 pm
Hi Tom,
I am interested on the EUR/USD. How do I contact?.
September 1st, 2008 at 12:26 pm
Hi Tom,
anobody working on the EUR/USD prediction model? I am interested on it.
September 5th, 2008 at 11:24 pm
Hi Tom
I need help on detailed steps of how to prepare/format input for training the model on past 3 years (e.g 2004, 2005, 2006) and predicting temperature for next year (e.g 2007). I have data in the following format
Temperature
Day of the year 2004 2005 2006
1 18.6 22.4 10.9
2 17.8 17.5 13.5
3 18.0 17.2 18.1
4 19.5 18.2 16.1
5 20.8 13.4 18.4
6 19.0 17.0 18.0
7 17.2 19.6 19.3
8 18.0 16.0 17.0
9 17.6 20.5 19.0
I have gone through your lectures, those are very informative and i learned alot but i am stil unable to build my prediction model using my time series data.
Thank you very munch in advance for your kind guideline and coopeartion.
regards
Adeel
September 5th, 2008 at 11:40 pm
Can I have Email Address of TOM
September 6th, 2008 at 5:44 pm
Adeel: Check your email, I replied to your query this morning. Thanks for reaching out to me!
September 29th, 2008 at 6:56 pm
Hi Tom,
I’ve gone through your tutorials and also downloaded rapid miner 4.2 and tried to model lesson 4. Different learners were used such as decision trees instead of suggested nearest neighbour. But I was trying it out on some telco data. Do you know if the tool can handle large amounts of data ? I’m also new to the predictive modelling environment and would love to learn more. Do you know of any good sites or books you can recommend to a first timer like myself ? I guess I’m asking the question of the ‘how do we interpret results from the model’ ?
September 30th, 2008 at 5:24 am
Vince: Rapidminer can handle large amount of datasets, the only limitation you have is your hardware. I highly recommend Data Mining and Business Productivity by Stephan Kudyba and Richard Hoptroff. It will give you a straight forward discussion on data mining, neural nets, and how to interpret the results.
http://www.amazon.com/exec/obidos/ASIN/1930708033/qid=981434492/104-0089139-4760725
September 30th, 2008 at 2:51 pm
Tom,
Thanks for the quick response. I’ll have to get the book you recommended.