Bridgeway Ultra Small Co. Market Fund – (BRSIX) | Home | Random Thoughts About Trends

May 19, 2007

Building an AI financial market model – Lesson V

***The Downloads For This Lessons Have Been Updated for RapidMiner4.0***

In this lesson we will build a prediction experiment for you to test new data against and predict if Gold’s trend is UP or DOWN! Since I put you through four grueling lesson previously, I’m going to take it easy on you here and give you the completed model.

A prediction model, for trend analysis, is typically the same for every one you develop. All it really requires is a data loader for your test data, an experiment visualizer (this is not mandatory but highly suggested), a model loader, and finally a model applier. That’s it, only 4 operators are needed to complete your entire experiment!

Step 1 – Build the Prediction Experiment

This is where you get off easy, I’ve built the prediction model already for you. All you have to do is download the predictive experiment (in zip format: GA-Gold-prediction) and then load it into YALE. (click here for the RapidMiner Compatible Prediction experiment)

Step 2 – Load in the Test Data & Model

Get your test data and load into YALE through the ExcelExampleSource operator. For this example we will use the following Excel spreadsheet: GA-Gold-Test. Select this spreadsheet in the operator and make sure the label field is set at zero (your test data should not have your output label in it because this is what you are trying to predict, it should therefore be zero) and change the id field to 1 (this is for your date column).

Next load in your model (gold_final.mod) that you created in Lesson IV and that I so graciously added to the GA-Gold-Prediction zip file. (click here for the RapidMiner Compatible version)

Step 3 – Run the Experiment

Click Run and YALE will spit back the results in a few seconds. Click on the data view in the Results Tab and you’ll see your predicted trend value (UP or DOWN). Congratulations! You’ve built your first trend model and predicted the trend in Gold!

Conclusion

I hope that you enjoyed these lessons and learned a little bit about the powerful ability of YALE. When I started learning YALE, no one was there to help me. It was one heck of learning curve but after trail and error, it got significantly easier for me to create models. I would guess that it took me 9 months of learning to get to where I am now, it took you 5 lessons!

If you decide to continue with building models, I highly suggest that you continue using YALE, you’ll be surprised at what it can do. Make sure you tip me when you make that cool million trading Gold. :)

Future Lessons

In future lessons I hope to show my readers how to build a model driven by genetic feature selection and an event driven model. If you have any questions or comments, please email me or drop me a comment. If you enjoyed these lessons, please consider subscribing to my RSS feed and passing the word about Neural Market Trends around.

***Please note, the information I’m sharing with you is very valuable and FREE, there’s nothing like this available on the Internet that I know of.  Please consider becoming an RSS Reader or buying something from my sponsors.***

52 Responses to “Building an AI financial market model – Lesson V”

  1. Piwi said:

    Nice one Tom! Very helpful tutorials. But one thing.. could you shed some light on the theory behind using NNs for predicting prices? Any useful references out there? Thanks, P.

  2. Tom said:

    Piwi: Check out this Wikipeida link: http://en.wikipedia.org/wiki/Neural_net

    It has the history behind it. The short answer is that Neural Nets are super statistical algorithms that help people forecast just about anything. Some people want to forecast how many jeans they sell at Walmart, others want to know about the trends of currencies. Hope that helps!

  3. Sarah said:

    Great Work Tom!
    Thanks for all your hard work. I went over all the five tutorials and now will try to figure out why SVM and others are not working on the data.
    But you provided me with a great tutorial and momentum.
    ~Sarah

  4. Tom said:

    Sarah: Are you using Rapid Miner 4.0beta? If so, you will get different results due to several bugs in the new version.

  5. bebo said:

    Do you have any results on performance vs actual?

    What is an event driven model?

  6. Tom said:

    bebo, now I don’t have that posted but I suppose I can do something like that. Look for a future post about it.

    An event driven model would be like trying to understand how earnings or other news events affect your particular model. Something like that would be really useful to forex markets or individual stocks.

  7. ciku said:

    Thanks Tom. Now I have clearer picture with the rapidminer stuff. I’m planning to build a model to predict a manufacturing yield. probably will need your help in future.

  8. Tom said:

    Sure thing Ciku, drop me an email.

  9. ciku said:

    Tom,

    In order to predict yield outcome say “GOOD” or “BAD”, without obtaining the current attributes first, we are not able to predict them.

    How to predict the future outcome without first waiting for all the current attributes comes in?

    In manufacturing, once you finished gathering all the current data for the attributes, its already too late especially when data collection comes from the end of the manufacturing line.

    Do we need to predict all the future attributes first and then predict the class outcome?

    Any idea to share?

  10. Tom said:

    Ciku, in time series data, like stock data, you can create a “one step ahead” forecast and tag your output to give you a “Buy” or “Sell” signal. Modeling time series is different than modeling defects and quality in a manufacturing process.

    I don’t know exactly know what you are manufacturing or the kind of data you’ve compiled, so I’m going to make some assumptions. I would suggest that you create different “what if” scenarios with your prediction set. Create plausible data sets and run them through the model and then graph the results. The way to do it is to set some attributes constant and then vary the other in the prediction set.

    You should see linear or nonlinear relationships that will help you in figuring out where your product defects are.

  11. Tom said:

    Let me follow up. If you’re running a process, say making widgets, you have many variables in your control. I’m guessing things like plastic temperature, belt speed, machine speed, machine type, etc. Those variables you can set constant or vary in prediction set.

    The prediction set should give you a clue as to how to set up your production line to insure max productivity.

    So in essence you don’t have to wait for the widgets to be made and then gather data at the end of the process, you want to know how to “tweak” the manufacturing process in the beginning.

    Hope that helps, if you need more detailed help, drop me an email. I do provide consultant services.

  12. ciku said:

    Tom,

    The data logged by a tester machine at the end of the manufacturing line. Hence,we have time series data with quite a numbers of attributes.It is very unfortunate that every single linked process to process before the tester machines are not a closed loop system, thus the data is not easily available to relate with yield.

    I did run rapidminer with selected numbers of attributes which normally monitored by engineers. The result looks good with quit impressive prediction accuracy, classification error and correlation.

    But as I mentioned previously the model is not actually predicting the future. Talking about “one step ahead” forecast, I thinking of forecasting a step ahead of each attributes first which involving numerical value, and then predicting the yield outcome “GOOD” or “BAD”.

  13. Tom said:

    Ciku: I think I know what you want to do. You want to find the optimal mix (and/or range) of attributes that if applied to your production process would yield a Good or Bad outcome. Right?

    If that’s the case, and you have the time series data with several input variables, you could use Rapidminer’s weighting algorithms (Evolutionary or other). You then apply these weights to your attributes and your data set.

    You would need to set some measure that would tell you what is Good or Bad. Rapidminer can then run through several scenarios by altering your attributes weights to get the best mix of Good or Bad results. This should help you in predicting the yield/outcome of your manufacturing process.

  14. Tom said:

    Ciku, I wrote about RapidMiner’s Evolutionary and Genetic Algorithms, check it out: http://www.neuralmarkettrends.com/2007/07/30/using-genetic-and-evolutionary-algorithms-to-build-a-trading-model/

  15. ciku said:

    You are right, Tom. Ok, Let me check it out first.

    Thank you.

  16. Shane B said:

    Is there any way to plot the data to show the date on the x axis, gold on the y axis and then color it based on the prediction? When I try to plot, it doesn’t let me select the date for the x axis.

  17. Tom said:

    Shane: As far as I know YALE/Rapidminer can’t plot those labels out. What I usually do is save the results as a DAT file and import them into Excel and the chart the results.

  18. Amed said:

    Thanks for the tutorials. One question: In the “Data View” under the “Results Tab”, is the predicted value in Row 1? Should it be UP in this case?

    Thanks again.

  19. Tom said:

    The predictions will be in first group of columns, usually after the ID.

    The prediction maybe be DOWN due to the data input. The idea behind this model is to see if the inputs of this week can “flip” the trend. It gives the modeler/trader ample warning that something *might* happen based on the market environment.

  20. Jeffrey Cameron said:

    Hi Tom,

    I have just jumped into this series of tutorials at this lesson without completing the prior lessons. When attempting to load the model file, I get the error:

    Error in: ModelLoader (ModelLoader) Could not read file ‘C:\Users\Jeff\Downloads\rapidminer\ga-gold-prediction\gold_final.mod’: Cannot read from XML stream, wrong format: : only whitespace content allowed before start tag and not \uac (position: START_DOCUMENT seen \uac… @1:1)

    I opened the file gold_final.mod in a text editor expecting to see something that looks like XML, but found that gold_final.mod looks more like binary data than anything else. Anyways, the tutorial isn’t working for me very easily. I will go back to lesson one and develop my own model, and I think that will probably work better. Any hints as to how to resolve the above error, for those who want to jump in to this series of tutorials without completing lessons 1 to 4?

  21. Jeffrey Cameron said:

    Hi Tom,

    Just following up. If I had to guess at the contents of gold_final.mod, I would guess that it’s java bytecode. It looks a lot like a .class or .jar file when examined in a text editor. However, it seems like RapidMiner is expecting gold_final.mod to contain data in XML format. Just an observation.

  22. Tom said:

    Jeff: These tutorials were written for YALE v3.4, the new version called RapidMiner has a new structure which makes the data files presented in the example unreadable. It’s not 100% backward compatible.

  23. Ax said:

    Hi,

    I found your articles very interesting. I would have a question regarding RapidMiner:
    How can we us it to predict a numerical value, is this possible?

  24. Tom said:

    Ax: Yes it is possible but I advise against it. Forecasting closing stock prices is very dicey, directions are better.

    If you want to forecast stock prices, I’d suggest using the Multilayer Prectpron operator instead of a Classifier operator

  25. Deon said:

    Hi Tom,

    I recently started looking into predictions using rapidminer after having had a bad spell in the forex market. I came across your tutorial and it helped me a lot when starting out. In the mean time, I have also been learning a lot about rapidminer and structuring prediction models (specifically time-series models). I have built, actually I am in the process of building a EURUSD prediction model, and I was wondering if anybody in this community might be interested in forming a development team and then beyond the development stage also form kind of a trading team.
    I will upload all the data, processes, and tests I have built or done if there is any interest at all.
    I am looking forward your responses.

    PS: I’m sure I’m leaving this comment at the wrong place, but I didn’t know where to start a new thread.

    Deon

  26. Tom said:

    Hi Deon,

    At one time I had done the exact same thing you wanted to do and we all started energetically but then fizzled out. I’d be interested to do it again but I wouldn’t be trading because I’m too busy right now. I could be more of a reviewer and suggestion maker if you like.

    You can access the project collaborating site here, http://www.neuralmarkettrends.com/collaborate

    Email me via the contact form and I’ll give you access.

  27. Eric Detterman said:

    Deon/Tom – I would be interested in the EURUSD Rapid Miner development work. I am also in the beginning stages of developing Rapid Miner based systems & have been trading currencies for a couple years now.

    Thanks,
    Eric

  28. David said:

    In response to EUR/USD pred mod. – A great idea and will have my full attention.

  29. Tom said:

    David, email me your email address and I’ll send you the login information for the project collaboration site.

  30. Ignacio said:

    Hi Tom,
    I am interested on the EUR/USD. How do I contact?.

  31. Ignacio said:

    Hi Tom,

    anobody working on the EUR/USD prediction model? I am interested on it.

  32. Adeel said:

    Hi Tom
    I need help on detailed steps of how to prepare/format input for training the model on past 3 years (e.g 2004, 2005, 2006) and predicting temperature for next year (e.g 2007). I have data in the following format

    Temperature
    Day of the year 2004 2005 2006
    1 18.6 22.4 10.9
    2 17.8 17.5 13.5
    3 18.0 17.2 18.1
    4 19.5 18.2 16.1
    5 20.8 13.4 18.4
    6 19.0 17.0 18.0
    7 17.2 19.6 19.3
    8 18.0 16.0 17.0
    9 17.6 20.5 19.0

    I have gone through your lectures, those are very informative and i learned alot but i am stil unable to build my prediction model using my time series data.

    Thank you very munch in advance for your kind guideline and coopeartion.

    regards

    Adeel

  33. Adeel said:

    Can I have Email Address of TOM

  34. Tom said:

    Adeel: Check your email, I replied to your query this morning. Thanks for reaching out to me!

  35. Vince said:

    Hi Tom,

    I’ve gone through your tutorials and also downloaded rapid miner 4.2 and tried to model lesson 4. Different learners were used such as decision trees instead of suggested nearest neighbour. But I was trying it out on some telco data. Do you know if the tool can handle large amounts of data ? I’m also new to the predictive modelling environment and would love to learn more. Do you know of any good sites or books you can recommend to a first timer like myself ? I guess I’m asking the question of the ‘how do we interpret results from the model’ ?

  36. Tom said:

    Vince: Rapidminer can handle large amount of datasets, the only limitation you have is your hardware. I highly recommend Data Mining and Business Productivity by Stephan Kudyba and Richard Hoptroff. It will give you a straight forward discussion on data mining, neural nets, and how to interpret the results.

    http://www.amazon.com/exec/obidos/ASIN/1930708033/qid=981434492/104-0089139-4760725

  37. Vince said:

    Tom,

    Thanks for the quick response. I’ll have to get the book you recommended.

  38. Benito Soarez said:

    Hmmmmm,
    I’ve got an unpleasant feeling about predictions using the raw prices.
    Have you considered calculating the daily returns or a similar normalization?
    If prices change (because of the usual inflation or – recently – deflation) or the relation between two assets changes, the system leaves the “operating point” and the neural model won’t give you any signal anymore. Even if known patterns occur, just at another level.

  39. Tom said:

    Benito: This tutorial was merely an example to show how RM users can build classification model. I transform raw data all the time before I load it into a model and use for trading.

  40. Martin said:

    Hi Tom,

    Thanks for creating those nice lessons. It is a help indeed!!

    I have one problem though: I cannot set my label-id to Zero, as you suggested, it always jums back to 1. Do you know what I do wrong?

    And another question. What is the Gold_Prediction_RapidMiner.xml file used for (when I use the golf_final_mod file for the prediction)?

    Please help when you have a second!

    Thanks, Martin

  41. arachnode.net said:

    Hi Tom –

    I’d like to collaborate at: http://www.neuralmarkettrends.com/collaborate/index.php?c=access&a=login

    I have used RM in the past for text classification purposes, with great success. Currently working on a model for correlating the COT with EURUSD.

    Thanks!
    Mike

  42. Kyle Degruttola said:

    Hi Tom:

    Where can I find the Yale v3.4? There seems to be some limitations on the newer version, which are creating unwanted variation and errors in the code. I think I found a nice little use for this though. It seems that you can find what some funds are trading by data mining the correlation between the change in NAV and different assets’ closing prices. Then you can build a prediction model after weighting this accordingly. This may open an opportunity for statistical arbitrage when the prior end day spread is large. Also this would not be defined as piggy backing under securities law as there is variance/the relationships are not absolute. Regardless, I hope to hear back from you.

    Yours truly,

    Kyle Degruttola
    web.njit.edu/~kd42

  43. Tom said:

    Hi Kyle,
    Yale 3.4 was morphed into Rapidminer. You can download it at http://www.rapid-i.com or http://www.rapidminer.com. The tutorials were modified load into the new version of Rapidminer.

    I see you are from NJIT, are you a (former) student of Dr. Kudyba?

  44. Kyle Degruttola said:

    I am a former and current student of Dr. Kudyba. I am finishing the masters this may but I will try to continue attending courses and training under him in the future. I think everyone agrees that he is one of the best and a pioneer in this field.

  45. Tom said:

    Kyle: He is a great guy and really open to sharing his knowledge. He’s the guy that helped me get my Future’s Magazine article published on Gold prices and Neural Nets.

  46. Martin said:

    Hi everybody,

    I cannot put my label-id to zero, it switches back to 1. Do you have a solution for this? The error I get is this one:

    Many operators like classification and regression methods or the PerformancEvaluator require the input example sets to have a label or class attribute. If this not the case, applying these operators is pointless. If you read the data using an ExampleSource, you can specify the label attribute by using a ‘label’ tag in the attribute description file

    I also tried not using “create label” at all, but it doesnt work.

    Thanks!

  47. Tom said:

    Martin,
    Are you using the ExcelExampleSource Operator to load your data? What version of RM are you using?

  48. Martin said:

    Tom,

    Well, I was basically doing all the tutorials that you wrote. In tutorial V you are using new data and run the algorithm over that new data (to generate the trend signal).

    My version is RM 4.4 and when putting the label-id to zero (because there is no label in that new excel data file) it switches back to 1. Yes, I do use the excel example source operator (if that is the one you used in the tutorial…)

    Thanks for your quick response. I appreciate this.
    Martin

  49. Tom said:

    Martin, I’d need to see your training data files and your prediction set. I’m still using RM version 4.0 so I don’t know if its a bug or not.

  50. Martin said:

    Tom, sorry to bother you again, but did you receive my email with the files? If not, please let me know an email address of yours. Thanks a lot, Martin

  51. Tom said:

    Hi Martin, yes I did receive your email with files. I’ve been terribly busy and haven’t had a chance to look at them. If this is a time critical thing, please post your question to the Rapid Miner forums (see the link in my blog roll). You should have an answer in hours.
    Regards, Tom

  52. Martin said:

    @ All

    I have following problem and am looking for help:

    I worked through Tom’s lessons and tried to forecast the trend of gold. When running the .mod file over the new data set, I have to put the label_id column at Zero (there is no label_id because it is the one I want to forecast, compare with lesson V, step 2).

    I am using RM 4.4 and I cannot enter zero in that field (it jumps back to 1). Does one of you know how to deal with this problem?

    Thank you.
    Martin

Post your opinion