The problem with developing a point spread betting system for football teams is that you can’t initially use a neural net. This is because backfitting raw stat data could lead to poor forecasting, or handicapping. What you have to do is come up with a method where you rank teams first and then feed them into a neural net system to forecast the estimated handicap or point spread.
Why? Teams change over the season; they may lose a key player to injury or evolve their strategies per game. Although neural nets are fantastic, they can’t cope with these paradigm shifts easily. However, if they are fed a ranking system that gets updated weekly based on data from played games, those rankings can be fed into the model and compared to opposing teams in the next game.
Now the trick is to develop this ranking system without backfitting data. How do you do this? It’s not easy but the route to use in my mind is to use neural net clustering to first identify if any pieces of data seem to drive the point spread. Once you know that, then its a matter of devising a mathematical model to help you rank the teams.
FYI, I am using EasyNN Plus for this project. I may or may not post my data files.
So does this mean that soon we will have the “Tom’s Power Rankings?” And could the inclusion of your rankings be too far away for college football and the BCS! haha.
I’m enjoying reading your work. I basically learned RapidMiner (or a small piece of it) through your tutorials (thanks), and continue to learn from your posts.
Interesting to see you have changed from stocks to footy. A mate of mine who was into bond trading got into the horse racing prediction a while ago and has had an interesting “ride”.
After looking at numerous “predictive” variables, the best ones he found involved combinations of the owner and the trainer. Why you ask? It has partially to do with better trainers are good at training horses and the richer owners can afford to buy better horses for them to work with. The other interesting thing is you get owner/trainer combinations that never fit the model. He did not know what to do with these outliers until he realized that certain owners and trainers were fixing races. He knew that if these horses appeared in the race, his model was never valid and he should not bet.
Anyway look forward to further posts in this area. Keep up the good work.
btw. speaking of predictive variables, after years of watching footy tipping competitions at the office, it is interesting to see how bad the real fans are at predicting who will win. It is not uncommon to see those who have no real knowledge of the game and just pick the home team to be in the top quartile of the leader board every time.
@Bedo: I like the sound of “Tom’s Power Rankings.” Who knows, lol.
@Caprica: I need to create a diversion for myself or else I’d shut down this blog completely. I’m pretty sure I’ll have some readers complain, “but this is a financial blog isn’t it?” There is some information out there that says you can get a betting edge of about 55 to 60% with the proper model, you need a 53% edge to make money.
Tom, two years ago I did almost the exact same thing and found that you’re right – the ranking system is key. I had pretty good success with it, the model would only get 2 or 3 games wrong each week. It just took a lot of data entry time and I stopped doing it.
@Tim: Now the trick is developing the ranking system. I already have some strong leads on how to do that.
Call me crazy, but the process of getting to the ranking system sure would make an interesting tutorial….
@Bedo: I think I’ll post the easynn data file