[flashvideo file=wp-content/uploads/2010/09/Rapidminer5-Vid11.mp4 /]The video is NOT uploaded to my Youtube channel because its 13 minutes long. Â Here's the HQ video tutorial #11.
September 2010 Archives
I'm back to making new videos again, at least for a little while! This new video showcases the Pattern Recognition & Landmarking plugin that was unveiled at RCOMM 2010. This plugin is fantastic! Â It analyzes your data, ranks the best type of learners that should yield the highest accuracy, and then automatically constructs the process for you. Its so great that it helps answer one of the most often asked questions from my readers, "which learner should I use for my data?"
Today's guest post about an awesome new plugin for Rapidminer, is from Milan Vukicevic. Â Although I walked in at the very end of his presentation at RCOMM 2010, I sat down with Milan on my last day and he gave me a personal demo of WhiBo. Â The applications I see from this plugin, as it relates to theÂ financialÂ world, is its ability to build algorithmsÂ on new data, Â find patterns, and tweak parameters that were never possible before. Thanks Milan!WhiBo is a RapidMiner plug-in for component-based design and performance testing of data mining algorithms. Users can design whole algorithms simply by connecting components. These components are building blocks that represent crucial algorithmic steps that every algorithm of certain type should have. WhiBo has an interactive GUI for design of component-based algorithms that can be designed and saved for reuse with just a few clicks, without having to write a single line of code. This way, data mining practitioners have more possibilities to construct and rebuild algorithms that better adapt to concrete data. In comparison with traditional algorithms, which could only be adjusted by parameter tuning, this approach offers more significant possibilities of algorithm adjustment. A component repository for design and testing of Decision tree and Partitioning Clustering algorithms is provided. This repository allows users to design algorithms which can outperform traditional, well-known, algorithms. If needed, component-based design allows simple extension of the repository, but also definitions of new generic algorithms (e.g. neural networks, SVMs etc.). When combined with RapidMinerâ€™s pre-processing and visualization operators, WhiBo becomes a powerful tool for pattern recognition and predictive analysis. For more information about WhiBo and component-based approach in design and application of data mining algorithms, feel free to contact me at milan.vukicevic *AT* fon.bg.ac.rs, (remove *AT*). Installation instructions, detailed user and developer documentation and list of our publications can be found on www.whibo.fon.bg.ac.rs.
I'm happy to announce that today is the first of a two part guest post series. Today's guest post is by Marin Matijas, who gave a presentation at RCOMM 2010 about Short Term Load Forecasting using Support Vector Machines (SVM). I asked Marin to elaborate a little about his use of the Radial Bias Function (RBF) in Rapidminer's SVM operator and here's what he had to say! I did edit the post a bit forÂ readability. Â Thanks Marin!In my RCOMM 2010 presentation, titled "Application of Short Term Load Forecasting using Support Vector Machines in RapidMiner 5.0," I showed how SVMs can be used to solve a volatile Load Forecasting problem. Load Forecasting is an old problem, it is almost as old as modern stock exchange related forecasting. I am comparing these two, as both problems are time-series which makes them similar (also because we are all eagerly waiting for Tom's videos with more insights on how to predict financial markets). Â The goal of Load Forecasting is to predict exact values of an electricity (power) load in a given time interval. Typically a load for the day ahead is being predicted on hourly basis. Unlike predictions in the financial markets where trend prediction is often more important than 'exact' value, here the goal is to predict the (exact) value of the load itself. Depending on the problem, Mean Average Percentage Error (MAPE) varies, but it is typically between 1 and 10 % for 24 intervals or more. A good precision can be obtained as load does not fluctuate much. Overall we consume typically more in winter than in autumn, more on Monday morning than Sunday evening, but when averaged electricity consumptions follow certain patterns. Since load is serial nature where patterns are being repeated on a known basis, windowing has been used to take the advantage of this property. Support Vector Machines has been chosen for the regression, as it gave better results than previously used method. Compared to Artificial Neural Networks, it is much faster, an important characteristic with large datasets. One key parameter used for the SVM learner was the Radial Basis Function (RBF) kernel. Â It was chosen for three main reasons, discussed below. The first reason is that it is good for non-linear problems. Looking at a typical graph of the electricity grid daily load, one can easily see that Load Forecasting is a non-linear problem (see graph below). hotmail.com
I'm playing around with Rapidminer's powerful text mining tools to dig through annual reports this evening and I'm making progress. Â Rapidminer can text mine all sorts of formats but the operators are still a bit tough to use if you don't know what you're doing, like me! Â Still, I did pick up a thing or two at RCOMM and I'm putting that to good use. For tonight I decided to mine through the annual reports of $CSCO, $XOM, $INTC, $AMD, and $BP. Â Granted, these stocks are in threeÂ differentÂ industry groups but I'm just poking around to see how they use buzz words like "sustainability" and "greenhouse." It's all rather fun and silly, but wait till I post about my Twitter mining experiment. Â LOL.
(Note: AMD never used it but BP did the most)
(Note: AMD never used it but BP did the most)
Well the jet lag finally caught up to me so I apologize for this late post on RCOMM. Thursday morning was kicked off by yours truly, and I was deeply humbled that the Rapid-I team asked me to one of their two invited speakers at RCOMM. For my presentation I choose to talk about Forecasting Historical Volatility for Option Trading. The subject of this talk was about the creation, or rather recreation, of a research paper that tried to predict the rise and fall of historical volatility and then utilize option volatility strategies to make profit. Â I created the Rapidminer model from this research paper back in 2007 after an astute NMT reader, who also is a full time option trader, contacted me about collaborating on such an endeavor.Â Long story short, we test traded the model through summer 2007 and it seemed to be working fine until Bear Stearns blew up.Â We both got busy with the financial mess that began unfolding before us and the collaboration was put on indefinite hiatus. When the Rapid-I team invited me to give a talk, I decided to talk about this experiment because itÂ yieldedÂ some interesting results that perhaps the original researchers didn't think about. Â The first thing I did was to recreate this model using the newer Time Series Forecasting plugin and includeÂ the volatility time period from 2005 to 2010 for the S&P500. Â In doing so, I yielded results that differed from what the research paper was predicting. Â I proceeded to further drill down into the details and retrain the model on two distinct time periods from 2005 to 2007, and 2007 to 2009, with both showing very different results. Â With the benefit of time, I was able to determine that in times of orderly/low volatility the historical volatility forecasting trend had greater than 60% accuracy.Â In times of high volatility it was slightly better than a coin flip. Â It seems that this strategy for forecasting historicalÂ volatilityÂ does work but only when the markets "behave." Marin Matijas followed my talk on a similar type time series project, trying to applyÂ Short Term Load Forecasting using Support Vector Machines in RapidMiner 5.0. I was able to glean some interesting insight from his talk about using SVMâ€™s for my previous talk if I wanted to supercharge the option trading system, but thatâ€™s for another time.Â Check back next week for a guest post from Marin where he details a bit more about using a RBF function in a SVM for his time series analysis. Following Marinâ€™s talk there was a short break which we chatted, networked, and drank lots of coffee. We began the next set of talks about how data analysis in Rapideminer can be improved.Â Alexander Arimond presented about Distributed Pattern Recognition in Data Mining, then Marco Stolpe presented how stream mining can be integrated into Rapidminer (this was really amazing) in his Implementing Hierarchical Heavy Hitters in RapidMiner talk, and lastly for the morning we heard from Olaf Laber of Ingres Vectorwise about how the way databases use in memory are about to be changed forever. Sounds like a lot for day doesnâ€™t it? Well that was just the morning! Â We kicked off the afternoon with two workshops that included the unveiling of the â€œRâ€ plugin by Sebastian Land and how to use RapidAnalytics by Simon Fischer. Â The rumor is that RapidAnalytics will be released as a open source soon. Â If that's true I'll be installing it on the NMT server and pulling down lots of daily financial data! Closing out the RCOMM 2010 were two amazing text mining presentations.Â I realized that we are on the cusp of something amazing in text mining when I listened intently to Timur Fayruzoyâ€™s talk about using the Rapiminer Framework for Protien Interaction Extraction.Â Timur unvileved a working system that helps researchers, doctors, and other medical practicitions find protein interactions by text mining research papers.Â WOW.Â If that didnâ€™t blow me away, Felix Jungermanâ€™s talk about the creation of a new plugin for Information Extraction did.Â Under development is a new text mining related plugin that attempts to extract information, not data, from text.Â This plugin will be a quantum leap for text mining in Rapidminer for sure and Iâ€™ll be checking for it regularly on the Rapid-I site.
Incorporating and expanding on my first RCOMM 2010 post, I going to write about the various presentations that I found highly interesting and applicable to financial data mining.Â I walked in on Milan Vukiecevic, who gave a talk about an upcoming plugin release called Whi Bo. Unfortunately I walked in toward the end of the talk and only caught the Q&A part.Â Still, I was able to catch up with on the last day of RCOMM to discuss his application.Â Ingo from Rapid-I describes it best, itâ€™s like a mini Rapidminer inside Rapidminer!Â Essentially Whi Bo works within the Decision Tree modelers and helps the user fine tune the splitting parameters.Â It also enhances the modelers by detecting better splitting algorithms for your particular data set. Right after Milanâ€™s talk we had another great talk about Landmarking for Meta Learning by Sarah Abdelmessih.Â This talk was considered a continuation of the PaREN talk I missed early in the day about pattern recognition.Â Â I found Sarahâ€™s discussion on determining the right learner for your particular data set to be very useful.Â Why? Often my readers ask me, would an SVM learner better to use in this data set? Or is Knn better?Â Often itâ€™s a combination of learners, not just one, that gives you the better answer!Â The end result is the creation of ranking system of learners for a given data set!Â I canâ€™t wait for the PaREN plugin to come out. Man, so many cool things were going on in those few short hours! We closed out the day with a workshop by Tobias Malberct, Rapid-I team member about using the Reporting operators in Rapidminer and the now famous â€œWho wants to be Dataminerâ€ game show.Â I think the game show was the funniest thing I saw in a long time!Â Â Contestants pitted themselves against veteran Rapid-I developers with the surprise of the evening coming at the end. Contestant Matko BoÅ¡njak, from Croatia, finished surprisingly strong after only â€œpicking upâ€ Rapidminer 3 months ago. Not even the veteran Rapid-I guys could finish in the 5 minutes time given and Matko took home the prize. Â I believe he said that he learned how to use Rapidminer from watching my tutorials. Dinner followed at nice local establishment only a few 100 meters from the University.Â We ate, drank, and chatted the night away.Â I met up with Milan, Matko, Ralf Klinkenberg, Ingo & Nadja Mierswa, Markus Hoffman, and Miran Matjis.Â Miran was presenting the next day about load forecasting electrical demand using SVMs.Â Although our talks were different in subject, we both applied the time series forecasting plugin for Rapidminer and had LOTS to talk about that night and the next, but Iâ€™ll leave those adventures for tomorrow.
Yes, I'm about to board my plane back to the USA so this post will have to be a bit short. Â I do owe you guys a long series of posts (and new videos) about my time in Dortmund with the Rapid-I team at RCOMM 2010, which will start after I survive the jet lag again! What I can say is that I was amazed by the papersÂ presentedÂ by the many RCOMM 2010 speakers. All of them are leveraging the power of Rapidminer in ways that I never dreamed of! Â BUT! That's not the best part! Â The best part of this trip was meeting some amazingly intelligent and dynamic people from all over the world and making new friends. Ingo posted his Day 1 and Day 2 review of RCOMM 2010 but here's yours truly in action!
Wow, RCOMM 2010 is so much fun! After an exhausting flight to Frankfurt, I made it to RCOMM 2010 late Tuesday afternoon. Â I got to listen to two great talks so far and watch aÂ hilariousÂ game show, "Who wants to be a data miner." The Rapid-i team has really done a great job of hosting this event and its amazing to hear how people are using Rapidminer to solve complex tasks to make everyday life better. After the game show we all went down for dinner at the Krautergarten and had some great food, drink, and of course conversation. Â I've made lots of new friends and went to bed very late. Â Â Now the trick is to be awake, on time, Â and coherent for my presentation. lol.
(from the game show event)
(the after RCOMM 2010 dinner)
(ofc I have to enjoy a good German beer, or three)
I recently installed the new MyExperiment Community plugin for Rapidminer after it was first suggested on my forums by a poster/reader (hat tip to Ronmac). Â I'm glad I did because this plugin enables me to access, upload, and download Rapidminer workflows / processes that users share as part of the community. The plugin allows to do all these functions from within Rapidminer and there are currently about 50 processes ranging from Image Mining to Text Mining available for you to download!Â Really great stuff and I wonder why I didn't install this plugin sooner!
My old college professor, Dr. Stephan Kudyba, explains what Data Mining is for newbies. His data mining class is the reason why I started Neural Market Trends.
I'm done with my draft presentation for RCOMM 2010 and hopefully will wrap it up over the weekend and then its to Germany in two weeks. Â Its hard to believe but RCOMM 2010 will soon be a reality. Â I'm looking forward to meeting the rest of the Rapid-I team and networking with some great minds. Once I return from Germany I will post three new video tutorials and some Rapid-I goodies for my loyal readers. Â One of those videos will be about parameter optimization for sure! Thanks for sticking with me through my hiatus, Â I'm very appreciative of yourÂ understanding!