I promised my readers that I would post about YALE/RapidMiner’s LibSVM operator over a month ago. Unfortunately life had gotten in the way and I’m resorting to a multiple part series to just get the information out to you, so bear with me over the course of the next few days (or weeks) as I write about this exciting, powerful, and complicated learner.
First off, I use the LibSVM operator in YALE 3.4 occasionally, I use it to fool around with data sometimes and I rarely use it to build trading models. I prefer to use either the Gaussian Regression, Multilayer Preceptron, or IBk Learners for my time series data modeling. However, you could use the LibSVM learner for time series data modeling but I have found it more useful in analyzing non-time series data.
What the heck is SVM anyway? Wikipedia defines a SVM as a Support Vector Machine that is “a set of related supervised learning methods used for classification and regression.†Wikipedia continues to give a decent overview on how SVM’s work:
A special property of SVMs is that they simultaneously minimize the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers.
Support vector machines map input vectors to a higher dimensional space where a maximal separating hyperplane is constructed. Two parallel hyperplanes are constructed on each side of the hyperplane that separates the data. The separating hyperplane is the hyperplane that maximizes the distance between the two parallel hyperplanes. An assumption is made that the larger the margin or distance between these parallel hyperplanes the better the generalisation error of the classifier will be.
This particular operator, the LibSVM, was created by two researchers Chih-Chung Chang and Chih-Jen Lin at the National Science Council of Taiwan. YALE/RapidMiner packages their LibSVM learner into the nice operator you see to left of this paragraph.
What makes the LibSVM learner so appealing to us is that it can do 5 specialized tasks: it does 2 types of regression (epsilon-SVR, nu-SVR), 2 types of classification (C-SVC, nu-SVC), and something called one class SVM.
I’ll go into greater detail about each type of specialized task in part II of my LibSVM tutorial. If you want to learn more before then, visit Chang and Lin’s website for more details.
Tom,
Literature appears to suggest that SVMs do as good or better than your above preferred learners with time series data modeling…
Would you please share your thoughts and elaborate about your preferences?
I would love to see part II of your LibSVM tutorial ;-)
Cordially,
-Digital Dude-
“We keep moving forward, opening new doors and doing new things, because we’re curious.†-Walt Disney-
DD: I forgot all about that article, I should write part 2.
Hi,
I came across when I tried to find information on Rapidminer SVM classication visualization. I am new to Rapidminer and I was wondering if we could see the ‘seperating hyperlines’ ‘support vector’ visualization in Rapidminer of a given data svm classification.
Regards,
Seyhan
seyhan: I don’t know the answer to that one, have you checked the rapidminer forums?
Thanks Tom. I gave up on Rapidminer and currently working on R ggobi. Hopefully, I’ll be able to generate something. Not much effective data mining visualization available around for SVM. Thanks anyway.
Hi. I have a question about SVM in rapid miner. I dont know why svm couldn’t create accuracy matrix as k-nn. it just shows root mean square error. what should I do?
I’m working on Text categorization and I have 3 columns. Title,Body and label. Body and title are text and label is numeric.
I used:
read database->process document->select attribute->set role(set label column as label)-> x-validation
@Hamid: I’m not sure what you’re doing without seeing your datafile and process. I suggest you visit the Rapid-I forums, they might already have a solution for you.