Nearly 100% of all the models I build, using the YALE/Rapidminer data modeling suite, uses something called â€œsupervised learningâ€. Simply put, supervised learning is nothing more than a process of creating a function from your training data and seeking the minimum error between the output variable and predicted output.
For example, you have a data set (which we will call the training data) that contains your input and output variables, and you want to build a model that explains the output variable in terms of the inputs. Imagine that the output variable is Plankton Growth (PG) and your inputs are Sea Temperature (ST), Sunlight Intensity (SI), and Whale Population (WP).
When you load this data set into a YALE/Rapidminer experiment you tell it to label PG as your output and ST, SI, and WP as your inputs. Assume that you’re using a standard Multilayer Preceptron (MLP) learner for this experiment. When you click â€œrunâ€, the MLP learner takes your training data builds a function with weights from the inputs ST, SI, WP, calculates the predicated output (which we will now call pPG), and compares it to the actual output (PG).
The first few times the learner does this the predicted output (pPG) will be way off from the actual output (PG) and the learner will register a % error between those two. The MLP learner wants to minimize these errors to the lowest magnitude so it varies the calculated function slightly and then recalculates the predicted output (pPG). Its not hard to guess what the learner does next but it goes through the same cycle by comparing pPG to PG, calculates another % error between then, and then begins the process all over again.
The MLP learner stops learning once it reaches the lowest % error between pPG and PG and then saves the final function to your model (if you have the ModelWriter operator in your experiment). This type of feedback loop is a great way to build a model but you must remember that your model will only be as good as your training data. A lot of times data modelers will throw all kinds of inputs into a training data only to later realize that, forgive the generalization, that they’re modeling apples to oranges.
A post on how to make sure you have good and robust data to model is to follow.
From around the Social Web!
Want to leave a comment?
If you want to give me some feedback on this post, please contact me
via email or on Twitter