Trimming Outliers In Rapidminer

RapidMiner OutlierI was inspired to write a short post about trimming outliers in RapidMiner after a comment from dc yesterday. Although I’ve never used these particular set of data pre-processing operators (I always inspect my data visually), I find them to interesting and worth a look.

If you right click and select “New Operator”, you’ll find many parent category operator selections. Choose the “Pre-Processing” category, then “Data”, and then “Outlier.”

Once in the outlier directory you’ll find three operators: densitybasedoutlierdetection, distancebasedoutlierdetection, LOFoutlierdetection.

Here’s what each of them do in brief:

  • The densitybasedoutlierdetection operator scans your data set and looks for outliers based on a density function (squared distance, euclidean distance, angle);
  • The distancebaseoutlierdetection operator uses a k-nearest neighbor algorithm to find outliers, and;
  • The LOFoutlierdectection operator uses minimal upper and lower bounds (with a density function) to find outliers.

These operators, in an experiment, will automatically “snip” your the outlier data record and then build your neural net model from the remaining data. Check out RapidMiner’s “Pre-Processing” category for more great data “cleaning” goodies!

About Tom

Blog owner of Neural Market Trends
This entry was posted in Data Mining, Neural Nets, Tutorials and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>