Trimming Outliers In Rapidminer

Posted on Di 12 Februar 2008 in misc • 1 min read

  • Data Analytics
  • Neural Nets
  • Tutorials tags:
  • RapidMiner
  • Tutorials meta: dsq_thread_id: '181179077' author:

    RapidMiner OutlierI was inspired to write a short post about trimming outliers in RapidMiner after a comment from dc yesterday. Although I've never used these particular set of data pre-processing operators (I always inspect my data visually), I find them to interesting and worth a look.

    If you right click and select "New Operator", you'll find many parent category operator selections. Choose the "Pre-Processing" category, then "Data", and then "Outlier."

    Once in the outlier directory you'll find three operators: densitybasedoutlierdetection, distancebasedoutlierdetection, LOFoutlierdetection.

    Here's what each of them do in brief:

    • The densitybasedoutlierdetection operator scans your data set and looks for outliers based on a density function (squared distance, euclidean distance, angle);
    • The distancebaseoutlierdetection operator uses a k-nearest neighbor algorithm to find outliers, and;
    • The LOFoutlierdectection operator uses minimal upper and lower bounds (with a density function) to find outliers.

    These operators, in an experiment, will automatically "snip" your the outlier data record and then build your neural net model from the remaining data. Check out RapidMiner's "Pre-Processing" category for more great data "cleaning" goodies!