#AI · Mean Reversion RapidMiner Trading Tutorials · 2018-03-19 · Thomas Ott

~7 min read

Lately I've been think about becoming more active in trading again. I was reviewing some strategies and decided to recreate a mean reversion trading process in RapidMiner. I found a mean reversion trading stategy that uses Python here and just recreated it in RapidMiner.

The Process

The process is quite simple. You do the following:

  1. Load in stock quote data via CSV;
  2. Calculate daily returns;
  3. Calculate a 20 day moving average;
  4. Calculate a rolling 90 day standard deviation;
  5. Generate Trading Criteria per the article;
  6. Wrap it all together and look at the Buy vs Hold and Buy Signals.

Mind you, this doesn't include commission costs and slippage. I suspect that once I add that in, the Buy and Hold strategy will be the best.

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="Load Data" width="90" x="45" y="34">
        <process expanded="true">
          <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read Downloaded S&amp;P500" width="90" x="45" y="34">
            <parameter key="csv_file" value="C:\Users\TomOtt\Downloads\INTC.csv"/>
            <parameter key="column_separators" value=","/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Date.true.polynominal.attribute"/>
              <parameter key="1" value="Open.true.real.attribute"/>
              <parameter key="2" value="High.true.real.attribute"/>
              <parameter key="3" value="Low.true.real.attribute"/>
              <parameter key="4" value="Close.true.real.attribute"/>
              <parameter key="5" value="Adj Close.true.real.attribute"/>
              <parameter key="6" value="Volume.true.real.attribute"/>
            </list>
            <description align="center" color="transparent" colored="false" width="126">Due to Yahoo changes, must download CSV file manually for now!</description>
          </operator>
          <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="179" y="34">
            <list key="attributes">
              <parameter key="Close" value="1"/>
            </list>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="8.1.001" expanded="true" height="82" name="Returns" width="90" x="313" y="34">
            <list key="function_descriptions">
              <parameter key="Returns" value="(Close-Open)/Open"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes for ETL" width="90" x="447" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="^GSPC_CLOSE-1"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <operator activated="true" class="series:moving_average" compatibility="7.4.000" expanded="true" height="82" name="MA 20 DAY" width="90" x="581" y="34">
            <parameter key="attribute_name" value="Close"/>
            <parameter key="window_width" value="20"/>
          </operator>
          <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Window for 90 Day STDEV" width="90" x="715" y="34">
            <parameter key="window_size" value="90"/>
            <parameter key="label_attribute" value="Close"/>
          </operator>
          <operator activated="true" class="generate_aggregation" compatibility="8.1.001" expanded="true" height="82" name="Calc STD Dev for 90 day window" width="90" x="849" y="34">
            <parameter key="attribute_name" value="StDev"/>
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="attributes" value="Log Returns-0|Log Returns-1|Log Returns-2|Log Returns-3|Log Returns-4"/>
            <parameter key="regular_expression" value="Close\-[0-9]"/>
            <parameter key="aggregation_function" value="standard_deviation"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="983" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Close-0|Date-0|High-0|Low-0|Open-0|StDev|Volume-0|average(Close)-0|Log Returns-0|Returns-0"/>
          </operator>
          <operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Rename a bunch of stuff" width="90" x="1117" y="34">
            <parameter key="old_name" value="Date-0"/>
            <parameter key="new_name" value="Date"/>
            <list key="rename_additional_attributes">
              <parameter key="Close-0" value="Close"/>
              <parameter key="High-0" value="High"/>
              <parameter key="Returns-0" value="Returns"/>
              <parameter key="Volume-0" value="Volume"/>
              <parameter key="Low-0" value="Low"/>
              <parameter key="Open-0" value="Open"/>
              <parameter key="average(Close)-0" value="MA20"/>
              <parameter key="StDev" value="90daySTDEV"/>
            </list>
          </operator>
          <operator activated="true" class="nominal_to_date" compatibility="8.1.001" expanded="true" height="82" name="Convert dates" width="90" x="1251" y="34">
            <parameter key="attribute_name" value="Date"/>
            <parameter key="date_format" value="yyyy-MM-dd"/>
          </operator>
          <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Low" width="90" x="1385" y="34">
            <list key="attributes">
              <parameter key="Low" value="1"/>
            </list>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="1519" y="34">
            <parameter key="attribute_filter_type" value="value_type"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="default" value="zero"/>
            <list key="columns"/>
          </operator>
          <connect from_op="Read Downloaded S&amp;P500" from_port="output" to_op="Lag Series" to_port="example set input"/>
          <connect from_op="Lag Series" from_port="example set output" to_op="Returns" to_port="example set input"/>
          <connect from_op="Returns" from_port="example set output" to_op="Select Attributes for ETL" to_port="example set input"/>
          <connect from_op="Select Attributes for ETL" from_port="example set output" to_op="MA 20 DAY" to_port="example set input"/>
          <connect from_op="MA 20 DAY" from_port="example set output" to_op="Window for 90 Day STDEV" to_port="example set input"/>
          <connect from_op="Window for 90 Day STDEV" from_port="example set output" to_op="Calc STD Dev for 90 day window" to_port="example set input"/>
          <connect from_op="Calc STD Dev for 90 day window" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Rename a bunch of stuff" to_port="example set input"/>
          <connect from_op="Rename a bunch of stuff" from_port="example set output" to_op="Convert dates" to_port="example set input"/>
          <connect from_op="Convert dates" from_port="example set output" to_op="Lag Low" to_port="example set input"/>
          <connect from_op="Lag Low" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="174" resized="false" width="180" x="24" y="231">The goal is to pull this data from a database in the future.&lt;br/&gt;&lt;br/&gt;The database will be populated with stock data from the Nasdaq automatically overnight using EOD</description>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Load CSV data</description>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="8.1.001" expanded="true" height="82" name="Generate Trading Signal" width="90" x="179" y="34">
        <list key="function_descriptions">
          <parameter key="Criteria1" value="if((Open&lt;[Low-1]) &amp;&amp; (Open-[Low-1])&lt;-[90daySTDEV],1,0)"/>
          <parameter key="Criteria2" value="if(Open&gt;MA20,1,0)"/>
          <parameter key="Signal" value="if(([Criteria1]+[Criteria2])==2,1,0)"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Final Col" width="90" x="313" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Close|Date|Signal|Returns"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="82" name="Multiply (2)" width="90" x="447" y="34"/>
      <operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="103" name="Cum Sum of Returns" width="90" x="581" y="34">
        <process expanded="true">
          <operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="103" name="Multiply (3)" width="90" x="112" y="34"/>
          <operator activated="true" class="series:integrate_series" compatibility="7.4.000" expanded="true" height="82" name="All Signals" width="90" x="447" y="289">
            <parameter key="attribute_name" value="Returns"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="8.1.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="313" y="34">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Signal.eq.1"/>
            </list>
          </operator>
          <operator activated="true" class="series:integrate_series" compatibility="7.4.000" expanded="true" height="82" name="Buy Signal Only" width="90" x="447" y="34">
            <parameter key="attribute_name" value="Returns"/>
          </operator>
          <operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Buy Returns" width="90" x="581" y="34">
            <parameter key="old_name" value="cumulative(Returns)"/>
            <parameter key="new_name" value="Buy Signals Returns"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Buy and Hold Returns" width="90" x="581" y="289">
            <parameter key="old_name" value="cumulative(Returns)"/>
            <parameter key="new_name" value="Buy and Hold"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <connect from_port="in 1" to_op="Multiply (3)" to_port="input"/>
          <connect from_op="Multiply (3)" from_port="output 1" to_op="Filter Examples (2)" to_port="example set input"/>
          <connect from_op="Multiply (3)" from_port="output 2" to_op="All Signals" to_port="example set input"/>
          <connect from_op="All Signals" from_port="example set output" to_op="Buy and Hold Returns" to_port="example set input"/>
          <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Buy Signal Only" to_port="example set input"/>
          <connect from_op="Buy Signal Only" from_port="example set output" to_op="Buy Returns" to_port="example set input"/>
          <connect from_op="Buy Returns" from_port="example set output" to_port="out 1"/>
          <connect from_op="Buy and Hold Returns" from_port="example set output" to_port="out 2"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="337" y="147">Set the previous value to the missing value</description>
        </process>
      </operator>
      <operator activated="false" class="subprocess" compatibility="8.1.001" expanded="true" height="103" name="Subprocess" width="90" x="581" y="289">
        <process expanded="true">
          <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Window for Training" width="90" x="45" y="187">
            <parameter key="window_size" value="1"/>
            <parameter key="create_label" value="true"/>
            <parameter key="label_attribute" value="Signal"/>
            <parameter key="add_incomplete_windows" value="true"/>
          </operator>
          <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="8.1.001" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="179" y="34">
            <list key="parameters">
              <parameter key="Backtesting.test_window_width" value="[2;5;4;linear]"/>
              <parameter key="Backtesting.training_window_step_size" value="[1;5;4;linear]"/>
              <parameter key="Backtesting.training_window_width" value="[2;5;4;linear]"/>
              <parameter key="SVM for HV Calc.kernel_gamma" value="[0.01;1000;5;logarithmic]"/>
              <parameter key="SVM for HV Calc.C" value="[0;10000;2;linear]"/>
            </list>
            <process expanded="true">
              <operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Backtesting" width="90" x="179" y="34">
                <parameter key="training_window_width" value="6"/>
                <parameter key="test_window_width" value="6"/>
                <parameter key="cumulative_training" value="true"/>
                <process expanded="true">
                  <operator activated="true" class="support_vector_machine" compatibility="8.1.001" expanded="true" height="124" name="SVM for HV Calc" width="90" x="179" y="34">
                    <parameter key="kernel_type" value="radial"/>
                    <parameter key="kernel_gamma" value="0.10000000000000002"/>
                    <parameter key="C" value="200.0"/>
                  </operator>
                  <connect from_port="training" to_op="SVM for HV Calc" to_port="training set"/>
                  <connect from_op="SVM for HV Calc" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model In Testing" width="90" x="45" y="34">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Forecast Performance" width="90" x="246" y="34">
                    <parameter key="horizon" value="1"/>
                    <parameter key="main_criterion" value="prediction_trend_accuracy"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model In Testing" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model In Testing" to_port="unlabelled data"/>
                  <connect from_op="Apply Model In Testing" from_port="labelled data" to_op="Forecast Performance" to_port="labelled data"/>
                  <connect from_op="Forecast Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Backtesting" to_port="training"/>
              <connect from_op="Backtesting" from_port="model" to_port="model"/>
              <connect from_op="Backtesting" from_port="averagable 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Window for Prediction" width="90" x="179" y="289">
            <parameter key="window_size" value="1"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply SVM model" width="90" x="380" y="136">
            <list key="application_parameters"/>
          </operator>
          <connect from_port="in 1" to_op="Window for Training" to_port="example set input"/>
          <connect from_op="Window for Training" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <connect from_op="Window for Training" from_port="original" to_op="Window for Prediction" to_port="example set input"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="out 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="model" to_op="Apply SVM model" to_port="model"/>
          <connect from_op="Window for Prediction" from_port="example set output" to_op="Apply SVM model" to_port="unlabelled data"/>
          <connect from_op="Apply SVM model" from_port="labelled data" to_port="out 2"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Load Data" from_port="out 1" to_op="Generate Trading Signal" to_port="example set input"/>
      <connect from_op="Generate Trading Signal" from_port="example set output" to_op="Select Final Col" to_port="example set input"/>
      <connect from_op="Select Final Col" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
      <connect from_op="Multiply (2)" from_port="output 1" to_op="Cum Sum of Returns" to_port="in 1"/>
      <connect from_op="Cum Sum of Returns" from_port="out 1" to_port="result 1"/>
      <connect from_op="Cum Sum of Returns" from_port="out 2" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <description align="center" color="yellow" colored="false" height="120" resized="false" width="180" x="31" y="195">Position Sizing?&lt;br&gt;Need to add: &lt;br&gt;Volatility based position sizing&lt;br&gt;buy and sell = $7, no slippage</description>
    </process>
  </operator>
</process>

PS: to test this, just go to Yahoo Finance and download historical quote data for a stock and then repath it in the Read CSV operator. Use at least a 2 year time period.

Next Steps

I still have several 'kink's to work out but I can definitely see the opportunity for optimization here, such as:

  • Why use a rolling 90 day window? Use parameter optimization to vary that value from 50 to 100.
  • Why use a 20 day moving average? You could vary between a 10 or 30 day MA?
  • Write a python script to download EOD stock data and then have RapidMiner loop through it.
  • Write an commission and slippage subprocess to see if this method IS really profitable or not.
  • Offload the processes to a RapidMiner Server and have it spit out trading recommendations on a daily basis