Neural Market Trends |||

Mean Reversion Trading Process in RapidMiner

Lately I’ve been think about becoming more active in trading again. I was reviewing some strategies and decided to recreate a mean reversion trading process in RapidMiner. I found a mean reversion trading stategy that uses Python here and just recreated it in RapidMiner.

The process is quite simple. You do the following:

  1. Load in stock quote data via CSV;
  2. Calculate daily returns;
  3. Calculate a 20 day moving average;
  4. Calculate a rolling 90 day standard deviation;
  5. Generate Trading Criteria per the article;
  6. Wrap it all together and look at the Buy vs Hold and Buy Signals.

Mind you, this doesn’t include commission costs and slippage. I suspect that once I add that in, the Buy and Hold strategy will be the best.

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="82" name="Load Data" width="90" x="45" y="34">
            <process expanded="true">
              <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read Downloaded S&amp;P500" width="90" x="45" y="34">
                <parameter key="csv_file" value="C:\Users\TomOtt\Downloads\INTC.csv"/>
                <parameter key="column_separators" value=","/>
                <parameter key="first_row_as_names" value="false"/>
                <list key="annotations">
                  <parameter key="0" value="Name"/>
                </list>
                <parameter key="encoding" value="windows-1252"/>
                <list key="data_set_meta_data_information">
                  <parameter key="0" value="Date.true.polynominal.attribute"/>
                  <parameter key="1" value="Open.true.real.attribute"/>
                  <parameter key="2" value="High.true.real.attribute"/>
                  <parameter key="3" value="Low.true.real.attribute"/>
                  <parameter key="4" value="Close.true.real.attribute"/>
                  <parameter key="5" value="Adj Close.true.real.attribute"/>
                  <parameter key="6" value="Volume.true.real.attribute"/>
                </list>
                <description align="center" color="transparent" colored="false" width="126">Due to Yahoo changes, must download CSV file manually for now!</description>
              </operator>
              <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="179" y="34">
                <list key="attributes">
                  <parameter key="Close" value="1"/>
                </list>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="8.1.001" expanded="true" height="82" name="Returns" width="90" x="313" y="34">
                <list key="function_descriptions">
                  <parameter key="Returns" value="(Close-Open)/Open"/>
                </list>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes for ETL" width="90" x="447" y="34">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="^GSPC_CLOSE-1"/>
                <parameter key="invert_selection" value="true"/>
              </operator>
              <operator activated="true" class="series:moving_average" compatibility="7.4.000" expanded="true" height="82" name="MA 20 DAY" width="90" x="581" y="34">
                <parameter key="attribute_name" value="Close"/>
                <parameter key="window_width" value="20"/>
              </operator>
              <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Window for 90 Day STDEV" width="90" x="715" y="34">
                <parameter key="window_size" value="90"/>
                <parameter key="label_attribute" value="Close"/>
              </operator>
              <operator activated="true" class="generate_aggregation" compatibility="8.1.001" expanded="true" height="82" name="Calc STD Dev for 90 day window" width="90" x="849" y="34">
                <parameter key="attribute_name" value="StDev"/>
                <parameter key="attribute_filter_type" value="regular_expression"/>
                <parameter key="attributes" value="Log Returns-0|Log Returns-1|Log Returns-2|Log Returns-3|Log Returns-4"/>
                <parameter key="regular_expression" value="Close\-[0-9]"/>
                <parameter key="aggregation_function" value="standard_deviation"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="983" y="34">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="Close-0|Date-0|High-0|Low-0|Open-0|StDev|Volume-0|average(Close)-0|Log Returns-0|Returns-0"/>
              </operator>
              <operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Rename a bunch of stuff" width="90" x="1117" y="34">
                <parameter key="old_name" value="Date-0"/>
                <parameter key="new_name" value="Date"/>
                <list key="rename_additional_attributes">
                  <parameter key="Close-0" value="Close"/>
                  <parameter key="High-0" value="High"/>
                  <parameter key="Returns-0" value="Returns"/>
                  <parameter key="Volume-0" value="Volume"/>
                  <parameter key="Low-0" value="Low"/>
                  <parameter key="Open-0" value="Open"/>
                  <parameter key="average(Close)-0" value="MA20"/>
                  <parameter key="StDev" value="90daySTDEV"/>
                </list>
              </operator>
              <operator activated="true" class="nominal_to_date" compatibility="8.1.001" expanded="true" height="82" name="Convert dates" width="90" x="1251" y="34">
                <parameter key="attribute_name" value="Date"/>
                <parameter key="date_format" value="yyyy-MM-dd"/>
              </operator>
              <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Low" width="90" x="1385" y="34">
                <list key="attributes">
                  <parameter key="Low" value="1"/>
                </list>
              </operator>
              <operator activated="true" class="replace_missing_values" compatibility="8.1.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="1519" y="34">
                <parameter key="attribute_filter_type" value="value_type"/>
                <parameter key="value_type" value="numeric"/>
                <parameter key="default" value="zero"/>
                <list key="columns"/>
              </operator>
              <connect from_op="Read Downloaded S&amp;P500" from_port="output" to_op="Lag Series" to_port="example set input"/>
              <connect from_op="Lag Series" from_port="example set output" to_op="Returns" to_port="example set input"/>
              <connect from_op="Returns" from_port="example set output" to_op="Select Attributes for ETL" to_port="example set input"/>
              <connect from_op="Select Attributes for ETL" from_port="example set output" to_op="MA 20 DAY" to_port="example set input"/>
              <connect from_op="MA 20 DAY" from_port="example set output" to_op="Window for 90 Day STDEV" to_port="example set input"/>
              <connect from_op="Window for 90 Day STDEV" from_port="example set output" to_op="Calc STD Dev for 90 day window" to_port="example set input"/>
              <connect from_op="Calc STD Dev for 90 day window" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Rename a bunch of stuff" to_port="example set input"/>
              <connect from_op="Rename a bunch of stuff" from_port="example set output" to_op="Convert dates" to_port="example set input"/>
              <connect from_op="Convert dates" from_port="example set output" to_op="Lag Low" to_port="example set input"/>
              <connect from_op="Lag Low" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
              <connect from_op="Replace Missing Values" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <description align="center" color="yellow" colored="false" height="174" resized="false" width="180" x="24" y="231">The goal is to pull this data from a database in the future.&lt;br/&gt;&lt;br/&gt;The database will be populated with stock data from the Nasdaq automatically overnight using EOD</description>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Load CSV data</description>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="8.1.001" expanded="true" height="82" name="Generate Trading Signal" width="90" x="179" y="34">
            <list key="function_descriptions">
              <parameter key="Criteria1" value="if((Open&lt;[Low-1]) &amp;&amp; (Open-[Low-1])&lt;-[90daySTDEV],1,0)"/>
              <parameter key="Criteria2" value="if(Open&gt;MA20,1,0)"/>
              <parameter key="Signal" value="if(([Criteria1]+[Criteria2])==2,1,0)"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Final Col" width="90" x="313" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Close|Date|Signal|Returns"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="82" name="Multiply (2)" width="90" x="447" y="34"/>
          <operator activated="true" class="subprocess" compatibility="8.1.001" expanded="true" height="103" name="Cum Sum of Returns" width="90" x="581" y="34">
            <process expanded="true">
              <operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="103" name="Multiply (3)" width="90" x="112" y="34"/>
              <operator activated="true" class="series:integrate_series" compatibility="7.4.000" expanded="true" height="82" name="All Signals" width="90" x="447" y="289">
                <parameter key="attribute_name" value="Returns"/>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="8.1.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="313" y="34">
                <list key="filters_list">
                  <parameter key="filters_entry_key" value="Signal.eq.1"/>
                </list>
              </operator>
              <operator activated="true" class="series:integrate_series" compatibility="7.4.000" expanded="true" height="82" name="Buy Signal Only" width="90" x="447" y="34">
                <parameter key="attribute_name" value="Returns"/>
              </operator>
              <operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Buy Returns" width="90" x="581" y="34">
                <parameter key="old_name" value="cumulative(Returns)"/>
                <parameter key="new_name" value="Buy Signals Returns"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="rename" compatibility="8.1.001" expanded="true" height="82" name="Buy and Hold Returns" width="90" x="581" y="289">
                <parameter key="old_name" value="cumulative(Returns)"/>
                <parameter key="new_name" value="Buy and Hold"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <connect from_port="in 1" to_op="Multiply (3)" to_port="input"/>
              <connect from_op="Multiply (3)" from_port="output 1" to_op="Filter Examples (2)" to_port="example set input"/>
              <connect from_op="Multiply (3)" from_port="output 2" to_op="All Signals" to_port="example set input"/>
              <connect from_op="All Signals" from_port="example set output" to_op="Buy and Hold Returns" to_port="example set input"/>
              <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Buy Signal Only" to_port="example set input"/>
              <connect from_op="Buy Signal Only" from_port="example set output" to_op="Buy Returns" to_port="example set input"/>
              <connect from_op="Buy Returns" from_port="example set output" to_port="out 1"/>
              <connect from_op="Buy and Hold Returns" from_port="example set output" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
              <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="337" y="147">Set the previous value to the missing value</description>
            </process>
          </operator>
          <operator activated="false" class="subprocess" compatibility="8.1.001" expanded="true" height="103" name="Subprocess" width="90" x="581" y="289">
            <process expanded="true">
              <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Window for Training" width="90" x="45" y="187">
                <parameter key="window_size" value="1"/>
                <parameter key="create_label" value="true"/>
                <parameter key="label_attribute" value="Signal"/>
                <parameter key="add_incomplete_windows" value="true"/>
              </operator>
              <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="8.1.001" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="179" y="34">
                <list key="parameters">
                  <parameter key="Backtesting.test_window_width" value="[2;5;4;linear]"/>
                  <parameter key="Backtesting.training_window_step_size" value="[1;5;4;linear]"/>
                  <parameter key="Backtesting.training_window_width" value="[2;5;4;linear]"/>
                  <parameter key="SVM for HV Calc.kernel_gamma" value="[0.01;1000;5;logarithmic]"/>
                  <parameter key="SVM for HV Calc.C" value="[0;10000;2;linear]"/>
                </list>
                <process expanded="true">
                  <operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Backtesting" width="90" x="179" y="34">
                    <parameter key="training_window_width" value="6"/>
                    <parameter key="test_window_width" value="6"/>
                    <parameter key="cumulative_training" value="true"/>
                    <process expanded="true">
                      <operator activated="true" class="support_vector_machine" compatibility="8.1.001" expanded="true" height="124" name="SVM for HV Calc" width="90" x="179" y="34">
                        <parameter key="kernel_type" value="radial"/>
                        <parameter key="kernel_gamma" value="0.10000000000000002"/>
                        <parameter key="C" value="200.0"/>
                      </operator>
                      <connect from_port="training" to_op="SVM for HV Calc" to_port="training set"/>
                      <connect from_op="SVM for HV Calc" from_port="model" to_port="model"/>
                      <portSpacing port="source_training" spacing="0"/>
                      <portSpacing port="sink_model" spacing="0"/>
                      <portSpacing port="sink_through 1" spacing="0"/>
                    </process>
                    <process expanded="true">
                      <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model In Testing" width="90" x="45" y="34">
                        <list key="application_parameters"/>
                      </operator>
                      <operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Forecast Performance" width="90" x="246" y="34">
                        <parameter key="horizon" value="1"/>
                        <parameter key="main_criterion" value="prediction_trend_accuracy"/>
                      </operator>
                      <connect from_port="model" to_op="Apply Model In Testing" to_port="model"/>
                      <connect from_port="test set" to_op="Apply Model In Testing" to_port="unlabelled data"/>
                      <connect from_op="Apply Model In Testing" from_port="labelled data" to_op="Forecast Performance" to_port="labelled data"/>
                      <connect from_op="Forecast Performance" from_port="performance" to_port="averagable 1"/>
                      <portSpacing port="source_model" spacing="0"/>
                      <portSpacing port="source_test set" spacing="0"/>
                      <portSpacing port="source_through 1" spacing="0"/>
                      <portSpacing port="sink_averagable 1" spacing="0"/>
                      <portSpacing port="sink_averagable 2" spacing="0"/>
                    </process>
                  </operator>
                  <connect from_port="input 1" to_op="Backtesting" to_port="training"/>
                  <connect from_op="Backtesting" from_port="model" to_port="model"/>
                  <connect from_op="Backtesting" from_port="averagable 1" to_port="performance"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_performance" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Window for Prediction" width="90" x="179" y="289">
                <parameter key="window_size" value="1"/>
              </operator>
              <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply SVM model" width="90" x="380" y="136">
                <list key="application_parameters"/>
              </operator>
              <connect from_port="in 1" to_op="Window for Training" to_port="example set input"/>
              <connect from_op="Window for Training" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
              <connect from_op="Window for Training" from_port="original" to_op="Window for Prediction" to_port="example set input"/>
              <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="out 1"/>
              <connect from_op="Optimize Parameters (Grid)" from_port="model" to_op="Apply SVM model" to_port="model"/>
              <connect from_op="Window for Prediction" from_port="example set output" to_op="Apply SVM model" to_port="unlabelled data"/>
              <connect from_op="Apply SVM model" from_port="labelled data" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Load Data" from_port="out 1" to_op="Generate Trading Signal" to_port="example set input"/>
          <connect from_op="Generate Trading Signal" from_port="example set output" to_op="Select Final Col" to_port="example set input"/>
          <connect from_op="Select Final Col" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
          <connect from_op="Multiply (2)" from_port="output 1" to_op="Cum Sum of Returns" to_port="in 1"/>
          <connect from_op="Cum Sum of Returns" from_port="out 1" to_port="result 1"/>
          <connect from_op="Cum Sum of Returns" from_port="out 2" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="120" resized="false" width="180" x="31" y="195">Position Sizing?&lt;br&gt;Need to add: &lt;br&gt;Volatility based position sizing&lt;br&gt;buy and sell = $7, no slippage</description>
        </process>
      </operator>
    </process>

PS: to test this, just go to Yahoo Finance and download historical quote data for a stock and then repath it in the Read CSV operator. Use at least a 2 year time period.

Next Steps

I still have several kink’s to work out but I can definitely see the opportunity for optimization here, such as:

  • Why use a rolling 90 day window? Use parameter optimization to vary that value from 50 to 100.
  • Why use a 20 day moving average? You could vary between a 10 or 30 day MA?
  • Write a python script to download EOD stock data and then have RapidMiner loop through it.
  • Write an commission and slippage subprocess to see if this method IS really profitable or not.
  • Offload the processes to a RapidMiner Server and have it spit out trading recommendations on a daily basis
Up next Building a Machine Learning Framework from scratch Great article by Florian Cäsar on how his team developed a new machine learning framework. From scratch. In 491 steps! He summarizes the entire Python Script to Parse Blog Feeds I recently wrote a small a python script to parse blog feeds and then tweet them out via Twitter. It randomly takes the first 5 RSS entries of a
Latest posts Machine Learning Making Pesto Tastier 5 Dangerous Things You Should Let Your Kids Do The Pyschology of Writing TensorFlow and High Level APIs Driving Marketing Performance with H2O Driverless AI Machine Learning and Data Munging in H2O Driverless AI with datatable Making AI Happen Without Getting Fired Latest Musings from a Traveling Sales Engineer The Night before H2O World 2019 Why Forex Trading is Frustrating Functional Programming in Python Automatic Feature Engineering with Driverless AI Ray Dalio's Pure Alpha Fund What's new in Driverless AI? Latest Writings Elsewhere - December 2018 House Buying Guide for Millennials Changing Pinboard Tags with Python Automate Feed Extraction and Posting it to Twitter Flux: A Machine Learning Framework for Julia Getting Started in Data Science Part 2 Makers vs Takers How Passive Investing Saved My Life Startups and Open Source The Process of Writing H2O AI World 2018 in London Ray Dalio's Pure Alpha Fund Isolation Forests in H2O.ai Living the Dream? Humility and Equanimity in Sales What is Reusable Holdout? H2O World London 2018 - Record Signups!