Neural Market Trends |||

Extracting OpenStreetMap Data in RapidMiner

A few weeks ago I wanted to play with the Enrich by Webservice operator. The operator is part of the RapidMiner Web Mining extension and is accessible through the Marketplace. I wanted to do reverse lookups based on latitude and longitude. In my searching I came across this post on how to do it using XPath and via Google. That post was most informative and I used it as a starting point for my process building. I wanted to do the same thing but use OpenStreetMaps. Why OSM? OSM is an open source database of Geographic Inforation Systems (GIS) and is rich with data. Plus, it’s a bit easier to use than Google.

After a few minutes of tinkering, I was successful. I built a process to go out to the USGS Eartquake site, grab the current CSV, load it, and then do a reverse lookup using the latitude and longitude. The process then creates a column with the country via the XPath of //reversegeocode/addressparts/country/text().”
Here’s what the process looks like:

Extracing OSM dataExtracing OSM data
Here’s the XML of the process:

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="open_file" compatibility="7.6.001" expanded="true" height="68" name="Open File" width="90" x="45" y="30">
        <parameter key="resource_type" value="URL"/>
        <parameter key="filename" value="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
        <parameter key="url" value="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
        <description align="center" color="transparent" colored="false" width="126">Open USGS URL</description>
      </operator>
      <operator activated="true" class="read_csv" compatibility="7.6.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="30">
        <parameter key="column_separators" value=","/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
        <description align="center" color="transparent" colored="false" width="126">Read CSV data</description>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="30">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="latitude|longitude|time|mag"/>
        <description align="center" color="transparent" colored="false" width="126">Select Columns</description>
      </operator>
      <operator activated="true" class="rename" compatibility="7.6.001" expanded="true" height="82" name="Rename" width="90" x="514" y="30">
        <parameter key="old_name" value="latitude"/>
        <parameter key="new_name" value="Latitude"/>
        <list key="rename_additional_attributes">
          <parameter key="longitude" value="Longitude"/>
        </list>
        <description align="center" color="transparent" colored="false" width="126">Rename Columns</description>
      </operator>
      <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="648" y="30">
        <parameter key="query_type" value="XPath"/>
        <list key="string_machting_queries"/>
        <list key="regular_expression_queries"/>
        <list key="regular_region_queries"/>
        <list key="xpath_queries">
          <parameter key="ExtractedCountry" value="//reversegeocode/addressparts/country/text()"/>
        </list>
        <list key="namespaces"/>
        <parameter key="assume_html" value="false"/>
        <list key="index_queries"/>
        <list key="jsonpath_queries"/>
        <parameter key="url" value="http://nominatim.openstreetmap.org/reverse?format=xml&amp;lat=&lt;%Latitude%&gt;&amp;lon=&lt;%Longitude%&gt;&amp;zoom=18&amp;addressdetails=1"/>
        <list key="request_properties"/>
        <description align="center" color="transparent" colored="false" width="126">Extract Country based on Lat/Long</description>
      </operator>
      <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
      <connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
      <connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

**Note: Updated RapidMiner XML with new USGS URL on 2017-09-07

Up next Coding RapidMiner in Python Back in middle school we learned about log tables. We learned how to look them up in a table, interpolate them, and then use the result in our Geo Distance in RapidMiner and Python In my previous post, I showed how you can use the Enrich by Webservice operator and OpenStreetMaps to do reverse geocoding lookups. This post will
Latest posts Machine Learning Making Pesto Tastier 5 Dangerous Things You Should Let Your Kids Do The Pyschology of Writing TensorFlow and High Level APIs Driving Marketing Performance with H2O Driverless AI Machine Learning and Data Munging in H2O Driverless AI with datatable Making AI Happen Without Getting Fired Latest Musings from a Traveling Sales Engineer The Night before H2O World 2019 Why Forex Trading is Frustrating Functional Programming in Python Automatic Feature Engineering with Driverless AI Ray Dalio's Pure Alpha Fund What's new in Driverless AI? Latest Writings Elsewhere - December 2018 House Buying Guide for Millennials Changing Pinboard Tags with Python Automate Feed Extraction and Posting it to Twitter Flux: A Machine Learning Framework for Julia Getting Started in Data Science Part 2 Makers vs Takers How Passive Investing Saved My Life Startups and Open Source The Process of Writing H2O AI World 2018 in London Ray Dalio's Pure Alpha Fund Isolation Forests in H2O.ai Living the Dream? Humility and Equanimity in Sales What is Reusable Holdout? H2O World London 2018 - Record Signups!