Extracting OpenStreetMap Data in RapidMiner

A few weeks ago I wanted to play with the Enrich by Webservice operator. The operator is part of the RapidMiner Web Mining extension and is accessible through the Marketplace. I wanted to do reverse lookups based on latitude and longitude. In my searching I came across this post on how to do it using XPath and via Google. That post was most informative and I used it as a starting point for my process building. I wanted to do the same thing but use OpenStreetMaps. Why OSM? OSM is an open source database of Geographic Inforation Systems (GIS) and is rich with data. Plus, it's a bit easier to use than Google.

After a few minutes of tinkering, I was successful. I built a process to go out to the USGS Eartquake site, grab the current CSV, load it, and then do a reverse lookup using the latitude and longitude. The process then creates a column with the country via the XPath of "//reversegeocode/addressparts/country/text()."
Here's what the process looks like:

Extracing OSM data
Here's the XML of the process:

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="open_file" compatibility="7.6.001" expanded="true" height="68" name="Open File" width="90" x="45" y="30">
        <parameter key="resource_type" value="URL"/>
        <parameter key="filename" value="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
        <parameter key="url" value="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
        <description align="center" color="transparent" colored="false" width="126">Open USGS URL</description>
      <operator activated="true" class="read_csv" compatibility="7.6.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="30">
        <parameter key="column_separators" value=","/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
        <description align="center" color="transparent" colored="false" width="126">Read CSV data</description>
      <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="30">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="latitude|longitude|time|mag"/>
        <description align="center" color="transparent" colored="false" width="126">Select Columns</description>
      <operator activated="true" class="rename" compatibility="7.6.001" expanded="true" height="82" name="Rename" width="90" x="514" y="30">
        <parameter key="old_name" value="latitude"/>
        <parameter key="new_name" value="Latitude"/>
        <list key="rename_additional_attributes">
          <parameter key="longitude" value="Longitude"/>
        <description align="center" color="transparent" colored="false" width="126">Rename Columns</description>
      <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="648" y="30">
        <parameter key="query_type" value="XPath"/>
        <list key="string_machting_queries"/>
        <list key="regular_expression_queries"/>
        <list key="regular_region_queries"/>
        <list key="xpath_queries">
          <parameter key="ExtractedCountry" value="//reversegeocode/addressparts/country/text()"/>
        <list key="namespaces"/>
        <parameter key="assume_html" value="false"/>
        <list key="index_queries"/>
        <list key="jsonpath_queries"/>
        <parameter key="url" value="http://nominatim.openstreetmap.org/reverse?format=xml&amp;lat=&lt;%Latitude%&gt;&amp;lon=&lt;%Longitude%&gt;&amp;zoom=18&amp;addressdetails=1"/>
        <list key="request_properties"/>
        <description align="center" color="transparent" colored="false" width="126">Extract Country based on Lat/Long</description>
      <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
      <connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
      <connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>

**Note: Updated RapidMiner XML with new USGS URL on 2017-09-07

Share on: Diaspora*TwitterFacebookGoogle+LinkedInHackerNewsEmail

Related Posts

Comments !