October 30, 2015

Extracting OpenStreetMap Data in RapidMiner

A few weeks ago I wanted to play with the Enrich by Webservice operator. The operator is part of the RapidMiner Web Mining extension and is accessible through the Marketplace. I wanted to do reverse lookups based on latitude and longitude. In my searching I came across this post on how to do it using XPath and via Google. That post was most informative and I used it as a starting point for my process building. I wanted to do the same thing but use OpenStreetMaps. Why OSM? OSM is an open source database of Geographic Inforation Systems (GIS) and is rich with data. Plus, it’s a bit easier to use than Google.

After a few minutes of tinkering, I was successful. I built a process to go out to the USGS Eartquake site, grab the current CSV, load it, and then do a reverse lookup using the latitude and longitude. The process then creates a column with the country via the XPath of //reversegeocode/addressparts/country/text().”
Here’s what the process looks like:

Extracing OSM dataExtracing OSM data
Here’s the XML of the process:

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="open_file" compatibility="7.6.001" expanded="true" height="68" name="Open File" width="90" x="45" y="30">
        <parameter key="resource_type" value="URL"/>
        <parameter key="filename" value="http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
        <parameter key="url" value="https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv"/>
        <description align="center" color="transparent" colored="false" width="126">Open USGS URL</description>
      <operator activated="true" class="read_csv" compatibility="7.6.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="30">
        <parameter key="column_separators" value=","/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
        <description align="center" color="transparent" colored="false" width="126">Read CSV data</description>
      <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="30">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="latitude|longitude|time|mag"/>
        <description align="center" color="transparent" colored="false" width="126">Select Columns</description>
      <operator activated="true" class="rename" compatibility="7.6.001" expanded="true" height="82" name="Rename" width="90" x="514" y="30">
        <parameter key="old_name" value="latitude"/>
        <parameter key="new_name" value="Latitude"/>
        <list key="rename_additional_attributes">
          <parameter key="longitude" value="Longitude"/>
        <description align="center" color="transparent" colored="false" width="126">Rename Columns</description>
      <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="648" y="30">
        <parameter key="query_type" value="XPath"/>
        <list key="string_machting_queries"/>
        <list key="regular_expression_queries"/>
        <list key="regular_region_queries"/>
        <list key="xpath_queries">
          <parameter key="ExtractedCountry" value="//reversegeocode/addressparts/country/text()"/>
        <list key="namespaces"/>
        <parameter key="assume_html" value="false"/>
        <list key="index_queries"/>
        <list key="jsonpath_queries"/>
        <parameter key="url" value="http://nominatim.openstreetmap.org/reverse?format=xml&amp;lat=&lt;%Latitude%&gt;&amp;lon=&lt;%Longitude%&gt;&amp;zoom=18&amp;addressdetails=1"/>
        <list key="request_properties"/>
        <description align="center" color="transparent" colored="false" width="126">Extract Country based on Lat/Long</description>
      <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
      <connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
      <connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>

**Note: Updated RapidMiner XML with new USGS URL on 2017-09-07

Don't forget to sign up for our monthly newsletter on Data Science and RapidMiner here!

Open Source Open Street Map RapidMIner tutorials RapidMiner

Previous post
Coding RapidMiner in Python Back in middle school we learned about log tables. We learned how to look them up in a table, interpolate them, and then use the result in our
Next post
D3js Integration and Boston 2024 In case you missed it (ICYMI), I’ve been creating new videos for the RapidMiner YouTube channel. My latest ones were posted a few weeks ago and are