Geo Distance in RapidMiner and Python

In my previous post, I showed how you can use the Enrich by Webservice operator and OpenStreetMaps to do reverse geocoding lookups. This post will show how to calculate Geo Distance in RapidMiner between two latitude and longitude points. First using a RapidMiner and then using the GeoPy Python module.

This was a fun because it touched on my civil engineering classes. I used to calculate distances from latitude and longitude in my land surveying classes.

My first step was to select a home” location, which was 1 Penn Plaza, NY NY. Then I downloaded the latest list of earthquakes from the USGSwebsite. The last step was to calculate the distance from home to each earthquake location.

The biggest time suck for me was building all the formulas in RapidMiner’s Generate Attribute (GA) operator. That took about about 15 minutes. Then I had to backcheck the calculations with a website to make sure they matched. RapidMiner excelled in the speed of building and analyzing this process but I did notice the results were a bit off from the GeoPy python process.

There was a variance of about +/- 4km in each distance. This is because I hard coded in the earth’s diameter as 6371000 km for the RapidMiner process, but the diameter of the Earth changes based on your location. This is because the earth isn’t a sphere but more of an ellipsoid and the diameter isn’t uniform. The GeoPy great_circle calculation accounts for this by adjusting the calculation.

For a proof of concept, both work just fine.

Get the Geo Distance in RapidMiner Process

There were a few snags in my python code that took me longer to finish and I chalk this up to my novice ability at writing python. I didn’t realize that I had to create a tuple out of the lat/long columns and then use a for loop to iterate over the entire tuple list. But this was something that my friend solved in 5 minutes. Otherwise than that, the python code works well. Here’s the XML of the process:

 

Here’s the python process: