February 7, 2017

Big Data and Infrastructure

I have a daily downtime routine. Every evening I set aside about a hour and think. I sit or walk around the house and ruminate about all sorts of random things. Sometimes it’s with a glass of wine and more often it’s with a cup of black tea and milk. Sometimes my mind wanders to what I did that day or what I didn’t finish. Other times I get inspired to write a new blog post or create a new tutorial. Sometimes it’s an epiphany like the impact of Big Data and Infrastructure.

Infrastructure Investigation

It’s no secret that I came from Infrastructure field. I spent many years designing and managing infrastructure projects as a Civil Engineer. Some projects were big, some were small. I traveled to remotest parts of Montana and North Dakota and worked all over the country. I’ve inspected bridges, roads, sewers, and written countless reports.

Most of these reports were to highlight deficiencies in some bit of infrastructure. We’d take an inventory of the structure, take photos, and make measurements. Then the report would go to an agency where they’d use it to get budget monies to fix the problem.

Then I moved to the machine learning startup world and here I am today.  My first move was into Pre-sales and right before I transferred to the Marketing group, I fielded  some interesting queries from potential customers.  One was from a major freight railroad and the other from an railroad car inspection company. Both of these organizations capture sensor data and make measurements on their infrastructure assets. They measure temperature of rail gauges, wear patterns, widths, and hours of use.

"Back by hand""Back by hand"

Big Data Migration

The most interesting part of these queries? The data was migrating from standalone reports into Hadoop clusters.  For the first time ever, at least since I was in the industry, data is coming together from all over the place. The only problem was getting the data out to work on it!

There are many ways to get the data out and work with it (i.e. Spark, Hive, RapidMiner, etc) now, but engineering professionals don’t understand it. Ask any manager in an Infrastructure firm what Hadoop is and they won’t know. Some might have heard of data science and data mining but they might not know what all the hoopla is about.

The hoopla is this.

Engineers use data to design all kinds of things. Imagine if they have access to a deeper pool of stress strain data for bridges?  Or for rails? What if researchers adjust the mixture ratios of concrete or temper steel differently to extract more performance based on terabytes of data from a central research Hadoop cluster?

These scenarios are not far fetched at all. I went to a presentation two years ago on the forecasting of flooding for Hurricane Sandy event types in the NY area. The room was filled with engineers and a presenter from Stevens Institute of Technology. The presenter says they run several wave function calculations to help state governments like New Jersey and New York predict where the flooding is going to occur and its severity.

After the presentation I asked him where they get their data from and he said from a group of computers tied together in a cluster.

The Future

The reality is that more Infrastructure companies are collecting ever increasing amounts of data. They’re using drones to do bridge inspections, tying river gauges together via the Internet, and using more sensors than ever. These sensors (aka IoT) collect and stream this data somewhere. In the old days it was an Access database. Today it’s an more robust database and one day that will be a big Hadoop cluster. The average Civil Engineer of my time hasn’t heard about a Hadoop Cluster but they heard of Big Data and wonder what its about.

Soon they’ll crush the silos of their data stores, unlock innovation, and build their own clusters.

Imagine the world we can build then?

Posted by Thomas Ott

Don't forget to sign up for our monthly newsletter on Data Science and RapidMiner here!


Big Data Hadoop Thoughts


Previous post
Machine Learning on a Raspberry Pi It looks like Google is catching up to the idea of machine learning on a Raspberry Pi! Someone put RapidMiner on a Pi back in 2013 but it was slow
Next post
Mashing Up Julia Language with RapidMiner If you want to execute any Python in RapidMiner, you have to use the Execute Python operator. This operator makes things so simple that people use