|||

Big Data and Infrastructure

I have a daily downtime routine. Every evening I set aside about a hour and think. I sit or walk around the house and ruminate about all sorts of random things. Sometimes it’s with a glass of wine and more often it’s with a cup of black tea and milk. Sometimes my mind wanders to what I did that day or what I didn’t finish. Other times I get inspired to write a new blog post or create a new tutorial. Sometimes it’s an epiphany like the impact of Big Data and Infrastructure.

Infrastructure Investigation

It’s no secret that I came from Infrastructure field. I spent many years designing and managing infrastructure projects as a Civil Engineer. Some projects were big, some were small. I traveled to remotest parts of Montana and North Dakota and worked all over the country. I’ve inspected bridges, roads, sewers, and written countless reports.

Most of these reports were to highlight deficiencies in some bit of infrastructure. We’d take an inventory of the structure, take photos, and make measurements. Then the report would go to an agency where they’d use it to get budget monies to fix the problem.

Then I moved to the machine learning startup world and here I am today.  My first move was into Pre-sales and right before I transferred to the Marketing group, I fielded  some interesting queries from potential customers.  One was from a major freight railroad and the other from an railroad car inspection company. Both of these organizations capture sensor data and make measurements on their infrastructure assets. They measure temperature of rail gauges, wear patterns, widths, and hours of use.

"Back by hand""Back by hand"

Big Data Migration

The most interesting part of these queries? The data was migrating from standalone reports into Hadoop clusters.  For the first time ever, at least since I was in the industry, data is coming together from all over the place. The only problem was getting the data out to work on it!

There are many ways to get the data out and work with it (i.e. Spark, Hive, RapidMiner, etc) now, but engineering professionals don’t understand it. Ask any manager in an Infrastructure firm what Hadoop is and they won’t know. Some might have heard of data science and data mining but they might not know what all the hoopla is about.

The hoopla is this.

Engineers use data to design all kinds of things. Imagine if they have access to a deeper pool of stress strain data for bridges?  Or for rails? What if researchers adjust the mixture ratios of concrete or temper steel differently to extract more performance based on terabytes of data from a central research Hadoop cluster?

These scenarios are not far fetched at all. I went to a presentation two years ago on the forecasting of flooding for Hurricane Sandy event types in the NY area. The room was filled with engineers and a presenter from Stevens Institute of Technology. The presenter says they run several wave function calculations to help state governments like New Jersey and New York predict where the flooding is going to occur and its severity.

After the presentation I asked him where they get their data from and he said from a group of computers tied together in a cluster.

The Future

The reality is that more Infrastructure companies are collecting ever increasing amounts of data. They’re using drones to do bridge inspections, tying river gauges together via the Internet, and using more sensors than ever. These sensors (aka IoT) collect and stream this data somewhere. In the old days it was an Access database. Today it’s an more robust database and one day that will be a big Hadoop cluster. The average Civil Engineer of my time hasn’t heard about a Hadoop Cluster but they heard of Big Data and wonder what its about.

Soon they’ll crush the silos of their data stores, unlock innovation, and build their own clusters.

Imagine the world we can build then?

Up next Machine Learning on a Raspberry Pi Mashing Up Julia Language with RapidMiner If you want to execute any Python in RapidMiner, you have to use the Execute Python operator. This operator makes things so simple that people use
Latest posts Revisiting GOOG, GE, NE, IYR from 2007 The Ye Old Blog List Motorola: Then and Now EWM Redux Testing for mean reversion with Python & developing simple VIX system - Talaikis unsorted - Tadas Talaikis Blog Steps to calculate centroids in cluster using K-means clustering algorithm - Data Science Central Basics of Statistical Mean Reversion Testing - QuantStart Algorithmic trading in less than 100 lines of Python code - O’Reilly Media Interpreting Machine Learning Models Microsoft the AI Powerhouse Investing in the S&P500 still beats AI Trading Microsoft makes a push to simplify machine learning | TechCrunch 10 Great Articles On Python Development — Hacker Noon Introduction to Keras Democratising Machine learning with H2O — Towards Data Science Getting started with Python datatable | Kaggle Phone Addiction Machine Learning Making Pesto Tastier 5 Dangerous Things You Should Let Your Kids Do The Pyschology of Writing Investing in 2019 and beyond TensorFlow and High Level APIs Driving Marketing Performance with H2O Driverless AI Machine Learning and Data Munging in H2O Driverless AI with datatable Making AI Happen Without Getting Fired Latest Musings from a Traveling Sales Engineer The Night before H2O World 2019 Why Forex Trading is Frustrating Functional Programming in Python Automatic Feature Engineering with Driverless AI Ray Dalio's Pure Alpha Fund