AlphaGo vs Lee Sedol

Lots of great videos on YouTube of the match between AlphaGo and Lee Sedol. I consider these historic because this level of AI is almost humanlike in its thinking of playing the game. Wikipedia has a great summary of the first 4 games. It’s AlphaGo 3 wins, Lee 1 win.

Game 1

Game 2

Game 3

Game 4

Game 5

Update: I write about how cool and scary AlphaGo is in my post here.

Spark vs Hadoop

There’s a lot of hullabaloo about Spark vs Hadoop for Big Data these days. If you’re rushing to stand up a Big Data cluster, you probably heard about this new Spark technology. The simplest way to think about the differences is that Hadoop is for batch jobs and Spark can do batch and stream processing. However, the biggest promise of Spark is the ability to code in Scala, Python (PySpark), and soon R (SparkR).

Dynamic programming languages like Python have opened up new ways to program, letting you develop algorithms interactively non-stop instead of the write/compile/test/debug cycle of C, not to mention chasing the inevitable memory management bugs. (Smart Data Collective)

While I don’t see Spark supplanting Hadoop – both rely on the HDFS data storage system – I see the leveraging of Spark to make that Hadoop elephant dance on a pin head.

As Mr. Schmitz so eloquently pointed out in the comments, Hadoop and Spark can’t supplant the other, they coexist together. What I mean to say in my last paragraph is that Spark will really let you leverage your Hadoop environment!

Forecast Transit Delays with Big Data

I used to work in the Transportation field, especially with railroads. Recovering from delays and the ability to forecast transit delays was always a tough task because of the ripple effects” that this article mentions, but Mathematician Wilhelm Landerholm figured out an algorithm to forecast delays 2 hours in advance!

Enter big data. Cars on the highway suffer from two problems: there is no monitoring system for tracking their movements and they are operated independently. Commuter train systems, however, do not have these defects. In fact, modern networks have traffic control centers with computer systems keeping track of each train’s location at all times. Ten years ago, this mountain of data would have been unassailable, but with today’s faster machines and this new algorithm it is possible to make accurate predictions about the future state of the train network in a longer time window. It’s a bit like weather forecasting but for your commute. (ed. emphasis mine)

While forecasting these delays is similar to forecasting the weather – and we all know how inaccurate that can be at times – it’s definitely a step in the right direction. Big Data has a lot of promise but it always comes down to the quality of the data you have. Before you jam every datapoint into your Hadoop cluster, think about how you use it later. Garbage in is always equal to Garbage out.

Tl;dr: Big Data is used to forecast transit delays.