What's new in Driverless AI?

2018-12-26 00:00:00 AI Machine Learning h2oai

Arno, H2O’s CTO, gave a great 1+ hour overview in what’s new with Driverless AI version 1.4.1. If you check back in a few weeks/months, it’ll be even better. In all honesty, I have never seen a company innovate this fast.

Below are my notes from the video:

H2O-3 is the open source product
Driverless AI is the commercial product
Makes Feature Engineering for you
When you have Domain Knowledge, Feature Engineering can give you a huge lift
Salary, Jon Title, Zip Code example
What about people in this Zip Code, with # of cars >> generate mean of salaries
Create out of fold estimates
Don’t take your own prediction feature for training
Writes in Python, CUDA and C++ is under the hood that Python directs
Able to create good models in an automated way
Driverless AI does not handle images
Handles strings, numbers, and categorial
Can be 100’s of Gigabytes
Creates 100’s of models with 1,000’s of new features
Creates an ensemble model after its done
Then creates a exportable model (Java runtime or Python)
C++ version is being worked on
All standalone models
Connect with Python client or via the web browser
Changelog is on docs.h2o.ai
Tests against Kaggle datasets
BNP Paribas Kaggle set, Driverless AI ranked in the top 10 out of the box
Took Driverless AI 2 hours, whereas Grandmasters it took 2 months
Discussed how Logloss is interpreted
Uses Reusable Holdout(RH) and subsamples of RH
Driverless AI uses unsupervised methods to make supervised models
Uses XGBoost, GLM, LightGBM, TensorFlow CNN, and Rule Fit
Implemented in R’s datatable for feature engineering and munging
Working on a open source version of R’s datatable in Python
Overview in how Driverless AI handles outliers (AutoViz)
AutoViz only plots what you should see, not 100’s of scatterplots like Tableau
Overview on the GUI, what you can do
Validation and Test sets. How to use them and when
Checks data shift in training and testing set
Includes Machine Learning Interpretability suite
Does Time Series and NLP

And much more! Arno’s presentation style is excellent and he makes Data Science simply understood.