When you’re dealing with a classification problem in machine learning, good labeled data is crucial. The more time you spend labeling training data correctly, the better. This is because your model’s performance and deployment will depend on it. Always remember that garbage in means garbage out.
Thoughts on labeling data
I recently listened to a great O’Reilly podcast on this subject. They interviewed Lukas Biewald, Chief Data Scientist and Founder of CrowdFlower. CrowdFlower provides their clients with top notch labeled training data for various machine learning tasks, and they’re busy!
The few bits that caught my ear were how much of the training data is used in deep learning. They’re also seeing more image labeled data for self driving cars.
The best part of the interview as Lukas’s discussion on using a Raspberry Pi with Tensor Flow! How cool is that?
Found this great real estate podcast from Fred Wilson’s AVC blog. Great listen if you’re into real estate. This episode talks about the NYC real estate market and what’s happen with it. I really like the “hot / room temperature / cold” game for all the different Manhattan areas.