Neural Market Trends

Understanding SSD MultiBox — Real-Time Object Detection In Deep Learning

You can read this in one minute.

My Notes:

  • Alexnet in 2012 kicked off the Image Recognition with CNN’s
  • Deep Learning is better at image classification than humans, BUT
  • Humans do more than classify, they localize and classify each element of an image
  • Example. There is a dog and a cat in an image against a wooden fence. Not just “dog” or “cat”
  • Region-CNN’s put boundaries on classified images and their class outputs, i.e. “person”, “dog”, etc
  • Training CNN’s is massive
  • Training is done in multiple phases (i.e regions vs the general classification)
  • Scoring (inference) is too slow when dealing with non-training data
  • Read R-CNN, Fast R-CNN, and Faster R-CNN
  • New architectures have been devised to handle this
  • Those are YOLO (You Only Look Once) and SSD (Single Shot Detector)
  • Single shot - object is localized and classified in a single forward pass
  • Multibox - bounding box technique
  • Detector - the network is an object detector that also classifies said objects

  • Multibox loss functions for confidence loss is Cross Entropy
  • Multibox loss functions for location loss is L2-Norm
  • Training SSD - use the PascalVOC an COCO datasets
Read the entire source article here.

Commentary