Semantic Segmentation

Traditional technique using Graph Based Image Segmentation

Template from Kaggle:

From Article

There are 3 popular architectures:

  1. Fully Convolutional Network (FCN)
  2. U-Net (originally came from medical field)
  3. Mask R-CNN

In segmentation, we group adjacent regions which are similar to each other based on some criteria such as color, texture etc.


  • Used for Lane Detection
  • Wow, this is actually very commonly used. You can treat 2D to 3D as a semantic segmentation problem, where each classes is a particular depth, heard this idea from this Computerphile video.
  • Face layers, each depth is like a different class

Selective Search

Image segmentation is the process of defining which pixels of an object class are found in an image.

Label each pixel in the image with a category label. We don’t differentiate instances, only care about pixels (i.e. if they are two cows, we don’t really differentiate that)

Semantic image segmentation will mark all pixels belonging to that tag, but won’t define the boundaries of each object.


  • Hough transform (see OpenCV resource)
  • Canny Edge Detection (see OpenCV resource)
  • Run-Length Encode

CNN for Semantic Segmentation

We can use ConvNets to solve the image segmentation problem. Usually, there is downsampling and upsampling inside the network, kind of like an Autoencoder?

  • Downsampling using Pooling for example
  • Upsampling
    • Bed of nails
    • Max Unpooling
    • Transpose Convolution (learnable upsampling, just another convolution) ??
      • Other names: Deconvolution, Upconvolution, fractionally strided convolution, backward strided convolution,
Supplementary Reading: ConvNets for Semantic Segmentation
  • Badrinarayanan, V., Kendall, A., & Cipolla, R. (2015). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561.
  • Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017, July). Pyramid scene parsing network. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (pp. 2881-2890). (State of the art)

From article

Object Detection + Semantic Segmentation = Instance Segmentation (this is really cool)

  • Mask R-CNN is used for Instance Segmentation (2017) ..? Idk if it is still state of the art

Image segmentation tasks can be classified into three groups based on the amount and type of information they convey:

  1. Semantic segmentation
  2. Instance segmentation
  3. Panoptic segmentation

The history:

Encoder decoder architectures for semantic segmentation became popular with the onset of works like SegNet (by Badrinarayanan et. a.) in 2015.