PointNet

https://arxiv.org/abs/1612.00593 There is a need for 3D Deep Learning

However, 3D has ways to be represented:

  • Point Cloud
  • Mesh
  • Volumetric Projected View RGB(D)

Point Cloud is the closest to raw sensor data. Point cloud is canonical.

Invariance Permutation invariance Max Pooling gives best performance.

Offers a unified approach to various 3D recognition tasks:

  • Classification
  • Part Segmentation
  • Semantic Segmentation

I remember where they talk about these functions that are symmetric or something? YES, INVARIANT FUNCTIONS

  • Like the max function, or the sum function

Farthest point sampling to select centroids.

Walkthrough (CS231n 2025 Lec 15)

The invariances a point-cloud network must satisfy

A point cloud is an unordered set with (possibly with RGB). Two invariances:

  • Permutation invariance: for any permutation . The ordering in the input list is arbitrary.
  • Sampling invariance: output should depend only on the underlying geometry, not on which subset of points was sampled.

Symmetric-function decomposition

A function is symmetric if it’s invariant to permutation. Simple examples: , . PointNet uses the factorization:

  • β€” shared per-point MLP.
  • β€” a symmetric aggregator (PointNet picks max pool over the points per feature channel).
  • β€” final MLP producing the task output.

This is provably a universal approximator over symmetric continuous set functions.

Point-cloud distances (for generation / reconstruction)

When the prediction is itself a point cloud, you need a permutation-invariant loss:

  • Chamfer: . Cheap, asymmetric in effect β€” bad at penalizing density mismatches.
  • Earth Mover’s (EMD): over bijections (requires equal size). Expensive, but geometrically meaningful.

Graph extensions

EdgeConv (Wang et al. TOG 2019) treats points as graph nodes and -NN neighborhoods as edges β€” lets the per-point also see local geometric context, not just the point itself.

Source

CS231n 2025 Lec 15 slides ~63–75 (permutation / sampling invariance, symmetric-function decomposition , max-pool aggregator, Chamfer + EMD, graph-on-points / EdgeConv).