Descriptor
There are LOTS of popular feature descriptors
 HOG: Histogram of Oriented Gradients
 SIFT: Scale Invariant Feature Transform
 SURF: SpeededUp Robust Features
 GLOH: Gradient Location and Orientation Histogram
 BRIEF: Binary Robust Independent Elementary Features
 ORB: Oriented FAST and rotated BRIEF
 BRISK: Binary Robust Invariant Scalable Keypoints
 FREAK: Fast REtinA Keypoint
Nowadays
There are also learningbased methods, which learn the features instead of handengineering them nowadays.
SIFT: Scale Invariant Feature Transform
 Handengineered
Image content is transformed into features that are invariant to
 image translation
 rotation
 scale
Partially invariant to
 illumination changes
 affine transformations and 3D projections
Suitable for detecting visual landmarks
 from different angles and distances
 with a different illumination
So how does it achieve that?
We are about to find out.
 $p,s,r$ and viewpoint dependent
 The 128dim descriptor $f$ is mainly independent
So how does it work? The Cyrill illustration are super helpful.
So high level:
So basically, you are computing histograms of gradients, and basically flatten the histograms.
But you can sometimes have wrong data associations, such as in the example below
You can do a ratio test to eliminate some of the wrong measures.
3 Step test to eliminate ambiguous matches for a query feature
 Step: Find closest two descriptors to $q$, called p1 and p2 based on the Euclidian distance $d$
 Step: Test if distance to best match is smaller than a threshold: $d(q,p_{1})<T$
 Step: Accept match only if the best match is substantially better than second: $d(q,p_{2})d(q,p_{1})β<21β$
 Loweβs Ratio test works well
 There will still remain few outliers
 Outliers require extra treatment
Binary Descriptors
This is about computing descriptors fast.
The issue is that SIFT takes a pretty long time. SIFT are not the way to go.
 GPU implementations of SIFT exist, but it is compute heavy
Fairly simple strategy
 Select a patch around a keypoint
 Select a set of pixel pairs in that patch
 For each pair, compare the intensities
$b={10βifΒI(s_{1})<I(s_{2})otherwiseβ$
 $I(s_{1})$ refers to the intensity of pixel $s_{1}$
Concatenate all $b$βs to a bit string.
There is a simple strategy for how the pairs are chosen.
Key Advantages of Binary Descriptors
 Compact descriptor: The number of pairs gives the length in bits
 Fast to compute: Simply intensity value comparisons
 Trivial and fast to compare: Hamming Distance given by $d_{hamming}(B_{1},B_{2})=sum(xor(B_{1},B_{2}))$
The key thing is, how do we select which pair of strings should we choose?
Different Binary Descriptors Differ Mainly by the Strategy of Selecting the Pairs.
BRIEF : Binary robust independent elementary features

First binary image descriptor

Proposed in 2010

256 bit descriptor

So there are 256 different pairs

Provides five different geometries as sampling strategies

Noise: operations performed on a smoothed image to deal with noise
BRIEF Sampling Pairs
 G I: Uniform random sampling
 G II: Gaussian sampling
 G III: $s_{1}$ Gaussian; $s_{2}$ Gaussian centered around $s_{1}$
 G IV: Discrete location from a coarse polar gird
 G V: $s_{1}=(0,0)$ ; $s_{2}$ are all location from a coarse polar grid
Now, the issue is that orientation is a problem. So we introduce ORB.
ORB: Oriented FAST Rotated BRIEF An extension to BRIEF that
 Adds rotation compensation
 Learns the optimal sampling pairs
I skipped the notes hereβ¦
ORB vs. SIFT
 ORB is 100x faster than SIFT
 ORB: 256 bit vs. SIFT: 4096 bit
 ORB is not scale invariant (achievable via an image pyramid)
 ORB mainly inplane rotation invariant
 ORB has a similar matching performance as SIFT (w/o scale)
 Several modern online systems (e.g. SLAM) use binary features