Run-Length Encoding (RLE)

Found out about this through a Kaggle competition.

Idea: Replace consecutive repeats with (value, count).

  • Best when: there are long runs of the same value, like:
    • bitmaps with large solid areas
    • simple masks
    • sparse-ish sequences where zeros repeat
    • certain sensor streams after quantization

Example: Data: AAAAAABBBCC
Encoded: (A,6)(B,3)(C,2)

Pros: Extremely simple, very fast.
Cons: If data alternates a lot (e.g., ABABAB...), it can get bigger than the original.

Solution taken from here: https://ccshenyltw.medium.com/run-length-encode-and-decode-a33383142e6b

# ref: https://www.kaggle.com/stainsby/fast-tested-rle
def mask_to_rle(img):
    """
    img: numpy array, 1 - mask, 0 - background
    Returns run length as string formatted
    """
    pixels = img.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    return ' '.join(str(x) for x in runs)
 
 
def rle_to_mask(mask_rle: str, shape=DEFAULT_IMAGE_SHAPE):
    """
    mask_rle: run-length as string formatted (start length)
    shape: (height,width) of array to return
    Returns numpy array, 1 - mask, 0 - background
 
    """
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0] * shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape)  # Needed to align to RLE direction