Statistics

Distribution

For each distribution, we try to figure out:

  • What is its support?
  • What is the p.m.f.?
  • What are the parameters?
  • What is ? What is and s.d.?
  • Why is it important?

Discrete vs. Continuous Distributions

DiscreteContinuous
Expected Value
Variance
pmf to cdf
cdf to pdf

Distributions

Concepts

Measures of dispersion and symmetry

  1. Range = max - min
  2. IQR (inter-quartile range) =
  3. Variance (standard dev),
  4. Skewness - measures bias towards left or right side
  5. Kurtosis - measure of normality

1D Distribution

I was having trouble understanding what a 1D Distribution was, since I needed to use that to calculate the Wasserstein Metric. Um, I think it’s just with one variable.

2D Distribution

This is where we talk about multivariate distributions. https://en.wikipedia.org/wiki/Multivariate_normal_distribution

Distance between Distributions

How do you quantify how far away two distributions are? There seems to be quite a few methods, that I found from stackoverflow

I ran into this problem while working on my Poker AI, since Euclidean distance is not the best measure.

Heuristic

  • Minkowski-form
  • Weighted-Mean-Variance (WMV)

Nonparametric test statistics

  • 2 (Chi Square)
  • Kolmogorov-Smirnov (KS)
  • Cramer/von Mises (CvM)

Information-theory divergences

  • KL Divergence
  • Jensen–Shannon divergence (metric)
  • Jeffrey-divergence (numerically stable and symmetric)

Ground distance measures