Bias vs. Variance

Statistics

I don't know about others but when I started digging into machine-learning I had some problems understanding bias-variance until I found a nice target shooting analogy.

Target

Out little target chart :) with some params.

  import numpy as np
  color = hue(1)
  size = 50
  p = plot([], figsize=7)
  p += plot(circle((0,0), 1))
  p += plot(circle((0,0), 3))
  p += plot(circle((0,0), 6))
  p

High bias and high variance

Worst-case scenario where hits are all over the places (high variance) and far away from the center of target, top-right skewed (high bias).

  variance = 5
  bias = 3
  samples = variance * np.random.random_sample((15, 2)) + bias
  hbhv = p + plot(point(samples, rgbcolor=color, size=size))
  hbhv

/img/bvsv/bvsv-hbhv.png

High bias and low variance

This time the shots are focused (low variance) within a small area that is still far away (high bias) from the target.

  variance = 2
  bias = 3
  samples = variance * np.random.random_sample((15, 2)) + bias
  hblv = p + plot(point(samples, rgbcolor=color, size=size))
  hblv

/img/bvsv/bvsv-hblv.png

Low bias and high variance

This time all hots are closer to center (low bias) but still spread all over the places (high variance).

  variance = 4
  bias = 0.5
  samples = variance * np.random.random_sample((15, 2)) + bias
  lbhv = p + plot(point(samples, rgbcolor=color, size=size))
  lbhv

/img/bvsv/bvsv-lbhv.png

Low bias and low variance

The best case scenario, focused (low variance) and very close to center (low bias).

  variance = 1.5
  bias = 0.5
  samples = variance * np.random.random_sample((15, 2)) + bias
  lblv = p + plot(point(samples, rgbcolor=color, size=size))
  lblv

/img/bvsv/bvsv-lblv.png

Happy shooting!!!