Standard Deviation and Variance

Deviation

Definition

Deviation is the distance from the mean of a particular piece of data

Variance

Definition

The variance of a dataset is the average of the squared distances from the mean

where:

  • is the population size
  • is the mean of the dataset

Re-writing this with expected value, given a random variable :

Properties

Standard Deviation

Definition

Standard deviation is the measure of how spread out numbers are. It is defined as the square root of the vairance.

Question - Why squared differences?

Suppose we have a dataset containing .
If we just sum the differences from the mean, we would have , which won't work.
If we try absolute value, we have , which looks fine.

But consider the dataset . The mean is still , but if we use absolute value, we find our variance to also be . But this can't be right, since the data is more spread out.

Instead, if we square the numbers instead, we find:

and

so the standard deviation will be larger if the data is more spread out.