From ee62e13342af25af50d569d3b8e20bfbd3959ac6 Mon Sep 17 00:00:00 2001 From: Loic Guegan Date: Mon, 16 Oct 2023 12:13:52 +0200 Subject: [PATCH] Minor changes --- source/statistics/bessel_correction.rst | 42 ++++++++++++++++++++++++- source/statistics/notations.rst | 4 +++ 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/source/statistics/bessel_correction.rst b/source/statistics/bessel_correction.rst index e24d727..c9ef0a9 100644 --- a/source/statistics/bessel_correction.rst +++ b/source/statistics/bessel_correction.rst @@ -1,7 +1,47 @@ .. _bessel_correction: +This page is inpired by `Wikipedia `__. + Bessel's Correction ----------------------- -Bessel's correction is the use of :math:`n-1` instead of :math:`n` in the formula for the sample +Bessel's correction is the use of :math:`n-1` instead of :math:`n` in the formulas for sample variance and sample standard deviation. +In fact, using :math:`n` as a denominator leads to a biased estimator. +This variance estimator is noted :math:`s^2_n`. +Lets compute the discrepency between population variance and the biased sample variance: + +.. math:: + \mathbb{E}[\sigma^2-s_n^2] &= \mathbb{E} \left[ \frac{1}{n} \sum_{i=1}^n(x_i - \mu)^2 - \frac{1}{n}\sum_{i=1}^n (x_i - \overline{x})^2 \right] + + &=\mathbb{E}\left[ \frac{1}{n} \sum_{i=1}^n\left((x_i^2 - 2 x_i \mu + \mu^2) - (x_i^2 - 2 x_i \overline{x} + \overline{x}^2)\right) \right] + + &=\mathbb{E}\left[ \frac{1}{n} \sum_{i=1}^n\left(\mu^2 - \overline{x}^2 + 2 x_i (\overline{x}-\mu) \right) \right] + + &=\mathbb{E}\left[ \frac{1}{n} \sum_{i=1}^n \left(\mu^2 - \overline{x}^2 \right) + \frac{1}{n} \sum_{i=1}^n 2 x_i (\overline{x} - \mu) \right] + + &=\mathbb{E}\left[ \mu^2 - \overline{x}^2 + \frac{1}{n} \sum_{i=1}^n 2 x_i (\overline{x} - \mu) \right] + + &=\mathbb{E}\left[ \mu^2 - \overline{x}^2 + 2\overline{x}(\overline{x} - \mu) \right] + + &=\mathbb{E}\left[ \mu^2 - 2 \overline{x} \mu + \overline{x}^2 \right] + + &= \mathbb{E}\left[ (\overline{x} - \mu)^2 \right] + + &= \mathrm{Var}[\overline{x}] + + &= \frac{\sigma^2}{n} + + +This result shows us that the discrepency between the population and sample variance is :math:`\frac{\sigma^2}{n}`. +From this result we can deduce how :math:`S_n^2` must be adjusted: + +.. math:: + \mathbb{E} \left[ s^2_n \right] = \sigma^2 - \frac{\sigma^2}{n} = \frac{n-1}{n} \sigma^2 + +Thus, the adjustment factor is :math:`\frac{n-1}{n}`. As such, an unbiased estimator of :math:`\sigma^2` is: + +.. math:: + s^2 = \frac{s^2_n}{\frac{n-1}{n}} = \frac{n}{n-1} s_n^2 &= \frac{n}{n-1} \frac{1}{n} \sum_{i=1}^n (x_i-\overline{x})^2 + + &= \frac{1}{n-1} \sum_{i=1}^n (x_i-\overline{x})^2 diff --git a/source/statistics/notations.rst b/source/statistics/notations.rst index d111887..90ab557 100644 --- a/source/statistics/notations.rst +++ b/source/statistics/notations.rst @@ -26,15 +26,19 @@ Two different notation conventions are used. The one to use depends if you are w * - Metric - Population - Sample + - Notes * - Sample mean - :math:`\mu` - :math:`\overline{x}` + - * - Variance - :math:`\sigma^2` - :math:`s^2` + - :math:`s^2_n` without `Bessel's Correction `__ * - Standard deviation - :math:`\sigma` - :math:`s` + - :math:`s_n` without `Bessel's Correction `__ To determine the metric of a population (say :math:`\mu`) using a sample, we use an estimator. In the case of :math:`\mu`, we use :math:`\overline{x}` as an estimator.