science-notes/source/statistics/metrics.rst
2023-10-18 22:14:14 +02:00

133 lines
4.4 KiB
ReStructuredText

Metrics
==================
Expected value
---------------
The expected value (*espérance*) noted :math:`\mathbb{E}[X]`, is a **theorical value**.
For example, when playing coin flipping, the expected value for getting heads or tails is 0.5.
For a random variable :math:`X` with :math:`n` possible outcomes, with the respective probabilities :math:`p_1,\cdots,p_n` of
occurring we have:
.. math::
\mathbb{E}[X]=x_1p_1+x_2p_2+\cdots+x_np_n
When working with a sample, the following is an unbiased estimator of the expected value (`source <https://stats.stackexchange.com/questions/518084/whats-the-difference-between-the-mean-and-expected-value-of-a-normal-distributi>`__):
.. math::
\overline{x}=\frac{\sum_{i=1}^n x_i}{n}
Variance
------------------
Variance can be seen as the expected squared deviation from the expected value of a random variable :math:`X`.
.. math::
\mathbb{V}[X]=\mathbb{E}[X-\mathbb{E}[X]]^2=\frac{\sum_{i=1}^n (x_i - \mathbb{E}[X])^2}{n}=\mathrm{Cov}(X,X)
When working with a sample, the following is an unbiased estimator of the variance:
.. math::
s=\frac{\sum_{i=1}^n(x_i-\overline{x})^2}{n-1}
To understand why denominator is :math:`n-1` see :ref:`Bessel's correction <bessel_correction>`.
Covariance
------------------
Covariance is a way to quantify the relationship between two random variables :math:`X` and
:math:`Y` (`source <https://www.youtube.com/watch?v=qtaqvPAeEJY>`__). Covariance **DOES NOT**
quantify how strong this correlation is! If covariance is:
Positive
For large values of :math:`X`, :math:`Y` is also taking large values
Negative
For large values of :math:`X`, :math:`Y` is also taking low values
Null
No correlation
.. math::
\mathrm{Cov}(X,Y)=\mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])]=\frac{\sum_{i=1}^n (x_i - \mathbb{E}[X])(y_i - \mathbb{E}[Y])}{n}
Standard deviation
-----------------------
Standard deviation provides a way to interprete the variance using the unit of :math:`X`.
.. math::
\sigma=\sqrt{\mathbb{V}[X]}
When working with a sample, the following is an unbiased estimator of the standard deviation:
.. math::
s=\sqrt{\frac{\sum_{i=1}^n(x_i-\overline{x})^2}{n-1}}
To understand why denominator is :math:`n-1` see :ref:`Bessel's correction <bessel_correction>`.
.. _sem:
Standard Error of the Mean
-----------------------------
Standard Error of the Mean (SEM) quantifies the error that is potentially made when computing the mean of a population.
.. math::
\mathrm{SEM}=\sigma_X^{-}=\sqrt{\frac{\mathbb{V}[X]}{n}}=\frac{\sigma}{\sqrt{n}}
When working with a sample of :math:`n` individuals, an estimator of the SEM is:
.. math::
s_{\overline{x}}=\frac{s}{\sqrt{n}}
Here is how to interpret it.
If :math:`n=1`, the error is at most :math:`\sqrt{\mathbb{V}[X]}=\sigma_X` which is the standard deviation or :math:`X`.
The more :math:`n` increases, the lower the error becomes.
The greater :math:`\mathbb{V}[X]`, the greater :math:`n` needs to be to ensure a low :math:`\mathrm{SEM}`.
More infos in `this video <https://www.youtube.com/watch?v=BwYj69LAQOI>`_.
If it is still unclear, see the following R code:
.. literalinclude:: code/sem.R
:language: R
Output example:
.. code-block:: console
----- Experiment 1 -----
Means SD: 1.22
SEM 1.26
----- Experiment 2 -----
Means SD: 1.26
SEM 1.26
----- Experiment 3 -----
Means SD: 1.27
SEM 1.26
Degree of Freedom
--------------------
The degree of freedom is a quantity defined for a given computation.
It corresponds to the number of parameters that are allowed to vary in that computation.
In other words, how many varying values are contributing to the computation.
For example, when computing the mean of a random variable :math:`X={x_1,...,x_n}`, there are :math:`n` parameters
that are allowed to change in the following formula:
.. math::
\overline{x}=\frac{\sum_{i=1}^n x_i}{n}
Thus, the degree of freedom in this computation is :math:`n`.
When computing the standard deviation of :math:`X`, we have:
.. math::
s^2=\frac{\sum_{i=0}^n (x_i-\overline{x})^2}{n}
In this case, the degree of freedom is :math:`n-1`. As the mean is already known, only :math:`n-1`
of the :math:`x_i` are allowed to vary. In fact, by knowing :math:`n-1` of the :math:`x_i`, we can deduce the last
one as follow:
.. math::
\overline{x}=\frac{(\sum_{i=1}^{n-1} x_i) + x_n}{n} \Longrightarrow x_n = n\overline{x} - (\sum_{i=1}^{n-1} x_i)