science-notes/source/statistics/metrics.rst
2023-10-15 13:23:43 +02:00

101 lines
3.2 KiB
ReStructuredText

Metrics
==================
* **Expected value/Espérance**: Noted :math:`\mathbb{E}[X]`, is a **theorical value**. For example, when playing coin
flipping, the expected value for getting heads or tails is 0.5.
Variance
------------------
Variance can be seen as the expected squared deviation from the expected value of a random variable :math:`X`.
.. math::
\mathbb{V}[X]=\mathbb{E}[X-\mathbb{E}[X]]^2=\frac{\sum_{i=1}^n (x_i - \mathbb{E}[X])^2}{n}=\mathrm{Cov}(X,X)
Covariance
------------------
Covariance is a way to quantify the relationship between two random variables :math:`X` and
:math:`Y` (`source <https://www.youtube.com/watch?v=qtaqvPAeEJY>`_). Covariance **DOES NOT**
quantify how strong this correlation is! If covariance is:
Positive
For large values of :math:`X`, :math:`Y` is also taking large values
Negative
For large values of :math:`X`, :math:`Y` is also taking low values
Null
No correlation
.. math::
\mathrm{Cov}(X,Y)=\mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])]=\frac{\sum_{i=1}^n (x_i - \mathbb{E}[X])(y_i - \mathbb{E}[Y])}{n}
Standard deviation
-----------------------
Standard deviation provides a way to interprete the variance using the unit of :math:`X`.
.. math::
\sigma=\sqrt{\mathbb{V}[X]}
Standard Error of the Mean
-----------------------------
Standard Error of the Mean (SEM) quantifies the error that is potentially made when computing the mean.
.. math::
\mathrm{SEM}=\sigma_X^{-}=\sqrt{\frac{\mathbb{V}[X]}{n}}=\frac{\sigma}{\sqrt{n}}
Here is how to interpret it.
If :math:`n=1`, the error is at most :math:`\sqrt{\mathbb{V}[X]}=\sigma_X` which is the standard deviation or :math:`X`.
The more :math:`n` increases, the lower the error becomes.
The greater :math:`\mathbb{V}[X]`, the greater :math:`n` needs to be to ensure a low :math:`\mathrm{SEM}`.
More infos in `this video <https://www.youtube.com/watch?v=BwYj69LAQOI>`_.
If it is still unclear, see the following R code:
.. literalinclude:: code/sem.R
:language: R
Output example:
.. code-block:: console
----- Experiment 1 -----
Means SD: 1.22
SEM 1.26
----- Experiment 2 -----
Means SD: 1.26
SEM 1.26
----- Experiment 3 -----
Means SD: 1.27
SEM 1.26
Degree of Freedom
--------------------
The degree of freedom is a quantity defined for a given computation.
It corresponds to the number of parameters that are allowed to vary in that computation.
In other words, how many varying values are contributing to the computation.
For example, when computing the mean of a random variable :math:`X={x_1,...,x_n}`, there are :math:`n` parameters
that are allowed to change in the following formula:
.. math::
\overline{x}=\frac{\sum_{i=1}^n x_i}{n}
Thus, the degree of freedom in this computation is :math:`n`.
When computing the standard deviation of :math:`X`, we have:
.. math::
\hat{\sigma}_x=\frac{\sum_{i=0}^n (x_i-\overline{x})^2}{n}
In this case, the degree of freedom is :math:`n-1`. As the mean is already known, only :math:`n-1`
of the :math:`x_i` are allowed to vary. By knowing :math:`n-1` of the :math:`x_i`, we can deduce the last
one as follow:
.. math::
\overline{x}=\frac{(\sum_{i=1}^{n-1} x_i) + x_n}{n} \Longrightarrow x_n = n\overline{x} - (\sum_{i=1}^{n-1} x_i)