133 lines
4.4 KiB
ReStructuredText
133 lines
4.4 KiB
ReStructuredText
Metrics
|
|
==================
|
|
|
|
Expected value
|
|
---------------
|
|
|
|
The expected value (*espérance*) noted :math:`\mathbb{E}[X]`, is a **theorical value**.
|
|
For example, when playing coin flipping, the expected value for getting heads or tails is 0.5.
|
|
For a random variable :math:`X` with :math:`n` possible outcomes, with the respective probabilities :math:`p_1,\cdots,p_n` of
|
|
occurring we have:
|
|
|
|
.. math::
|
|
\mathbb{E}[X]=x_1p_1+x_2p_2+\cdots+x_np_n
|
|
|
|
When working with a sample, the following is an unbiased estimator of the expected value (`source <https://stats.stackexchange.com/questions/518084/whats-the-difference-between-the-mean-and-expected-value-of-a-normal-distributi>`__):
|
|
|
|
.. math::
|
|
\overline{x}=\frac{\sum_{i=1}^n x_i}{n}
|
|
|
|
Variance
|
|
------------------
|
|
|
|
Variance can be seen as the expected squared deviation from the expected value of a random variable :math:`X`.
|
|
|
|
.. math::
|
|
\mathbb{V}[X]=\mathbb{E}[X-\mathbb{E}[X]]^2=\frac{\sum_{i=1}^n (x_i - \mathbb{E}[X])^2}{n}=\mathrm{Cov}(X,X)
|
|
|
|
When working with a sample, the following is an unbiased estimator of the variance:
|
|
|
|
.. math::
|
|
s=\frac{\sum_{i=1}^n(x_i-\overline{x})^2}{n-1}
|
|
|
|
To understand why denominator is :math:`n-1` see :ref:`Bessel's correction <bessel_correction>`.
|
|
|
|
Covariance
|
|
------------------
|
|
|
|
Covariance is a way to quantify the relationship between two random variables :math:`X` and
|
|
:math:`Y` (`source <https://www.youtube.com/watch?v=qtaqvPAeEJY>`__). Covariance **DOES NOT**
|
|
quantify how strong this correlation is! If covariance is:
|
|
|
|
Positive
|
|
For large values of :math:`X`, :math:`Y` is also taking large values
|
|
Negative
|
|
For large values of :math:`X`, :math:`Y` is also taking low values
|
|
Null
|
|
No correlation
|
|
|
|
.. math::
|
|
\mathrm{Cov}(X,Y)=\mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])]=\frac{\sum_{i=1}^n (x_i - \mathbb{E}[X])(y_i - \mathbb{E}[Y])}{n}
|
|
|
|
Standard deviation
|
|
-----------------------
|
|
|
|
Standard deviation provides a way to interprete the variance using the unit of :math:`X`.
|
|
|
|
.. math::
|
|
\sigma=\sqrt{\mathbb{V}[X]}
|
|
|
|
When working with a sample, the following is an unbiased estimator of the standard deviation:
|
|
|
|
.. math::
|
|
s=\sqrt{\frac{\sum_{i=1}^n(x_i-\overline{x})^2}{n-1}}
|
|
|
|
To understand why denominator is :math:`n-1` see :ref:`Bessel's correction <bessel_correction>`.
|
|
|
|
.. _sem:
|
|
|
|
Standard Error of the Mean
|
|
-----------------------------
|
|
|
|
Standard Error of the Mean (SEM) quantifies the error that is potentially made when computing the mean of a population.
|
|
|
|
.. math::
|
|
\mathrm{SEM}=\sigma_X^{-}=\sqrt{\frac{\mathbb{V}[X]}{n}}=\frac{\sigma}{\sqrt{n}}
|
|
|
|
When working with a sample of :math:`n` individuals, an estimator of the SEM is:
|
|
|
|
.. math::
|
|
s_{\overline{x}}=\frac{s}{\sqrt{n}}
|
|
|
|
Here is how to interpret it.
|
|
If :math:`n=1`, the error is at most :math:`\sqrt{\mathbb{V}[X]}=\sigma_X` which is the standard deviation or :math:`X`.
|
|
The more :math:`n` increases, the lower the error becomes.
|
|
The greater :math:`\mathbb{V}[X]`, the greater :math:`n` needs to be to ensure a low :math:`\mathrm{SEM}`.
|
|
More infos in `this video <https://www.youtube.com/watch?v=BwYj69LAQOI>`_.
|
|
If it is still unclear, see the following R code:
|
|
|
|
.. literalinclude:: code/sem.R
|
|
:language: R
|
|
|
|
Output example:
|
|
|
|
.. code-block:: console
|
|
|
|
----- Experiment 1 -----
|
|
Means SD: 1.22
|
|
SEM 1.26
|
|
----- Experiment 2 -----
|
|
Means SD: 1.26
|
|
SEM 1.26
|
|
----- Experiment 3 -----
|
|
Means SD: 1.27
|
|
SEM 1.26
|
|
|
|
Degree of Freedom
|
|
--------------------
|
|
|
|
The degree of freedom is a quantity defined for a given computation.
|
|
It corresponds to the number of parameters that are allowed to vary in that computation.
|
|
In other words, how many varying values are contributing to the computation.
|
|
For example, when computing the mean of a random variable :math:`X={x_1,...,x_n}`, there are :math:`n` parameters
|
|
that are allowed to change in the following formula:
|
|
|
|
.. math::
|
|
\overline{x}=\frac{\sum_{i=1}^n x_i}{n}
|
|
|
|
Thus, the degree of freedom in this computation is :math:`n`.
|
|
When computing the standard deviation of :math:`X`, we have:
|
|
|
|
.. math::
|
|
s^2=\frac{\sum_{i=0}^n (x_i-\overline{x})^2}{n}
|
|
|
|
In this case, the degree of freedom is :math:`n-1`. As the mean is already known, only :math:`n-1`
|
|
of the :math:`x_i` are allowed to vary. In fact, by knowing :math:`n-1` of the :math:`x_i`, we can deduce the last
|
|
one as follow:
|
|
|
|
.. math::
|
|
\overline{x}=\frac{(\sum_{i=1}^{n-1} x_i) + x_n}{n} \Longrightarrow x_n = n\overline{x} - (\sum_{i=1}^{n-1} x_i)
|
|
|
|
|
|
|
|
|