science-notes/source/statistics/tests/parametric/ztest.rst

Z-Test
-------

The z-test is used to assess if the mean :math:`\overline{x}` of sample :math:`X` significantly differ from the one of a known population.
The *significance level* is determined by a *p-value* threshold chosen prior doing the test.

Conditions for using a z-test:

#. Population is normally distributed
#. Population :math:`\mu` and :math:`\sigma` is known
#. Sample size is greater than 30 (see note below)

.. note::
   According to central limit theorem, a distribution is well approximated when reaching 30 samples.
   See `here <https://statisticsbyjim.com/basics/central-limit-theorem/>`__ for more infos.


To perform a z-test, you should compute the *standard score* (or *z-score*) of your sample.
It characterizes how far from the population mean :math:`\mu` your sample mean :math:`\overline{x}` is, in unit of standard deviation :math:`\sigma`.
It is computed as follow:

.. math::
   Z=\frac{\overline{x}-\mu}{\sigma}

.. note::
   The following formula can also be seen, when the original population :math:`\sigma` is unknown:

   .. math::
         Z=\frac{\overline{x}-\mu}{\mathrm{SEM}}=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}

   In this case, :math:`Z` technically follow a t-distribution (student test).
   However, if :math:`n` is sufficiently large, the distribution followed by :math:`Z` is very close to a normal one.
   So close that, using z-test in place of the student test to compute *p-values* leads to nominal differences (`source <https://stats.stackexchange.com/questions/625578/why-is-the-sample-standard-deviation-used-in-the-z-test>`__).

From :math:`Z`, the z-test *p-value* can be derived using the :math:`\mathcal{N}(0,1)` :ref:`CDF <CDF>`.
That *p-value* is computed as follow:

* Left "tail" of the :math:`\mathcal{N}(0,1)` distribution:

  .. math::
     \alpha=P(\mathcal{N}(0,1)<Z\sigma)=P(\mathcal{N}(0,1)<Z\times 1)=P(\mathcal{N}(0,1)<Z)

* Right "tail" of the :math:`\mathcal{N}(0,1)` distribution:

  .. math::
     \alpha=1-P(\mathcal{N}(0,1)<Z\sigma)=1-P(\mathcal{N}(0,1)<Z\times 1)=1-P(\mathcal{N}(0,1)<Z)

.. image:: ../../figures/normal_law_tails.svg
   :align: center

If a z-test is done over one tail (left or right) it is called a **one-tailed** z-test.
If a z-test is done over both tails (left and right) it is called a **two-tailed** z-test.

If the :math:`\alpha` value given by the test is lower or equal to the *p-value* threshold chosen prior the test,
:math:`H_0` is rejected and :math:`H_1` is considered accepted.

One-tailed vs Two-tailed
========================


One tailed two tailed:
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-the-differences-between-one-tailed-and-two-tailed-tests/

example 2 tailed https://www.mathandstatistics.com/learn-stats/hypothesis-testing/two-tailed-z-test-hypothesis-test-by-hand


Examples
========