2023-10-18 12:52:09 +02:00
Z-Test
-------
2023-10-18 21:34:59 +02:00
The z-test is used to assess if the mean :math: `\overline{x}` of sample :math: `X` differs from the one of a known population.
The *significance level* of this difference is determined by a *p-value* threshold chosen prior doing the test.
2023-10-18 12:52:09 +02:00
Conditions for using a z-test:
#. Population is normally distributed
#. Population :math: `\mu` and :math: `\sigma` is known
#. Sample size is greater than 30 (see note below)
.. note ::
According to central limit theorem, a distribution is well approximated when reaching 30 samples.
See `here <https://statisticsbyjim.com/basics/central-limit-theorem/> `__ for more infos.
2023-10-18 21:34:59 +02:00
To perform a z-test, you should compute the *standard score* (or *z-score* ) of your sample :math: `X` .
The *z-score* , noted :math: `Z` , characterizes how far from the population mean :math: `\mu` your sample mean :math: `\overline{x}` is, in unit of standard deviation :math: `\sigma` .
2023-10-18 12:52:09 +02:00
It is computed as follow:
.. math ::
Z=\frac{\overline{x}-\mu}{\sigma}
.. note ::
The following formula can also be seen, when the original population :math: `\sigma` is unknown:
.. math ::
Z=\frac{\overline{x}-\mu}{\mathrm{SEM}}=\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}
2023-10-18 15:25:55 +02:00
In this case, :math: `Z` technically follow a t-distribution (student test).
However, if :math: `n` is sufficiently large, the distribution followed by :math: `Z` is very close to a normal one.
So close that, using z-test in place of the student test to compute *p-values* leads to nominal differences (`source <https://stats.stackexchange.com/questions/625578/why-is-the-sample-standard-deviation-used-in-the-z-test> `__ ).
2023-10-18 21:34:59 +02:00
From :math: `Z` , a *p-value* can be derived using the :math: `\mathcal{N}(0,1)` :ref: `CDF <CDF>` noted :math: `\Phi_{0,1}(x)` :
2023-10-18 15:25:55 +02:00
* Left "tail" of the :math: `\mathcal{N}(0,1)` distribution:
.. math ::
2023-10-18 21:34:59 +02:00
\alpha &= P(\mathcal{N}(0,1)<Z\sigma)
&=P(\mathcal{N}(0,1)<Z\times 1)
&=P(\mathcal{N}(0,1)<Z)=\Phi_{0,1}(Z)
2023-10-18 15:25:55 +02:00
* Right "tail" of the :math: `\mathcal{N}(0,1)` distribution:
.. math ::
2023-10-18 21:34:59 +02:00
\alpha &= 1-P(\mathcal{N}(0,1)<Z\sigma)
&=1-P(\mathcal{N}(0,1)<Z\times 1)
&=1-P(\mathcal{N}(0,1)<Z)=1-\Phi_{0,1}(Z)
2023-10-18 15:25:55 +02:00
.. image :: ../../figures/normal_law_tails.svg
:align: center
2023-10-18 21:34:59 +02:00
|
| If the test is done over one tail (left OR right) it is called a **one-tailed** z-test.
| If the test is done over both tails (left AND right) it is called a **two-tailed** z-test.
2023-10-18 15:25:55 +02:00
2023-10-18 17:43:35 +02:00
The following code shows you how to obtain the *p-value* in R:
.. literalinclude :: ../../code/ztest_pvalue.R
:language: R
Output example:
.. code-block :: console
Alpha approximated is 0.0359588035958804
Alpha from built-in CDF 0.0359303191129258
2023-10-18 21:34:59 +02:00
If the :math: `\alpha` value given by the test is lower or equal to the *p-value* threshold chosen initially,
2023-10-18 15:25:55 +02:00
:math: `H_0` is rejected and :math: `H_1` is considered accepted.
2023-10-18 20:32:31 +02:00
An alternative way of doing the z-test is to build a **rejection region** from the *p-value* .
This is done by using the reverse :ref: `CDF <CDF>` function :math: `\Phi_{0,1}^{-1}(x)` as in the following code:
2023-10-18 17:43:35 +02:00
2023-10-18 20:32:31 +02:00
.. literalinclude :: ../../code/ztest_rejection_region.R
:language: R
Output:
.. code-block :: console
2023-10-18 20:55:32 +02:00
Rejection region for left tail: Z in ]-inf,-3.43161440362327]
Rejection region for right tail: Z in [3.4316144036233,+inf[
2023-10-18 20:32:31 +02:00
Thus, if the z-score if part of one of the rejection regions, :math: `H_0` is rejected and :math: `H_1` is considered accepted.
2023-10-18 17:43:35 +02:00
2023-10-18 15:25:55 +02:00
One-tailed vs Two-tailed
========================
2023-10-18 12:52:09 +02:00
One tailed two tailed:
https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-the-differences-between-one-tailed-and-two-tailed-tests/
example 2 tailed https://www.mathandstatistics.com/learn-stats/hypothesis-testing/two-tailed-z-test-hypothesis-test-by-hand
Examples
========