Z-Test ------- The z-test is used to assess if the mean :math:`\overline{x}` of sample :math:`X` differs from the one of a known population. The *significance level* of this difference is determined by a *p-value* threshold chosen prior doing the test. Conditions for using a z-test: #. Population is normally distributed #. Population :math:`\mu` and :math:`\sigma` is known #. Sample size is greater than 30 (see note below) .. note:: According to central limit theorem, a distribution is well approximated when reaching 30 samples. See `here `__ for more infos. To perform a z-test with a sample :math:`X` of size :math:`n`, you should compute the sample *standard score* (or *z-score*). The *z-score*, noted :math:`Z`, characterizes how far from the population mean :math:`\mu` your sample mean :math:`\overline{x}` is, in unit of standard deviation :math:`\sigma`. It is computed as follow: .. math:: Z=\frac{\overline{x}-\mu}{\sigma_\overline{x}}=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}} .. note:: The SEM is used in the denominator to account for inaccuracies of :math:`\overline{x}`. The more samples are collected, the more the denominator tends toward :math:`\sigma`. See :ref:`SEM ` for more details. From :math:`Z`, a *p-value* can be derived using the :math:`\mathcal{N}(0,1)` :ref:`CDF ` noted :math:`\Phi_{0,1}(x)`: * Left "tail" of the :math:`\mathcal{N}(0,1)` distribution: .. math:: \alpha &= P(\mathcal{N}(0,1) \mu`. | A *two-tailed* z-test checks whether :math:`\overline{x} \ne \mu`. The following code shows you how to obtain the *p-value* in R: .. literalinclude:: ../../code/ztest_pvalue.R :language: R Output example: .. code-block:: console Alpha approximated is 0.0359588035958804 Alpha from built-in CDF 0.0359303191129258 If the :math:`\alpha` value given by the test is lower or equal to the *p-value* threshold chosen initially, :math:`H_0` is rejected and :math:`H_1` is considered accepted. An alternative way of doing the z-test is to build a **rejection region** from the *p-value*. This is done by using the reverse :ref:`CDF ` function :math:`\Phi_{0,1}^{-1}(x)` as in the following code: .. literalinclude:: ../../code/ztest_rejection_region.R :language: R Output: .. code-block:: console Rejection region for left tail: Z in ]-inf,-3.43161440362327] Rejection region for right tail: Z in [3.4316144036233,+inf[ Thus, if the z-score if part of one of the rejection regions, :math:`H_0` is rejected and :math:`H_1` is considered accepted. Examples ======== One-tailed ^^^^^^^^^^^ This exercice is inpired from `this video `__. A complain was registered stating that the boys in the municipal school are underfed. The average weight of boys of age 10 is 32kg with a standard deviation of 9kg. A sample of 25 boys of age 10 from the school is selected. Their average weight is 29.5kg. We want to check whether the complain is true or not with a confidence level of :math:`\alpha=0.05`. **--- Solution ---** Hypothesis: * :math:`H_0` : No significant difference (:math:`\overline{x} \ge 32`), the boys from the are not underfed * :math:`H_1` : There is significant difference (:math:`\overline{x} < 32`), the boys from the are underfed .. math:: Z=\frac{29.5-32}{\frac{9}{\sqrt{25}}}=-1.388889 From this z-score, the *p-value* is 0.08243327. As it is greater than 0.05, we cannot reject :math:`H_0`. Thus, the boys from the are not underfed. Two-tailed ^^^^^^^^^^^ This exercice is inpired from `this website `__. The USA mean public school yearly funding is $6800 per student per year, with a standard deviation of $400. We want to assess if a certain state in the USA, Michigan, receives a significantly different amount of public school funding (per student) than the USA average, with :math:`\alpha=0.05`. A sample of 100 students reveals that in average, they received $6873. .. note:: Notice, we are not saying **"significantly lower amount"** or **"significantly higher amount"** but **"significantly different amount"**. This is a sign that a two-tailed z-test is required since we should check for both, lower and higher. **--- Solution ---** Hypothesis: * :math:`H_0` : No significant difference (:math:`\overline{x} = 6800`), Michigan receives the same amount of public school funding per student * :math:`H_1` : There is significant difference (:math:`\overline{x} \ne 6800`), Michigan do not receives the same amount of public school funding per student .. math:: Z=\frac{6873-6800}{\frac{400}{\sqrt{100}}}=1.825 | The *p-value* associated with the left tail (using :math:`-Z` with the CDF) is 0.03400051. | Thus, as we are doing a *two-tailed* z-test the *p-value* is :math:`2\times 0.03400051 = 0.06800103`. | We multiply by two has the two tails of the normal law are symetric. Since :math:`0.06800103 >> 0.05` we cannot reject the null hypothesis :math:`H_0`. Thus, Michigan receives the same amount of public school funding per student.