machine-learning/linear_regression/polynomial.org
2021-02-13 14:39:28 +01:00

1.5 KiB

Gradient Decent Based Polynomial Regression

First, choose a polynomial function $h_w(x)$ according to the data complexity. In our case, we have:

\begin{equation} h_w(x) = w_1 + w_2x + w_3x^2 \end{equation}

Then, we should define a cost function. A common approach is to use the Mean Square Error cost function: \begin{equation}\label{eq:cost} J(w) = \frac{1}{2n} ∑i=0^n (h_w(x(i)) - \hat{y}(i))^2

\end{equation}

Note that in Equation \ref{eq:cost} we average by $2n$ and not $n$. This is because it get simplify while doing the partial derivatives as we will see below. This is a pure cosmetic approach which do not impact the gradient decent (see here for more informations). The next step is to $min_w J(w)$ for each weight $w_i$ (performing the gradient decent). Thus we compute each partial derivatives:

\begin{align} \frac{\partial J(w)}{\partial w_1}&=\frac{\partial J(w)}{\partial h_w(x)}\frac{\partial h_w(x)}{\partial w_1}\nonumber\\ &= \frac{1}{n} \sum_{i=0}^n (h_w(x^{(i)}) - \hat{y}^{(i)})\\ \text{similarly:}\nonumber\\ \frac{\partial J(w)}{\partial w_2}&= \frac{1}{n} \sum_{i=0}^n x(h_w(x^{(i)}) - \hat{y}^{(i)})\\ \frac{\partial J(w)}{\partial w_3}&= \frac{1}{n} \sum_{i=0}^n x^2(h_w(x^{(i)}) - \hat{y}^{(i)}) \end{align}