35 lines
1.6 KiB
Org Mode
35 lines
1.6 KiB
Org Mode
#+TITLE: Gradient Decent Based Polynomial Regression
|
|
#+AUTHOR: Loic Guegan
|
|
|
|
#+OPTIONS: toc:nil
|
|
|
|
#+LATEX_HEADER: \usepackage{fullpage}
|
|
#+latex_header: \hypersetup{colorlinks=true,linkcolor=blue}
|
|
|
|
First, choose a polynomial function $h_w(x)$ according to the data complexity.
|
|
In our case, we have:
|
|
\begin{equation}
|
|
h_w(x) = w_1 + w_2x + w_3x^2
|
|
\end{equation}
|
|
|
|
Then, we should define a cost function. A common approach is to use the *Mean Square Error*
|
|
cost function:
|
|
\begin{equation}\label{eq:cost}
|
|
J(w) = \frac{1}{2n} \sum_{i=0}^n (h_w(x^{(i)}) - y^{(i)})^2
|
|
\end{equation}
|
|
|
|
With $n$ the number of observations, $x^{(i)}$ the value of the independant variable associated with
|
|
the observation $y^{(i)}$. Note that in Equation \ref{eq:cost} we average by $2n$ and not $n$. This
|
|
is because it simplify the partial derivatives expression as we will see below. This is a pure
|
|
cosmetic approach which do not impact the gradient decent (see [[https://math.stackexchange.com/questions/884887/why-divide-by-2m][here]] for more informations). The next
|
|
step is to $min_w J(w)$ for each weight $w_i$ (performing the gradient decent, see [[https://towardsdatascience.com/gradient-descent-demystified-bc30b26e432a][here]]). Thus we
|
|
compute each partial derivatives:
|
|
\begin{align}
|
|
\frac{\partial J(w)}{\partial w_1}&=\frac{\partial J(w)}{\partial h_w(x)}\frac{\partial h_w(x)}{\partial w_1}\nonumber\\
|
|
&= \frac{1}{n} \sum_{i=0}^n (h_w(x^{(i)}) - y^{(i)})\\
|
|
\text{similarly:}\nonumber\\
|
|
\frac{\partial J(w)}{\partial w_2}&= \frac{1}{n} \sum_{i=0}^n x(h_w(x^{(i)}) - y^{(i)})\\
|
|
\frac{\partial J(w)}{\partial w_3}&= \frac{1}{n} \sum_{i=0}^n x^2(h_w(x^{(i)}) - y^{(i)})
|
|
\end{align}
|
|
|
|
|