Add more details to the logistic regression

author: manzerbredes <manzerbredes@mailbox.org> 2021-02-20 20:36:33 +0100
committer: manzerbredes <manzerbredes@mailbox.org> 2021-02-20 20:36:33 +0100
commit: 14856930a79b4246bff951330e56200ba043d34d (patch)
tree: 3e04fb5ca09a8d1bf188e47705528a2cd745e148
parent: 8e79c803545ff937b529505aca938f62c4825ba3 (diff)
4 files changed, 84 insertions, 1 deletions
diff --git a/logistic_regression/binary.org b/logistic_regression/binary.org
new file mode 100644
index 0000000..59be860
--- /dev/null
+++ b/logistic_regression/binary.org
@@ -0,0 +1,79 @@
+#+TITLE: Binary Logistic  Regression
+#+AUTHOR: Loic Guegan
+
+#+OPTIONS: toc:nil
+
+#+LATEX_HEADER: \usepackage{fullpage}\usepackage{cancel}
+#+latex_header: \hypersetup{colorlinks=true,linkcolor=blue}
+
+Binary logistic regression are used to predict an binary issue (win/loss, true/false) according to
+various parameters. First, we have to choose a polynomial function $h_w(x)$ according to the data
+complexity (see \textit{data/binary\_logistic.csv}).  In our case, we want to predict our issue (1
+or 0) according to two parameters. Thus:
+\begin{equation}
+h_w(x_1,x_2) = w_1 + w_2x_1 + w_3x_2
+\end{equation}
+However, the function we are looking for should return a *binary* result! To achieve this goal, we
+can use a sigmoid (or logistic) with the following property $\mathbb{R} \to ]0;1[$ with the following
+form:
+#+BEGIN_SRC python :results file :exports none :session 
+  import numpy as np
+  import matplotlib.pyplot as plt
+
+  x=np.arange(-5,5,0.1)
+  plt.xlabel("X")
+  plt.ylabel("Y")
+  plt.title("Sigmoid Function")
+  plt.text(-3,0.5,r'$\frac{1}{1+e^{-x}}$',fontsize=30)
+  plt.plot(x,1/(1+np.exp(-x)))
+  plt.grid()
+  plt.savefig("sigmoid.png")
+  "sigmoid.png"
+  plt.close()
+#+END_SRC
+#+ATTR_LATEX: :width 10cm
+[[file:sigmoid.png]]
+
+To this end, we can define the following function:
+\begin{equation}\label{eq:cost}
+    g_w(x_1,x_2) = \frac{1}{1+e^{-h_w(x_1,x_2)}}
+\end{equation}
+
+The next step is to define a cost function. A common approach in binary logistic function is to use
+the *Cross-Entropy* loss function. It is much more convenient than the classical Mean Square Error
+used in polynomial regression. Indeed, the gradient is stronger even for small error (see [[https://www.youtube.com/watch?v=gIx974WtVb4&t=110s][here]] for
+more informations). Thus, it looks like the following:
+\begin{equation}\label{eq:cost}
+    J(w) = -\frac{1}{n} \sum_{i=0}^n \left[y^{(i)}log(g_w(x_1^{(i)},x_2^{(i)})) + (1-y^{(i)})log(1-g_w(x_1^{(i)},x_2^{(i)}))\right]
+\end{equation}
+
+With $n$ the number of observations, $x_j^{(i)}$ is the value of the $j^{th}$ independant variable
+associated with the observation $y^{(i)}$. The next step is to $min_w J(w)$ for each weight $w_i$
+(performing the gradient decent, see [[https://towardsdatascience.com/gradient-descent-demystified-bc30b26e432a][here]]). Thus we compute each partial derivatives:
+\begin{align*}
+    \frac{\partial J(w)}{\partial w_1}&=\frac{\partial J(w)}{\partial g_w(x_1,x_2)}\frac{\partial g_w(x_1,x_2)}{\partial h_w(x_1,x_2)}\frac{\partial h_w(x_1,x_2)}{\partial w_1}\nonumber\\
+    \frac{\partial J(w)}{\partial g_w(x_1,x_2)}&=-\frac{1}{n} \sum_{i=0}^n \left[y^{(i)}\frac{1}{g_w(x_1^{(i)},x_2^{(i)})} + (1-y^{(i)})\times\frac{1}{1-g_w(x_1^{(i)},x_2^{(i)})}\times (-1)\right]\nonumber\\
+    &=-\frac{1}{n} \sum_{i=0}^n \left[\frac{y^{(i)}}{g_w(x_1^{(i)},x_2^{(i)})} - \frac{1-y^{(i)}}{1-g_w(x_1^{(i)},x_2^{(i)})}\right]\nonumber\\
+    &=-\frac{1}{n} \sum_{i=0}^n \left[\frac{y^{(i)}(1-g_w(x_1^{(i)},x_2^{(i)}))}{g_w(x_1^{(i)},x_2^{(i)})(1-g_w(x_1^{(i)},x_2^{(i)}))} - \frac{g_w(x_1^{(i)},x_2^{(i)})(1-y^{(i)})}{g_w(x_1^{(i)},x_2^{(i)})(1-g_w(x_1^{(i)},x_2^{(i)}))}\right]\nonumber\\
+    &=-\frac{1}{n} \sum_{i=0}^n \left[\frac{y^{(i)}\cancel{-y^{(i)}g_w(x_1^{(i)},x_2^{(i)})} -g_w(x_1^{(i)},x_2^{(i)})\cancel{+y^{(i)}g_w(x_1^{(i)},x_2^{(i)})}}{g_w(x_1^{(i)},x_2^{(i)})(1-g_w(x_1^{(i)},x_2^{(i)}))}\right]\nonumber\\
+    &=\frac{1}{n} \sum_{i=0}^n \left[\frac{-y^{(i)} +g_w(x_1^{(i)},x_2^{(i)})}{g_w(x_1^{(i)},x_2^{(i)})(1-g_w(x_1^{(i)},x_2^{(i)}))}\right]\nonumber\\
+    \frac{\partial g_w(x_1,x_2)}{\partial h_w(x_1,x_2)}&=\frac{\partial (1+e^{-h_w(x_1,x_2)})^{-1}}{\partial h_w(x_1,x_2)}=-(1+e^{-h_w(x_1,x_2)})^{-2}\times \frac{\partial (1+e^{-h_w(x_1,x_2)})}{\partial h_w(x_1,x_2)}\nonumber\\
+    &=-(1+e^{-h_w(x_1,x_2)})^{-2}\times -e^{-h_w(x_1,x_2)}=\frac{e^{-h_w(x_1,x_2)}}{(1+e^{-h_w(x_1,x_2)})^2}\nonumber\\
+    &=\frac{e^{-h_w(x_1,x_2)}}{(1+e^{-h_w(x_1,x_2)})(1+e^{-h_w(x_1,x_2)})}=\frac{1}{(1+e^{-h_w(x_1,x_2)})}\frac{e^{-h_w(x_1,x_2)}}{(1+e^{-h_w(x_1,x_2)})}\nonumber\\
+    &=\frac{1}{(1+e^{-h_w(x_1,x_2)})}\frac{e^{-h_w(x_1,x_2)}+1-1}{(1+e^{-h_w(x_1,x_2)})}=\frac{1}{(1+e^{-h_w(x_1,x_2)})}\left(1+\frac{-1}{(1+e^{-h_w(x_1,x_2)})}\right)\nonumber\\
+    &=g_w(x_1,x_2)(1-g_w(x_1,x_2))\nonumber\\
+    \frac{\partial h_w(x_1,x_2)}{\partial w_1}=1\nonumber\\
+    \text{Finally:}\\
+    \frac{\partial J(w)}{\partial w_1}&=\frac{1}{n} \sum_{i=0}^n \left[\frac{-y^{(i)}+g_w(x_1^{(i)},x_2^{(i)})}{\cancel{g_w(x_1^{(i)},x_2^{(i)})(1-g_w(x_1^{(i)},x_2^{(i)}))}} \times \cancel{g_w(x_1^{(i)},x_2^{(i)})(1-g_w(x_1^{(i)},x_2^{(i)}))} \right]\nonumber\\
+    &=\frac{1}{n} \sum_{i=0}^n \left[-y^{(i)}+g_w(x_1^{(i)},x_2^{(i)})\right]
+\end{align*}
+\begin{align*}
+    \text{Similarly:}\\
+    \frac{\partial J(w)}{\partial w_2}&=\frac{1}{n} \sum_{i=0}^n x_1\left[-y^{(i)}+g_w(x_1^{(i)},x_2^{(i)})\right]\\
+    \frac{\partial J(w)}{\partial w_1}&=\frac{1}{n} \sum_{i=0}^n x_2\left[-y^{(i)}+g_w(x_1^{(i)},x_2^{(i)})\right]\\
+\end{align*}
+
+For more informations on binary logistic regression, here are usefull links:
+- [[https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html][Logistic Regression -- ML Glossary  documentation]]
+- [[https://math.stackexchange.com/questions/2503428/derivative-of-binary-cross-entropy-why-are-my-signs-not-right][Derivative of the Binary Cross Entropy]]
+
diff --git a/logistic_regression/binary.pdf b/logistic_regression/binary.pdf
new file mode 100644
index 0000000..a69a04f
--- /dev/null
+++ b/logistic_regression/binary.pdf
diff --git a/logistic_regression/binary.py b/logistic_regression/binary.py
index a8a11e2..1c3f608 100755
--- a/logistic_regression/binary.py
+++ b/logistic_regression/binary.py
@@ -97,6 +97,10 @@ scatter=plt.scatter(x_1,x_2,c=np.round(h(x_1,x_2)),marker="o")
 handles, labels = scatter.legend_elements(prop="colors", alpha=0.6)
 legend = ax.legend(handles, ["Class A","Class B"], loc="upper right", title="Legend")
 
+x=np.arange(0,10,0.2)
+plt.plot([1,2],[2,2])
+
 # Save
 plt.tight_layout()
-plt.savefig("binary.png",dpi=300)
+#plt.savefig("binary.png",dpi=300)
+plt.show()
diff --git a/logistic_regression/sigmoid.png b/logistic_regression/sigmoid.png
new file mode 100644
index 0000000..a7b79c4
--- /dev/null
+++ b/logistic_regression/sigmoid.png
author	manzerbredes <manzerbredes@mailbox.org>	2021-02-20 20:36:33 +0100
committer	manzerbredes <manzerbredes@mailbox.org>	2021-02-20 20:36:33 +0100
commit	14856930a79b4246bff951330e56200ba043d34d (patch)
tree	3e04fb5ca09a8d1bf188e47705528a2cd745e148
parent	8e79c803545ff937b529505aca938f62c4825ba3 (diff)