ML & DL

Created2024-06-27|ML

|Word Count:1.7k|Reading Time:10mins

Machine Learning

Linear Regression

Input space: x

Output space: $y \in \mathbb{R}$

Model space: $f: x \rightarrow y$

f(x) : b + w_1x_1 + w_2x_2 + \cdots + w_nx_n = b + w_xx

Loss(f) = \sum\limits_{i = 1}^N(f(x) - y)^2

Fundamental problem of ML: $\mathop{Min}\limits_{f}(Loss(s)) = \mathop{Min}\limits_{b, w}\sum\limits_{i = 1}^N(f(x) - y)^2$

Loss(w, b) = \sum\limits^{N}(wx + b - y)^2

\frac{\partial L}{\partial b} = \sum\limits_{i=1}^N(wx^i + b-y^i) = 0

$w\sum\limits_i x^i + bN-\sum\limits_{i}y^i = 0$

\frac{\partial L}{\partial w} = \sum\limits_{i=1}^Nx^i(wx^i + b-y^i) = 0

$w\sum\limits_{i} (x^i)^2 + b\sum\limits_{i}x^i - \sum\limits_{i}(x^iy^i) = 0$

\bar x = \frac{1}{N}\sum x^i, \bar y = \frac{1}{N}\sum y^i, b = \bar y - w \bar x

$w \bar x + b - \bar y = 0$
$w\bar{x^2} + b \bar x - \overline{xy} = 0$

w \bar x^2 + (\bar y - w \bar x) - \overline{xy} = 0

w\overline{x^2} + \bar x \bar y - w (\bar x) ^2 - \overline{xy} = 0

w(\overline{x^2} - (\bar x)^2) = \overline{xy}- \bar x \bar y

w^* = \frac{\overline{xy} - \bar x \bar y}{\overline{x^2} - (\bar x)^2}

Polynomial Regression

Still Linear

Treating $x, x^2,\cdots$ as distinct independent variables

Evaluation

MAE: Mean Absolute Error

$MAE = \frac{1}{N}\sum\limits_{i=1}^N|f(x^i) - y^i|$

MSE: Mean Squared Error

$MSE = \frac{1}{N}\sum\limits_{i=1}^N(f(x^i) - y^i)^2$

RMSE: Root Mean Squared Error

$RMSE = \sqrt{MSE}$

MAPE: Mean Absolute Percentage Error

$MAPE = \frac{1}{N}\sum\limits_{i=1}^N\frac{|f(x^i) - y^i|}{y^i}$

R2 Score

$R^2 = 1 - \frac{\sum\limits_{i=1}^N(f(x^i) - y^i)^2}{\sum\limits_{i=1}^N(y^i - \bar y)^2}$

Optimization

Objective Function: $F(x) \rightarrow \mathbb{R}$

Find x to maximize or minimize $F$

Local optimum occur when $f'(x) = 0$

\frac{\partial{F}}{\partial{x_i}} = 0 \rightarrow \text{solve for } x^*= (x_1^*,x_2^*, \cdots x_0^*)

\mathop{\nabla}\limits_{x} F = 0

Find local x, iterate

x_0: \text{some guess}

x_{t+1} = x_t - \alpha_if'(x_t)

Gradient Descent

x(t+1) = x(t) - \alpha\nabla F(x(t))

Stochastic Gradient Descent(SGD)
- Calculate the gradient using a random small part of the observations
- $w(0):\text{some guess},b(0):\text{some guess}$
- $x(t+1) = x(t) - \alpha[\frac{1}{k}\sum\limits_{j=1}^{k}\nabla F_{i,j}(x(t))]$ , $k:\text{batch size}$
- At update $t+1$ $t + 1$
  - pick $i$ at random
  - $w(t+1) = w(t)-\alpha \mathop {\nabla}_w Loss(w(t),b(t))$
  - $b(t+1)=b(t)-\alpha \frac{\partial{Loss}}{\partial b}(w(t),b(t))$

\frac{\partial Loss}{\partial w_k}=\frac{\partial}{\partial w_k}[(b + w \cdot x^i- y^i)^2]=2(x_k^i)(b+ w \cdot x^i - y ^i)

\frac{\partial Loss}{\partial b}=[(b + w\cdot x^i - y^i)^2]=2(b + w \cdot x^i - y ^i)

\mathop{\nabla}\limits _w Loss_i = 2(b + w \cdot x^i - y ^i) = 2(b + w \cdot x^i - y ^i) x^i

Logistic Regression

Input: $x$

Output: $y \in \left \{ 0, 1 \right \}$

Model: $F(x) = P(\text{x is in class 1})$

Sigmoid(z) = \frac{1}{1 + e^{-z}}

F(x) = sigmoid(b+w_1x_1+w_2x_2+\cdots+w_0x_0)=\frac{1}{1+e^{-b+w\cdot x}}

\mathop{min}\limits_{F} \sum \limits_{i=1}^{N} \ln [{F(x^i)}^{y^i} {(1-F(x^i))}^{(1-y_i)}]

\mathop{min}\limits_{F} \sum \limits_{i=1}^{N} [-{y^i}{F(x^i)}-{(1-y_i)\ln(1-F(x^i))}]

For Classification

Input: $x$

Output: $y \in \left \{-1, +1 \right \}$

Softmax(v_1, v_2, \cdots, v_k)

Softmax Regression

F(x) = \text{softmax}(b_1 + w^1 \cdot x, b_2 + w^2 \cdot x, \cdots, b_c + w^c \cdot x)

Loss: Cross entropy loss

Metrics

\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}

\text{Precision} = \frac{TP}{TP + FP}

\text{Recall} = \frac{TP}{TP + FN}

F_{\beta \text{ score}} = (1 + \beta^2) \frac{\text{Precision} \cdot \text{Recall}}{\beta^2 \cdot \text{Precision} + \text{Recall}}

Deep Learning

Simple Models

Linear Regression
- $f(x) = w \cdot x + b$
Perceptron [hard classifier]
- $f(x) = \sigma(w \cdot x + b)$
Logistic Regression
- $f(x) = \text{sigmoid}(w \cdot x + b)$
Multiclass Classification
- $f(x) = \text{Softmax}(w \cdot x + b)$

Complex Model (Neural Networks)

f(x) = \sigma (w_3\sigma(w_2 \sigma(w_1 x + b_1) + b_2) + b_3)

Activation Function

\sigma (w_1 x + b_1)

Examples:

$\sigma(z) = \text{tanh}(z)$
$\sigma(z) = \text{sigmoid}(z)$
$\sigma(z) = \text{ReLU}(z)$
$\sigma(z) = \text{Leaky ReLU}(z)$
$\sigma(z) = \text{Softmax}(z)$

Introduce non-linearity into the model

Author: fl_334

Link: https://www.fl334.com/2024/06/28/MLDL/

Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.

Classification Regression

Related Articles

Loading the Database