Machine Learning
Linear Regression
Input space: x
Output space: y∈R
Model space: f:x→y
f(x):b+w1x1+w2x2+⋯+wnxn=b+wxx
Loss(f)=i=1∑N(f(x)−y)2
Fundamental problem of ML: fMin(Loss(s))=b,wMini=1∑N(f(x)−y)2
Loss(w,b)=∑N(wx+b−y)2
∂b∂L=i=1∑N(wxi+b−yi)=0
- wi∑xi+bN−i∑yi=0
∂w∂L=i=1∑Nxi(wxi+b−yi)=0
- wi∑(xi)2+bi∑xi−i∑(xiyi)=0
xˉ=N1∑xi,yˉ=N1∑yi,b=yˉ−wxˉ
- wxˉ+b−yˉ=0
- wx2ˉ+bxˉ−xy=0
wxˉ2+(yˉ−wxˉ)−xy=0
wx2+xˉyˉ−w(xˉ)2−xy=0
w(x2−(xˉ)2)=xy−xˉyˉ
w∗=x2−(xˉ)2xy−xˉyˉ
Polynomial Regression
Still Linear
Treating x,x2,⋯ as distinct independent variables
Evaluation
- MAE: Mean Absolute Error
- MAE=N1i=1∑N∣f(xi)−yi∣
- MSE: Mean Squared Error
- MSE=N1i=1∑N(f(xi)−yi)2
- RMSE: Root Mean Squared Error
- RMSE=MSE
- MAPE: Mean Absolute Percentage Error
- MAPE=N1i=1∑Nyi∣f(xi)−yi∣
- R2 Score
- R2=1−i=1∑N(yi−yˉ)2i=1∑N(f(xi)−yi)2
Optimization
Objective Function: F(x)→R
Find x to maximize or minimize F
Local optimum occur when f′(x)=0
∂xi∂F=0→solve for x∗=(x1∗,x2∗,⋯x0∗)
x∇F=0
Find local x, iterate
x0:some guess
xt+1=xt−αif′(xt)
Gradient Descent
x(t+1)=x(t)−α∇F(x(t))
- Stochastic Gradient Descent(SGD)
- Calculate the gradient using a random small part of the observations
- w(0):some guess,b(0):some guess
- x(t+1)=x(t)−α[k1j=1∑k∇Fi,j(x(t))], k:batch size
- At update t+1
- pick i at random
- w(t+1)=w(t)−α∇wLoss(w(t),b(t))
- b(t+1)=b(t)−α∂b∂Loss(w(t),b(t))
∂wk∂Loss=∂wk∂[(b+w⋅xi−yi)2]=2(xki)(b+w⋅xi−yi)
∂b∂Loss=[(b+w⋅xi−yi)2]=2(b+w⋅xi−yi)
w∇Lossi=2(b+w⋅xi−yi)=2(b+w⋅xi−yi)xi
Logistic Regression
Input: x
Output: y∈{0,1}
Model: F(x)=P(x is in class 1)
Sigmoid(z)=1+e−z1
F(x)=sigmoid(b+w1x1+w2x2+⋯+w0x0)=1+e−b+w⋅x1
Fmini=1∑Nln[F(xi)yi(1−F(xi))(1−yi)]
Fmini=1∑N[−yiF(xi)−(1−yi)ln(1−F(xi))]
For Classification
Input: x
Output: y∈{−1,+1}
Softmax(v1,v2,⋯,vk)
Softmax Regression
F(x)=softmax(b1+w1⋅x,b2+w2⋅x,⋯,bc+wc⋅x)
Loss: Cross entropy loss
Metrics
Accuracy=TP+TN+FP+FNTP+TN
Precision=TP+FPTP
Recall=TP+FNTP
Fβ score=(1+β2)β2⋅Precision+RecallPrecision⋅Recall
Deep Learning
Simple Models
Complex Model (Neural Networks)
f(x)=σ(w3σ(w2σ(w1x+b1)+b2)+b3)
Activation Function
σ(w1x+b1)
Examples:
- σ(z)=tanh(z)
- σ(z)=sigmoid(z)
- σ(z)=ReLU(z)
- σ(z)=Leaky ReLU(z)
- σ(z)=Softmax(z)
Introduce non-linearity into the model