機器學習及演算法

林嶔 (Lin, Chin)

Lesson 7 人工智慧基礎4(邏輯斯迴歸與資料科學研究設計)

第一節:邏輯斯迴歸介紹(1)

  1. 提出預測函數

  2. 提出損失函數

  3. 利用梯度下降法優化(也可以叫模型訓練)

– 邏輯斯迴歸的預測方程:

\[lp_i= log(\frac{{p_i}}{1-p_i}) = b_{0} + b_{1}x_i\]

– 其中\(p\)代表著是樣本為正的預測機率。我們先將此式稍作改寫,改成標準型態:

\[p_i = \frac{{1}}{1+e^{-lp_i}} = \frac{{1}}{1+e^{-b_{0} - b_{1}x_i}}\]

第一節:邏輯斯迴歸介紹(2)

– 請至這裡下載範例資料

dat <- read.csv("ECG_train.csv", header = TRUE, fileEncoding = 'CP950', stringsAsFactors = FALSE, na.strings = "")

– 最常見的Y分別為:

  1. 連續變項 - 假設誤差為常態分佈 (這就是線性迴歸)

  2. 二元變項 - 假設機率服從邏輯斯分布 (這就是邏輯斯迴歸)

x <- dat[,"AGE"]
y <- dat[,"LVD"]
model <- glm(y ~ x, family = 'binomial')
model
## 
## Call:  glm(formula = y ~ x, family = "binomial")
## 
## Coefficients:
## (Intercept)            x  
##   -0.308954     0.001626  
## 
## Degrees of Freedom: 2111 Total (i.e. Null);  2110 Residual
##   (2888 observations deleted due to missingness)
## Null Deviance:       2907 
## Residual Deviance: 2906  AIC: 2910

第一節:邏輯斯迴歸介紹(3)

\[log(\frac{{p}}{1-p}) = -0.308954 + 0.001626 \times AGE\]

\[p = \frac{{1}}{1+e^{0.308954 - 0.001626 \times AGE}}\] - 如果有個人的AGE是60,那他有LVD的機率是0.4473474:

\[p = 0.4473474 = \frac{{1}}{1+e^{0.308954 - 0.001626 \times 60}}\] – 代用相同的式子計算,如果有個人的AGE是80,那他有LVD的機率是0.4554004;假設AGE是100,那他有LVD的機率是0.4634767

\[p = 0.4554004 = \frac{{1}}{1+e^{0.308954 - 0.001626 \times 80}}\]

\[p = 0.4634767 = \frac{{1}}{1+e^{0.308954 - 0.001626 \times 100}}\]

第一節:邏輯斯迴歸介紹(4)

\[loss = diff(y, p)\]

– 但這個損失函數不太好定,我們先以簡單線性迴歸的損失函數為例,所求的值為殘差平方和,可將此式改寫為:

\[loss = diff(y, p) = \frac{{1}}{2n}\sum \limits_{i=1}^{n} \left(y_{i} - p_{i}\right)^{2}\]

\[loss = \frac{{1}}{2n} \sum \limits_{i=1}^{n} \left(y_{i} - \frac{{1}}{1+e^{-b_{0} - b_{1}x_{i}}}\right)^{2}\]

第一節:邏輯斯迴歸介紹(5)

– 這個過程實在是複雜了點,讓我們列出幾個微分公式輔助進行:

  1. 連鎖率

\[\frac{\partial}{\partial x}h(x) = \frac{\partial}{\partial x}f(g(x)) = \frac{\partial}{\partial g(x)}f(g(x)) \cdot\frac{\partial}{\partial x}g(x)\]

  1. 微分除法

\[\frac{\partial}{\partial x}\frac{{f(x)}}{g(x)} = \frac{{g(x) \cdot \frac{\partial}{\partial x} f(x)} - {f(x) \cdot \frac{\partial}{\partial x} g(x)}}{g(x)^2}\]

  1. 指數微分

\[\frac{\partial}{\partial x} e^x = e^x\]

  1. S型函數微分

\[ \begin{align} \frac{\partial}{\partial x}S(x) & = \frac{\partial}{\partial x}\frac{{1}}{1+e^{-x}} \\ & = \frac{\partial}{\partial (1+e^{-x})}\frac{{1}}{1+e^{-x}} \cdot \frac{\partial}{\partial x}(1+e^{-x}) \\ & = \frac{-1}{(1+e^{-x})^2} \cdot (-e^{-x}) \\ & = \frac{e^{-x}}{(1+e^{-x})^2} \\ & = \frac{1}{1+e^{-x}} \cdot (1 - \frac{1}{1+e^{-x}}) \\ & = S(x)(1-S(x)) \end{align} \]

\(b_0\)的偏導函數:

\[ \begin{align} \frac{\partial}{\partial b_{0}} loss & = \frac{\partial}{\partial p} diff(y, p) \cdot \frac{\partial}{\partial b_{0}} p \\ & = \frac{{1}}{2n}\sum \limits_{i=1}^{n} \frac{\partial}{\partial p_i} \left(y_{i} - p_{i}\right)^{2} \cdot \frac{\partial}{\partial lp_i} \frac{{1}}{1+e^{-lp_i}} \cdot \frac{\partial}{\partial b_{0}} lp_i \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left(p_{i} - y_{i} \right) \cdot p_{i} \cdot (1 - p_{i}) \cdot \frac{\partial}{\partial b_{0}} (b_{0} + b_{1}x_i) \\ & = \frac{{1}}{n} \sum \limits_{i=1}^{n} \left(p_{i} - y_{i}\right) \cdot p_{i} \cdot (1 - p_{i}) \end{align} \]

\(b_1\)的偏導函數(過程略):

\[ \begin{align} \frac{\partial}{\partial b_{1}} loss & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left(p_{i} - y_{i} \right) \cdot p_{i} \cdot (1 - p_{i}) \cdot \frac{\partial}{\partial b_{1}} (b_{0} + b_{1}x_i) \\ & = \frac{{1}}{n} \sum \limits_{i=1}^{n} \left(p_{i} - y_{i}\right) \cdot p_{i} \cdot (1 - p_{i}) \cdot x_i \end{align} \]

練習1:使用梯度下降法求解邏輯斯回歸

x <- 1:10 
y <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, 1)

pred.fun <- function(b0, b1, x = x) {
  p = 1 / (1 + exp(- b0 - b1 * x))
  return(p)
}

loss.fun <- function(b0, b1, x = x, y = y) {
  p = pred.fun(b0 = b0, b1 = b1, x = x)
  loss = 1/(2*length(x)) * sum((y - p)^2)
  return(loss)
}

differential.fun.b0 <- function(b0, b1, x = x, y = y) {
  p = pred.fun(b0 = b0, b1 = b1, x = x)
  return(-sum((y - p)*p*(1-p))/length(x))
}

differential.fun.b1 <- function(b0, b1, x = x, y = y) {
  p = pred.fun(b0 = b0, b1 = b1, x = x)
  return(-sum((y - p)*p*(1-p)*x)/length(x))
}
model <- glm(y~x, family = 'binomial')
model
## 
## Call:  glm(formula = y ~ x, family = "binomial")
## 
## Coefficients:
## (Intercept)            x  
##     -1.9957       0.3629  
## 
## Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
## Null Deviance:       13.86 
## Residual Deviance: 11.67     AIC: 15.67

練習1答案

num.iteration <- 2000
lr <- 0.1
ans_b0 <- rep(0, num.iteration)
ans_b1 <- rep(0, num.iteration)

for (i in 2:num.iteration) {
  ans_b0[i+1] <- ans_b0[i] - lr * differential.fun.b0(b0 = ans_b0[i], b1 = ans_b1[i], x = x, y = y)
  ans_b1[i+1] <- ans_b1[i] - lr * differential.fun.b1(b0 = ans_b0[i], b1 = ans_b1[i], x = x, y = y)
}

print(tail(ans_b0, 1))

[1] -1.539271

print(tail(ans_b1, 1))

[1] 0.2869881