機器學習及演算法

林嶔 (Lin, Chin)

Lesson 14 機器學習概論5(線性支持向量機)

第一節:線性支持向量機介紹(1)

– 他希望有自動進行非線性特徵工程的能力。

F01

– 我們首先要先介紹線性SVM,他是在1963年被創造出來做做線性分割的工作(與邏輯斯回歸相同),但「kernel trick」是在1992年後才發展出以此方法來解決原始特徵空間「線性不可分割」的問題

第一節:線性支持向量機介紹(2)

– 在一個2維平面中,他希望找到一條線能完美的分割紅點與藍點

x1 = c(0, 2, 2, 3)
x2 = c(0, 2, 0, 0)
y = c(1, 1, -1, -1)
plot(x1, x2, col = y + 3, pch = 19, cex = 3)

第一節:線性支持向量機介紹(3)

– 我們當然一眼就能看出有好多條線都能輕易完美分割這四個點,舉例來說,\(x_2 = -0.5 + 0.5 \times x_1\)能幫我們輕易的切開紅藍點

plot(x1, x2, col =  y + 3, pch = 19, cex = 3)
abline(a = -0.5, b = 0.5, lwd = 2, lty = 1)

– 但這樣是不夠的,儘管我們不清楚原因,但你應該覺得\(x_2 = -1 + 1 \times x_1\)這條線切得更好吧!

plot(x1, x2, col =  y + 3, pch = 19, cex = 3)
abline(a = -1, b = 1, lwd = 2, lty = 1)

第一節:線性支持向量機介紹(4)

plot(x1, x2, col =  y + 3, pch = 19, cex = 3)
abline(a = -1, b = 1, lwd = 2, lty = 1)
abline(a = 0, b = 1, lwd = 2, lty = 2)
abline(a = -2, b = 1, lwd = 2, lty = 2)

  1. 我們想要找到一條線,能完美切割這些點(約束條件)

  2. 這條線離最接近的點要最遠(取極大值)

第一節:線性支持向量機介紹(5)

– 在這樣的條線下,\(w_0 = 1\)\(w_1 = -1\)\(w_2 = 1\),這能滿足這個例子上我們要的解

distance.func = function (x1, x2, w0, w1, w2) {
  dist = 1/sqrt(w1^2 + w2^2) * abs(w0 + w1 * x1 + w2 * x2)
  return(dist)
}

distance.func(0, 0, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(2, 2, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(2, 0, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(3, 0, w0 = 1, w1 = -1, w2 = 1)
## [1] 1.414214

第一節:線性支持向量機介紹(6)

distance.func = function (x1, x2, y, w0, w1, w2) {
  dist = 1/sqrt(w1^2 + w2^2) * y * (w0 + w1 * x1 + w2 * x2)
  return(dist)
}

distance.func(0, 0, 1, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(2, 2, 1, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(2, 0, -1, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(3, 0, -1, w0 = 1, w1 = -1, w2 = 1)
## [1] 1.414214

\[ \begin{align} \text{distance}(x_i, b, w) & = \frac{1}{||w||} y_i (w^Tx_i + b) \end{align} \]

第一節:線性支持向量機介紹(7)

– 所有點的距離可以表達成這樣:

\[ \begin{align} \text{distance}(x_1, b, w) & = \frac{1}{||w||} y_i (w^Tx_1 + b) \\ \text{distance}(x_2, b, w) & = \frac{1}{||w||} y_i (w^Tx_2 + b) \\ \cdots \\ \text{distance}(x_n, b, w) & = \frac{1}{||w||} y_i (w^Tx_n + b) \end{align} \]

– 因此,距離最小的點就可以表達成這樣:

\[ \begin{align} min_{i=1, \cdots , n} \text{distance}(x_i, b, w) = min \frac{1}{||w||} y_i (w^Tx_i + b) \end{align} \]

– 我們希望的是,最大化最小距離的點的距離

\[ \begin{align} max \left[ min \left[ \frac{1}{||w||} y_i (w^Tx_i + b) \right] \right] \end{align} \]

第一節:線性支持向量機介紹(8)

distance.func(0, 0, 1, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(0, 0, 1, w0 = 3, w1 = -3, w2 = 3)
## [1] 0.7071068

– 既然縮放是沒有意義的,我們就可以強迫剛剛圖旁邊的兩條虛線為\(w0 + w_1x_1 + w_2x_2 = 1\)\(w_0 + w_1x_1 + w_2x_2 = -1\)

\[ \begin{align} w^Tx_i + b & = 1 \text{ or } -1 \\ y(w^Tx_i + b) & = 1 \end{align} \]

\[ min \left[ y(w^Tx_i + b) \right] = 1 \]

– 我們再重新整理一下求解式子:

\[ \begin{align} & max \frac{1}{||w||} \\ \text{subject to } & min \left[ y(w^Tx_i + b) \right] = 1 \end{align} \]

第一節:線性支持向量機介紹(9)

\[ \begin{align} & max \frac{1}{||w||} \\ \text{subject to } & y(w^Tx_i + b) \geq 1 \end{align} \]

– 假設真的有最小的數值真的大於1為一個數字如1.2,那這樣\(w\)\(b\)必然就可以同時除以這個數字,導致\(\frac{1}{||w||}\)可以來的更大。

– 因此這個改寫是合法的,當保證\(\frac{1}{||w||}\)最大化的前提之下,我們可以保證條件\(y(w^Tx_i + b) \geq 1\)可以求得與條件\(min \left[ y(w^Tx_i + b) \right] = 1\)相同的解。

\[ \begin{align} & min \frac{1}{2}w^Tw \\ \text{subject to } & y(w^Tx_i + b) \geq 1 \end{align} \]

– 在這樣的前提之下,我們試著把不同\(w\)\(b=1\)時的求解\(loss\)給畫出來:

第一節:線性支持向量機介紹(10)

\[ \begin{align} & min \frac{1}{2}w^Tw \\ \text{subject to } & y(w^Tx_i + b) \geq 1 \end{align} \]

F02

\[ \begin{align} & min \text{ } \frac{1}{2}b^TDb - d^Tb \\ \text{ subject to } & A^Tb \geq b_0 \end{align} \]

第一節:線性支持向量機介紹(11)

\[ \begin{align} & min \text{ } \frac{1}{2}w^Tw & \text{ subject to } y(w^Tx_i + b) \geq 1 \\ & min \text{ } \frac{1}{2}b^TDb - d^Tb & \text{ subject to } A^Tb \geq b_0 \end{align} \]

\[ \begin{align} b = \begin{pmatrix} b \\ w_1 \\ w_2 \end{pmatrix} \space D = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \space d = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} \space A = \begin{pmatrix} y_1 & y_2 & y_3 & y_4 \\ y_1x_{1,1} & y_2x_{1,2} & y_3x_{1,3} & y_4x_{1,4} \\ y_1x_{2,1} & y_2x_{2,2} & y_3x_{2,3} & y_4x_{2,4} \end{pmatrix} \space b_0 = \begin{pmatrix} 1 \\ 1 \\ 1 \\ 1 \end{pmatrix} \end{align} \]

library(quadprog)
n.sample = 4
n.weight = 3 
small.value = 5e-6
D.matrix = matrix(small.value, nrow = n.weight, ncol = n.weight)
diag(D.matrix) = 1
D.matrix[1,1] = small.value
A.matrix = rbind(rep(1, n.sample)*y, x1*y, x2*y)
fit = solve.QP(Dmat = D.matrix, dvec = rep(0, n.weight), Amat = A.matrix, bvec = rep(1, n.sample))
fit$solution
## [1]  1 -1  1

第一節:線性支持向量機介紹(12)

– 當答案解出來之後,我們就能利用這個解來畫線了

COEF = fit$solution
A0 = -COEF[1]/COEF[3]
A1 = A0 + 1/COEF[3]
A2 = A0 - 1/COEF[3]
B = -COEF[2]/COEF[3]

plot(x1, x2, col =  y + 3, pch = 19, cex = 3)
abline(a = A0, b = B, lwd = 2, lty = 1)
abline(a = A1, b = B, lwd = 2, lty = 2)
abline(a = A2, b = B, lwd = 2, lty = 2)

– 我們把它寫成一個完整的求解函數

mysvm = function (x1, x2, y) {
  
  require(quadprog)
  
  n.sample = length(x1)
  n.weight = 3
  small.value = 5e-6
  D.matrix = matrix(small.value, nrow = n.weight, ncol = n.weight);diag(D.matrix) = 1; D.matrix[1,1] = small.value
  A.matrix = rbind(rep(1, n.sample)*y, x1*y, x2*y)
  fit = solve.QP(Dmat = D.matrix, dvec = rep(0, n.weight), Amat = A.matrix, bvec = rep(1, n.sample))
  
  COEF = fit$solution
  A0 = -COEF[1]/COEF[3]
  A1 = A0 + 1/COEF[3]
  A2 = A0 - 1/COEF[3]
  B = -COEF[2]/COEF[3]
  
  plot(x1, x2, col =  y + 3, pch = 19)
  abline(a = A0, b = B, lwd = 2, lty = 1)
  abline(a = A1, b = B, lwd = 2, lty = 2)
  abline(a = A2, b = B, lwd = 2, lty = 2)
  
}

– 讓我們試試不同的樣本看看:

mysvm(x1 = c(0, 2, 2, 3),
      x2 = c(0, 2, 0, 0),
      y = c(1, 1, -1, -1))

mysvm(x1 = c(0, 2, 3, 4, 5),
      x2 = c(0, 2, 0, 0, 3),
      y = c(1, 1, -1, -1, -1))

練習1:試著更了解線性SVM

– 讓我們先用最簡單的範例資料來進行實驗。

data(iris)
sub.iris <- iris[1:100,]
X1 <- sub.iris[,1]
X2 <- sub.iris[,2]
Y <- as.integer(sub.iris[,5]) * 2 - 3

mysvm(x1 = X1, x2 = X2, y = Y)

glm.model <- glm((Y>0)~X1+X2, family = "binomial")
COEF <- glm.model$coefficients
A <- -COEF[1]/COEF[3]
B <- -COEF[2]/COEF[3]

plot(X1, X2, col = Y + 3, pch = 19)
abline(a = A, b = B, col = "darkgreen")

data(iris)
sub.iris <- iris[51:150,]
X1 <- sub.iris[,1]
X2 <- sub.iris[,2]
Y <- as.integer(sub.iris[,5]) * 2 - 5

練習1答案

glm.model <- glm((Y>0)~X1+X2, family = "binomial")
COEF <- glm.model$coefficients
A <- -COEF[1]/COEF[3]
B <- -COEF[2]/COEF[3]

plot(X1, X2, col = Y + 3, pch = 19)
abline(a = A, b = B, col = "darkgreen")

mysvm(x1 = X1, x2 = X2, y = Y)

– 為什麼呢? 因為目前我們的推導過程中要求所有的點都必須滿足\(y(w^Tx_i + b) \geq 1\)這個條件,因此如果找不到完美分割線,那就無解!

– 學了半天你有沒有覺得SVM很難用呀?

第二節:對偶問題(1)

\[ \begin{align} & min \frac{1}{2}w^Tw \\ \text{subject to } \space \space \space & y(w^Tx_i + b) \geq 1 \space \space \space \text{ for all } i \end{align} \]

– 我們先把下面那個條件做一些簡單的轉換:

\[ \begin{align} 1 - y(w^Tx_i + b) \leq 0 \end{align} \]

– 那這樣我們可以把優化式子改寫成這樣:

\[ \begin{align} L(b, w,\alpha_i) & = \frac{1}{2}w^Tw + \sum \limits_{i=1}^{n} \alpha_i (1 - y(w^Tx_i + b)) \end{align} \]

第二節:對偶問題(2)

\[ \begin{align} L(b, w,\alpha_i) & = \frac{1}{2}w^Tw + \sum \limits_{i=1}^{n} \alpha_i (1 - y(w^Tx_i + b)) \end{align} \]

– 需要注意的是在「滿足條件\(y(w^Tx_i + b) \geq 1\)的狀況下」,由於拉格朗日乘數\(\alpha_i \geq 0\)後面的\(1 - y(w^Tx_i + b) \leq 0\),因此這個數值必然為負數,我們的優化目標改寫成這樣:

\[ \begin{align} \min \limits_{w, \space b} \left( \max \limits_{\alpha_i \geq 0} \left( L(b, w,\alpha_i) \right) \right) \end{align} \]

第二節:對偶問題(3)

\[ \begin{align} \min \limits_{w, \space b} \left( \max \limits_{\alpha_i \geq 0} \left( L(b, w,\alpha_i) \right) \right) \geq \min \limits_{w, \space b} \left( L(b, w,\hat{\alpha_i}) \right) \end{align} \]

– 這應該很容易理解,畢竟在本來的式子中我們最大化了\(\alpha_i\)

– 順這樣推導下去,那我先「隨便」找一組\(\hat{\alpha_i}\),最小化該式後再來找一組最大的\(\hat{\alpha_i}\),那這樣應該也不可能比本來的式子來的更大

\[ \begin{align} \min \limits_{w, \space b} \left( \max \limits_{\alpha_i \geq 0} \left( L(b, w,\alpha_i) \right) \right) \geq \max \limits_{\hat{\alpha_i} \geq 0} \left( \min \limits_{w, \space b} \left( L(b, w,\hat{\alpha_i}) \right) \right) \end{align} \]

– 如果我們真的能夠解出新式的答案,那它必然會比原式來的更小或相等,但我們不是要最小化原式嗎?因此我們可以將我們的優化目標改成新的式子,而求出來的答案會一模一樣。

– 現在,我們可以直接來解這個式子:

\[ \begin{align} \max \limits_{\hat{\alpha_i} \geq 0} \left( \min \limits_{w, \space b} \left( L(b, w,\hat{\alpha_i}) \right) \right) \end{align} \]

第二節:對偶問題(4)

\[ \begin{align} L(b, w,\alpha_i) & = \frac{1}{2}w^Tw + \sum \limits_{i=1}^{n} \alpha_i (1 - y(w^Tx_i + b)) \\ & = \frac{1}{2}w^Tw + \sum \limits_{i=1}^{n} \left( \alpha_i - \alpha_i yw^Tx_i - \alpha_i y b \right) \\\\ \frac{\partial}{\partial b} L(b, w,\hat{\alpha_i}) & = - \sum \limits_{i=1}^{n} \alpha_i y = 0 \\ \frac{\partial}{\partial w} L(b, w,\hat{\alpha_i}) & = w - \sum \limits_{i=1}^{n} \alpha_i yx_i = 0 \end{align} \]

\[ \begin{align} \sum \limits_{i=1}^{n} \alpha_i y & = 0 \space \space \space \space w = \sum \limits_{i=1}^{n} \alpha_i yx_i \end{align} \]

\[ \begin{align} L(b, w,\alpha_i) & = \frac{1}{2}w^Tw + \sum \limits_{i=1}^{n} \left( \alpha_i - \alpha_i yw^Tx_i - \alpha_i y b \right) \\ & = \frac{1}{2}w^Tw + \sum \limits_{i=1}^{n} \alpha_i - \sum \limits_{i=1}^{n} \alpha_i yw^Tx_i - \sum \limits_{i=1}^{n} \alpha_i y b \\ & = \frac{1}{2}w^Tw + \sum \limits_{i=1}^{n} \alpha_i - w^T \sum \limits_{i=1}^{n} \alpha_i yx_i - b \sum \limits_{i=1}^{n} \alpha_i y\\ \left[ \because \sum \limits_{i=1}^{n} \alpha_i y = 0 \right] & = \frac{1}{2}w^Tw + \sum \limits_{i=1}^{n} \alpha_i - w^T \sum \limits_{i=1}^{n} \alpha_i yx_i \\ \left[ \because w = \sum \limits_{i=1}^{n} \alpha_i yx_i \right] & = \frac{1}{2}w^Tw + \sum \limits_{i=1}^{n} \alpha_i - w^T w \\ & = - \frac{1}{2}w^Tw + \sum \limits_{i=1}^{n} \alpha_i\\ & = - \frac{1}{2} \sum \limits_{i=1}^{n} \sum \limits_{j=1}^{n} \alpha_i \alpha_j y_i y_j x_i x_j + \sum \limits_{i=1}^{n} \alpha_i \end{align} \]

第二節:對偶問題(5)

\[ \begin{align} \max \limits_{\hat{\alpha_i} \geq 0} \left( \min \limits_{w, \space b} \left( L(b, w,\alpha_i) \right) \right) & = \max \limits_{\hat{\alpha_i} \geq 0} \left( - \frac{1}{2} \sum \limits_{i=1}^{n} \sum \limits_{j=1}^{n} \alpha_i \alpha_j y_i y_j x_i x_j + \sum \limits_{i=1}^{n} \alpha_i \right) \end{align} \]

\[ \begin{align} min & \space \space \space \frac{1}{2} \sum \limits_{i=1}^{n} \sum \limits_{j=1}^{n} \alpha_i \alpha_j y_i y_j x_i x_j - \sum \limits_{i=1}^{n} \alpha_i \\ \text{subject to } & \space \space \space \sum \limits_{i=1}^{n} \alpha_i y_i = 0 \space \space \text{and} \space \alpha_i \geq 0 \space \space \space \text{ for all } i \end{align} \]

\[ \begin{align} min & \space \space \space \space \space \frac{1}{2}b^TDb - d^Tb \\ \text{ subject to} & \space \space \space \space \space A^Tb \geq b_0 \end{align} \]

\[ \begin{align} b = \begin{pmatrix} \alpha_1 \\ \alpha_2 \\ \alpha_3 \\ \alpha_4 \end{pmatrix} \space D =YY^T \otimes (XX^T) = \begin{pmatrix} y_1y_1x_1x_1 & y_1y_2x_1x_2 & y_1y_3x_1x_3 & y_1y_4x_1x_4 \\ y_2y_1x_2x_1 & y_2y_2x_2x_2 & y_2y_3x_2x_3 & y_2y_4x_2x_4 \\ y_3y_1x_3x_1 & y_3y_2x_3x_2 & y_3y_3x_3x_3 & y_3y_4x_3x_4 \\ y_4y_1x_4x_1 & y_4y_2x_4x_2 & y_4y_3x_4x_3 & y_4y_4x_4x_4 \end{pmatrix} \space d = \begin{pmatrix} 1 \\ 1 \\ 1 \\ 1 \end{pmatrix} \\ \space A = \begin{pmatrix} y_1 & 1 & 0 & 0 & 0 \\ y_2 & 0 & 1 & 0 & 0 \\ y_3 & 0 & 0 & 1 & 0 \\ y_4 & 0 & 0 & 0 & 1 \end{pmatrix} \space b_0 = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{pmatrix} \end{align} \]

第二節:對偶問題(6)

#set.seed(0)
x1 = c(0, 2, 2, 3)
x2 = c(0, 2, 0, 0)
y = c(1, 1, -1, -1)

library(quadprog)

n.sample = 4
small.value = 5e-6

X = cbind(x1, x2)

D.matrix = (y%*%t(y))*(X%*%t(X))
D.matrix = D.matrix + small.value*diag(n.sample)
A.matrix = t(rbind(matrix(y, ncol = n.sample), diag(n.sample)))

fit = solve.QP(Dmat = D.matrix, dvec = rep(1, n.sample), Amat = A.matrix, bvec = rep(0, n.sample + 1), meq = 1, factorized = FALSE)
qpsol <- fit$solution
print(qpsol)
## [1] 0.4999981 0.4999981 0.9999963 0.0000000

– 不要忘記我們剛剛在解極值時給定的條件\(w = \sum \limits_{i=1}^{n} \alpha_i yx_i\)

– 也不要忘記最開始原始SVM給的條件\(y(w^Tx_i + b) \geq 1\),其中大於1的部份我們不管,而等於1的部分就是支持向量(\(\alpha_i > 0\)),因此利用支持向量獲得條件\(y(w^Tx_i + b) = 1\)(在做式子消去時要特別注意\(y \in (1, \space -1)\),所以\(y = \frac {1} {y}\)),那這樣我們就可以獲得下面兩個關係式:

\[ \begin{align} w & = \sum \limits_{i=1}^{n} \alpha_i yx_i \\ b & = yi - w^Tx_i \space \space \space \text{ for all } \alpha_i > 0 \end{align} \]

– 這裡要注意一點,若拉格朗日乘數等於0,在計算b時不能使用,而且也完全不影響到W向量的結果。

findCoefs <- function(a, y, X){
  nonzero <-  abs(a) > 5e-6
  W <- rowSums(sapply(which(nonzero), function(i) a[i]*y[i]*X[i,]))
  b <- mean(sapply(which(nonzero), function(i) y[i]-X[i,]%*%W))
  result <- c(b, W)
  names(result) = c("w0", "w1", "w2")
  return(result)
}

coefs = findCoefs(qpsol, y, X)
print(coefs)
##         w0         w1         w2 
##  0.9999975 -0.9999963  0.9999963
A = -coefs[1]/coefs[3]
B = -coefs[2]/coefs[3]

plot(x1, x2, col =  y + 3, pch = 19)
abline(a = A, b = B, lwd = 2, lty = 1)

第二節:對偶問題(7)

library(quadprog)

data(iris)
sub.iris <- iris[51:150,]
x1 <- sub.iris[,1]
x2 <- sub.iris[,2]
y <- as.integer(sub.iris[,5]) * 2 - 5

n.sample = 100
small.value = 5e-6

X = cbind(x1, x2)

D.matrix = (y%*%t(y))*(X%*%t(X))
D.matrix = D.matrix + small.value*diag(n.sample)
A.matrix = t(rbind(matrix(y, ncol = n.sample), diag(n.sample)))

fit = solve.QP(Dmat = D.matrix, dvec = rep(1, n.sample), Amat = A.matrix, bvec = rep(0, n.sample + 1), meq = 1, factorized = FALSE)
qpsol <- fit$solution

coefs = findCoefs(qpsol, y, X)

A = -coefs[1]/coefs[3]
B = -coefs[2]/coefs[3]

plot(x1, x2, col =  y + 3, pch = 19)
abline(a = A, b = B, lwd = 2, lty = 1)

– 先記住這個結果,我們下節課會介紹軟邊界的SVM,就會從數學上介紹這到底是怎麼回事。

第三節:利用套件做出SVM(1)

– 自己寫函數雖然對學習它的原理很有幫助,但畢竟不是好的解決之道(為何要重新發明輪子?),套件「e1071」是在R裡面做SVM計算最常用的套件,讓我們用它來對iris分類一下

– 需要注意的是這裡有個參數【cost】,這是「違反\(y(w^Tx_i + b) \geq 1\)所要付出的代價」,我們目前完全不允許他違反,所以先設一個非常大的代價,下節課我們會介紹如何將這個參數帶入SVM:

library(e1071)

x1 = c(0, 2, 2, 3)
x2 = c(0, 2, 0, 0)
y = c(1, 1, -1, -1)

svm.model = svm(y ~ x1 + x2, kernel = "linear", scale = FALSE, type = "C-classification", cost = 1e5)

– 這就是「支持向量」(所有\(\alpha_i > 0\)的點)

svm.model$SV
##   x1 x2
## 1  0  0
## 2  2  2
## 3  2  0

– 這是每個「支持向量」的拉格朗日乘數乘以標籤\(\alpha_i y_i\)

svm.model$coefs
##      [,1]
## [1,]  0.5
## [2,]  0.5
## [3,] -1.0

– 這是負數截距\(-b\)

svm.model$rho
## [1] -1
W.vector = rowSums(sapply(1:length(svm.model$coefs), function(i) svm.model$coefs[i]*svm.model$SV[i,]))
w0 = -svm.model$rho
w1 = W.vector[1]
w2 = W.vector[2]
A0 = -w0/w2
A1 = A0 + 1/w2
A2 = A0 - 1/w2
B = -w1/w2

plot(x1, x2, col =  y + 3, pch = 19, cex = 3)
abline(a = A0, b = B, lwd = 2, lty = 1)
abline(a = A1, b = B, lwd = 2, lty = 2)
abline(a = A2, b = B, lwd = 2, lty = 2)

svm.model$decision.values
##   1/-1
## 1    1
## 2    1
## 3   -1
## 4   -2

第三節:利用套件做出SVM(2)

data(iris)
sub.iris = iris[1:100,c(1, 2, 5)]
sub.iris[,3] = as.factor(as.character(sub.iris[,3]))
X1 = sub.iris[,1]
X2 = sub.iris[,2]
Y = as.integer(sub.iris[,3])

svm.model = svm(Species ~ Sepal.Length + Sepal.Width, data = sub.iris, kernel = "linear", scale = FALSE, type = "C-classification", cost = 1e5)

W.vector = rowSums(sapply(1:length(svm.model$coefs), function(i) svm.model$coefs[i]*svm.model$SV[i,]))
w0 = -svm.model$rho
w1 = W.vector[1]
w2 = W.vector[2]

A0 = -w0/w2
A1 = A0 + 1/w2
A2 = A0 - 1/w2
B = -w1/w2

plot(X1, X2, col =  Y * 2 + 2, pch = 19)
abline(a = A0, b = B, lwd = 2, lty = 1)
abline(a = A1, b = B, lwd = 2, lty = 2)
abline(a = A2, b = B, lwd = 2, lty = 2)

predict(svm.model, data.frame(Sepal.Length = 5, Sepal.Width = 4))
##      1 
## setosa 
## Levels: setosa versicolor
predict(svm.model, data.frame(Sepal.Length = 6, Sepal.Width = 3))
##          1 
## versicolor 
## Levels: setosa versicolor
plot(svm.model, sub.iris)

練習2:重現套件預測機制

– 關鍵是你如何求得「w0、w1、w2」

– 提示:先想想「decision.values」是怎樣算出來的

data(iris)
sub.iris = iris[1:100,c(1, 2, 5)]
sub.iris[,3] = as.factor(as.character(sub.iris[,3]))
X1 = sub.iris[,1]
X2 = sub.iris[,2]
Y = as.integer(sub.iris[,3])

svm.model = svm(Species ~ Sepal.Length + Sepal.Width, data = sub.iris, kernel = "linear", scale = FALSE, type = "C-classification", cost = 1e5)

W.vector = rowSums(sapply(1:length(svm.model$coefs), function(i) svm.model$coefs[i]*svm.model$SV[i,]))
w0 = -svm.model$rho
w1 = W.vector[1]
w2 = W.vector[2]

predict(svm.model, data.frame(Sepal.Length = 5, Sepal.Width = 4), decision.values = TRUE)
##      1 
## setosa 
## attr(,"decision.values")
##   setosa/versicolor
## 1          6.791608
## Levels: setosa versicolor

練習2答案

– 使用Sepal.Length = 5以及Sepal.Width = 4的點。

W.vector = rowSums(sapply(1:length(svm.model$coefs), function(i) svm.model$coefs[i]*svm.model$SV[i,]))
w0 = -svm.model$rho
w1 = W.vector[1]
w2 = W.vector[2]

w1 * 5 + w2 * 4 + w0
## Sepal.Length 
##     6.791608

課程小結

– 目前的支持向量機要求所有的點都必須滿足\(y(w^Tx_i + b) \geq 1\),也就是我們只能求有完美線性分割解的問題,而這個邏輯斯回歸也能解出一樣的結果,並且邏輯斯回歸還能面對不能完美分割的情形。

– 我們還是沒有解決高維空間投影的部分,現在如果想要多幾個變項還是需要手動進行特徵工程,有點不方便。

– 透過這個關鍵的轉換,我們才有可能面對「無限多維」的高維空間,我們下節課會繼續介紹支持向量機的擴展,讓你感受這個強大的分類器為什麼會在深度學習突破到來之前統治世界。

– 至今為止,支持向量機可能還是在數值分析中最好用的工具之一。