機器學習及演算法

林嶔 (Lin, Chin)

Lesson 14 機器學習概論5(線性支持向量機)

第一節:線性支持向量機介紹(1)

– 他希望有自動進行非線性特徵工程的能力。

F01

– 我們首先要先介紹線性SVM,他是在1963年被創造出來做做線性分割的工作(與邏輯斯回歸相同),但「kernel trick」是在1992年後才發展出以此方法來解決原始特徵空間「線性不可分割」的問題

第一節:線性支持向量機介紹(2)

– 在一個2維平面中,他希望找到一條線能完美的分割紅點與藍點

x1 = c(0, 2, 2, 3)
x2 = c(0, 2, 0, 0)
y = c(1, 1, -1, -1)
plot(x1, x2, col = y + 3, pch = 19, cex = 3)

第一節:線性支持向量機介紹(3)

– 我們當然一眼就能看出有好多條線都能輕易完美分割這四個點,舉例來說,\(x_2 = -0.5 + 0.5 \times x_1\)能幫我們輕易的切開紅藍點

plot(x1, x2, col =  y + 3, pch = 19, cex = 3)
abline(a = -0.5, b = 0.5, lwd = 2, lty = 1)

– 但這樣是不夠的,儘管我們不清楚原因,但你應該覺得\(x_2 = -1 + 1 \times x_1\)這條線切得更好吧!

plot(x1, x2, col =  y + 3, pch = 19, cex = 3)
abline(a = -1, b = 1, lwd = 2, lty = 1)

第一節:線性支持向量機介紹(4)

plot(x1, x2, col =  y + 3, pch = 19, cex = 3)
abline(a = -1, b = 1, lwd = 2, lty = 1)
abline(a = 0, b = 1, lwd = 2, lty = 2)
abline(a = -2, b = 1, lwd = 2, lty = 2)

  1. 我們想要找到一條線,能完美切割這些點(約束條件)

  2. 這條線離最接近的點要最遠(取極大值)

第一節:線性支持向量機介紹(5)

– 在這樣的條線下,\(w_0 = 1\)\(w_1 = -1\)\(w_2 = 1\),這能滿足這個例子上我們要的解

distance.func = function (x1, x2, w0, w1, w2) {
  dist = 1/sqrt(w1^2 + w2^2) * abs(w0 + w1 * x1 + w2 * x2)
  return(dist)
}

distance.func(0, 0, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(2, 2, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(2, 0, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(3, 0, w0 = 1, w1 = -1, w2 = 1)
## [1] 1.414214

第一節:線性支持向量機介紹(6)

distance.func = function (x1, x2, y, w0, w1, w2) {
  dist = 1/sqrt(w1^2 + w2^2) * y * (w0 + w1 * x1 + w2 * x2)
  return(dist)
}

distance.func(0, 0, 1, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(2, 2, 1, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(2, 0, -1, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(3, 0, -1, w0 = 1, w1 = -1, w2 = 1)
## [1] 1.414214

\[ \begin{align} \text{distance}(x_i, b, w) & = \frac{1}{||w||} y_i (w^Tx_i + b) \end{align} \]

第一節:線性支持向量機介紹(7)

– 所有點的距離可以表達成這樣:

\[ \begin{align} \text{distance}(x_1, b, w) & = \frac{1}{||w||} y_i (w^Tx_1 + b) \\ \text{distance}(x_2, b, w) & = \frac{1}{||w||} y_i (w^Tx_2 + b) \\ \cdots \\ \text{distance}(x_n, b, w) & = \frac{1}{||w||} y_i (w^Tx_n + b) \end{align} \]

– 因此,距離最小的點就可以表達成這樣:

\[ \begin{align} min_{i=1, \cdots , n} \text{distance}(x_i, b, w) = min \frac{1}{||w||} y_i (w^Tx_i + b) \end{align} \]

– 我們希望的是,最大化最小距離的點的距離

\[ \begin{align} max \left[ min \left[ \frac{1}{||w||} y_i (w^Tx_i + b) \right] \right] \end{align} \]

第一節:線性支持向量機介紹(8)

distance.func(0, 0, 1, w0 = 1, w1 = -1, w2 = 1)
## [1] 0.7071068
distance.func(0, 0, 1, w0 = 3, w1 = -3, w2 = 3)
## [1] 0.7071068

– 既然縮放是沒有意義的,我們就可以強迫剛剛圖旁邊的兩條虛線為\(w0 + w_1x_1 + w_2x_2 = 1\)\(w_0 + w_1x_1 + w_2x_2 = -1\)

\[ \begin{align} w^Tx_i + b & = 1 \text{ or } -1 \\ y(w^Tx_i + b) & = 1 \end{align} \]

\[ min \left[ y(w^Tx_i + b) \right] = 1 \]

– 我們再重新整理一下求解式子:

\[ \begin{align} & max \frac{1}{||w||} \\ \text{subject to } & min \left[ y(w^Tx_i + b) \right] = 1 \end{align} \]

第一節:線性支持向量機介紹(9)

\[ \begin{align} & max \frac{1}{||w||} \\ \text{subject to } & y(w^Tx_i + b) \geq 1 \end{align} \]

– 假設真的有最小的數值真的大於1為一個數字如1.2,那這樣\(w\)\(b\)必然就可以同時除以這個數字,導致\(\frac{1}{||w||}\)可以來的更大。

– 因此這個改寫是合法的,當保證\(\frac{1}{||w||}\)最大化的前提之下,我們可以保證條件\(y(w^Tx_i + b) \geq 1\)可以求得與條件\(min \left[ y(w^Tx_i + b) \right] = 1\)相同的解。

\[ \begin{align} & min \frac{1}{2}w^Tw \\ \text{subject to } & y(w^Tx_i + b) \geq 1 \end{align} \]

– 在這樣的前提之下,我們試著把不同\(w\)\(b=1\)時的求解\(loss\)給畫出來:

第一節:線性支持向量機介紹(10)

\[ \begin{align} & min \frac{1}{2}w^Tw \\ \text{subject to } & y(w^Tx_i + b) \geq 1 \end{align} \]

F02

\[ \begin{align} & min \text{ } \frac{1}{2}b^TDb - d^Tb \\ \text{ subject to } & A^Tb \geq b_0 \end{align} \]

第一節:線性支持向量機介紹(11)

\[ \begin{align} & min \text{ } \frac{1}{2}w^Tw & \text{ subject to } y(w^Tx_i + b) \geq 1 \\ & min \text{ } \frac{1}{2}b^TDb - d^Tb & \text{ subject to } A^Tb \geq b_0 \end{align} \]

\[ \begin{align} b = \begin{pmatrix} b \\ w_1 \\ w_2 \end{pmatrix} \space D = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \space d = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} \space A = \begin{pmatrix} y_1 & y_2 & y_3 & y_4 \\ y_1x_{1,1} & y_2x_{1,2} & y_3x_{1,3} & y_4x_{1,4} \\ y_1x_{2,1} & y_2x_{2,2} & y_3x_{2,3} & y_4x_{2,4} \end{pmatrix} \space b_0 = \begin{pmatrix} 1 \\ 1 \\ 1 \\ 1 \end{pmatrix} \end{align} \]

library(quadprog)
n.sample = 4
n.weight = 3 
small.value = 5e-6
D.matrix = matrix(small.value, nrow = n.weight, ncol = n.weight)
diag(D.matrix) = 1
D.matrix[1,1] = small.value
A.matrix = rbind(rep(1, n.sample)*y, x1*y, x2*y)
fit = solve.QP(Dmat = D.matrix, dvec = rep(0, n.weight), Amat = A.matrix, bvec = rep(1, n.sample))
fit$solution
## [1]  1 -1  1

第一節:線性支持向量機介紹(12)

– 當答案解出來之後,我們就能利用這個解來畫線了

COEF = fit$solution
A0 = -COEF[1]/COEF[3]
A1 = A0 + 1/COEF[3]
A2 = A0 - 1/COEF[3]
B = -COEF[2]/COEF[3]

plot(x1, x2, col =  y + 3, pch = 19, cex = 3)
abline(a = A0, b = B, lwd = 2, lty = 1)
abline(a = A1, b = B, lwd = 2, lty = 2)
abline(a = A2, b = B, lwd = 2, lty = 2)

– 我們把它寫成一個完整的求解函數

mysvm = function (x1, x2, y) {
  
  require(quadprog)
  
  n.sample = length(x1)
  n.weight = 3
  small.value = 5e-6
  D.matrix = matrix(small.value, nrow = n.weight, ncol = n.weight);diag(D.matrix) = 1; D.matrix[1,1] = small.value
  A.matrix = rbind(rep(1, n.sample)*y, x1*y, x2*y)
  fit = solve.QP(Dmat = D.matrix, dvec = rep(0, n.weight), Amat = A.matrix, bvec = rep(1, n.sample))
  
  COEF = fit$solution
  A0 = -COEF[1]/COEF[3]
  A1 = A0 + 1/COEF[3]
  A2 = A0 - 1/COEF[3]
  B = -COEF[2]/COEF[3]
  
  plot(x1, x2, col =  y + 3, pch = 19)
  abline(a = A0, b = B, lwd = 2, lty = 1)
  abline(a = A1, b = B, lwd = 2, lty = 2)
  abline(a = A2, b = B, lwd = 2, lty = 2)
  
}

– 讓我們試試不同的樣本看看:

mysvm(x1 = c(0, 2, 2, 3),
      x2 = c(0, 2, 0, 0),
      y = c(1, 1, -1, -1))

mysvm(x1 = c(0, 2, 3, 4, 5),
      x2 = c(0, 2, 0, 0, 3),
      y = c(1, 1, -1, -1, -1))

練習1:試著更了解線性SVM

– 讓我們先用最簡單的範例資料來進行實驗。

data(iris)
sub.iris <- iris[1:100,]
X1 <- sub.iris[,1]
X2 <- sub.iris[,2]
Y <- as.integer(sub.iris[,5]) * 2 - 3

mysvm(x1 = X1, x2 = X2, y = Y)