林嶔 (Lin, Chin)
Lesson 12 機器學習概論3(反向傳播法與多層感知機)
– 下面是一個能夠對任意2維分布做邏輯斯迴歸運算並分類的函數,請大家試試看不同分布的狀況下邏輯斯迴歸的分類能力。
LOGISTIC_PLOT <- function (x1, x2, y, fomula) {
require(scales)
require(plot3D)
model = glm(fomula, family = 'binomial')
x1_seq = seq(min(x1), max(x1), length.out = 100)
x2_seq = seq(min(x2), max(x2), length.out = 100)
z_matrix = sapply(x2_seq, function(x) {1/(1+exp(-predict(model, data.frame(x1 = x1_seq, x2 = x))))})
image2D(z = z_matrix,
x = x1_seq, xlab = 'x1',
y = x2_seq, ylab = 'x2',
shade = 0.2, rasterImage = TRUE,
col = colorRampPalette(c("#FFA0A0", "#FFFFFF", "#A0A0FF"))(100))
points(x1, x2, col = (y + 1)*2, pch = 19, cex = 0.5)
}
set.seed(0)
x1 = rnorm(50, sd = 1)
x2 = rnorm(50, sd = 1)
lr1 = 1 + 0.5 * x1 + 2 * x2
y = lr1 > 0
LOGISTIC_PLOT(x1 = x1, x2 = x2, y = y, fomula = y ~ x1 + x2)
set.seed(0)
x1 = rnorm(50, sd = 1)
x2 = rnorm(50, sd = 1)
lr1 = 1 + 0.5 * x1 + 0.7 * x2 + 3 * x1 * x2
y = lr1 > 0
LOGISTIC_PLOT(x1 = x1, x2 = x2, y = y, fomula = y ~ x1 + x2)
set.seed(0)
x1 = rnorm(50, sd = 1)
x2 = rnorm(50, sd = 1)
lr1 = - 0.5 + 0.5 * x1^2 + 0.3 * x2^2 + 0.4 * x1 * x2
y = lr1 > 0
LOGISTIC_PLOT(x1 = x1, x2 = x2, y = y, fomula = y ~ x1 + x2 + x1:x2)
– 線性迴歸
\[\hat{y} = f(x) = b_{0} + b_{1}x_1 + b_{2}x_2\]
– 邏輯斯回歸
\[log(\frac{{p}}{1-p}) = b_{0} + b_{1}x_1 + b_{2}x_2\]
– 這時候生物學家開始踏入了這個領域,如果我們能了解大腦是如何運作的,那是否能模仿其結構進行分類?
– 神經細胞的構造如下,不論是何種神經元皆可分成:接收區、觸發區、傳導區和輸出區。
透過樹突(dendrite)能接收上一個神經元的訊息,而有些會在接收訊息後產生抑制性作用,有些會產生興奮性作用,然後這些訊號再透過神經元整合,之後再透過軸突(axon)將訊號傳導出去。
我們根據這樣的生物學知識,開始來用電腦模擬一個簡單的神經元。
\[ \begin{align} \mbox{weighted sum} & = w_{0} + w_{1}x_1 + w_{2}x_2 + \dots \\ \hat{y} & = step(\mbox{weighted sum}) \end{align} \]
– 既然是邏輯斯回歸的簡單版,那就不用談了,我們已經很清楚邏輯斯回歸的極限了。
– 我們不要拿太複雜的結構,就用下面這個最簡單的結構,這被稱為多層感知機(Multilayer Perceptron, MLP)
\[L^k(x_1, x_2) = w_{0}^k + w_{1}^kx_1 + w_{2}^kx_2\]
\[ \begin{align} S(x) & = \frac{{1}}{1+e^{-x}} \end{align} \]
\[ \begin{align} h_1 & = S(L^1(x_1, x_2)) \\ h_2 & = S(L^2(x_1, x_2)) \\ o & = S(L^3(h_1, h_2)) \end{align} \]
\[ \begin{align} h_1 & = S(L^1(x_1, x_2)) \\ h_2 & = S(L^2(x_1, x_2)) \\ o & = S(L^3(h_1, h_2)) \\ loss & = CE(y, o) = \frac{{1}}{n}\sum \limits_{i=1}^{n} -\left(y_{i} \cdot log(o_{i}) + (1-y_{i}) \cdot log(1-o_{i})\right) \end{align} \]
– 其實這種複雜的微分是有規律可循的,我們把幾個重要的微分工具寫出來:
\[\frac{\partial}{\partial x}h(x) = \frac{\partial}{\partial x}f(g(x)) = \frac{\partial}{\partial g(x)}f(g(x)) \cdot\frac{\partial}{\partial x}g(x)\]
\[ \begin{align} \frac{\partial}{\partial x}S(x) & = S(x)(1-S(x)) \end{align} \]
\[ \begin{align} \frac{\partial}{\partial p_i}CE(y_i, p_i) & = \frac{p_i - y_i}{p_i(1-p_i)} \end{align} \]
– 先求離結果比較近的部分(比較深層)
\[ \begin{align} \frac{\partial}{\partial w_{0}^3}CE(y, o) & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left(\frac{\partial}{\partial w_{0}^3}CE(y_i, o_i)\right) \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left(\frac{\partial}{\partial o_i}CE(y_i, o_i) \cdot \frac{\partial}{\partial w_{0}^3}S(L^3(h_{1i}, h_{2i})) \right) \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left(\frac{o_i-y_i}{o_i(1-o_i)} \cdot \frac{\partial}{\partial L^3(h_{1i}, h_{2i})}S(L^3(h_{1i}, h_{2i})) \cdot \frac{\partial}{\partial w_{0}^3}L^3(h_{1i}, h_{2i}) \right) \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left(\frac{o_i-y_i}{o_i(1-o_i)} \cdot S(L^3(h_{1i}, h_{2i})) \cdot (1 - S(L^3(h_{1i}, h_{2i}))) \cdot \frac{\partial}{\partial w_{0}^3} \left( w_{0}^3 + w_{1}^3h_{1i} + w_{2}^3h_{2i} \right) \right) \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left(\frac{o_i-y_i}{o_i(1-o_i)} \cdot o_i(1-o_i) \right) \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \end{align} \]
\[ \begin{align} \frac{\partial}{\partial w_{1}^3}CE(y, o) & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot h_{1i} \end{align} \]
\[ \begin{align} \frac{\partial}{\partial w_{2}^3}CE(y, o) & = \frac{{1}}{n}\sum \limits_{i=1}^{n}\left( o_i-y_i \right) \cdot h_{2i} \end{align} \]
– 再求離結果比較遠的部分(比較淺層)
\[ \begin{align} \frac{\partial}{\partial w_{0}^2}CE(y, o) & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot \frac{\partial}{\partial w_{0}^2} \left( w_{0}^3 + w_{1}^3h_{1i} + w_{2}^3h_{2i} \right) \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot w_{2}^3\cdot \frac{\partial}{\partial w_{0}^2} \left( h_{2i} \right) \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot w_{2}^3\cdot \frac{\partial}{\partial w_{0}^2} \left( S(L^2(x_{1i}, x_{2i})) \right) \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot w_{2}^3 \cdot \frac{\partial}{\partial L^2(x_{1i}, x_{2i})} S(L^2(x_{1i}, x_{2i})) \cdot \frac{\partial}{\partial w_{0}^2} L^2(x_{1i}, x_{2i}) \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot w_{2}^3 \cdot h_{2i} (1 - h_{2i}) \cdot \frac{\partial}{\partial w_{0}^2} (w_{0}^2 + w_{1}^2x_{1i} + w_{2}^2x_{2i}) \\ & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot w_{2}^3 \cdot h_{2i} (1 - h_{2i}) \end{align} \]
\[ \begin{align} \frac{\partial}{\partial w_{1}^2}CE(y, o) & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot w_{2}^3 \cdot h_{2i} (1 - h_{2i}) \cdot x_{1i} \end{align} \]
\[ \begin{align} \frac{\partial}{\partial w_{2}^2}CE(y, o) & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot w_{2}^3 \cdot h_{2i} (1 - h_{2i}) \cdot x_{2i} \end{align} \]
\[ \begin{align} \frac{\partial}{\partial w_{0}^1}CE(y, o) & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot w_{1}^3 \cdot h_{1i} (1 - h_{1i}) \end{align} \]
\[ \begin{align} \frac{\partial}{\partial w_{1}^1}CE(y, o) & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot w_{1}^3 \cdot h_{1i} (1 - h_{1i}) \cdot x_{1i} \end{align} \]
\[ \begin{align} \frac{\partial}{\partial w_{2}^1}CE(y, o) & = \frac{{1}}{n}\sum \limits_{i=1}^{n} \left( o_i-y_i \right) \cdot w_{1}^3 \cdot h_{1i} (1 - h_{1i}) \cdot x_{2i} \end{align} \]
#Sample generation
set.seed(0)
x1 = rnorm(50, sd = 1)
x2 = rnorm(50, sd = 1)
lr1 = - 1.5 + x1^2 + x2^2
y = lr1 > 0 + 0L
#Forward
S.fun = function (x, eps = 1e-5) {
S = 1/(1 + exp(-x))
S[S < eps] = eps
S[S > 1 - eps] = 1 - eps
return(S)
}
h1.fun = function (w10, w11, w12, x1 = x1, x2 = x2) {
L1 = w10 + w11 * x1 + w12 * x2
return(S.fun(L1))
}
h2.fun = function (w20, w21, w22, x1 = x1, x2 = x2) {
L2 = w20 + w21 * x1 + w22 * x2
return(S.fun(L2))
}
o.fun = function (w30, w31, w32, h1, h2) {
L3 = w30 + w31 * h1 + w32 * h2
return(S.fun(L3))
}
loss.fun = function (o, y = y) {
loss = -1/length(y) * sum(y * log(o) + (1 - y) * log(1 - o))
return(loss)
}
#Backward
differential.fun.w30 = function(o, y = y) {
return(-1/length(y)*sum(y-o))
}
differential.fun.w31 = function(o, h1, y = y) {
return(-1/length(y)*sum((y-o)*h1))
}
differential.fun.w32 = function(o, h2, y = y) {
return(-1/length(y)*sum((y-o)*h2))
}
differential.fun.w20 = function(o, h2, w32, y = y) {
return(-1/length(y)*sum((y-o)*w32*h2*(1-h2)))
}
differential.fun.w21 = function(o, h2, w32, y = y, x1 = x1) {
return(-1/length(y)*sum((y-o)*w32*h2*(1-h2)*x1))
}
differential.fun.w22 = function(o, h2, w32, y = y, x2 = x2) {
return(-1/length(y)*sum((y-o)*w32*h2*(1-h2)*x2))
}
differential.fun.w10 = function(o, h1, w31, y = y) {
return(-1/length(y)*sum((y-o)*w31*h1*(1-h1)))
}
differential.fun.w11 = function(o, h1, w31, y = y, x1 = x1) {
return(-1/length(y)*sum((y-o)*w31*h1*(1-h1)*x1))
}
differential.fun.w12 = function(o, h1, w31, y = y, x2 = x2) {
return(-1/length(y)*sum((y-o)*w31*h1*(1-h1)*x2))
}
num.iteration = 10000
lr = 0.1
W_matrix = matrix(0, nrow = num.iteration + 1, ncol = 9)
loss_seq = rep(0, num.iteration)
colnames(W_matrix) = c('w10', 'w11', 'w12', 'w20', 'w21', 'w22', 'w30', 'w31', 'w32')
#Start random values
W_matrix[1,] = rnorm(9, sd = 1)
for (i in 2:(num.iteration+1)) {
#Forward
current_H1 = h1.fun(w10 = W_matrix[i-1,1], w11 = W_matrix[i-1,2], w12 = W_matrix[i-1,3],
x1 = x1, x2 = x2)
current_H2 = h2.fun(w20 = W_matrix[i-1,4], w21 = W_matrix[i-1,5], w22 = W_matrix[i-1,6],
x1 = x1, x2 = x2)
current_O = o.fun(w30 = W_matrix[i-1,7], w31 = W_matrix[i-1,8], w32 = W_matrix[i-1,9],
h1 = current_H1, h2 = current_H2)
loss_seq[i-1] = loss.fun(o = current_O, y = y)
#Backward
W_matrix[i,1] = W_matrix[i-1,1] - lr * differential.fun.w10(o = current_O, h1 = current_H1,
w31 = W_matrix[i-1,8], y = y)
W_matrix[i,2] = W_matrix[i-1,2] - lr * differential.fun.w11(o = current_O, h1 = current_H1,
w31 = W_matrix[i-1,8], y = y, x1 = x1)
W_matrix[i,3] = W_matrix[i-1,3] - lr * differential.fun.w12(o = current_O, h1 = current_H1,
w31 = W_matrix[i-1,8], y = y, x2 = x2)
W_matrix[i,4] = W_matrix[i-1,4] - lr * differential.fun.w20(o = current_O, h2 = current_H2,
w32 = W_matrix[i-1,9], y = y)
W_matrix[i,5] = W_matrix[i-1,5] - lr * differential.fun.w21(o = current_O, h2 = current_H2,
w32 = W_matrix[i-1,9], y = y, x1 = x1)
W_matrix[i,6] = W_matrix[i-1,6] - lr * differential.fun.w22(o = current_O, h2 = current_H2,
w32 = W_matrix[i-1,9], y = y, x2 = x2)
W_matrix[i,7] = W_matrix[i-1,7] - lr * differential.fun.w30(o = current_O, y = y)
W_matrix[i,8] = W_matrix[i-1,8] - lr * differential.fun.w31(o = current_O, h1 = current_H1, y = y)
W_matrix[i,9] = W_matrix[i-1,9] - lr * differential.fun.w32(o = current_O, h2 = current_H2, y = y)
}
require(scales)
require(plot3D)
x1_seq = seq(min(x1), max(x1), length.out = 100)
x2_seq = seq(min(x2), max(x2), length.out = 100)
pre_func = function (x1, x2, W_list = W_matrix[nrow(W_matrix),]) {
H1 = h1.fun(w10 = W_list[1], w11 = W_list[2], w12 = W_list[3], x1 = x1, x2 = x2)
H2 = h2.fun(w20 = W_list[4], w21 = W_list[5], w22 = W_list[6], x1 = x1, x2 = x2)
O = o.fun(w30 = W_list[7], w31 = W_list[8], w32 = W_list[9], h1 = H1, h2 = H2)
return(O)
}
z_matrix = sapply(x2_seq, function(x) {pre_func(x1 = x1_seq, x2 = x)})
image2D(z = z_matrix,
x = x1_seq, xlab = 'x1',
y = x2_seq, ylab = 'x2',
shade = 0.2, rasterImage = TRUE,
col = colorRampPalette(c("#FFA0A0", "#FFFFFF", "#A0A0FF"))(100))
points(x1, x2, col = (y + 1)*2, pch = 19, cex = 0.5)
– 透過矩陣能夠將多層感知機變成較為簡易的公式組合(以下為個人層級的方程式):
\[ \begin{align} L^k_d(X, W^k_d) & = XW^k_d \\ X & = \begin{pmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,m} \end{pmatrix} \\ W^k_d & = \begin{pmatrix} w^k_{0,1} & w^k_{0,2} & \cdots & w^k_{0,d} \\ w^k_{1,1} & w^k_{1,2} & \cdots & w^k_{1,d} \\ w^k_{2,1} & w^k_{2,2} & \cdots & w^k_{2,d} \\ \vdots & \vdots & \ddots & \vdots \\ w^k_{m,1} & w^k_{m,2} & \cdots & w^k_{m,d} \end{pmatrix} \\ \frac{\partial}{\partial W^k_d}L^k_d(X) & = \begin{pmatrix} X^T & & X^T & \cdots & X^T \\ \end{pmatrix} \mbox{ [repeat } d \mbox{ times]} \end{align} \]
\[ \begin{align} S(x) & = \frac{{1}}{1+e^{-x}} \\ \frac{\partial}{\partial x}S(x) & = S(x)(1-S(x)) \end{align} \]
\[ \begin{align} l_1 & = L^1_d(x^E,W^1_d) \\ h_1 & = S(l_1) \\ l_2 & = L^2_1(h_1^E,W^2_1) \\ o & = S(l_2) \end{align} \]
\[ \begin{align} l_1 & = L^1_d(x^E,W^1_d) \\ h_1 & = S(l_1) \\ l_2 & = L^2_1(h_1^E,W^2_1) \\ o & = S(l_2) \\ loss & = CE(y, o) = \frac{{1}}{n}\sum \limits_{i=1}^{n} -\left(y \cdot log(o) + (1-y) \cdot log(1-o)\right) \end{align} \]
\[ \begin{align} grad.o & = \frac{\partial}{\partial o}loss = \frac{o-y}{o(1-o)} \\ grad.l_2 & = \frac{\partial}{\partial l_2}loss = grad.o \otimes \frac{\partial}{\partial l_2}o= o-y \\ grad.W^2_1 & = \frac{\partial}{\partial W^2_1}loss = grad.l_2 \otimes \frac{\partial}{\partial W^2_1}l_2 = \frac{1}{n} \otimes (h_1^E)^T \bullet grad.l_2\\ grad.h_1^E & = \frac{\partial}{\partial h_1^E}loss = grad.l_2 \otimes \frac{\partial}{\partial h_1^E}l_2 = grad.l_2 \bullet (W^2_1)^T \\ grad.l_1 & = \frac{\partial}{\partial l_1}loss = grad.h_1 \otimes \frac{\partial}{\partial l_1}h_1 = grad.h_1 \otimes h_1 \otimes (1-h_1) \\ grad.W^1_d & = \frac{\partial}{\partial W^1_d}loss = grad.l_1 \otimes \frac{\partial}{\partial W^1_d}l_1 = \frac{1}{n} \otimes (x^E)^T \bullet grad.l_1 \end{align} \]
– 我們用程式碼描述上述式子:
#Sample generation
set.seed(0)
x1 = rnorm(50, sd = 1)
x2 = rnorm(50, sd = 1)
lr1 = - 1.5 + x1^2 + x2^2
y = lr1 > 0 + 0L
#Forward
S.fun = function (x, eps = 1e-5) {
S = 1/(1 + exp(-x))
S[S < eps] = eps
S[S > 1 - eps] = 1 - eps
return(S)
}
L.fun = function (X, W) {
X.E = cbind(1, X)
L = X.E %*% W
return(L)
}
CE.fun = function (o, y, eps = 1e-5) {
loss = -1/length(y) * sum(y * log(o + eps) + (1 - y) * log(1 - o + eps))
return(loss)
}
#Backward
grad_o.fun = function (o, y) {
return((o - y)/(o*(1-o)))
}
grad_l2.fun = function (grad_o, o) {
return(grad_o*(o*(1-o)))
}
grad_W2.fun = function (grad_l2, h1) {
h1.E = cbind(1, h1)
return(t(h1.E) %*% grad_l2/nrow(h1))
}
grad_h1.fun = function (grad_l2, W2) {
return(grad_l2 %*% t(W2[-1,]))
}
grad_l1.fun = function (grad_h1, h1) {
return(grad_h1*(h1*(1-h1)))
}
grad_W1.fun = function (grad_l1, x) {
x.E = cbind(1, x)
return(t(x.E) %*% grad_l1/nrow(x))
}
MLP_Trainer = function (num.iteration = 500, num.hidden = 2, lr = 0.1, x1 = x1, x2 = x2, y = y) {
#Functions
#Forward
S.fun = function (x, eps = 1e-5) {
S = 1/(1 + exp(-x))
S[S < eps] = eps
S[S > 1 - eps] = 1 - eps
return(S)
}
L.fun = function (X, W) {
X.E = cbind(1, X)
L = X.E %*% W
return(L)
}
CE.fun = function (o, y, eps = 1e-5) {
loss = -1/length(y) * sum(y * log(o + eps) + (1 - y) * log(1 - o + eps))
return(loss)
}
#Backward
grad_o.fun = function (o, y) {
return((o - y)/(o*(1-o)))
}
grad_l2.fun = function (grad_o, o) {
return(grad_o*(o*(1-o)))
}
grad_W2.fun = function (grad_l2, h1) {
h1.E = cbind(1, h1)
return(t(h1.E) %*% grad_l2/nrow(h1))
}
grad_h1.fun = function (grad_l2, W2) {
return(grad_l2 %*% t(W2[-1,]))
}
grad_l1.fun = function (grad_h1, h1) {
return(grad_h1*(h1*(1-h1)))
}
grad_W1.fun = function (grad_l1, x) {
x.E = cbind(1, x)
return(t(x.E) %*% grad_l1/nrow(x))
}
#Caculating
X_matrix = cbind(x1, x2)
W1_list = list()
W2_list = list()
loss_seq = rep(0, num.iteration)
#Start random values
W1_list[[1]] = matrix(rnorm(3*num.hidden, sd = 1), nrow = 3, ncol = num.hidden)
W2_list[[1]] = matrix(rnorm(num.hidden + 1, sd = 1), nrow = num.hidden + 1, ncol = 1)
for (i in 2:(num.iteration+1)) {
#Forward
current_l1 = L.fun(X = X_matrix, W = W1_list[[i - 1]])
current_h1 = S.fun(x = current_l1)
current_l2 = L.fun(X = current_h1, W = W2_list[[i - 1]])
current_o = S.fun(x = current_l2)
loss_seq[i-1] = CE.fun(o = current_o, y = y, eps = 1e-5)
#Backward
current_grad_o = grad_o.fun(o = current_o, y = y)
current_grad_l2 = grad_l2.fun(grad_o = current_grad_o, o = current_o)
current_grad_W2 = grad_W2.fun(grad_l2 = current_grad_l2, h1 = current_h1)
current_grad_h1 = grad_h1.fun(grad_l2 = current_grad_l2, W2 = W2_list[[i - 1]])
current_grad_l1 = grad_l1.fun(grad_h1 = current_grad_h1, h1 = current_h1)
current_grad_W1 = grad_W1.fun(grad_l1 = current_grad_l1, x = X_matrix)
W2_list[[i]] = W2_list[[i-1]] - lr * current_grad_W2
W1_list[[i]] = W1_list[[i-1]] - lr * current_grad_W1
}
require(scales)
require(plot3D)
x1_seq = seq(min(x1), max(x1), length.out = 100)
x2_seq = seq(min(x2), max(x2), length.out = 100)
pre_func = function (x1, x2, W1 = W1_list[[length(W1_list)]], W2 = W2_list[[length(W2_list)]]) {
new_X = cbind(x1, x2)
O = S.fun(x = L.fun(X = S.fun(x = L.fun(X = new_X, W = W1)), W = W2))
return(O)
}
z_matrix = sapply(x2_seq, function(x) {pre_func(x1 = x1_seq, x2 = x)})
par(mfrow = c(1, 2))
image2D(z = z_matrix,
x = x1_seq, xlab = 'x1',
y = x2_seq, ylab = 'x2',
shade = 0.2, rasterImage = TRUE,
col = colorRampPalette(c("#FFA0A0", "#FFFFFF", "#A0A0FF"))(100))
points(x1, x2, col = (y + 1)*2, pch = 19, cex = 0.5)
plot(loss_seq, type = 'l', main = 'loss', xlab = 'iter.', ylab = 'CE loss')
}
– 請至這裡下載範例資料
dat <- read.csv("ECG_train.csv", header = TRUE, fileEncoding = 'CP950', stringsAsFactors = FALSE, na.strings = "")
– 這是想要做線性預測的方式:
subdat <- dat[!(dat[,'K'] %in% NA) & !(dat[,'GENDER'] %in% NA) & !(dat[,'AGE'] %in% NA), c('K', 'GENDER', 'AGE')]
subdat[,'GENDER'] <- as.factor(subdat[,'GENDER'])
fit_nnet <- nnet(K ~ GENDER + AGE, data = subdat, size = 5, maxit = 100, linout = TRUE, decay = 0.001, trace = FALSE, MaxNWts = 1e5)
pred_val <- predict(fit_nnet, subdat)
– 這是想要做二元類別預測的方式:
subdat <- dat[!(dat[,'LVD'] %in% NA) & !(dat[,'GENDER'] %in% NA) & !(dat[,'AGE'] %in% NA), c('LVD', 'GENDER', 'AGE')]
subdat[,'GENDER'] <- as.factor(subdat[,'GENDER'])
fit_nnet <- nnet(LVD ~ GENDER + AGE, data = subdat, size = 5, maxit = 100, entropy = TRUE, decay = 0.001, trace = FALSE, MaxNWts = 1e5)
pred_val <- predict(fit_nnet, subdat)
– 這是想要做多元類別預測的方式:
subdat <- dat[!(dat[,'AMI'] %in% NA) & !(dat[,'GENDER'] %in% NA) & !(dat[,'AGE'] %in% NA), c('AMI', 'GENDER', 'AGE')]
subdat[,'GENDER'] <- as.factor(subdat[,'GENDER'])
subdat[,'AMI'] <- as.factor(subdat[,'AMI'])
fit_nnet <- nnet(AMI ~ GENDER + AGE, data = subdat, size = 5, maxit = 100, decay = 0.001, trace = FALSE, MaxNWts = 1e5)
pred_val <- predict(fit_nnet, subdat)
subdat <- dat[apply(is.na(dat[,c(-1, -3:-5)]), 1, sum) %in% 0, c(-1, -3:-5)]
subdat[,'GENDER'] <- as.factor(subdat[,'GENDER'])
set.seed(0)
all_idx <- 1:nrow(subdat)
train_idx <- sample(all_idx, nrow(subdat) * 0.6)
valid_idx <- sample(all_idx[!all_idx %in% train_idx], nrow(subdat) * 0.2)
test_idx <- all_idx[!all_idx %in% c(train_idx, valid_idx)]
train_dat <- subdat[train_idx,]
valid_dat <- subdat[valid_idx,]
test_dat <- subdat[test_idx,]
– 需要注意一下彈性網路的函數需要特定格式的輸入喔!
library(glmnet)
train_X <- model.matrix(~ ., data = train_dat[,-1])
train_X <- train_X[,-1]
train_Y <- train_dat[,1]
valid_X <- model.matrix(~ ., data = valid_dat[,-1])
valid_X <- valid_X[,-1]
valid_Y <- valid_dat[,1]
glmnet_fit <- cv.glmnet(x = train_X, y = train_Y, family = 'gaussian', alpha = 0.5, nfolds = 5)
pred_valid <- predict(glmnet_fit, valid_X, s = "lambda.1se")
cor(pred_valid, valid_Y)
## [,1]
## 1 0.333089
– 相關係數是0.333,再讓我們試試看MLP:
library(nnet)
fit_nnet <- nnet(K ~ ., data = train_dat, size = 100, maxit = 100, linout = TRUE, decay = 0.001, trace = FALSE, MaxNWts = 1e5)
pred_valid <- predict(fit_nnet, valid_dat)
cor(pred_valid, valid_Y)
## [,1]
## [1,] 0.3116847
– 相關係數是0.312,好像沒有比較好。
– 這是彈性網路,結果是0.375
## [,1]
## 1 0.3745448
– 這是多層感知機,結果是0.515
## [,1]
## [1,] 0.5152867
– 很可惜的是,並沒有方便的套件能幫助我們利用MLP做存活分析,這裡有待後續課程的學習。
– 現代的深度學習其實就是基於這個技術,但想要擴展至更深層的網路還需要更多的數學技術,相關突破會在下學期的課程繼續。
– 隨著模型變得更加複雜,「正則化」技術對我們更加重要了,資料科學實驗的完整流程是非常重要的,你必須調整好最佳的參數才能得到好的結果。
– 過度擬合(overfitting)是非線性模型面臨最重要的問題,我們下學期的課程會強調這個部份的技術!