林嶔 (Lin, Chin)
Lesson 6 卷積神經網路與轉移特徵學習
– 請在這裡下載MNIST的手寫數字資料,並讓我們了解一下這筆資料的結構
– 一個28×28的黑白圖片的其實可以被表示成784個介於0至255的數字,這樣我們就又能把問題轉換為單純的預測問題了。
library(data.table)
DAT = fread("data/MNIST.csv", data.table = FALSE)
DAT = data.matrix(DAT)
#Split data
set.seed(0)
Train.sample = sample(1:nrow(DAT), nrow(DAT)*0.6, replace = FALSE)
Train.X = DAT[Train.sample,-1]
Train.Y = DAT[Train.sample,1]
Test.X = DAT[-Train.sample,-1]
Test.Y = DAT[-Train.sample,1]
#Display
library(OpenImageR)
imageShow(t(matrix(as.numeric(Train.X[1,]), nrow = 28, byrow = TRUE)))
– 這個時候我們會面對到一個硬體的問題,那就是我們不可能預先把所有檔案都讀到RAM內。比較好的解決方法是每次訓練時只讀取小批量的訓練樣本,這樣就能有效降低記憶體的使用。
fwrite(x = data.table(cbind(Train.Y, Train.X)),
file = 'data/train_data.csv',
col.names = FALSE, row.names = FALSE)
fwrite(x = data.table(cbind(Test.Y, Test.X)),
file = 'data/test_data.csv',
col.names = FALSE, row.names = FALSE)
library(mxnet)
my_iterator_func <- setRefClass("Custom_Iter",
fields = c("iter", "data.csv", "data.shape", "batch.size"),
contains = "Rcpp_MXArrayDataIter",
methods = list(
initialize = function(iter, data.csv, data.shape, batch.size){
csv_iter <- mx.io.CSVIter(data.csv = data.csv, data.shape = data.shape, batch.size = batch.size)
.self$iter <- csv_iter
.self
},
value = function(){
val <- as.array(.self$iter$value()$data)
val.x <- val[-1,]
val.y <- t(model.matrix(~ -1 + factor(val[1,], levels = 0:9)))
val.y <- array(val.y, dim = c(10, ncol(val.x)))
dim(val.x) <- c(28, 28, 1, ncol(val.x))
val.x <- mx.nd.array(val.x)
val.y <- mx.nd.array(val.y)
list(data=val.x, label=val.y)
},
iter.next = function(){
.self$iter$iter.next()
},
reset = function(){
.self$iter$reset()
},
finalize=function(){
}
)
)
my_iter = my_iterator_func(iter = NULL, data.csv = 'data/train_data.csv', data.shape = 785, batch.size = 20)
## [1] TRUE
## [1] 0 0 0 0 0 0 0 0 1 0
– 但回到我們的手寫數字分類問題,當我們看到這些手寫數字時,我們一眼就能認出他們了,但從「圖片」到「概念」的過程真的這麼簡單嗎?
– 1962年時David H. Hubel與Torsten Wiesel共同發表了一篇研究:Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,這篇研究旨在探討生物視覺系統的運作方式,並獲得了1981年的Nobel prize
– 他們的研究發現,貓咪在受到不同形狀的圖像刺激時,感受野的腦部細胞會產生不同反應
– 卷積器模擬了感受野最初的細胞,他們負責用來辨認特定特徵,他們的數學模式如下:
– 「特徵圖」的意義是什麼呢?卷積器就像是最初級的視覺細胞,他們專門辨認某一種簡單特徵,那這個「特徵圖」上面數字越大的,就代表那個地方越符合該細胞所負責的特徵。
獲得特徵圖之後,還記得我們為了增加神經網路的數學複雜性,會添加一些非線性函數做轉換,因此卷積神經網路在經過卷積層後的特徵圖會再經過非線性轉換。
接著,由於連續卷積的特徵圖造成了訊息的重複,這時候我們經常會使用「池化層」(pooling layer)進行圖片降維,事實上他等同於把圖片的解析度調低,這同時也能節省計算量。
– 我們想像有一張人的圖片,假定第一個卷積器是辨認眼睛的特徵,第二個卷積器是在辨認鼻子的特徵,第三個卷積器是在辨認耳朵的特徵,第四個卷積器是在辨認手掌的特徵,第五個卷積器是在辨認手臂的特徵
– 第1.2.3張特徵圖中數值越高的地方,就分別代表眼睛、鼻子、耳朵最有可能在的位置,那將這3張特徵圖合在一起看再一次卷積,是否就能辨認出人臉的位置?
– 第4.5張特徵圖中數值越高的地方,就分別代表手掌、手臂最有可能在的位置,那將這2張特徵圖合在一起看再一次卷積,是否就能辨認出手的位置?
– 第4.5張特徵圖對人臉辨識同樣能起到作用,因為人臉不包含手掌、手臂,因此如果有個卷積器想要辨認人臉,他必須對第1.2.3張特徵圖做正向加權,而對第4.5張特徵圖做負向加權
– 儘管存在著形狀改變的問題,但卷積器的運算過程可以視為是一種線性運算,因此假若我們能將卷積的過程作一些轉換,那他的梯度與之前的線性轉換的部分很類似。
– 假定\(X\)是一個輸入矩陣,而\(W\)是一個卷積核,而\(O\)是卷積運算之後的輸出,那我們可以把整個過程轉換成類似這樣的形態(下面是一個最簡單的例子,假設輸入是3×3,而卷積核為2×2,輸出則2×2是)
\[\begin{align} O & = Conv(X, W) \\ O' &= X'W' \\\\ X & = \begin{pmatrix} x_{1,1} & x_{1,2} & x_{1,3} \\ x_{2,1} & x_{2,2} & x_{2,3} \\ x_{3,1} & x_{3,2} & x_{3,3} \end{pmatrix} \ \ \ \ \ X' = \begin{pmatrix} x_{1,1} & x_{1,2} & x_{2,1} & x_{2,2} \\ x_{2,1} & x_{2,2} & x_{3,1} & x_{3,2} \\ x_{1,2} & x_{1,3} & x_{2,2} & x_{2,3} \\ x_{2,2} & x_{2,3} & x_{3,2} & x_{3,3} \end{pmatrix} \\\\ W & = \begin{pmatrix} w_{1,1} & w_{1,2} \\ w_{2,1} & w_{2,2} \end{pmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ W' = \begin{pmatrix} w_{1,1} \\ w_{1,2} \\ w_{2,1} \\ w_{2,2} \end{pmatrix} \\\\ O & = \begin{pmatrix} o_{1,1} & o_{1,2} \\ o_{2,1} & o_{2,2} \end{pmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ O' = \begin{pmatrix} o_{1,1} = X'_{i,1}W' \\ o_{1,2} = X'_{i,2}W' \\ o_{2,1} = X'_{i,3}W' \\ o_{2,2} = X'_{i,4}W' \end{pmatrix} \end{align}\]
– 在這裡我們假設反向傳播的過程中已經取得了輸出\(O\)的梯度\(gard.O\),並從這裡開始往下:
\[\begin{align} \frac{\partial}{\partial W'} O' & = X' \\ \frac{\partial}{\partial X'} O' & = W' \\\\ gard.W' &= \frac{1}{n} \otimes (X')^T \bullet gard.O' \\ gard.X' &= gard.O' \bullet (W')^T \end{align}\]
– 我們先看「average pooling」的型態,假定\(X\)是一個輸入矩陣,而\(O\)是池化運算之後的輸出(池化步幅為1×1,池化核為2×2):
\[\begin{align} O & = Pool(X) \\\\ X & = \begin{pmatrix} x_{1,1} & x_{1,2} & x_{1,3} \\ x_{2,1} & x_{2,2} & x_{2,3} \\ x_{3,1} & x_{3,2} & x_{3,3} \end{pmatrix} \\ O & = \begin{pmatrix} o_{1,1} = mean(x_{1,1}, \ x_{1,2}, \ x_{2,1}, \ x_{2,2}) & o_{1,2} = mean(x_{1,2}, \ x_{1,3}, \ x_{2,2}, \ x_{2,3})\\ o_{2,1} = mean(x_{2,1}, \ x_{2,2}, \ x_{3,1}, \ x_{3,2}) & o_{2,2} = mean(x_{2,2}, \ x_{2,3}, \ x_{3,2}, \ x_{3,3}) \end{pmatrix} \end{align}\]
\[\begin{align} grad.O & = \begin{pmatrix} grad.o_{1,1} & grad.o_{1,2} \\ grad.o_{2,1} & grad.o_{2,2} \end{pmatrix} \\ grad.X & = \begin{pmatrix} grad.x_{1,1} = \frac{grad.o_{1,1}}{4} & grad.x_{1,2} = \frac{grad.o_{1,1} + grad.o_{1,2}}{4} & grad.x_{1,3} =\frac{grad.o_{1,2}}{4}\\ grad.x_{2,1} = \frac{grad.o_{1,1} + grad.o_{2,1}}{4} & grad.x_{2,2} = \frac{grad.o_{1,1} + grad.o_{2,1} + grad.o_{1,2} + grad.o_{2,2}}{4} & grad.x_{2,3} = \frac{grad.o_{1,2} + grad.o_{2,2}}{4} \\ grad.x_{3,1} = \frac{grad.o_{2,1}}{4} & grad.x_{3,2} = \frac{grad.o_{2,1} + grad.o_{2,2}}{4} & grad.x_{3,3} = \frac{grad.o_{2,2}}{4} \end{pmatrix} \end{align}\]
– 由於不存在數學符號表示最大的元素為何,這裡我們帶入一組真實數字來看看結果
\[\begin{align} O & = Pool(X) \\\\ X & = \begin{pmatrix} x_{1,1} & x_{1,2} & x_{1,3} \\ x_{2,1} & x_{2,2} & x_{2,3} \\ x_{3,1} & x_{3,2} & x_{3,3} \end{pmatrix} \ \ \ \ \ \ \ X = \begin{pmatrix} 5 & 3 & 4 \\ 8 & 1 & 2 \\ 6 & 7 & 9 \end{pmatrix} \\ O & = \begin{pmatrix} o_{1,1} = x_{2,1} & o_{1,2} = x_{1,3} \\ o_{2,1} = x_{2,1} & o_{2,2} = x_{3,3} \end{pmatrix} \end{align}\]
\[\begin{align} grad.O & = \begin{pmatrix} grad.o_{1,1} & grad.o_{1,2} \\ grad.o_{2,1} & grad.o_{2,2} \end{pmatrix} \\ grad.X & = \begin{pmatrix} grad.x_{1,1} = 0 & grad.x_{1,2} = 0 & grad.x_{1,3} = grad.o_{1,2} \\ grad.x_{2,1} = grad.o_{1,1} + grad.o_{2,1} & grad.x_{2,2} = 0 & grad.x_{2,3} = 0 \\ grad.x_{3,1} = 0 & grad.x_{3,2} = 0 & grad.x_{3,3} = grad.o_{2,2} \end{pmatrix} \end{align}\]
my.model.FeedForward.create = function (Iterator, ctx = mx.cpu(), save.grad = FALSE,
loss_symbol, pred_symbol,
Optimizer, num_round = 30) {
require(abind)
#0. Check data shape
Iterator$reset()
Iterator$iter.next()
my_values <- Iterator$value()
input_shape <- lapply(my_values, dim)
batch_size <- tail(input_shape[[1]], 1)
#1. Build an executor to train model
exec_list = list(symbol = loss_symbol, ctx = ctx, grad.req = "write")
exec_list = append(exec_list, input_shape)
my_executor = do.call(mx.simple.bind, exec_list)
#2. Set the initial parameters
mx.set.seed(0)
new_arg = mxnet:::mx.model.init.params(symbol = loss_symbol,
input.shape = input_shape,
output.shape = NULL,
initializer = mxnet:::mx.init.uniform(0.01),
ctx = ctx)
mx.exec.update.arg.arrays(my_executor, new_arg$arg.params, match.name = TRUE)
mx.exec.update.aux.arrays(my_executor, new_arg$aux.params, match.name = TRUE)
#3. Define the updater
my_updater = mx.opt.get.updater(optimizer = Optimizer, weights = my_executor$ref.arg.arrays)
#4. Forward/Backward
message('Start training:')
set.seed(0)
if (save.grad) {epoch_grad = NULL}
for (i in 1:num_round) {
Iterator$reset()
batch_loss = list()
if (save.grad) {batch_grad = list()}
batch_seq = 0
t0 = Sys.time()
while (Iterator$iter.next()) {
my_values <- Iterator$value()
mx.exec.update.arg.arrays(my_executor, arg.arrays = my_values, match.name = TRUE)
mx.exec.forward(my_executor, is.train = TRUE)
mx.exec.backward(my_executor)
update_args = my_updater(weight = my_executor$ref.arg.arrays, grad = my_executor$ref.grad.arrays)
mx.exec.update.arg.arrays(my_executor, update_args, skip.null = TRUE)
batch_loss[[length(batch_loss) + 1]] = as.array(my_executor$ref.outputs[[1]])
if (save.grad) {
grad_list = sapply(my_executor$ref.grad.arrays, function (x) {if (!is.null(x)) {mean(abs(as.array(x)))}})
grad_list = unlist(grad_list[grepl('weight', names(grad_list), fixed = TRUE)])
batch_grad[[length(batch_grad) + 1]] = grad_list
}
batch_seq = batch_seq + 1
}
message(paste0("epoch = ", i,
": loss = ", formatC(mean(unlist(batch_loss)), format = "f", 4),
" (Speed: ", formatC(batch_seq * batch_size/as.numeric(Sys.time() - t0, units = 'secs'), format = "f", 2), " sample/secs)"))
if (save.grad) {epoch_grad = rbind(epoch_grad, apply(abind(batch_grad, along = 2), 1, mean))}
}
if (save.grad) {
epoch_grad[epoch_grad < 1e-8] = 1e-8
COL = rainbow(ncol(epoch_grad))
random_pos = 2^runif(ncol(epoch_grad), -0.5, 0.5)
plot(epoch_grad[,1] * random_pos[1], type = 'l', col = COL[1],
xlab = 'epoch', ylab = 'mean of abs(grad)', log = 'y',
ylim = range(epoch_grad))
for (i in 2:ncol(epoch_grad)) {lines(1:nrow(epoch_grad), epoch_grad[,i] * random_pos[i], col = COL[i])}
legend('topright', paste0('layer', 1:ncol(epoch_grad), '_weight'), col = COL, lwd = 1)
}
#5. Get model
my_model <- mxnet:::mx.model.extract.model(symbol = pred_symbol,
train.execs = list(my_executor))
return(my_model)
}
– 這是一個閹割版的LeNet,原版的LeNet其第一、二層的卷積器數量分別是20以及50,而第一個全連接層具有500個神經元。這樣一個閹割版的小網路他的總參數量將與剛剛我們所使用的5隱藏層多層感知器相當。
# input
data <- mx.symbol.Variable('data')
# first conv
conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=10, name = 'conv1')
relu1 <- mx.symbol.Activation(data=conv1, act_type="relu")
pool1 <- mx.symbol.Pooling(data=relu1, pool_type="max",
kernel=c(2,2), stride=c(2,2))
# second conv
conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=20, name = 'conv2')
relu2 <- mx.symbol.Activation(data=conv2, act_type="relu")
pool2 <- mx.symbol.Pooling(data=relu2, pool_type="max",
kernel=c(2,2), stride=c(2,2))
# first fullc
flatten <- mx.symbol.Flatten(data=pool2)
fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=150, name = 'fc1')
relu3 <- mx.symbol.Activation(data=fc1, act_type="relu")
# second fullc
fc2 <- mx.symbol.FullyConnected(data=relu3, num_hidden=10, name = 'fc2')
# Softmax
lenet <- mx.symbol.softmax(data = fc2, axis = 1, name = 'lenet')
# m-log loss
label = mx.symbol.Variable(name = 'label')
eps = 1e-8
m_log = 0 - mx.symbol.mean(mx.symbol.broadcast_mul(mx.symbol.log(lenet + eps), label))
m_logloss = mx.symbol.MakeLoss(m_log, name = 'm_logloss')
– 第一層卷積組合
原始圖片(28x28x1)要先經過10個5x5的「卷積器」(5x5x1x10)處理,將使圖片變成10張「一階特徵圖」(24x24x10)
接著這10張「一階特徵圖」(24x24x10)會經過ReLU,產生10張「轉換後的一階特徵圖」(24x24x10)
接著這10張「轉換後的一階特徵圖」(24x24x10)再經過2x2「池化器」(2x2)處理,將使圖片變成10張「降維後的一階特徵圖」(12x12x10)
– 第二層卷積組合
再將10張「降維後的一階特徵圖」(12x12x10)經過20個5x5的「卷積器」(5x5x10x20)處理,將使圖片變成20張「二階特徵圖」(8x8x20)
接著這20張「二階特徵圖」(8x8x20)會經過ReLU,產生20張「轉換後的二階特徵圖」(8x8x20)
接著這20張「轉換後的二階特徵圖」(8x8x20)再經過2x2「池化器」(2x2)處理,將使圖片變成20張「降維後的二階特徵圖」(4x4x20)
– 全連接層
將「降維後的二階特徵圖」(4x4x20)重新排列,壓製成「一階高級特徵」(320)
讓「一階高級特徵」(320)進入「隱藏層」,輸出「二階高級特徵」(150)
「二階高級特徵」(150)經過ReLU,輸出「轉換後的二階高級特徵」(150)
「轉換後的二階高級特徵」(150)進入「輸出層」,產生「原始輸出」(10)
「原始輸出」(10)經過Softmax函數轉換,判斷圖片是哪個類別
my_optimizer = mx.opt.create(name = "adam", learning.rate = 0.001, beta1 = 0.9, beta2 = 0.999,
epsilon = 1e-08, wd = 0)
lenet_model = my.model.FeedForward.create(Iterator = my_iter, ctx = mx.gpu(), save.grad = TRUE,
loss_symbol = m_logloss, pred_symbol = lenet,
Optimizer = my_optimizer, num_round = 20)
library(data.table)
DAT = fread("data/test_data.csv", data.table = FALSE)
DAT = data.matrix(DAT)
Test.X = t(DAT[,-1])
dim(Test.X) = c(28, 28, 1, ncol(Test.X))
Test.Y = DAT[,1]
predict_Y = predict(lenet_model, Test.X)
confusion_table = table(max.col(t(predict_Y)), Test.Y)
cat("Testing accuracy rate =", sum(diag(confusion_table))/sum(confusion_table))
## Testing accuracy rate = 0.9835119
## Test.Y
## 0 1 2 3 4 5 6 7 8 9
## 1 1658 2 2 0 3 5 10 1 4 4
## 2 1 1832 5 1 4 0 1 2 4 2
## 3 1 4 1630 2 2 0 1 7 4 1
## 4 0 0 7 1711 1 5 1 4 1 1
## 5 0 2 1 0 1560 1 1 2 4 16
## 6 2 1 0 7 0 1527 6 1 7 1
## 7 0 1 2 0 1 2 1636 0 3 0
## 8 1 6 3 3 6 1 0 1719 0 7
## 9 0 2 6 16 4 4 5 4 1645 5
## 10 0 1 0 2 25 6 0 13 3 1605
## [1] "conv1_bias" "conv1_weight" "conv2_bias" "conv2_weight" "fc1_bias"
## [6] "fc1_weight" "fc2_bias" "fc2_weight"
– 第一層卷積組合
原始圖片(28x28x1)要先經過10個5x5的「卷積器」(5x5x1x10)處理,將使圖片變成10張「一階特徵圖」(24x24x10)
接著這10張「一階特徵圖」(24x24x10)會經過ReLU,產生10張「轉換後的一階特徵圖」(24x24x10)
接著這10張「轉換後的一階特徵圖」(24x24x10)再經過2x2「池化器」(2x2)處理,將使圖片變成10張「降維後的一階特徵圖」(12x12x10)
– 第二層卷積組合
再將10張「降維後的一階特徵圖」(12x12x10)經過20個5x5的「卷積器」(5x5x10x20)處理,將使圖片變成20張「二階特徵圖」(8x8x520)
接著這20張「二階特徵圖」(8x8x20)會經過ReLU,產生20張「轉換後的二階特徵圖」(8x8x20)
接著這20張「轉換後的二階特徵圖」(8x8x20)再經過2x2「池化器」(2x2)處理,將使圖片變成20張「降維後的二階特徵圖」(4x4x20)
– 全連接層
將「降維後的二階特徵圖」(4x4x20)重新排列,壓製成「一階高級特徵」(320)
讓「一階高級特徵」(320)進入「隱藏層」,輸出「二階高級特徵」(150)
「二階高級特徵」(150)經過ReLU,輸出「轉換後的二階高級特徵」(150)
「轉換後的二階高級特徵」(150)進入「輸出層」,產生「原始輸出」(10)
「原始輸出」(10)經過Softmax函數轉換,判斷圖片是哪個類別
Input = Test.X[,,,1]
dim(Input) = c(28, 28, 1, 1)
preds = predict(lenet_model, Input)
pred.label = max.col(t(preds)) - 1
par(mar=rep(0,4))
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
img = as.raster(t(matrix(as.numeric(Input)/255, nrow = 28)))
rasterImage(img, -0.04, -0.04, 1.04, 1.04, interpolate=FALSE)
text(0.05, 0.95, Test.Y[1], col = "green", cex = 2)
text(0.95, 0.95, pred.label, col = "blue", cex = 2)
PARAMS = lenet_model$arg.params
Input = Test.X[,,,1]
dim(Input) = c(28, 28, 1, 1)
Conv1_out = array(0, dim = c(24, 24, 10))
for (i in 1:10) {
for (j in 1:24) {
for (k in 1:24) {
Conv1_out[j,k,i] <- sum(Input[j+(0:4),k+(0:4),,] * as.array(PARAMS$conv1_weight)[,,,i]) + as.array(PARAMS$conv1_bias)[i]
}
}
}
ReLU1_out = Conv1_out
ReLU1_out[ReLU1_out < 0] = 0
Pool1_out = array(0, dim = c(12, 12, 10))
for (i in 1:10) {
for (j in 1:12) {
for (k in 1:12) {
Pool1_out[j,k,i] <- max(ReLU1_out[j*2+(-1:0),k*2+(-1:0),i])
}
}
}
Conv2_out = array(0, dim = c(8, 8, 20))
for (i in 1:20) {
for (j in 1:8) {
for (k in 1:8) {
Conv2_out[j,k,i] <- sum(Pool1_out[j+(0:4),k+(0:4),] * as.array(PARAMS$conv2_weight)[,,,i]) + as.array(PARAMS$conv2_bias)[i]
}
}
}
ReLU2_out = Conv2_out
ReLU2_out[ReLU2_out < 0] = 0
Pool2_out = array(0, dim = c(4, 4, 20))
for (i in 1:20) {
for (j in 1:4) {
for (k in 1:4) {
Pool2_out[j,k,i] <- max(ReLU2_out[j*2+(-1:0),k*2+(-1:0),i])
}
}
}
Flatten_out = as.numeric(Pool2_out)
fc1_out = Flatten_out %*% as.array(PARAMS$fc1_weight) + as.array(PARAMS$fc1_bias)
ReLU3_out = fc1_out
ReLU3_out[ReLU3_out < 0] = 0
fc2_out = ReLU3_out %*% as.array(PARAMS$fc2_weight) + as.array(PARAMS$fc2_bias)
Softmax_out = exp(fc2_out)/sum(exp(fc2_out))
all.equal(preds, t(Softmax_out))
## [1] TRUE
– 史丹佛大學的李飛飛從2007年創辦ImageNet,收集大量帶有標註信息的圖片數據供電腦視覺模型訓練,至今為止這個資料庫已有上百萬張圖片
– 讓我們下載resnet-50 .params及resnet-50 symbol兩個檔案,這是先前提過基於何愷明的Residual Learning的研究所訓練出來的50層深的深度神經網路
– 另外再請你下載chinese synset.txt,這描述了這個模型輸出的1000個類別分別是甚麼。
– 通常這些在ImageNet上訓練的模型,都是以224x224的圖像作為訓練:
library(OpenImageR)
img<- readImage('test.jpg')
resized_img <- resizeImage(img, 224, 224, method = 'bilinear')
imageShow(resized_img)
– 讓我們來進行預測,注意要先將圖像改成MxNet所接受的維度:
dim(resized_img) <- c(dim(resized_img), 1)
pred_prob <- predict(res_model, resized_img)
pred_prob <- as.numeric(pred_prob)
names(pred_prob) <- synsets
pred_prob <- sort(pred_prob, decreasing = TRUE)
pred_prob <- formatC(pred_prob, 4, format = 'f')
head(pred_prob, 5)
## n01484850 大白鯊 n01491361 虎鯊
## "0.9971" "0.0027"
## n01494475 鎚頭鯊 n02071294 殺人鯨,逆戟鯨,虎鯨
## "0.0001" "0.0001"
## n02066245 灰鯨
## "0.0000"
– 而剩下的兩個問題中過度擬合問題有眾多可行的解決方案,或者是我們可以取得更大量的資料解決問題。然而權重初始化問題一直沒有辦法被解決。
– 這個想法稱作轉移特徵學習(Transfer learning),而這個想法是基於人類通常具有舉一反三的能力,舉例來說一個剛入學的醫學系學生他們僅有接受過高中程度的基礎訓練,並未接受過任何醫學專業領域的訓練,但他們的學習因為是基於高中的基礎之上,因此即使醫學專業相當艱深也能相當快的學會。
至於為什麼這樣會成功?這主要是因為人們發現在深度神經網路較淺層的部分,通常只能辨認線條、區塊等基礎特徵,所以無論是用什麼資料訓練網路在前面的部分都是一樣的。而通常要回答一張圖片是什麼,這樣的功能主要是在網路的後端再進行分類,因此在一個比較理想的狀況之下,預先用大資料可以訓練網路較淺層的部分,而之後的小資料能夠協助網路調整後面幾層的權重。
讓我們做個小實驗來看看,提取剛剛的resnet-50的卷積器的權重並畫出來,這是64個最淺層的卷積器希望抓取的特徵:
library(imager)
par(mar=rep(0,4), mfrow = c(8, 8))
for (i in 1:64) {
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.cimg(as.array(res_model$arg.params$conv0_weight)[,,,i]), 0, 0, 1, 1, interpolate=FALSE)
}
– 讓我們到這裡下載其中的100張貓以及100張狗,最後再用這個分類器預測裡面貓狗各5張測試圖片。
– 先讓我們示範一次如何使用轉移特徵學習進行模型訓練,我們試著以「底層執行器」來編寫。
library(OpenImageR)
in_dir <- 'Dogs vs. Cats/'
out_dir <- 'processed/'
dir.create(out_dir)
img_paths <- list.files(in_dir)
for (i in 1:length(img_paths)) {
img <- readImage(paste0(in_dir, img_paths[i]))
resized_img <- resizeImage(img, 224, 224, method = 'bilinear')
save(resized_img, file = paste0(out_dir, gsub('.jpg', '.RData', img_paths[i])))
}
library(mxnet)
data_dir <- 'processed/'
img_paths <- list.files(data_dir)
# Iterator
my_iterator_core = function (batch_size) {
batch = 0
batch_per_epoch = length(img_paths)/batch_size
reset = function() {batch <<- 0}
iter.next = function() {
batch <<- batch+1
if (batch > batch_per_epoch) {return(FALSE)} else {return(TRUE)}
}
value = function() {
idx <- sample(1:length(img_paths), batch_size)
X.array <- array(0, dim = c(224, 224, 3, batch_size))
for (i in 1:batch_size) {
load(paste0(data_dir, img_paths[idx[i]]))
X.array[,,,i] <- resized_img
}
Y.array <- array(0, dim = c(2, batch_size))
Y.array[1, grepl('cat', img_paths[idx])] <- 1
Y.array[2, grepl('dog', img_paths[idx])] <- 1
data = mx.nd.array(X.array)
label = mx.nd.array(Y.array)
return(list(data = data, label = label))
}
return(list(reset = reset, iter.next = iter.next, value = value, batch_size = batch_size, batch = batch))
}
my_iterator_func <- setRefClass("Custom_Iter",
fields = c("iter", "batch_size"),
contains = "Rcpp_MXArrayDataIter",
methods = list(
initialize = function(iter, batch_size = 100){
.self$iter <- my_iterator_core(batch_size = batch_size)
.self
},
value = function(){
.self$iter$value()
},
iter.next = function(){
.self$iter$iter.next()
},
reset = function(){
.self$iter$reset()
},
finalize=function(){
}
)
)
my_iter = my_iterator_func(iter = NULL, batch_size = 20)
## [1] TRUE
## [1] 0 1
library(mxnet)
library(magrittr)
# Read Pre-training Model
res_model = mx.model.load("model/resnet-50", 0)
# Get symbol
all_layers = res_model$symbol$get.internals()
flatten0_output = which(all_layers$outputs == 'flatten0_output') %>% all_layers$get.output()
# Define Model Architecture
fc1 <- mx.symbol.FullyConnected(data = flatten0_output, num_hidden = 2, name = 'fc1')
softmax <- mx.symbol.softmax(data = fc1, axis = 1, name = 'softmax')
label = mx.symbol.Variable(name = 'label')
eps = 1e-8
m_log = 0 - mx.symbol.mean(mx.symbol.broadcast_mul(mx.symbol.log(softmax + eps), label))
m_logloss = mx.symbol.MakeLoss(m_log, name = 'm_logloss')
new_arg <- mxnet:::mx.model.init.params(symbol = softmax,
input.shape = list(data = c(224, 224, 3, 32)),
output.shape = NULL,
initializer = mxnet:::mx.init.uniform(0.01),
ctx = mx.cpu())
for (i in 1:length(new_arg$arg.params)) {
pos <- which(names(res_model$arg.params) == names(new_arg$arg.params)[i])
if (length(pos) == 1) {
if (all.equal(dim(res_model$arg.params[[pos]]), dim(new_arg$arg.params[[i]])) == TRUE) {
new_arg$arg.params[[i]] <- res_model$arg.params[[pos]]
}
}
}
for (i in 1:length(new_arg$aux.params)) {
pos <- which(names(res_model$aux.params) == names(new_arg$aux.params)[i])
if (length(pos) == 1) {
if (all.equal(dim(res_model$aux.params[[pos]]), dim(new_arg$aux.params[[i]])) == TRUE) {
new_arg$aux.params[[i]] <- res_model$aux.params[[pos]]
}
}
}
#1. Build an executor to train model
my_executor = mx.simple.bind(symbol = m_logloss,
data = c(224, 224, 3, 20), label = c(2, 20),
ctx = mx.gpu(), grad.req = "write")
#2. Set the initial parameters
mx.exec.update.arg.arrays(my_executor, new_arg$arg.params, match.name = TRUE)
mx.exec.update.aux.arrays(my_executor, new_arg$aux.params, match.name = TRUE)
#3. Define the updater
my_updater = mx.opt.get.updater(optimizer = my_optimizer, weights = my_executor$ref.arg.arrays)
for (i in 1:20) {
my_iter$reset()
batch_loss = NULL
while (my_iter$iter.next()) {
my_values <- my_iter$value()
mx.exec.update.arg.arrays(my_executor, arg.arrays = my_values, match.name = TRUE)
mx.exec.forward(my_executor, is.train = TRUE)
mx.exec.backward(my_executor)
update_args = my_updater(weight = my_executor$ref.arg.arrays, grad = my_executor$ref.grad.arrays)
mx.exec.update.arg.arrays(my_executor, update_args, skip.null = TRUE)
batch_loss = c(batch_loss, as.array(my_executor$ref.outputs$m_logloss_output))
}
message(paste0("epoch = ", i, ": m-logloss = ", formatC(mean(batch_loss), format = "f", 4)))
}
# Get model
dog_cat_model <- mxnet:::mx.model.extract.model(symbol = softmax,
train.execs = list(my_executor))
# Predict & Display
par(mar=rep(0,4), mfcol = c(2, 5))
for (i in 1:5) {
plot(NA, xlim = c(0.04, 0.96), ylim = c(0.04, 0.96), xaxt = "n", yaxt = "n", bty = "n")
cat_img <- readImage(paste0('test_cat.', i, '.jpg'))
norm_cat_img <- resizeImage(cat_img, 224, 224, method = 'bilinear')
dim(norm_cat_img) <- c(224, 224, 3, 1)
rasterImage(cat_img, 0, 0, 1, 1, interpolate=FALSE)
prob <- predict(dog_cat_model, X = norm_cat_img, ctx = mx.gpu())
text(0.5, 0.95, formatC(prob[1,1], 3, format = 'f'), col = "green", cex = 2)
plot(NA, xlim = c(0.04, 0.96), ylim = c(0.04, 0.96), xaxt = "n", yaxt = "n", bty = "n")
dog_img <- readImage(paste0('test_dog.', i, '.jpg'))
norm_dog_img <- resizeImage(dog_img, 224, 224, method = 'bilinear')
dim(norm_dog_img) <- c(224, 224, 3, 1)
rasterImage(dog_img, 0, 0, 1, 1, interpolate=FALSE)
prob <- predict(dog_cat_model, X = norm_dog_img, ctx = mx.gpu())
text(0.5, 0.95, formatC(prob[1,1], 3, format = 'f'), col = "green", cex = 2)
}
– 你也可以換幾個模型,不一定要使用resnet-50。
– 多嘗試一些作法,並試著修改Iterator或是Optimizer。
new_arg <- mxnet:::mx.model.init.params(symbol = softmax,
input.shape = list(data = c(224, 224, 3, 32)),
output.shape = NULL,
initializer = mxnet:::mx.init.uniform(0.01),
ctx = mx.cpu())
#1. Build an executor to train model
my_executor = mx.simple.bind(symbol = m_logloss,
data = c(224, 224, 3, 20), label = c(2, 20),
ctx = mx.gpu(), grad.req = "write")
#2. Set the initial parameters
mx.exec.update.arg.arrays(my_executor, new_arg$arg.params, match.name = TRUE)
mx.exec.update.aux.arrays(my_executor, new_arg$aux.params, match.name = TRUE)
#3. Define the updater
my_updater = mx.opt.get.updater(optimizer = my_optimizer, weights = my_executor$ref.arg.arrays)
for (i in 1:20) {
my_iter$reset()
batch_loss = NULL
while (my_iter$iter.next()) {
my_values <- my_iter$value()
mx.exec.update.arg.arrays(my_executor, arg.arrays = my_values, match.name = TRUE)
mx.exec.forward(my_executor, is.train = TRUE)
mx.exec.backward(my_executor)
update_args = my_updater(weight = my_executor$ref.arg.arrays, grad = my_executor$ref.grad.arrays)
mx.exec.update.arg.arrays(my_executor, update_args, skip.null = TRUE)
batch_loss = c(batch_loss, as.array(my_executor$ref.outputs$m_logloss_output))
}
message(paste0("epoch = ", i, ": m-logloss = ", formatC(mean(batch_loss), format = "f", 4)))
}
他會考慮各像素之間真實的相關性,而非像多層感知器一樣把每個像素視為完全獨立的特徵。這一點可以參考人類的視覺系統,我想你應該能認同你的眼睛具有平移不變性(shift invariance)的特色,因此CNN因為完美的模仿了視覺系統造就了在測試集中的高準確性。
CNN由於權重共享的特性,導致相當的節省參數量。從過擬合的角度思考,這暗示著我們可以用較小的參數量完成複雜的網路,因此較不容易過擬合;從參數量固定的狀況下思考,CNN可以拼湊出較為複雜的結構。因此CNN的結構在影像辨識上具有相當強大的優勢!
另外,深度學習的三大經典理論問題(過度擬合問題、梯度消失問題、權重初始化問題)我們都已經大致上學會了該如何應對,並且我們都已經有一些基礎的能力編寫程式訓練出一個AI模型進行圖像分類。
值得一提的是在這次最後的練習中(貓狗分類任務),你首次面對了較接近真實世界的資料,而非精心整理過的範例資料。
– 你是否開始覺得訓練神經網路其實很簡單?說簡單也很簡單,但說難也很難,你應該還根本不知道你讀取的模型到底裡面長什麼樣子,並且還有很多訓練技巧有待學習。
– 學習AI最有效的方法就是參與資料科學競賽,歡迎大家使用我們的資料科學競賽網站!