深度學習理論與實務

林嶔 (Lin, Chin)

Lesson 6 卷積神經網路與轉移特徵學習

第一節:圖像辨識基礎(1)

F01

第一節:圖像辨識基礎(2)

– 請在這裡下載MNIST的手寫數字資料,並讓我們了解一下這筆資料的結構

– 一個28×28的黑白圖片的其實可以被表示成784個介於0至255的數字,這樣我們就又能把問題轉換為單純的預測問題了。

library(data.table)

DAT = fread("data/MNIST.csv", data.table = FALSE)
DAT = data.matrix(DAT)

#Split data

set.seed(0)
Train.sample = sample(1:nrow(DAT), nrow(DAT)*0.6, replace = FALSE)

Train.X = DAT[Train.sample,-1]
Train.Y = DAT[Train.sample,1]
Test.X = DAT[-Train.sample,-1]
Test.Y = DAT[-Train.sample,1]

#Display

library(OpenImageR)

imageShow(t(matrix(as.numeric(Train.X[1,]), nrow = 28, byrow = TRUE)))

第一節:圖像辨識基礎(3)

– 這個時候我們會面對到一個硬體的問題,那就是我們不可能預先把所有檔案都讀到RAM內。比較好的解決方法是每次訓練時只讀取小批量的訓練樣本,這樣就能有效降低記憶體的使用。

fwrite(x = data.table(cbind(Train.Y, Train.X)),
       file = 'data/train_data.csv',
       col.names = FALSE, row.names = FALSE)

fwrite(x = data.table(cbind(Test.Y, Test.X)),
       file = 'data/test_data.csv',
       col.names = FALSE, row.names = FALSE)
library(mxnet)

my_iterator_func <- setRefClass("Custom_Iter",
                                fields = c("iter", "data.csv", "data.shape", "batch.size"),
                                contains = "Rcpp_MXArrayDataIter",
                                methods = list(
                                  initialize = function(iter, data.csv, data.shape, batch.size){
                                    csv_iter <- mx.io.CSVIter(data.csv = data.csv, data.shape = data.shape, batch.size = batch.size)
                                    .self$iter <- csv_iter
                                    .self
                                  },
                                  value = function(){
                                    val <- as.array(.self$iter$value()$data)
                                    val.x <- val[-1,]
                                    val.y <- t(model.matrix(~ -1 + factor(val[1,], levels = 0:9)))
                                    val.y <- array(val.y, dim = c(10, ncol(val.x)))
                                    dim(val.x) <- c(28, 28, 1, ncol(val.x))
                                    val.x <- mx.nd.array(val.x)
                                    val.y <- mx.nd.array(val.y)
                                    list(data=val.x, label=val.y)
                                  },
                                  iter.next = function(){
                                    .self$iter$iter.next()
                                  },
                                  reset = function(){
                                    .self$iter$reset()
                                  },
                                  finalize=function(){
                                  }
                                )
)

my_iter = my_iterator_func(iter = NULL,  data.csv = 'data/train_data.csv', data.shape = 785, batch.size = 20)

第一節:圖像辨識基礎(4)

my_iter$reset()
my_iter$iter.next()
## [1] TRUE
my_value = my_iter$value()
print(as.array(my_value$label)[,1])
##  [1] 0 0 0 0 0 0 0 0 1 0
imageShow(as.array(my_value$data)[,,,1])

第二節:卷積神經網路介紹(1)

– 但回到我們的手寫數字分類問題,當我們看到這些手寫數字時,我們一眼就能認出他們了,但從「圖片」到「概念」的過程真的這麼簡單嗎?

– 1962年時David H. Hubel與Torsten Wiesel共同發表了一篇研究:Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,這篇研究旨在探討生物視覺系統的運作方式,並獲得了1981年的Nobel prize

F02

– 他們的研究發現,貓咪在受到不同形狀的圖像刺激時,感受野的腦部細胞會產生不同反應

F03

第二節:卷積神經網路介紹(2)

– 卷積器模擬了感受野最初的細胞,他們負責用來辨認特定特徵,他們的數學模式如下:

F04

– 「特徵圖」的意義是什麼呢?卷積器就像是最初級的視覺細胞,他們專門辨認某一種簡單特徵,那這個「特徵圖」上面數字越大的,就代表那個地方越符合該細胞所負責的特徵。

F05

第二節:卷積神經網路介紹(3)

F06

F07

第二節:卷積神經網路介紹(4)

– 我們想像有一張人的圖片,假定第一個卷積器是辨認眼睛的特徵,第二個卷積器是在辨認鼻子的特徵,第三個卷積器是在辨認耳朵的特徵,第四個卷積器是在辨認手掌的特徵,第五個卷積器是在辨認手臂的特徵

– 第1.2.3張特徵圖中數值越高的地方,就分別代表眼睛、鼻子、耳朵最有可能在的位置,那將這3張特徵圖合在一起看再一次卷積,是否就能辨認出人臉的位置?

– 第4.5張特徵圖中數值越高的地方,就分別代表手掌、手臂最有可能在的位置,那將這2張特徵圖合在一起看再一次卷積,是否就能辨認出的位置?

– 第4.5張特徵圖對人臉辨識同樣能起到作用,因為人臉不包含手掌、手臂,因此如果有個卷積器想要辨認人臉,他必須對第1.2.3張特徵圖做正向加權,而對第4.5張特徵圖做負向加權

F08

第二節:卷積神經網路介紹(5)

– 儘管存在著形狀改變的問題,但卷積器的運算過程可以視為是一種線性運算,因此假若我們能將卷積的過程作一些轉換,那他的梯度與之前的線性轉換的部分很類似。

– 假定\(X\)是一個輸入矩陣,而\(W\)是一個卷積核,而\(O\)是卷積運算之後的輸出,那我們可以把整個過程轉換成類似這樣的形態(下面是一個最簡單的例子,假設輸入是3×3,而卷積核為2×2,輸出則2×2是)

\[\begin{align} O & = Conv(X, W) \\ O' &= X'W' \\\\ X & = \begin{pmatrix} x_{1,1} & x_{1,2} & x_{1,3} \\ x_{2,1} & x_{2,2} & x_{2,3} \\ x_{3,1} & x_{3,2} & x_{3,3} \end{pmatrix} \ \ \ \ \ X' = \begin{pmatrix} x_{1,1} & x_{1,2} & x_{2,1} & x_{2,2} \\ x_{2,1} & x_{2,2} & x_{3,1} & x_{3,2} \\ x_{1,2} & x_{1,3} & x_{2,2} & x_{2,3} \\ x_{2,2} & x_{2,3} & x_{3,2} & x_{3,3} \end{pmatrix} \\\\ W & = \begin{pmatrix} w_{1,1} & w_{1,2} \\ w_{2,1} & w_{2,2} \end{pmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ W' = \begin{pmatrix} w_{1,1} \\ w_{1,2} \\ w_{2,1} \\ w_{2,2} \end{pmatrix} \\\\ O & = \begin{pmatrix} o_{1,1} & o_{1,2} \\ o_{2,1} & o_{2,2} \end{pmatrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ O' = \begin{pmatrix} o_{1,1} = X'_{i,1}W' \\ o_{1,2} = X'_{i,2}W' \\ o_{2,1} = X'_{i,3}W' \\ o_{2,2} = X'_{i,4}W' \end{pmatrix} \end{align}\]

– 在這裡我們假設反向傳播的過程中已經取得了輸出\(O\)的梯度\(gard.O\),並從這裡開始往下:

\[\begin{align} \frac{\partial}{\partial W'} O' & = X' \\ \frac{\partial}{\partial X'} O' & = W' \\\\ gard.W' &= \frac{1}{n} \otimes (X')^T \bullet gard.O' \\ gard.X' &= gard.O' \bullet (W')^T \end{align}\]

第二節:卷積神經網路介紹(6)

– 我們先看「average pooling」的型態,假定\(X\)是一個輸入矩陣,而\(O\)是池化運算之後的輸出(池化步幅為1×1,池化核為2×2):

\[\begin{align} O & = Pool(X) \\\\ X & = \begin{pmatrix} x_{1,1} & x_{1,2} & x_{1,3} \\ x_{2,1} & x_{2,2} & x_{2,3} \\ x_{3,1} & x_{3,2} & x_{3,3} \end{pmatrix} \\ O & = \begin{pmatrix} o_{1,1} = mean(x_{1,1}, \ x_{1,2}, \ x_{2,1}, \ x_{2,2}) & o_{1,2} = mean(x_{1,2}, \ x_{1,3}, \ x_{2,2}, \ x_{2,3})\\ o_{2,1} = mean(x_{2,1}, \ x_{2,2}, \ x_{3,1}, \ x_{3,2}) & o_{2,2} = mean(x_{2,2}, \ x_{2,3}, \ x_{3,2}, \ x_{3,3}) \end{pmatrix} \end{align}\]

\[\begin{align} grad.O & = \begin{pmatrix} grad.o_{1,1} & grad.o_{1,2} \\ grad.o_{2,1} & grad.o_{2,2} \end{pmatrix} \\ grad.X & = \begin{pmatrix} grad.x_{1,1} = \frac{grad.o_{1,1}}{4} & grad.x_{1,2} = \frac{grad.o_{1,1} + grad.o_{1,2}}{4} & grad.x_{1,3} =\frac{grad.o_{1,2}}{4}\\ grad.x_{2,1} = \frac{grad.o_{1,1} + grad.o_{2,1}}{4} & grad.x_{2,2} = \frac{grad.o_{1,1} + grad.o_{2,1} + grad.o_{1,2} + grad.o_{2,2}}{4} & grad.x_{2,3} = \frac{grad.o_{1,2} + grad.o_{2,2}}{4} \\ grad.x_{3,1} = \frac{grad.o_{2,1}}{4} & grad.x_{3,2} = \frac{grad.o_{2,1} + grad.o_{2,2}}{4} & grad.x_{3,3} = \frac{grad.o_{2,2}}{4} \end{pmatrix} \end{align}\]

第二節:卷積神經網路介紹(7)

– 由於不存在數學符號表示最大的元素為何,這裡我們帶入一組真實數字來看看結果

\[\begin{align} O & = Pool(X) \\\\ X & = \begin{pmatrix} x_{1,1} & x_{1,2} & x_{1,3} \\ x_{2,1} & x_{2,2} & x_{2,3} \\ x_{3,1} & x_{3,2} & x_{3,3} \end{pmatrix} \ \ \ \ \ \ \ X = \begin{pmatrix} 5 & 3 & 4 \\ 8 & 1 & 2 \\ 6 & 7 & 9 \end{pmatrix} \\ O & = \begin{pmatrix} o_{1,1} = x_{2,1} & o_{1,2} = x_{1,3} \\ o_{2,1} = x_{2,1} & o_{2,2} = x_{3,3} \end{pmatrix} \end{align}\]

\[\begin{align} grad.O & = \begin{pmatrix} grad.o_{1,1} & grad.o_{1,2} \\ grad.o_{2,1} & grad.o_{2,2} \end{pmatrix} \\ grad.X & = \begin{pmatrix} grad.x_{1,1} = 0 & grad.x_{1,2} = 0 & grad.x_{1,3} = grad.o_{1,2} \\ grad.x_{2,1} = grad.o_{1,1} + grad.o_{2,1} & grad.x_{2,2} = 0 & grad.x_{2,3} = 0 \\ grad.x_{3,1} = 0 & grad.x_{3,2} = 0 & grad.x_{3,3} = grad.o_{2,2} \end{pmatrix} \end{align}\]

第三節:利用卷積神經網路做手寫數字辨識(1)

F09

第三節:利用卷積神經網路做手寫數字辨識(2)

my.model.FeedForward.create = function (Iterator, ctx = mx.cpu(), save.grad = FALSE,
                                        loss_symbol, pred_symbol,
                                        Optimizer, num_round = 30) {
  
  require(abind)
  
  #0. Check data shape
  Iterator$reset()
  Iterator$iter.next()
  my_values <- Iterator$value()
  input_shape <- lapply(my_values, dim)
  batch_size <- tail(input_shape[[1]], 1)
  
  #1. Build an executor to train model
  exec_list = list(symbol = loss_symbol, ctx = ctx, grad.req = "write")
  exec_list = append(exec_list, input_shape)
  my_executor = do.call(mx.simple.bind, exec_list)
  
  #2. Set the initial parameters
  mx.set.seed(0)
  new_arg = mxnet:::mx.model.init.params(symbol = loss_symbol,
                                         input.shape = input_shape,
                                         output.shape = NULL,
                                         initializer = mxnet:::mx.init.uniform(0.01),
                                         ctx = ctx)
  mx.exec.update.arg.arrays(my_executor, new_arg$arg.params, match.name = TRUE)
  mx.exec.update.aux.arrays(my_executor, new_arg$aux.params, match.name = TRUE)
  
  #3. Define the updater
  my_updater = mx.opt.get.updater(optimizer = Optimizer, weights = my_executor$ref.arg.arrays)
  
  #4. Forward/Backward
  message('Start training:')
  
  set.seed(0)
  if (save.grad) {epoch_grad = NULL}
  
  for (i in 1:num_round) {
    
    Iterator$reset()
    batch_loss = list()
    if (save.grad) {batch_grad = list()}
    batch_seq = 0
    t0 = Sys.time()
    
    while (Iterator$iter.next()) {
      
      my_values <- Iterator$value()
      mx.exec.update.arg.arrays(my_executor, arg.arrays = my_values, match.name = TRUE)
      mx.exec.forward(my_executor, is.train = TRUE)
      mx.exec.backward(my_executor)
      update_args = my_updater(weight = my_executor$ref.arg.arrays, grad = my_executor$ref.grad.arrays)
      mx.exec.update.arg.arrays(my_executor, update_args, skip.null = TRUE)
      batch_loss[[length(batch_loss) + 1]] = as.array(my_executor$ref.outputs[[1]])
      if (save.grad) {
        grad_list = sapply(my_executor$ref.grad.arrays, function (x) {if (!is.null(x)) {mean(abs(as.array(x)))}})
        grad_list = unlist(grad_list[grepl('weight', names(grad_list), fixed = TRUE)])
        batch_grad[[length(batch_grad) + 1]] = grad_list
      }
      batch_seq = batch_seq + 1
      
    }
    
    message(paste0("epoch = ", i,
                   ": loss = ", formatC(mean(unlist(batch_loss)), format = "f", 4),
                   " (Speed: ", formatC(batch_seq * batch_size/as.numeric(Sys.time() - t0, units = 'secs'), format = "f", 2), " sample/secs)"))
    
    if (save.grad) {epoch_grad = rbind(epoch_grad, apply(abind(batch_grad, along = 2), 1, mean))}
    
  }
  
  if (save.grad) {
    
    epoch_grad[epoch_grad < 1e-8] = 1e-8
    
    COL = rainbow(ncol(epoch_grad))
    random_pos = 2^runif(ncol(epoch_grad), -0.5, 0.5)
    
    plot(epoch_grad[,1] * random_pos[1], type = 'l', col = COL[1],
         xlab = 'epoch', ylab = 'mean of abs(grad)', log = 'y',
         ylim = range(epoch_grad))
    
    for (i in 2:ncol(epoch_grad)) {lines(1:nrow(epoch_grad), epoch_grad[,i] * random_pos[i], col = COL[i])}
    
    legend('topright', paste0('layer', 1:ncol(epoch_grad), '_weight'), col = COL, lwd = 1)
    
  }
  
  #5. Get model
  my_model <- mxnet:::mx.model.extract.model(symbol = pred_symbol,
                                             train.execs = list(my_executor))
  
  return(my_model)
  
}

第三節:利用卷積神經網路做手寫數字辨識(3)

– 這是一個閹割版的LeNet,原版的LeNet其第一、二層的卷積器數量分別是20以及50,而第一個全連接層具有500個神經元。這樣一個閹割版的小網路他的總參數量將與剛剛我們所使用的5隱藏層多層感知器相當。

# input
data <- mx.symbol.Variable('data')

# first conv
conv1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=10, name = 'conv1')
relu1 <- mx.symbol.Activation(data=conv1, act_type="relu")
pool1 <- mx.symbol.Pooling(data=relu1, pool_type="max",
                          kernel=c(2,2), stride=c(2,2))
# second conv
conv2 <- mx.symbol.Convolution(data=pool1, kernel=c(5,5), num_filter=20, name = 'conv2')
relu2 <- mx.symbol.Activation(data=conv2, act_type="relu")
pool2 <- mx.symbol.Pooling(data=relu2, pool_type="max",
                          kernel=c(2,2), stride=c(2,2))
# first fullc
flatten <- mx.symbol.Flatten(data=pool2)
fc1 <- mx.symbol.FullyConnected(data=flatten, num_hidden=150, name = 'fc1')
relu3 <- mx.symbol.Activation(data=fc1, act_type="relu")

# second fullc
fc2 <- mx.symbol.FullyConnected(data=relu3, num_hidden=10, name = 'fc2')

# Softmax
lenet <- mx.symbol.softmax(data = fc2, axis = 1, name = 'lenet')

# m-log loss
label = mx.symbol.Variable(name = 'label')

eps = 1e-8
m_log = 0 - mx.symbol.mean(mx.symbol.broadcast_mul(mx.symbol.log(lenet + eps), label))
m_logloss = mx.symbol.MakeLoss(m_log, name = 'm_logloss')

– 第一層卷積組合

  1. 原始圖片(28x28x1)要先經過10個5x5的「卷積器」(5x5x1x10)處理,將使圖片變成10張「一階特徵圖」(24x24x10)

  2. 接著這10張「一階特徵圖」(24x24x10)會經過ReLU,產生10張「轉換後的一階特徵圖」(24x24x10)

  3. 接著這10張「轉換後的一階特徵圖」(24x24x10)再經過2x2「池化器」(2x2)處理,將使圖片變成10張「降維後的一階特徵圖」(12x12x10)

– 第二層卷積組合

  1. 再將10張「降維後的一階特徵圖」(12x12x10)經過20個5x5的「卷積器」(5x5x10x20)處理,將使圖片變成20張「二階特徵圖」(8x8x20)

  2. 接著這20張「二階特徵圖」(8x8x20)會經過ReLU,產生20張「轉換後的二階特徵圖」(8x8x20)

  3. 接著這20張「轉換後的二階特徵圖」(8x8x20)再經過2x2「池化器」(2x2)處理,將使圖片變成20張「降維後的二階特徵圖」(4x4x20)

– 全連接層

  1. 將「降維後的二階特徵圖」(4x4x20)重新排列,壓製成「一階高級特徵」(320)

  2. 讓「一階高級特徵」(320)進入「隱藏層」,輸出「二階高級特徵」(150)

  3. 「二階高級特徵」(150)經過ReLU,輸出「轉換後的二階高級特徵」(150)

  4. 「轉換後的二階高級特徵」(150)進入「輸出層」,產生「原始輸出」(10)

  5. 「原始輸出」(10)經過Softmax函數轉換,判斷圖片是哪個類別

第三節:利用卷積神經網路做手寫數字辨識(4)

my_optimizer = mx.opt.create(name = "adam", learning.rate = 0.001, beta1 = 0.9, beta2 = 0.999,
                             epsilon = 1e-08, wd = 0)

lenet_model = my.model.FeedForward.create(Iterator = my_iter, ctx = mx.gpu(), save.grad = TRUE,
                                          loss_symbol = m_logloss, pred_symbol = lenet,
                                          Optimizer = my_optimizer, num_round = 20)

library(data.table)

DAT = fread("data/test_data.csv", data.table = FALSE)
DAT = data.matrix(DAT)

Test.X = t(DAT[,-1])
dim(Test.X) = c(28, 28, 1, ncol(Test.X))
Test.Y = DAT[,1]

predict_Y = predict(lenet_model, Test.X)
confusion_table = table(max.col(t(predict_Y)), Test.Y)
cat("Testing accuracy rate =", sum(diag(confusion_table))/sum(confusion_table))
## Testing accuracy rate = 0.9835119
print(confusion_table)
##     Test.Y
##         0    1    2    3    4    5    6    7    8    9
##   1  1658    2    2    0    3    5   10    1    4    4
##   2     1 1832    5    1    4    0    1    2    4    2
##   3     1    4 1630    2    2    0    1    7    4    1
##   4     0    0    7 1711    1    5    1    4    1    1
##   5     0    2    1    0 1560    1    1    2    4   16
##   6     2    1    0    7    0 1527    6    1    7    1
##   7     0    1    2    0    1    2 1636    0    3    0
##   8     1    6    3    3    6    1    0 1719    0    7
##   9     0    2    6   16    4    4    5    4 1645    5
##   10    0    1    0    2   25    6    0   13    3 1605

練習1:重現CNN推理過程

PARAMS = lenet_model$arg.params
ls(PARAMS)
## [1] "conv1_bias"   "conv1_weight" "conv2_bias"   "conv2_weight" "fc1_bias"    
## [6] "fc1_weight"   "fc2_bias"     "fc2_weight"

– 第一層卷積組合

  1. 原始圖片(28x28x1)要先經過10個5x5的「卷積器」(5x5x1x10)處理,將使圖片變成10張「一階特徵圖」(24x24x10)

  2. 接著這10張「一階特徵圖」(24x24x10)會經過ReLU,產生10張「轉換後的一階特徵圖」(24x24x10)

  3. 接著這10張「轉換後的一階特徵圖」(24x24x10)再經過2x2「池化器」(2x2)處理,將使圖片變成10張「降維後的一階特徵圖」(12x12x10)

– 第二層卷積組合

  1. 再將10張「降維後的一階特徵圖」(12x12x10)經過20個5x5的「卷積器」(5x5x10x20)處理,將使圖片變成20張「二階特徵圖」(8x8x520)

  2. 接著這20張「二階特徵圖」(8x8x20)會經過ReLU,產生20張「轉換後的二階特徵圖」(8x8x20)

  3. 接著這20張「轉換後的二階特徵圖」(8x8x20)再經過2x2「池化器」(2x2)處理,將使圖片變成20張「降維後的二階特徵圖」(4x4x20)

– 全連接層

  1. 將「降維後的二階特徵圖」(4x4x20)重新排列,壓製成「一階高級特徵」(320)

  2. 讓「一階高級特徵」(320)進入「隱藏層」,輸出「二階高級特徵」(150)

  3. 「二階高級特徵」(150)經過ReLU,輸出「轉換後的二階高級特徵」(150)

  4. 「轉換後的二階高級特徵」(150)進入「輸出層」,產生「原始輸出」(10)

  5. 「原始輸出」(10)經過Softmax函數轉換,判斷圖片是哪個類別

Input = Test.X[,,,1]
dim(Input) = c(28, 28, 1, 1)
preds = predict(lenet_model, Input)
pred.label = max.col(t(preds)) - 1

par(mar=rep(0,4))
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
img = as.raster(t(matrix(as.numeric(Input)/255, nrow = 28)))
rasterImage(img, -0.04, -0.04, 1.04, 1.04, interpolate=FALSE)
text(0.05, 0.95, Test.Y[1], col = "green", cex = 2)
text(0.95, 0.95, pred.label, col = "blue", cex = 2)

練習1答案

PARAMS = lenet_model$arg.params

Input = Test.X[,,,1]
dim(Input) = c(28, 28, 1, 1)

Conv1_out = array(0, dim = c(24, 24, 10))

for (i in 1:10) {
  for (j in 1:24) {
    for (k in 1:24) {
      Conv1_out[j,k,i] <- sum(Input[j+(0:4),k+(0:4),,] * as.array(PARAMS$conv1_weight)[,,,i]) + as.array(PARAMS$conv1_bias)[i]
    } 
  }
}

ReLU1_out = Conv1_out
ReLU1_out[ReLU1_out < 0] = 0

Pool1_out = array(0, dim = c(12, 12, 10))

for (i in 1:10) {
  for (j in 1:12) {
    for (k in 1:12) {
      Pool1_out[j,k,i] <- max(ReLU1_out[j*2+(-1:0),k*2+(-1:0),i])
    } 
  }
}

Conv2_out = array(0, dim = c(8, 8, 20))

for (i in 1:20) {
  for (j in 1:8) {
    for (k in 1:8) {
      Conv2_out[j,k,i] <- sum(Pool1_out[j+(0:4),k+(0:4),] * as.array(PARAMS$conv2_weight)[,,,i]) + as.array(PARAMS$conv2_bias)[i]
    } 
  }
}

ReLU2_out = Conv2_out
ReLU2_out[ReLU2_out < 0] = 0

Pool2_out = array(0, dim = c(4, 4, 20))

for (i in 1:20) {
  for (j in 1:4) {
    for (k in 1:4) {
      Pool2_out[j,k,i] <- max(ReLU2_out[j*2+(-1:0),k*2+(-1:0),i])
    } 
  }
}

Flatten_out = as.numeric(Pool2_out)
fc1_out = Flatten_out %*% as.array(PARAMS$fc1_weight) + as.array(PARAMS$fc1_bias)

ReLU3_out = fc1_out
ReLU3_out[ReLU3_out < 0] = 0

fc2_out = ReLU3_out %*% as.array(PARAMS$fc2_weight) + as.array(PARAMS$fc2_bias)
Softmax_out = exp(fc2_out)/sum(exp(fc2_out))

all.equal(preds, t(Softmax_out))
## [1] TRUE

第四節:使用經典的卷積神經網路(1)

– 史丹佛大學的李飛飛從2007年創辦ImageNet,收集大量帶有標註信息的圖片數據供電腦視覺模型訓練,至今為止這個資料庫已有上百萬張圖片

F10

F11

第四節:使用經典的卷積神經網路(2)

– 讓我們下載resnet-50 .paramsresnet-50 symbol兩個檔案,這是先前提過基於何愷明的Residual Learning的研究所訓練出來的50層深的深度神經網路

– 另外再請你下載chinese synset.txt,這描述了這個模型輸出的1000個類別分別是甚麼。

library(mxnet)

res_model <- mx.model.load("model/resnet-50", 0)
synsets <- readLines('model/chinese synset.txt', encoding = 'UTF-8')

第四節:使用經典的卷積神經網路(3)

– 通常這些在ImageNet上訓練的模型,都是以224x224的圖像作為訓練:

library(OpenImageR)

img<- readImage('test.jpg') 
resized_img <- resizeImage(img, 224, 224, method = 'bilinear')

imageShow(resized_img)

– 讓我們來進行預測,注意要先將圖像改成MxNet所接受的維度:

dim(resized_img) <- c(dim(resized_img), 1)
pred_prob  <- predict(res_model, resized_img)

pred_prob <- as.numeric(pred_prob)
names(pred_prob) <- synsets
pred_prob <- sort(pred_prob, decreasing = TRUE)
pred_prob <- formatC(pred_prob, 4, format = 'f')
head(pred_prob, 5)
##             n01484850 大白鯊               n01491361 虎鯊 
##                     "0.9971"                     "0.0027" 
##             n01494475 鎚頭鯊 n02071294 殺人鯨,逆戟鯨,虎鯨 
##                     "0.0001"                     "0.0001" 
##               n02066245 灰鯨 
##                     "0.0000"

第五節:轉移特徵學習(1)

– 而剩下的兩個問題中過度擬合問題有眾多可行的解決方案,或者是我們可以取得更大量的資料解決問題。然而權重初始化問題一直沒有辦法被解決。

– 這個想法稱作轉移特徵學習(Transfer learning),而這個想法是基於人類通常具有舉一反三的能力,舉例來說一個剛入學的醫學系學生他們僅有接受過高中程度的基礎訓練,並未接受過任何醫學專業領域的訓練,但他們的學習因為是基於高中的基礎之上,因此即使醫學專業相當艱深也能相當快的學會。

F12

第五節:轉移特徵學習(2)

library(imager)

par(mar=rep(0,4), mfrow = c(8, 8))
for (i in 1:64) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.cimg(as.array(res_model$arg.params$conv0_weight)[,,,i]), 0, 0, 1, 1, interpolate=FALSE)
}

第五節:轉移特徵學習(3)

– 讓我們到這裡下載其中的100張貓以及100張狗,最後再用這個分類器預測裡面貓狗各5張測試圖片。

– 先讓我們示範一次如何使用轉移特徵學習進行模型訓練,我們試著以「底層執行器」來編寫。

library(OpenImageR)

in_dir <- 'Dogs vs. Cats/'
out_dir <- 'processed/'
dir.create(out_dir)

img_paths <- list.files(in_dir)

for (i in 1:length(img_paths)) {
    
    img <- readImage(paste0(in_dir, img_paths[i]))
    resized_img <- resizeImage(img, 224, 224, method = 'bilinear')
    save(resized_img, file = paste0(out_dir, gsub('.jpg', '.RData', img_paths[i])))
    
}

第五節:轉移特徵學習(4)

library(mxnet)

data_dir <- 'processed/'
img_paths <- list.files(data_dir)

# Iterator

my_iterator_core = function (batch_size) {
  
  batch = 0
  batch_per_epoch = length(img_paths)/batch_size
  
  reset = function() {batch <<- 0}
  
  iter.next = function() {
    batch <<- batch+1
    if (batch > batch_per_epoch) {return(FALSE)} else {return(TRUE)}
  }
  
  value = function() {
    idx <- sample(1:length(img_paths), batch_size)
    X.array <- array(0, dim = c(224, 224, 3, batch_size))
    for (i in 1:batch_size) {
      load(paste0(data_dir, img_paths[idx[i]]))
      X.array[,,,i] <- resized_img
    }
    Y.array <- array(0, dim = c(2, batch_size))
    Y.array[1, grepl('cat', img_paths[idx])] <- 1
    Y.array[2, grepl('dog', img_paths[idx])] <- 1
    data = mx.nd.array(X.array)
    label = mx.nd.array(Y.array)
    return(list(data = data, label = label))
  }
  
  return(list(reset = reset, iter.next = iter.next, value = value, batch_size = batch_size, batch = batch))
}

my_iterator_func <- setRefClass("Custom_Iter",
                                fields = c("iter", "batch_size"),
                                contains = "Rcpp_MXArrayDataIter",
                                methods = list(
                                  initialize = function(iter, batch_size = 100){
                                    .self$iter <- my_iterator_core(batch_size = batch_size)
                                    .self
                                  },
                                  value = function(){
                                    .self$iter$value()
                                  },
                                  iter.next = function(){
                                    .self$iter$iter.next()
                                  },
                                  reset = function(){
                                    .self$iter$reset()
                                  },
                                  finalize=function(){
                                  }
                                )
)

my_iter = my_iterator_func(iter = NULL, batch_size = 20)

第五節:轉移特徵學習(4)

my_iter$reset()
my_iter$iter.next()
## [1] TRUE
my_value = my_iter$value()
print(as.array(my_value$label)[,1])
## [1] 0 1
imageShow(as.array(my_value$data)[,,,1])

my_optimizer <- mx.opt.create(name = "adam", learning.rate = 1e-3, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08, wd = 1e-3)

第五節:轉移特徵學習(5)

library(mxnet)
library(magrittr)

# Read Pre-training Model

res_model = mx.model.load("model/resnet-50", 0)

# Get symbol

all_layers = res_model$symbol$get.internals()
flatten0_output = which(all_layers$outputs == 'flatten0_output') %>% all_layers$get.output()

# Define Model Architecture

fc1 <- mx.symbol.FullyConnected(data = flatten0_output, num_hidden = 2, name = 'fc1')
softmax <- mx.symbol.softmax(data = fc1, axis = 1, name = 'softmax')

label = mx.symbol.Variable(name = 'label')

eps = 1e-8
m_log = 0 - mx.symbol.mean(mx.symbol.broadcast_mul(mx.symbol.log(softmax + eps), label))
m_logloss = mx.symbol.MakeLoss(m_log, name = 'm_logloss')
new_arg <- mxnet:::mx.model.init.params(symbol = softmax,
                                        input.shape = list(data = c(224, 224, 3, 32)),
                                        output.shape = NULL,
                                        initializer = mxnet:::mx.init.uniform(0.01),
                                        ctx = mx.cpu())

for (i in 1:length(new_arg$arg.params)) {
  pos <- which(names(res_model$arg.params) == names(new_arg$arg.params)[i])
  if (length(pos) == 1) {
    if (all.equal(dim(res_model$arg.params[[pos]]), dim(new_arg$arg.params[[i]])) == TRUE) {
      new_arg$arg.params[[i]] <- res_model$arg.params[[pos]]
    }
  }
}

for (i in 1:length(new_arg$aux.params)) {
  pos <- which(names(res_model$aux.params) == names(new_arg$aux.params)[i])
  if (length(pos) == 1) {
    if (all.equal(dim(res_model$aux.params[[pos]]), dim(new_arg$aux.params[[i]])) == TRUE) {
      new_arg$aux.params[[i]] <- res_model$aux.params[[pos]]
    }
  }
}

第五節:轉移特徵學習(6)

#1. Build an executor to train model

my_executor = mx.simple.bind(symbol = m_logloss,
                             data = c(224, 224, 3, 20), label = c(2, 20),
                             ctx = mx.gpu(), grad.req = "write")

#2. Set the initial parameters

mx.exec.update.arg.arrays(my_executor, new_arg$arg.params, match.name = TRUE)
mx.exec.update.aux.arrays(my_executor, new_arg$aux.params, match.name = TRUE)

#3. Define the updater

my_updater = mx.opt.get.updater(optimizer = my_optimizer, weights = my_executor$ref.arg.arrays)
for (i in 1:20) {
  
  my_iter$reset()
  batch_loss = NULL
  
  while (my_iter$iter.next()) {
    
    my_values <- my_iter$value()
    mx.exec.update.arg.arrays(my_executor, arg.arrays = my_values, match.name = TRUE)
    mx.exec.forward(my_executor, is.train = TRUE)
    mx.exec.backward(my_executor)
    update_args = my_updater(weight = my_executor$ref.arg.arrays, grad = my_executor$ref.grad.arrays)
    mx.exec.update.arg.arrays(my_executor, update_args, skip.null = TRUE)
    batch_loss = c(batch_loss, as.array(my_executor$ref.outputs$m_logloss_output))
    
  }
  
  message(paste0("epoch = ", i, ": m-logloss = ", formatC(mean(batch_loss), format = "f", 4)))
  
}

第五節:轉移特徵學習(7)

# Get model

dog_cat_model <- mxnet:::mx.model.extract.model(symbol = softmax,
                                                train.execs = list(my_executor))

# Predict & Display

par(mar=rep(0,4), mfcol = c(2, 5))

for (i in 1:5) {
  
  plot(NA, xlim = c(0.04, 0.96), ylim = c(0.04, 0.96), xaxt = "n", yaxt = "n", bty = "n")
  cat_img <- readImage(paste0('test_cat.', i, '.jpg'))
  norm_cat_img <- resizeImage(cat_img, 224, 224, method = 'bilinear')
  dim(norm_cat_img) <- c(224, 224, 3, 1)
  rasterImage(cat_img, 0, 0, 1, 1, interpolate=FALSE)
  prob <- predict(dog_cat_model, X = norm_cat_img, ctx = mx.gpu())
  text(0.5, 0.95, formatC(prob[1,1], 3, format = 'f'), col = "green", cex = 2)
  
  plot(NA, xlim = c(0.04, 0.96), ylim = c(0.04, 0.96), xaxt = "n", yaxt = "n", bty = "n")
  dog_img <- readImage(paste0('test_dog.', i, '.jpg'))
  norm_dog_img <- resizeImage(dog_img, 224, 224, method = 'bilinear')
  dim(norm_dog_img) <- c(224, 224, 3, 1)
  rasterImage(dog_img, 0, 0, 1, 1, interpolate=FALSE)
  prob <- predict(dog_cat_model, X = norm_dog_img, ctx = mx.gpu())
  text(0.5, 0.95, formatC(prob[1,1], 3, format = 'f'), col = "green", cex = 2)
  
}

練習2:轉移特徵學習的效果

– 你也可以換幾個模型,不一定要使用resnet-50。

– 多嘗試一些作法,並試著修改Iterator或是Optimizer。

練習2答案(1)

new_arg <- mxnet:::mx.model.init.params(symbol = softmax,
                                        input.shape = list(data = c(224, 224, 3, 32)),
                                        output.shape = NULL,
                                        initializer = mxnet:::mx.init.uniform(0.01),
                                        ctx = mx.cpu())
#1. Build an executor to train model

my_executor = mx.simple.bind(symbol = m_logloss,
                             data = c(224, 224, 3, 20), label = c(2, 20),
                             ctx = mx.gpu(), grad.req = "write")

#2. Set the initial parameters

mx.exec.update.arg.arrays(my_executor, new_arg$arg.params, match.name = TRUE)
mx.exec.update.aux.arrays(my_executor, new_arg$aux.params, match.name = TRUE)

#3. Define the updater

my_updater = mx.opt.get.updater(optimizer = my_optimizer, weights = my_executor$ref.arg.arrays)

練習2答案(2)

for (i in 1:20) {
  
  my_iter$reset()
  batch_loss = NULL
  
  while (my_iter$iter.next()) {
    
    my_values <- my_iter$value()
    mx.exec.update.arg.arrays(my_executor, arg.arrays = my_values, match.name = TRUE)
    mx.exec.forward(my_executor, is.train = TRUE)
    mx.exec.backward(my_executor)
    update_args = my_updater(weight = my_executor$ref.arg.arrays, grad = my_executor$ref.grad.arrays)
    mx.exec.update.arg.arrays(my_executor, update_args, skip.null = TRUE)
    batch_loss = c(batch_loss, as.array(my_executor$ref.outputs$m_logloss_output))
    
  }
  
  message(paste0("epoch = ", i, ": m-logloss = ", formatC(mean(batch_loss), format = "f", 4)))
  
}

結語

  1. 他會考慮各像素之間真實的相關性,而非像多層感知器一樣把每個像素視為完全獨立的特徵。這一點可以參考人類的視覺系統,我想你應該能認同你的眼睛具有平移不變性(shift invariance)的特色,因此CNN因為完美的模仿了視覺系統造就了在測試集中的高準確性。

  2. CNN由於權重共享的特性,導致相當的節省參數量。從過擬合的角度思考,這暗示著我們可以用較小的參數量完成複雜的網路,因此較不容易過擬合;從參數量固定的狀況下思考,CNN可以拼湊出較為複雜的結構。因此CNN的結構在影像辨識上具有相當強大的優勢!

– 你是否開始覺得訓練神經網路其實很簡單?說簡單也很簡單,但說難也很難,你應該還根本不知道你讀取的模型到底裡面長什麼樣子,並且還有很多訓練技巧有待學習。

– 學習AI最有效的方法就是參與資料科學競賽,歡迎大家使用我們的資料科學競賽網站!