圖像辨識與卷積神經網路

林嶔 (Lin, Chin)

Lesson 6

圖像辨識基礎(1)

F6_1

– 我們已經利用了多層感知機解決了眾多問題,那我們是否能在圖像方面也能使用多層感知機找到一個能夠精準進行手寫數字辨識的預測函數?

圖像辨識基礎(2)

– 請在這裡下載MNIST的手寫數字資料,並讓我們了解一下這筆資料的結構

– 一個28×28的黑白圖片的其實可以被表示成784個介於0至255的數字,這樣我們就又能把問題轉換為單純的預測問題了。

library(data.table)

DAT = fread("data/MNIST.csv", data.table = FALSE)
DAT = data.matrix(DAT)

#Split data

set.seed(0)
Train.sample = sample(1:nrow(DAT), nrow(DAT)*0.6, replace = FALSE)

Train.X = DAT[Train.sample,-1]
Train.Y = DAT[Train.sample,1]
Test.X = DAT[-Train.sample,-1]
Test.Y = DAT[-Train.sample,1]

#Display

library(OpenImageR)

imageShow(matrix(as.numeric(Train.X[1,]), nrow = 28, byrow = TRUE))

圖像辨識基礎(3)

– 這個時候我們會面對到一個硬體的問題,那就是我們不可能預先把所有檔案都讀到RAM內。比較好的解決方法是每次訓練時只讀取小批量的訓練樣本,這樣就能有效降低記憶體的使用。

fwrite(x = data.table(cbind(Train.Y, Train.X)),
       file = 'data/train_data.csv',
       col.names = FALSE, row.names = FALSE)

fwrite(x = data.table(cbind(Test.Y, Test.X)),
       file = 'data/test_data.csv',
       col.names = FALSE, row.names = FALSE)
library(mxnet)

my_iterator_func <- setRefClass("Custom_Iter",
                                fields = c("iter", "data.csv", "data.shape", "batch.size"),
                                contains = "Rcpp_MXArrayDataIter",
                                methods = list(
                                  initialize = function(iter, data.csv, data.shape, batch.size){
                                    csv_iter <- mx.io.CSVIter(data.csv = data.csv, data.shape = data.shape, batch.size = batch.size)
                                    .self$iter <- csv_iter
                                    .self
                                  },
                                  value = function(){
                                    val <- as.array(.self$iter$value()$data)
                                    val.x <- val[-1,]
                                    val.y <- t(model.matrix(~ -1 + factor(val[1,], levels = 0:9)))
                                    dim(val.x) <- c(nrow(val.x), ncol(val.x))
                                    val.y <- array(val.y, dim = c(10, ncol(val.x)))
                                    val.x <- mx.nd.array(val.x)
                                    val.y <- mx.nd.array(val.y)
                                    list(data=val.x, label=val.y)
                                  },
                                  iter.next = function(){
                                    .self$iter$iter.next()
                                  },
                                  reset = function(){
                                    .self$iter$reset()
                                  },
                                  finalize=function(){
                                  }
                                )
)

my_iter = my_iterator_func(iter = NULL,  data.csv = 'data/train_data.csv', data.shape = 785, batch.size = 20)
my_iter$reset()
my_iter$iter.next()
## [1] TRUE
my_value = my_iter$value()

imageShow(matrix(as.numeric(as.array(my_value$data)[,1]), nrow = 28, byrow = TRUE))

print(as.array(my_value$label)[,1])
##  [1] 0 0 0 0 0 0 0 0 1 0

圖像辨識基礎(4)

my.model.FeedForward.create = function (Iterator, ctx = mx.cpu(), save.grad = FALSE,
                                        loss_symbol, pred_symbol,
                                        Optimizer, num_round = 30) {
  
  require(abind)
  
  #0. Check data shape
  Iterator$reset()
  Iterator$iter.next()
  my_values <- Iterator$value()
  input_shape <- lapply(my_values, dim)
  batch_size <- tail(input_shape[[1]], 1)
  
  #1. Build an executor to train model
  exec_list = list(symbol = loss_symbol, ctx = ctx, grad.req = "write")
  exec_list = append(exec_list, input_shape)
  my_executor = do.call(mx.simple.bind, exec_list)
  
  #2. Set the initial parameters
  mx.set.seed(0)
  new_arg = mxnet:::mx.model.init.params(symbol = loss_symbol,
                                         input.shape = input_shape,
                                         output.shape = NULL,
                                         initializer = mxnet:::mx.init.uniform(0.01),
                                         ctx = ctx)
  mx.exec.update.arg.arrays(my_executor, new_arg$arg.params, match.name = TRUE)
  mx.exec.update.aux.arrays(my_executor, new_arg$aux.params, match.name = TRUE)
  
  #3. Define the updater
  my_updater = mx.opt.get.updater(optimizer = Optimizer, weights = my_executor$ref.arg.arrays)
  
  #4. Forward/Backward
  message('Start training:')
  
  set.seed(0)
  if (save.grad) {epoch_grad = NULL}
  
  for (i in 1:num_round) {
    
    Iterator$reset()
    batch_loss = list()
    if (save.grad) {batch_grad = list()}
    batch_seq = 0
    t0 = Sys.time()
    
    while (Iterator$iter.next()) {
      
      my_values <- Iterator$value()
      mx.exec.update.arg.arrays(my_executor, arg.arrays = my_values, match.name = TRUE)
      mx.exec.forward(my_executor, is.train = TRUE)
      mx.exec.backward(my_executor)
      update_args = my_updater(weight = my_executor$ref.arg.arrays, grad = my_executor$ref.grad.arrays)
      mx.exec.update.arg.arrays(my_executor, update_args, skip.null = TRUE)
      batch_loss[[length(batch_loss) + 1]] = as.array(my_executor$ref.outputs[[1]])
      if (save.grad) {
        grad_list = sapply(my_executor$ref.grad.arrays, function (x) {if (!is.null(x)) {mean(abs(as.array(x)))}})
        grad_list = unlist(grad_list[grepl('weight', names(grad_list), fixed = TRUE)])
        batch_grad[[length(batch_grad) + 1]] = grad_list
      }
      batch_seq = batch_seq + 1
      
    }
    
    message(paste0("epoch = ", i,
                   ": loss = ", formatC(mean(unlist(batch_loss)), format = "f", 4),
                   " (Speed: ", formatC(batch_seq * batch_size/as.numeric(Sys.time() - t0, units = 'secs'), format = "f", 2), " sample/secs)"))
    
    if (save.grad) {epoch_grad = rbind(epoch_grad, apply(abind(batch_grad, along = 2), 1, mean))}
    
  }
  
  if (save.grad) {
    
    epoch_grad[epoch_grad < 1e-8] = 1e-8
    
    COL = rainbow(ncol(epoch_grad))
    random_pos = 2^runif(ncol(epoch_grad), -0.5, 0.5)
    
    plot(epoch_grad[,1] * random_pos[1], type = 'l', col = COL[1],
         xlab = 'epoch', ylab = 'mean of abs(grad)', log = 'y',
         ylim = range(epoch_grad))
    
    for (i in 2:ncol(epoch_grad)) {lines(1:nrow(epoch_grad), epoch_grad[,i] * random_pos[i], col = COL[i])}
    
    legend('topright', paste0('layer', 1:ncol(epoch_grad), '_weight'), col = COL, lwd = 1)
    
  }
  
  #5. Get model
  my_model <- mxnet:::mx.model.extract.model(symbol = pred_symbol,
                                             train.execs = list(my_executor))
  
  return(my_model)
  
}

圖像辨識基礎(5)

– 我們先試試看5隱藏層網路,這個網路在昨天的IRIS資料集中是不work的,現在呢?

– 這是Model architecture:

data = mx.symbol.Variable(name = 'data')
label = mx.symbol.Variable(name = 'label')
fc1 = mx.symbol.FullyConnected(data = data, num.hidden = 50, name = 'fc1')
relu1 = mx.symbol.Activation(data = fc1, act.type = 'relu', name = 'relu1')
fc2 = mx.symbol.FullyConnected(data = relu1, num.hidden = 50, name = 'fc2')
relu2 = mx.symbol.Activation(data = fc2, act.type = 'relu', name = 'relu2')
fc3 = mx.symbol.FullyConnected(data = relu2, num.hidden = 50, name = 'fc3')
relu3 = mx.symbol.Activation(data = fc3, act.type = 'relu', name = 'relu3')
fc4 = mx.symbol.FullyConnected(data = relu3, num.hidden = 50, name = 'fc4')
relu4 = mx.symbol.Activation(data = fc4, act.type = 'relu', name = 'relu4')
fc5 = mx.symbol.FullyConnected(data = relu4, num.hidden = 50, name = 'fc5')
relu5 = mx.symbol.Activation(data = fc5, act.type = 'relu', name = 'relu5')
fc6 = mx.symbol.FullyConnected(data = relu5, num.hidden = 10, name = 'fc6')
softmax_layer = mx.symbol.softmax(data = fc6, axis = 1, name = 'softmax_layer')

eps = 1e-8
m_log = 0 - mx.symbol.mean(mx.symbol.broadcast_mul(mx.symbol.log(softmax_layer + eps), label))
m_logloss = mx.symbol.MakeLoss(m_log, name = 'm_logloss')

– 這是Optimizer:

my_optimizer = mx.opt.create(name = "adam", learning.rate = 0.001, beta1 = 0.9, beta2 = 0.999,
                             epsilon = 1e-08, wd = 1e-4)

圖像辨識基礎(6)

model = my.model.FeedForward.create(Iterator = my_iter, ctx = mx.cpu(), save.grad = TRUE,
                                    loss_symbol = m_logloss, pred_symbol = softmax_layer,
                                    Optimizer = my_optimizer, num_round = 30)

predict_Y = predict(model, t(Test.X), array.layout = "colmajor")
confusion_table = table(max.col(t(predict_Y)), Test.Y)
cat("Testing accuracy rate =", sum(diag(confusion_table))/sum(confusion_table))
## Testing accuracy rate = 0.9591071
print(confusion_table)
##     Test.Y
##         0    1    2    3    4    5    6    7    8    9
##   1  1645    0   12    2    6   13   11    5   11    9
##   2     0 1834    7    4    2    4   12    5   12    3
##   3     1    3 1559   12    1    0    3   12   12    1
##   4     2    2   30 1652    2   19    1    9   38   15
##   5     0    1    6    0 1548    2    9    5    1   28
##   6     8    1    8   33    1 1488   17    3   29    9
##   7     2    0    2    0    7    5 1603    0    4    0
##   8     0    0   11    8    5    2    0 1682    4   18
##   9     4    8   18   18    3   11    5    3 1550    7
##   10    1    2    3   13   31    7    0   29   14 1552

– 我們發現,5層隱藏層的網路在MNIST數據集中不但work,而且結果還算準(96.1%)

練習1:深淺網路比較

sum(sapply(lapply(model$arg.params, dim), prod))
## [1] 49960

– 請你設計一個只有1個隱藏層(各層神經元數量為62)的網路並進行準確度測試!

– 你也可以訓練一個再深一點的網路試試看!

練習1答案

data = mx.symbol.Variable(name = 'data')
label = mx.symbol.Variable(name = 'label')
fc1 = mx.symbol.FullyConnected(data = data, num.hidden = 62, name = 'fc1')
relu1 = mx.symbol.Activation(data = fc1, act.type = 'relu', name = 'relu1')
fc2 = mx.symbol.FullyConnected(data = relu1, num.hidden = 10, name = 'fc2')
softmax_layer = mx.symbol.softmax(data = fc2, axis = 1, name = 'softmax_layer')

eps = 1e-8
m_log = 0 - mx.symbol.mean(mx.symbol.broadcast_mul(mx.symbol.log(softmax_layer + eps), label))
m_logloss = mx.symbol.MakeLoss(m_log, name = 'm_logloss')

model = my.model.FeedForward.create(Iterator = my_iter, ctx = mx.cpu(), save.grad = TRUE,
                                    loss_symbol = m_logloss, pred_symbol = softmax_layer,
                                    Optimizer = my_optimizer, num_round = 30)

predict_Y = predict(model, t(Test.X), array.layout = "colmajor")
confusion_table = table(max.col(t(predict_Y)), Test.Y)
cat("Testing accuracy rate =", sum(diag(confusion_table))/sum(confusion_table))
## Testing accuracy rate = 0.9095833

卷積神經網路介紹(1)

– 但回到我們的手寫數字分類問題,當我們看到這些手寫數字時,我們一眼就能認出他們了,但從「圖片」到「概念」的過程真的這麼簡單嗎?

– 現在我們面對的是視覺問題,看來除了模擬大腦思考運作的過程之外,我們還需要模擬眼睛的作用!

卷積神經網路介紹(2)

F6_2

– 他們的研究發現,貓咪在受到不同形狀的圖像刺激時,感受野的腦部細胞會產生不同反應

F6_3

卷積神經網路介紹(3)

– 卷積器模擬了感受野最初的細胞,他們負責用來辨認特定特徵,他們的數學模式如下:

F6_4

– 「特徵圖」的意義是什麼呢?卷積器就像是最初級的視覺細胞,他們專門辨認某一種簡單特徵,那這個「特徵圖」上面數字越大的,就代表那個地方越符合該細胞所負責的特徵。