深度學習理論與實務

林嶔 (Lin, Chin)

Lesson 12 物體分割與物件識別模型概述

前言

– 另外,我們也學習了反卷積層這種新的網路結構,透過卷積層與反卷積層的配合,我們可以將圖形自由的變換大小!

F01

第一節:物體分割任務(1)

– 輸入是圖片,這個對我們來說不成問題,但預測目標呢?讓我們先看看我們的成品:

F02

F03

第一節:物體分割任務(2)

ISBI challenge提供了一個簡單的資料集,讓我們能嘗試把細胞給切割出來:

F04

– 你可以在這裡下載壓縮檔,請將它解壓縮後進行運用!

第一節:物體分割任務(3)

library(imager)
library(abind)
library(jpeg)

train_img_list <- list()
train_files <- list.files('ISBI/train-volume', pattern = '.jpg', full.names = TRUE)

for (i in 1:length(train_files)) {
  train_img_list[[i]] <- readJPEG(train_files[i])
}

train.x <- abind(train_img_list, along = 3)
dim(train.x) <- c(512, 512, 1, 30)

par(mar=rep(0,4), mfcol = c(2, 5))
for (i in 1:10) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  img = as.raster(train.x[,,,i])
  rasterImage(img, 0, 0, 1, 1, interpolate=FALSE)
}

第一節:物體分割任務(4)

train_label_list <- list()
train_files <- list.files('ISBI/train-labels', pattern = '.jpg', full.names = TRUE)

for (i in 1:length(train_files)) {
  train_label_list[[i]] <- readJPEG(train_files[i])
}

train.y <- abind(train_label_list, along = 3)
dim(train.y) <- c(512, 512, 1, 30)

train.y[train.y > 0.5] <- 1
train.y[train.y <= 0.5] <- 0

par(mar=rep(0,4), mfcol = c(2, 5))
for (i in 1:10) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  img = as.raster(train.y[,,,i])
  rasterImage(img, 0, 0, 1, 1, interpolate=FALSE)
}

第一節:物體分割任務(5)

library(mxnet)

my_iterator_core = function(batch_size) {
  
  batch = 0
  batch_per_epoch = dim(train.y)[4]/batch_size
  
  reset = function() {batch <<- 0}
  
  iter.next = function() {
    batch <<- batch+1
    if (batch > batch_per_epoch) {return(FALSE)} else {return(TRUE)}
  }
  
  value = function() {
    idx <- 1:batch_size + (batch - 1) * batch_size
    idx[idx > dim(train.y)[4]] <- sample(1:dim(train.y)[4], sum(idx > dim(train.y)[4]))
    data <- mx.nd.array(array(train.x[,,,idx], dim = c(dim(train.x)[1:3], batch_size)))
    label <- mx.nd.array(array(train.y[,,,idx], dim = c(dim(train.y)[1:3], batch_size)))
    return(list(data = data, label = label))
  }
  
  return(list(reset = reset, iter.next = iter.next, value = value, batch_size = batch_size, batch = batch))
}

my_iterator_func <- setRefClass("Custom_Iter",
                                fields = c("iter", "batch_size"),
                                contains = "Rcpp_MXArrayDataIter",
                                methods = list(
                                  initialize = function(iter, batch_size = 100){
                                    .self$iter <- my_iterator_core(batch_size = batch_size)
                                    .self
                                  },
                                  value = function(){
                                    .self$iter$value()
                                  },
                                  iter.next = function(){
                                    .self$iter$iter.next()
                                  },
                                  reset = function(){
                                    .self$iter$reset()
                                  },
                                  finalize=function(){
                                  }
                                )
)
my_iter <- my_iterator_func(iter = NULL, batch_size = 2)
my_iter$reset()
my_iter$iter.next()
## [1] TRUE
batch_data <- my_iter$value()

par(mar=rep(0,4), mfcol = c(2, 2))
for (i in 1:2) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(as.array(batch_data$data)[,,,i]), 0, 0, 1, 1, interpolate=FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(as.array(batch_data$label)[,,,i]), 0, 0, 1, 1, interpolate=FALSE)
}

第一節:物體分割任務(6)

data <- mx.symbol.Variable('data')
bn_data <- mx.symbol.BatchNorm(data = data, fix.gamma = TRUE, name = 'bn_data')

conv1 <- mx.symbol.Convolution(data = bn_data, kernel = c(3, 3), pad = c(1, 1), num_filter = 8, no.bias = TRUE, name = 'conv1')
bn1 <- mx.symbol.BatchNorm(data = conv1, fix.gamma = FALSE, name = 'bn1')
relu1 <- mx.symbol.Activation(data = bn1, act_type = "relu", name = 'relu1')
pool1 <- mx.symbol.Pooling(data = relu1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool1')

conv2 <- mx.symbol.Convolution(data = pool1, kernel = c(3, 3), pad = c(1, 1), num_filter = 16, no.bias = TRUE, name = 'conv2')
bn2 <- mx.symbol.BatchNorm(data = conv2, fix.gamma = FALSE, name = 'bn2')
relu2 <- mx.symbol.Activation(data = bn2, act_type = "relu", name = 'relu2')
pool2 <- mx.symbol.Pooling(data = relu2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool2')

conv3 <- mx.symbol.Convolution(data = pool2, kernel = c(3, 3), pad = c(1, 1), num_filter = 32, no.bias = TRUE, name = 'conv3')
bn3 <- mx.symbol.BatchNorm(data = conv3, fix.gamma = FALSE, name = 'bn3')
relu3 <- mx.symbol.Activation(data = bn3, act_type = "relu", name = 'relu3')
pool3 <- mx.symbol.Pooling(data = relu3, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool3')

conv4 <- mx.symbol.Convolution(data = pool3, kernel = c(3, 3), pad = c(1, 1), num_filter = 64, no.bias = TRUE, name = 'conv4')
bn4 <- mx.symbol.BatchNorm(data = conv4, fix.gamma = FALSE, name = 'bn4')
relu4 <- mx.symbol.Activation(data = bn4, act_type = "relu", name = 'relu4')
pool4 <- mx.symbol.Pooling(data = relu4, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool4')

conv5 <- mx.symbol.Convolution(data = pool4, kernel = c(3, 3), pad = c(1, 1), num_filter = 128, no.bias = TRUE, name = 'conv5')
bn5 <- mx.symbol.BatchNorm(data = conv5, fix.gamma = FALSE, name = 'bn5')
relu5 <- mx.symbol.Activation(data = bn5, act_type = "relu", name = 'relu5')

deconv6 <- mx.symbol.Deconvolution(data = relu5, kernel = c(2, 2), stride = c(2, 2), num_filter = 64, name = 'deconv6')
bn6 <- mx.symbol.BatchNorm(data = deconv6, fix.gamma = FALSE, name = 'bn6')
relu6 <- mx.symbol.Activation(data = bn6, act_type = "relu", name = 'relu6')

deconv7 <- mx.symbol.Deconvolution(data = relu6, kernel = c(2, 2), stride = c(2, 2), num_filter = 32, name = 'deconv7')
bn7 <- mx.symbol.BatchNorm(data = deconv7, fix.gamma = FALSE, name = 'bn7')
relu7 <- mx.symbol.Activation(data = bn7, act_type = "relu", name = 'relu7')

deconv8 <- mx.symbol.Deconvolution(data = relu7, kernel = c(2, 2), stride = c(2, 2), num_filter = 16, name = 'deconv8')
bn8 <- mx.symbol.BatchNorm(data = deconv8, fix.gamma = FALSE, name = 'bn8')
relu8 <- mx.symbol.Activation(data = bn8, act_type = "relu", name = 'relu8')

deconv9 <- mx.symbol.Deconvolution(data = relu8, kernel = c(2, 2), stride = c(2, 2), num_filter = 8, name = 'deconv9')
bn9 <- mx.symbol.BatchNorm(data = deconv9, fix.gamma = FALSE, name = 'bn9')
relu9 <- mx.symbol.Activation(data = bn9, act_type = "relu", name = 'relu9')

linear_pred <- mx.symbol.Convolution(data = relu9, kernel = c(1, 1), num_filter = 1, name = 'linear_pred')
logistic_pred <- mx.symbol.Activation(data = linear_pred, act.type = 'sigmoid', name = 'logistic_pred')

# CE loss

label <- mx.symbol.Variable(name = 'label')

eps <- 1e-8
ce_loss_pos <-  mx.symbol.broadcast_mul(mx.symbol.log(logistic_pred + eps), label)
ce_loss_neg <-  mx.symbol.broadcast_mul(mx.symbol.log(1 - logistic_pred + eps), 1 - label)
ce_loss_mean <- 0 - mx.symbol.mean(ce_loss_pos + ce_loss_neg, axis = 0:3)
ce_loss <- mx.symbol.MakeLoss(ce_loss_mean, name = 'ce_loss')
my_optimizer <- mx.opt.create(name = "sgd", learning.rate = 0.05, momentum = 0.9, wd = 1e-4)

第一節:物體分割任務(7)

my.eval.metric.loss <- mx.metric.custom(
  name = "ce-loss", 
  function(real, pred) {
    return(as.array(pred))
  }
)

mx.set.seed(0)

model_1 <- mx.model.FeedForward.create(symbol = ce_loss, X = my_iter, optimizer = my_optimizer,
                                       eval.metric = my.eval.metric.loss,
                                       array.batch.size = 2, ctx = mx.gpu(), num.round = 20)
test_img_list <- list()
test_files <- list.files('ISBI/test-volume', pattern = '.jpg', full.names = TRUE)

for (i in 1:length(test_files)) {
  test_img_list[[i]] <- readJPEG(test_files[i])
}

test.x <- abind(test_img_list, along = 3)
dim(test.x) <- c(512, 512, 1, 30)
model_1$symbol <- logistic_pred
pred_y.1 <- predict(model_1, test.x)
pred_y.1[pred_y.1 > 0.5] <- 1
pred_y.1[pred_y.1 <= 0.5] <- 0

par(mar = rep(0, 4), mfcol = c(3, 4))
for (i in 1:4) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  pred_img <- pred_y.1[,,,i]
  pred_img[pred_img == 0] <- '#0000FF80'
  pred_img[pred_img == 1] <- '#FFFFFF00'
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
}

練習1:U-Net

F05

練習1答案(1)

data <- mx.symbol.Variable('data')
bn_data <- mx.symbol.BatchNorm(data = data, fix.gamma = TRUE, name = 'bn_data')

conv1 <- mx.symbol.Convolution(data = bn_data, kernel = c(3, 3), pad = c(1, 1), num_filter = 8, no.bias = TRUE, name = 'conv1')
bn1 <- mx.symbol.BatchNorm(data = conv1, fix.gamma = FALSE, name = 'bn1')
relu1 <- mx.symbol.Activation(data = bn1, act_type = "relu", name = 'relu1')
pool1 <- mx.symbol.Pooling(data = relu1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool1')

conv2 <- mx.symbol.Convolution(data = pool1, kernel = c(3, 3), pad = c(1, 1), num_filter = 16, no.bias = TRUE, name = 'conv2')
bn2 <- mx.symbol.BatchNorm(data = conv2, fix.gamma = FALSE, name = 'bn2')
relu2 <- mx.symbol.Activation(data = bn2, act_type = "relu", name = 'relu2')
pool2 <- mx.symbol.Pooling(data = relu2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool2')

conv3 <- mx.symbol.Convolution(data = pool2, kernel = c(3, 3), pad = c(1, 1), num_filter = 32, no.bias = TRUE, name = 'conv3')
bn3 <- mx.symbol.BatchNorm(data = conv3, fix.gamma = FALSE, name = 'bn3')
relu3 <- mx.symbol.Activation(data = bn3, act_type = "relu", name = 'relu3')
pool3 <- mx.symbol.Pooling(data = relu3, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool3')

conv4 <- mx.symbol.Convolution(data = pool3, kernel = c(3, 3), pad = c(1, 1), num_filter = 64, no.bias = TRUE, name = 'conv4')
bn4 <- mx.symbol.BatchNorm(data = conv4, fix.gamma = FALSE, name = 'bn4')
relu4 <- mx.symbol.Activation(data = bn4, act_type = "relu", name = 'relu4')
pool4 <- mx.symbol.Pooling(data = relu4, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool4')

conv5 <- mx.symbol.Convolution(data = pool4, kernel = c(3, 3), pad = c(1, 1), num_filter = 128, no.bias = TRUE, name = 'conv5')
bn5 <- mx.symbol.BatchNorm(data = conv5, fix.gamma = FALSE, name = 'bn5')
relu5 <- mx.symbol.Activation(data = bn5, act_type = "relu", name = 'relu5')

deconv6 <- mx.symbol.Deconvolution(data = relu5, kernel = c(2, 2), stride = c(2, 2), num_filter = 64, name = 'deconv6')
bn6 <- mx.symbol.BatchNorm(data = deconv6, fix.gamma = FALSE, name = 'bn6')
relu6 <- mx.symbol.Activation(data = bn6, act_type = "relu", name = 'relu6')

concat7 <- mx.symbol.concat(data = list(relu6, relu4), num.args = 2, dim = 1, name = 'concat7')
deconv7 <- mx.symbol.Deconvolution(data = concat7, kernel = c(2, 2), stride = c(2, 2), num_filter = 32, name = 'deconv7')
bn7 <- mx.symbol.BatchNorm(data = deconv7, fix.gamma = FALSE, name = 'bn7')
relu7 <- mx.symbol.Activation(data = bn7, act_type = "relu", name = 'relu7')

concat8 <- mx.symbol.concat(data = list(relu7, relu3), num.args = 2, dim = 1, name = 'concat8')
deconv8 <- mx.symbol.Deconvolution(data = concat8, kernel = c(2, 2), stride = c(2, 2), num_filter = 16, name = 'deconv8')
bn8 <- mx.symbol.BatchNorm(data = deconv8, fix.gamma = FALSE, name = 'bn8')
relu8 <- mx.symbol.Activation(data = bn8, act_type = "relu", name = 'relu8')

concat9 <- mx.symbol.concat(data = list(relu8, relu2), num.args = 2, dim = 1, name = 'concat9')
deconv9 <- mx.symbol.Deconvolution(data = concat9, kernel = c(2, 2), stride = c(2, 2), num_filter = 8, name = 'deconv9')
bn9 <- mx.symbol.BatchNorm(data = deconv9, fix.gamma = FALSE, name = 'bn9')
relu9 <- mx.symbol.Activation(data = bn9, act_type = "relu", name = 'relu9')

concat10 <- mx.symbol.concat(data = list(relu9, relu1), num.args = 2, dim = 1, name = 'concat10')
linear_pred <- mx.symbol.Convolution(data = concat10, kernel = c(1, 1), num_filter = 1, name = 'linear_pred')
logistic_pred <- mx.symbol.Activation(data = linear_pred, act.type = 'sigmoid', name = 'logistic_pred')

# CE loss

label <- mx.symbol.Variable(name = 'label')

eps <- 1e-8
ce_loss_pos <-  mx.symbol.broadcast_mul(mx.symbol.log(logistic_pred + eps), label)
ce_loss_neg <-  mx.symbol.broadcast_mul(mx.symbol.log(1 - logistic_pred + eps), 1 - label)
ce_loss_mean <- 0 - mx.symbol.mean(ce_loss_pos + ce_loss_neg, axis = 0:3)
ce_loss <- mx.symbol.MakeLoss(ce_loss_mean, name = 'ce_loss')

練習1答案(2)

my.eval.metric.loss <- mx.metric.custom(
  name = "ce-loss", 
  function(real, pred) {
    return(as.array(pred))
  }
)

mx.set.seed(0)

model_2 <- mx.model.FeedForward.create(symbol = ce_loss, X = my_iter, optimizer = my_optimizer,
                                       eval.metric = my.eval.metric.loss,
                                       array.batch.size = 2, ctx = mx.gpu(), num.round = 20)
model_2$symbol <- logistic_pred
pred_y.2 <- predict(model_2, test.x)
pred_y.2[pred_y.2 > 0.5] <- 1
pred_y.2[pred_y.2 <= 0.5] <- 0
par(mar = rep(0, 4), mfcol = c(3, 4))

for (i in 1:2) {
  
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  pred_img <- pred_y.1[,,,i]
  pred_img[pred_img == 0] <- '#0000FF80'
  pred_img[pred_img == 1] <- '#FFFFFF00'
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
  
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
  
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  pred_img <- pred_y.2[,,,i]
  pred_img[pred_img == 0] <- '#0000FF80'
  pred_img[pred_img == 1] <- '#FFFFFF00'
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
  
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
  
}

練習1引申(1)

F06

F07

F08

練習1引申(2)

data <- mx.symbol.Variable('data')
bn_data <- mx.symbol.BatchNorm(data = data, fix.gamma = TRUE, name = 'bn_data')

conv1 <- mx.symbol.Convolution(data = bn_data, kernel = c(3, 3), pad = c(1, 1), num_filter = 8, no.bias = TRUE, name = 'conv1')
bn1 <- mx.symbol.BatchNorm(data = conv1, fix.gamma = FALSE, name = 'bn1')
relu1 <- mx.symbol.Activation(data = bn1, act_type = "relu", name = 'relu1')
pool1 <- mx.symbol.Pooling(data = relu1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool1')

conv2 <- mx.symbol.Convolution(data = pool1, kernel = c(3, 3), pad = c(1, 1), num_filter = 16, no.bias = TRUE, name = 'conv2')
bn2 <- mx.symbol.BatchNorm(data = conv2, fix.gamma = FALSE, name = 'bn2')
relu2 <- mx.symbol.Activation(data = bn2, act_type = "relu", name = 'relu2')
pool2 <- mx.symbol.Pooling(data = relu2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool2')

conv3 <- mx.symbol.Convolution(data = pool2, kernel = c(3, 3), pad = c(1, 1), num_filter = 32, no.bias = TRUE, name = 'conv3')
bn3 <- mx.symbol.BatchNorm(data = conv3, fix.gamma = FALSE, name = 'bn3')
relu3 <- mx.symbol.Activation(data = bn3, act_type = "relu", name = 'relu3')
pool3 <- mx.symbol.Pooling(data = relu3, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool3')

conv4 <- mx.symbol.Convolution(data = pool3, kernel = c(3, 3), pad = c(1, 1), num_filter = 64, no.bias = TRUE, name = 'conv4')
bn4 <- mx.symbol.BatchNorm(data = conv4, fix.gamma = FALSE, name = 'bn4')
relu4 <- mx.symbol.Activation(data = bn4, act_type = "relu", name = 'relu4')
up4 <- mx.symbol.UpSampling(data = relu4, num_args = 1, scale = 2, sample_type = 'bilinear', num_filter = 64, name = 'up4')
pool4 <- mx.symbol.Pooling(data = relu4, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool4')

conv5 <- mx.symbol.Convolution(data = pool4, kernel = c(3, 3), pad = c(1, 1), num_filter = 128, no.bias = TRUE, name = 'conv5')
bn5 <- mx.symbol.BatchNorm(data = conv5, fix.gamma = FALSE, name = 'bn5')
relu5 <- mx.symbol.Activation(data = bn5, act_type = "relu", name = 'relu5')
up5 <- mx.symbol.UpSampling(data = relu5, num_args = 1, scale = 4, sample_type = 'bilinear', num_filter = 128, name = 'up5')

concat6 <- mx.symbol.concat(data = list(relu3, up4, up5), num.args = 3, dim = 1, name = 'concat6')
deconv6 <- mx.symbol.Deconvolution(data = concat6, kernel = c(2, 2), stride = c(2, 2), num_filter = 16, name = 'deconv6')
bn6 <- mx.symbol.BatchNorm(data = deconv6, fix.gamma = FALSE, name = 'bn6')
relu6 <- mx.symbol.Activation(data = bn6, act_type = "relu", name = 'relu6')

deconv7 <- mx.symbol.Deconvolution(data = relu6, kernel = c(2, 2), stride = c(2, 2), num_filter = 8, name = 'deconv7')
bn7 <- mx.symbol.BatchNorm(data = deconv7, fix.gamma = FALSE, name = 'bn7')
relu7 <- mx.symbol.Activation(data = bn7, act_type = "relu", name = 'relu7')

linear_pred <- mx.symbol.Convolution(data = relu7, kernel = c(1, 1), num_filter = 1, name = 'linear_pred')
logistic_pred <- mx.symbol.Activation(data = linear_pred, act.type = 'sigmoid', name = 'logistic_pred')

# CE loss

label <- mx.symbol.Variable(name = 'label')

eps <- 1e-8
ce_loss_pos <-  mx.symbol.broadcast_mul(mx.symbol.log(logistic_pred + eps), label)
ce_loss_neg <-  mx.symbol.broadcast_mul(mx.symbol.log(1 - logistic_pred + eps), 1 - label)
ce_loss_mean <- 0 - mx.symbol.mean(ce_loss_pos + ce_loss_neg, axis = 0:3)
ce_loss <- mx.symbol.MakeLoss(ce_loss_mean, name = 'ce_loss')

練習1引申(3)

my.eval.metric.loss <- mx.metric.custom(
  name = "ce-loss", 
  function(real, pred) {
    return(as.array(pred))
  }
)

mx.set.seed(0)

model_3 <- mx.model.FeedForward.create(symbol = ce_loss, X = my_iter, optimizer = my_optimizer,
                                       eval.metric = my.eval.metric.loss,
                                       array.batch.size = 2, ctx = mx.gpu(), num.round = 20)
model_3$symbol <- logistic_pred
pred_y.3 <- predict(model_3, test.x)
pred_y.3[pred_y.3 > 0.5] <- 1
pred_y.3[pred_y.3 <= 0.5] <- 0
par(mar = rep(0, 4), mfcol = c(3, 3))

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.1[,,,1]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.2[,,,1]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.3[,,,1]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

第二節:物件識別模型的前期演進(1)

F09

– 給各位一個思考的時間,你能想像一下要怎樣,你要注意的是預測函數需要「固定」輸出數目,因此如何預測「不固定」數量的預測框其實就是一個大難題!

第二節:物件識別模型的前期演進(2)

– 舉例來說我們可以像下圖這樣對每一個候選框進行分類任務,看看這個框是否有包含我們希望預測的物件:

F10

F11

第二節:物件識別模型的前期演進(3)

F12

– 模型的概念如下圖所示:

F13

第二節:物件識別模型的前期演進(4)

– 他的概念大概是透過簡單的色階變換決定圖像內有多少可分割的物體,而RCNN就是透過這種方式決定候選框有哪些:

F14

– 更麻煩的是,在做預測的時候同樣要把整個步驟重新跑過一遍,可見他還有非常大的改進空間!

第二節:物件識別模型的前期演進(5)

– 整個R-CNN最大的問題在於為什麼要做卷積神經網路的運算這麼多次,難道過程不能整合?

F15

第二節:物件識別模型的前期演進(6)

  1. 用來提取特徵的卷積神經網絡是作用在整個圖片上,而不是各個候選框上,而且這個卷積網絡也參與訓練過程。

  2. 候選框的搜索是在卷積神經網絡的輸出上,而不是原始圖片上。

  3. 在R-CNN裡,我們將形狀各異的提議區域resize後使用同樣的形狀來進行特徵提取。而在Fast R-CNN中整個過程為了整合進神經網路的推理之中,引入了興趣區域池化層(Region of Interest Pooling,RoI Pooling)來resize每個候選區域。

F16

– 這裡要特別說明一下Fast R-CNN的候選框提取方法,由於相較於R-CNN速度加快了不少,Fast R-CNN預先定義了一系列非常多的候選框(或稱作錨框,anchor box)。

– 與R-CNN最大的不同是對於R-CNN而言,每張圖片的候選框都不一樣,而對於Fast R-CNN而言每張圖片的候選框都一樣。

第二節:物件識別模型的前期演進(7)

– 在Fast R-CNN出來後不久,Ross Girshick又與他的同事Kaiming He(何愷明)、Shaoqing Ren以及Jian Sun發表了Faster R-CNN:Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

F17

第二節:物件識別模型的前期演進(8)

F18

– 而在Faster R-CNN中,他把這個過程拆成兩個步驟,首先先使用Region Proposal Networks做相對簡單的任務(判斷anchor box是否存在以及預測偏移量),之後再用後面的網路進行後續的分類以及偏移預測。

第三節:一階段物件識別模型介紹(1)

– 而YOLO的邏輯在於他不再試圖把框大概的位置找出來,而是直接找尋「物件中心」,再透過「物件中心」的資訊直接預測出框的大小及位置。

F19

F20

第三節:一階段物件識別模型介紹(2)

F21

  1. 框的中心位置(x座標與y座標)

  2. 框的長度及寬度

  3. 該框是屬於哪一個類別的物件

F22

第三節:一階段物件識別模型介紹(3)

F23

– 除了多尺度的預測之外,還有一個與YOLO不同的地方在於他不是憑空預測框的資訊,而是透過預先定義好一系列anchor boxes,再去預測目標框與anchor boxex的偏移量。

第三節:一階段物件識別模型介紹(4)

– 該論文為Feature Pyramid Networks for Object Detection,看看作者群你會發現怎麼又是Ross Girshick以及Kaiming He

F24

F25

家庭作業:利用訓練好的YOLO模型實現物體檢測

– 這個模型是用COCO Detection Challenge中的資料集訓練而成的,抓取物件的標的共有80項,你可以到這裡下載完整的語法及模型,我們可以透過裡面的語法進行預測,多嘗試一下不同的圖片感受他的威力吧!

F28

– 需要注意的是,這並不是原版的YOLO v3模型,這是一個較小的模型,準確度有略為降低,重要的是你是否有辦法從語法中、論文內、網路資源中找到這個最先進的模型是怎樣執行物件識別任務的!

家庭作業提示

  1. 由於這個模型的預測式包含了3個部分(分別是下採用8倍、16倍以及32倍),因此一般的predict函數沒有辦法使用,所以這個語法包含了一個「my_predict」函數,你必須優先了解它的輸出格式為何!

  2. 接著考慮到我們需要把輸出格式重新編碼成物件框的格式,我們需要讀取anchor_boxs (yolo v3).RData,這裡面記錄了9個anchor box的長寬資訊。有了這些anchor box之後,我們是利用下面這個方式進行解碼:

F26

  1. 接著注意Decode_fun之內,除了第一步解碼的部分,第二步是要把多餘的物件框移除(與高機率框重複太多的),這裡我們移除的依據是使用IoU的大小作為依據,而IoU的定義如下,計算方法交給IoU_function:

F27

  1. 把預測輸出還原回框後,下一步就只是顯示圖片了,函數「Show_img」負責顯示圖片!

結語

– 至於怎麼訓練一個物件識別模型,這還會遇到非常多的難題,並且非常考驗你的程式能力!

F26