
林嶔 (Lin, Chin)

Lesson 12 物體分割與物件識別模型概述


– 另外,我們也學習了反卷積層這種新的網路結構,透過卷積層與反卷積層的配合,我們可以將圖形自由的變換大小!



– 輸入是圖片,這個對我們來說不成問題,但預測目標呢?讓我們先看看我們的成品:




ISBI challenge提供了一個簡單的資料集,讓我們能嘗試把細胞給切割出來:


– 你可以在這裡下載壓縮檔,請將它解壓縮後進行運用!



train_img_list <- list()
train_files <- list.files('ISBI/train-volume', pattern = '.jpg', full.names = TRUE)

for (i in 1:length(train_files)) {
  train_img_list[[i]] <- readJPEG(train_files[i])

train.x <- abind(train_img_list, along = 3)
dim(train.x) <- c(512, 512, 1, 30)

par(mar=rep(0,4), mfcol = c(2, 5))
for (i in 1:10) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  img = as.raster(train.x[,,,i])
  rasterImage(img, 0, 0, 1, 1, interpolate=FALSE)


train_label_list <- list()
train_files <- list.files('ISBI/train-labels', pattern = '.jpg', full.names = TRUE)

for (i in 1:length(train_files)) {
  train_label_list[[i]] <- readJPEG(train_files[i])

train.y <- abind(train_label_list, along = 3)
dim(train.y) <- c(512, 512, 1, 30)

train.y[train.y > 0.5] <- 1
train.y[train.y <= 0.5] <- 0

par(mar=rep(0,4), mfcol = c(2, 5))
for (i in 1:10) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  img = as.raster(train.y[,,,i])
  rasterImage(img, 0, 0, 1, 1, interpolate=FALSE)



my_iterator_core = function(batch_size) {
  batch = 0
  batch_per_epoch = dim(train.y)[4]/batch_size
  reset = function() {batch <<- 0}
  iter.next = function() {
    batch <<- batch+1
    if (batch > batch_per_epoch) {return(FALSE)} else {return(TRUE)}
  value = function() {
    idx <- 1:batch_size + (batch - 1) * batch_size
    idx[idx > dim(train.y)[4]] <- sample(1:dim(train.y)[4], sum(idx > dim(train.y)[4]))
    data <- mx.nd.array(array(train.x[,,,idx], dim = c(dim(train.x)[1:3], batch_size)))
    label <- mx.nd.array(array(train.y[,,,idx], dim = c(dim(train.y)[1:3], batch_size)))
    return(list(data = data, label = label))
  return(list(reset = reset, iter.next = iter.next, value = value, batch_size = batch_size, batch = batch))

my_iterator_func <- setRefClass("Custom_Iter",
                                fields = c("iter", "batch_size"),
                                contains = "Rcpp_MXArrayDataIter",
                                methods = list(
                                  initialize = function(iter, batch_size = 100){
                                    .self$iter <- my_iterator_core(batch_size = batch_size)
                                  value = function(){
                                  iter.next = function(){
                                  reset = function(){
my_iter <- my_iterator_func(iter = NULL, batch_size = 2)
## [1] TRUE
batch_data <- my_iter$value()

par(mar=rep(0,4), mfcol = c(2, 2))
for (i in 1:2) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(as.array(batch_data$data)[,,,i]), 0, 0, 1, 1, interpolate=FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(as.array(batch_data$label)[,,,i]), 0, 0, 1, 1, interpolate=FALSE)


data <- mx.symbol.Variable('data')
bn_data <- mx.symbol.BatchNorm(data = data, fix.gamma = TRUE, name = 'bn_data')

conv1 <- mx.symbol.Convolution(data = bn_data, kernel = c(3, 3), pad = c(1, 1), num_filter = 8, no.bias = TRUE, name = 'conv1')
bn1 <- mx.symbol.BatchNorm(data = conv1, fix.gamma = FALSE, name = 'bn1')
relu1 <- mx.symbol.Activation(data = bn1, act_type = "relu", name = 'relu1')
pool1 <- mx.symbol.Pooling(data = relu1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool1')

conv2 <- mx.symbol.Convolution(data = pool1, kernel = c(3, 3), pad = c(1, 1), num_filter = 16, no.bias = TRUE, name = 'conv2')
bn2 <- mx.symbol.BatchNorm(data = conv2, fix.gamma = FALSE, name = 'bn2')
relu2 <- mx.symbol.Activation(data = bn2, act_type = "relu", name = 'relu2')
pool2 <- mx.symbol.Pooling(data = relu2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool2')

conv3 <- mx.symbol.Convolution(data = pool2, kernel = c(3, 3), pad = c(1, 1), num_filter = 32, no.bias = TRUE, name = 'conv3')
bn3 <- mx.symbol.BatchNorm(data = conv3, fix.gamma = FALSE, name = 'bn3')
relu3 <- mx.symbol.Activation(data = bn3, act_type = "relu", name = 'relu3')
pool3 <- mx.symbol.Pooling(data = relu3, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool3')

conv4 <- mx.symbol.Convolution(data = pool3, kernel = c(3, 3), pad = c(1, 1), num_filter = 64, no.bias = TRUE, name = 'conv4')
bn4 <- mx.symbol.BatchNorm(data = conv4, fix.gamma = FALSE, name = 'bn4')
relu4 <- mx.symbol.Activation(data = bn4, act_type = "relu", name = 'relu4')
pool4 <- mx.symbol.Pooling(data = relu4, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool4')

conv5 <- mx.symbol.Convolution(data = pool4, kernel = c(3, 3), pad = c(1, 1), num_filter = 128, no.bias = TRUE, name = 'conv5')
bn5 <- mx.symbol.BatchNorm(data = conv5, fix.gamma = FALSE, name = 'bn5')
relu5 <- mx.symbol.Activation(data = bn5, act_type = "relu", name = 'relu5')

deconv6 <- mx.symbol.Deconvolution(data = relu5, kernel = c(2, 2), stride = c(2, 2), num_filter = 64, name = 'deconv6')
bn6 <- mx.symbol.BatchNorm(data = deconv6, fix.gamma = FALSE, name = 'bn6')
relu6 <- mx.symbol.Activation(data = bn6, act_type = "relu", name = 'relu6')

deconv7 <- mx.symbol.Deconvolution(data = relu6, kernel = c(2, 2), stride = c(2, 2), num_filter = 32, name = 'deconv7')
bn7 <- mx.symbol.BatchNorm(data = deconv7, fix.gamma = FALSE, name = 'bn7')
relu7 <- mx.symbol.Activation(data = bn7, act_type = "relu", name = 'relu7')

deconv8 <- mx.symbol.Deconvolution(data = relu7, kernel = c(2, 2), stride = c(2, 2), num_filter = 16, name = 'deconv8')
bn8 <- mx.symbol.BatchNorm(data = deconv8, fix.gamma = FALSE, name = 'bn8')
relu8 <- mx.symbol.Activation(data = bn8, act_type = "relu", name = 'relu8')

deconv9 <- mx.symbol.Deconvolution(data = relu8, kernel = c(2, 2), stride = c(2, 2), num_filter = 8, name = 'deconv9')
bn9 <- mx.symbol.BatchNorm(data = deconv9, fix.gamma = FALSE, name = 'bn9')
relu9 <- mx.symbol.Activation(data = bn9, act_type = "relu", name = 'relu9')

linear_pred <- mx.symbol.Convolution(data = relu9, kernel = c(1, 1), num_filter = 1, name = 'linear_pred')
logistic_pred <- mx.symbol.Activation(data = linear_pred, act.type = 'sigmoid', name = 'logistic_pred')

# CE loss

label <- mx.symbol.Variable(name = 'label')

eps <- 1e-8
ce_loss_pos <-  mx.symbol.broadcast_mul(mx.symbol.log(logistic_pred + eps), label)
ce_loss_neg <-  mx.symbol.broadcast_mul(mx.symbol.log(1 - logistic_pred + eps), 1 - label)
ce_loss_mean <- 0 - mx.symbol.mean(ce_loss_pos + ce_loss_neg, axis = 0:3)
ce_loss <- mx.symbol.MakeLoss(ce_loss_mean, name = 'ce_loss')
my_optimizer <- mx.opt.create(name = "sgd", learning.rate = 0.05, momentum = 0.9, wd = 1e-4)


my.eval.metric.loss <- mx.metric.custom(
  name = "ce-loss", 
  function(real, pred) {


model_1 <- mx.model.FeedForward.create(symbol = ce_loss, X = my_iter, optimizer = my_optimizer,
                                       eval.metric = my.eval.metric.loss,
                                       array.batch.size = 2, ctx = mx.gpu(), num.round = 20)
test_img_list <- list()
test_files <- list.files('ISBI/test-volume', pattern = '.jpg', full.names = TRUE)

for (i in 1:length(test_files)) {
  test_img_list[[i]] <- readJPEG(test_files[i])

test.x <- abind(test_img_list, along = 3)
dim(test.x) <- c(512, 512, 1, 30)
model_1$symbol <- logistic_pred
pred_y.1 <- predict(model_1, test.x)
pred_y.1[pred_y.1 > 0.5] <- 1
pred_y.1[pred_y.1 <= 0.5] <- 0

par(mar = rep(0, 4), mfcol = c(3, 4))
for (i in 1:4) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  pred_img <- pred_y.1[,,,i]
  pred_img[pred_img == 0] <- '#0000FF80'
  pred_img[pred_img == 1] <- '#FFFFFF00'
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)




data <- mx.symbol.Variable('data')
bn_data <- mx.symbol.BatchNorm(data = data, fix.gamma = TRUE, name = 'bn_data')

conv1 <- mx.symbol.Convolution(data = bn_data, kernel = c(3, 3), pad = c(1, 1), num_filter = 8, no.bias = TRUE, name = 'conv1')
bn1 <- mx.symbol.BatchNorm(data = conv1, fix.gamma = FALSE, name = 'bn1')
relu1 <- mx.symbol.Activation(data = bn1, act_type = "relu", name = 'relu1')
pool1 <- mx.symbol.Pooling(data = relu1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool1')

conv2 <- mx.symbol.Convolution(data = pool1, kernel = c(3, 3), pad = c(1, 1), num_filter = 16, no.bias = TRUE, name = 'conv2')
bn2 <- mx.symbol.BatchNorm(data = conv2, fix.gamma = FALSE, name = 'bn2')
relu2 <- mx.symbol.Activation(data = bn2, act_type = "relu", name = 'relu2')
pool2 <- mx.symbol.Pooling(data = relu2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool2')

conv3 <- mx.symbol.Convolution(data = pool2, kernel = c(3, 3), pad = c(1, 1), num_filter = 32, no.bias = TRUE, name = 'conv3')
bn3 <- mx.symbol.BatchNorm(data = conv3, fix.gamma = FALSE, name = 'bn3')
relu3 <- mx.symbol.Activation(data = bn3, act_type = "relu", name = 'relu3')
pool3 <- mx.symbol.Pooling(data = relu3, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool3')

conv4 <- mx.symbol.Convolution(data = pool3, kernel = c(3, 3), pad = c(1, 1), num_filter = 64, no.bias = TRUE, name = 'conv4')
bn4 <- mx.symbol.BatchNorm(data = conv4, fix.gamma = FALSE, name = 'bn4')
relu4 <- mx.symbol.Activation(data = bn4, act_type = "relu", name = 'relu4')
pool4 <- mx.symbol.Pooling(data = relu4, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool4')

conv5 <- mx.symbol.Convolution(data = pool4, kernel = c(3, 3), pad = c(1, 1), num_filter = 128, no.bias = TRUE, name = 'conv5')
bn5 <- mx.symbol.BatchNorm(data = conv5, fix.gamma = FALSE, name = 'bn5')
relu5 <- mx.symbol.Activation(data = bn5, act_type = "relu", name = 'relu5')

deconv6 <- mx.symbol.Deconvolution(data = relu5, kernel = c(2, 2), stride = c(2, 2), num_filter = 64, name = 'deconv6')
bn6 <- mx.symbol.BatchNorm(data = deconv6, fix.gamma = FALSE, name = 'bn6')
relu6 <- mx.symbol.Activation(data = bn6, act_type = "relu", name = 'relu6')

concat7 <- mx.symbol.concat(data = list(relu6, relu4), num.args = 2, dim = 1, name = 'concat7')
deconv7 <- mx.symbol.Deconvolution(data = concat7, kernel = c(2, 2), stride = c(2, 2), num_filter = 32, name = 'deconv7')
bn7 <- mx.symbol.BatchNorm(data = deconv7, fix.gamma = FALSE, name = 'bn7')
relu7 <- mx.symbol.Activation(data = bn7, act_type = "relu", name = 'relu7')

concat8 <- mx.symbol.concat(data = list(relu7, relu3), num.args = 2, dim = 1, name = 'concat8')
deconv8 <- mx.symbol.Deconvolution(data = concat8, kernel = c(2, 2), stride = c(2, 2), num_filter = 16, name = 'deconv8')
bn8 <- mx.symbol.BatchNorm(data = deconv8, fix.gamma = FALSE, name = 'bn8')
relu8 <- mx.symbol.Activation(data = bn8, act_type = "relu", name = 'relu8')

concat9 <- mx.symbol.concat(data = list(relu8, relu2), num.args = 2, dim = 1, name = 'concat9')
deconv9 <- mx.symbol.Deconvolution(data = concat9, kernel = c(2, 2), stride = c(2, 2), num_filter = 8, name = 'deconv9')
bn9 <- mx.symbol.BatchNorm(data = deconv9, fix.gamma = FALSE, name = 'bn9')
relu9 <- mx.symbol.Activation(data = bn9, act_type = "relu", name = 'relu9')

concat10 <- mx.symbol.concat(data = list(relu9, relu1), num.args = 2, dim = 1, name = 'concat10')
linear_pred <- mx.symbol.Convolution(data = concat10, kernel = c(1, 1), num_filter = 1, name = 'linear_pred')
logistic_pred <- mx.symbol.Activation(data = linear_pred, act.type = 'sigmoid', name = 'logistic_pred')

# CE loss

label <- mx.symbol.Variable(name = 'label')

eps <- 1e-8
ce_loss_pos <-  mx.symbol.broadcast_mul(mx.symbol.log(logistic_pred + eps), label)
ce_loss_neg <-  mx.symbol.broadcast_mul(mx.symbol.log(1 - logistic_pred + eps), 1 - label)
ce_loss_mean <- 0 - mx.symbol.mean(ce_loss_pos + ce_loss_neg, axis = 0:3)
ce_loss <- mx.symbol.MakeLoss(ce_loss_mean, name = 'ce_loss')


my.eval.metric.loss <- mx.metric.custom(
  name = "ce-loss", 
  function(real, pred) {


model_2 <- mx.model.FeedForward.create(symbol = ce_loss, X = my_iter, optimizer = my_optimizer,
                                       eval.metric = my.eval.metric.loss,
                                       array.batch.size = 2, ctx = mx.gpu(), num.round = 20)
model_2$symbol <- logistic_pred
pred_y.2 <- predict(model_2, test.x)
pred_y.2[pred_y.2 > 0.5] <- 1
pred_y.2[pred_y.2 <= 0.5] <- 0
par(mar = rep(0, 4), mfcol = c(3, 4))

for (i in 1:2) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  pred_img <- pred_y.1[,,,i]
  pred_img[pred_img == 0] <- '#0000FF80'
  pred_img[pred_img == 1] <- '#FFFFFF00'
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  pred_img <- pred_y.2[,,,i]
  pred_img[pred_img == 0] <- '#0000FF80'
  pred_img[pred_img == 1] <- '#FFFFFF00'
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
  rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)






data <- mx.symbol.Variable('data')
bn_data <- mx.symbol.BatchNorm(data = data, fix.gamma = TRUE, name = 'bn_data')

conv1 <- mx.symbol.Convolution(data = bn_data, kernel = c(3, 3), pad = c(1, 1), num_filter = 8, no.bias = TRUE, name = 'conv1')
bn1 <- mx.symbol.BatchNorm(data = conv1, fix.gamma = FALSE, name = 'bn1')
relu1 <- mx.symbol.Activation(data = bn1, act_type = "relu", name = 'relu1')
pool1 <- mx.symbol.Pooling(data = relu1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool1')

conv2 <- mx.symbol.Convolution(data = pool1, kernel = c(3, 3), pad = c(1, 1), num_filter = 16, no.bias = TRUE, name = 'conv2')
bn2 <- mx.symbol.BatchNorm(data = conv2, fix.gamma = FALSE, name = 'bn2')
relu2 <- mx.symbol.Activation(data = bn2, act_type = "relu", name = 'relu2')
pool2 <- mx.symbol.Pooling(data = relu2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool2')

conv3 <- mx.symbol.Convolution(data = pool2, kernel = c(3, 3), pad = c(1, 1), num_filter = 32, no.bias = TRUE, name = 'conv3')
bn3 <- mx.symbol.BatchNorm(data = conv3, fix.gamma = FALSE, name = 'bn3')
relu3 <- mx.symbol.Activation(data = bn3, act_type = "relu", name = 'relu3')
pool3 <- mx.symbol.Pooling(data = relu3, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool3')

conv4 <- mx.symbol.Convolution(data = pool3, kernel = c(3, 3), pad = c(1, 1), num_filter = 64, no.bias = TRUE, name = 'conv4')
bn4 <- mx.symbol.BatchNorm(data = conv4, fix.gamma = FALSE, name = 'bn4')
relu4 <- mx.symbol.Activation(data = bn4, act_type = "relu", name = 'relu4')
up4 <- mx.symbol.UpSampling(data = relu4, num_args = 1, scale = 2, sample_type = 'bilinear', num_filter = 64, name = 'up4')
pool4 <- mx.symbol.Pooling(data = relu4, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool4')

conv5 <- mx.symbol.Convolution(data = pool4, kernel = c(3, 3), pad = c(1, 1), num_filter = 128, no.bias = TRUE, name = 'conv5')
bn5 <- mx.symbol.BatchNorm(data = conv5, fix.gamma = FALSE, name = 'bn5')
relu5 <- mx.symbol.Activation(data = bn5, act_type = "relu", name = 'relu5')
up5 <- mx.symbol.UpSampling(data = relu5, num_args = 1, scale = 4, sample_type = 'bilinear', num_filter = 128, name = 'up5')

concat6 <- mx.symbol.concat(data = list(relu3, up4, up5), num.args = 3, dim = 1, name = 'concat6')
deconv6 <- mx.symbol.Deconvolution(data = concat6, kernel = c(2, 2), stride = c(2, 2), num_filter = 16, name = 'deconv6')
bn6 <- mx.symbol.BatchNorm(data = deconv6, fix.gamma = FALSE, name = 'bn6')
relu6 <- mx.symbol.Activation(data = bn6, act_type = "relu", name = 'relu6')

deconv7 <- mx.symbol.Deconvolution(data = relu6, kernel = c(2, 2), stride = c(2, 2), num_filter = 8, name = 'deconv7')
bn7 <- mx.symbol.BatchNorm(data = deconv7, fix.gamma = FALSE, name = 'bn7')
relu7 <- mx.symbol.Activation(data = bn7, act_type = "relu", name = 'relu7')

linear_pred <- mx.symbol.Convolution(data = relu7, kernel = c(1, 1), num_filter = 1, name = 'linear_pred')
logistic_pred <- mx.symbol.Activation(data = linear_pred, act.type = 'sigmoid', name = 'logistic_pred')

# CE loss

label <- mx.symbol.Variable(name = 'label')

eps <- 1e-8
ce_loss_pos <-  mx.symbol.broadcast_mul(mx.symbol.log(logistic_pred + eps), label)
ce_loss_neg <-  mx.symbol.broadcast_mul(mx.symbol.log(1 - logistic_pred + eps), 1 - label)
ce_loss_mean <- 0 - mx.symbol.mean(ce_loss_pos + ce_loss_neg, axis = 0:3)
ce_loss <- mx.symbol.MakeLoss(ce_loss_mean, name = 'ce_loss')


my.eval.metric.loss <- mx.metric.custom(
  name = "ce-loss", 
  function(real, pred) {


model_3 <- mx.model.FeedForward.create(symbol = ce_loss, X = my_iter, optimizer = my_optimizer,
                                       eval.metric = my.eval.metric.loss,
                                       array.batch.size = 2, ctx = mx.gpu(), num.round = 20)
model_3$symbol <- logistic_pred
pred_y.3 <- predict(model_3, test.x)
pred_y.3[pred_y.3 > 0.5] <- 1
pred_y.3[pred_y.3 <= 0.5] <- 0
par(mar = rep(0, 4), mfcol = c(3, 3))

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.1[,,,1]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.2[,,,1]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.3[,,,1]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)

plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)



– 給各位一個思考的時間,你能想像一下要怎樣,你要注意的是預測函數需要「固定」輸出數目,因此如何預測「不固定」數量的預測框其實就是一個大難題!


– 舉例來說我們可以像下圖這樣對每一個候選框進行分類任務,看看這個框是否有包含我們希望預測的物件:





– 模型的概念如下圖所示:



– 他的概念大概是透過簡單的色階變換決定圖像內有多少可分割的物體,而RCNN就是透過這種方式決定候選框有哪些:


– 更麻煩的是,在做預測的時候同樣要把整個步驟重新跑過一遍,可見他還有非常大的改進空間!


– 整個R-CNN最大的問題在於為什麼要做卷積神經網路的運算這麼多次,難道過程不能整合?



  1. 用來提取特徵的卷積神經網絡是作用在整個圖片上,而不是各個候選框上,而且這個卷積網絡也參與訓練過程。

  2. 候選框的搜索是在卷積神經網絡的輸出上,而不是原始圖片上。

  3. 在R-CNN裡,我們將形狀各異的提議區域resize後使用同樣的形狀來進行特徵提取。而在Fast R-CNN中整個過程為了整合進神經網路的推理之中,引入了興趣區域池化層(Region of Interest Pooling,RoI Pooling)來resize每個候選區域。


– 這裡要特別說明一下Fast R-CNN的候選框提取方法,由於相較於R-CNN速度加快了不少,Fast R-CNN預先定義了一系列非常多的候選框(或稱作錨框,anchor box)。

– 與R-CNN最大的不同是對於R-CNN而言,每張圖片的候選框都不一樣,而對於Fast R-CNN而言每張圖片的候選框都一樣。


– 在Fast R-CNN出來後不久,Ross Girshick又與他的同事Kaiming He(何愷明)、Shaoqing Ren以及Jian Sun發表了Faster R-CNN:Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks




– 而在Faster R-CNN中,他把這個過程拆成兩個步驟,首先先使用Region Proposal Networks做相對簡單的任務(判斷anchor box是否存在以及預測偏移量),之後再用後面的網路進行後續的分類以及偏移預測。


– 而YOLO的邏輯在於他不再試圖把框大概的位置找出來,而是直接找尋「物件中心」,再透過「物件中心」的資訊直接預測出框的大小及位置。





  1. 框的中心位置(x座標與y座標)

  2. 框的長度及寬度

  3. 該框是屬於哪一個類別的物件




– 除了多尺度的預測之外,還有一個與YOLO不同的地方在於他不是憑空預測框的資訊,而是透過預先定義好一系列anchor boxes,再去預測目標框與anchor boxex的偏移量。


– 該論文為Feature Pyramid Networks for Object Detection,看看作者群你會發現怎麼又是Ross Girshick以及Kaiming He




– 這個模型是用COCO Detection Challenge中的資料集訓練而成的,抓取物件的標的共有80項,你可以到這裡下載完整的語法及模型,我們可以透過裡面的語法進行預測,多嘗試一下不同的圖片感受他的威力吧!


– 需要注意的是,這並不是原版的YOLO v3模型,這是一個較小的模型,準確度有略為降低,重要的是你是否有辦法從語法中、論文內、網路資源中找到這個最先進的模型是怎樣執行物件識別任務的!


  1. 由於這個模型的預測式包含了3個部分(分別是下採用8倍、16倍以及32倍),因此一般的predict函數沒有辦法使用,所以這個語法包含了一個「my_predict」函數,你必須優先了解它的輸出格式為何!

  2. 接著考慮到我們需要把輸出格式重新編碼成物件框的格式,我們需要讀取anchor_boxs (yolo v3).RData,這裡面記錄了9個anchor box的長寬資訊。有了這些anchor box之後,我們是利用下面這個方式進行解碼:


  1. 接著注意Decode_fun之內,除了第一步解碼的部分,第二步是要把多餘的物件框移除(與高機率框重複太多的),這裡我們移除的依據是使用IoU的大小作為依據,而IoU的定義如下,計算方法交給IoU_function:


  1. 把預測輸出還原回框後,下一步就只是顯示圖片了,函數「Show_img」負責顯示圖片!


– 至於怎麼訓練一個物件識別模型,這還會遇到非常多的難題,並且非常考驗你的程式能力!
