林嶔 (Lin, Chin)
Lesson 12 物體分割與物件識別模型概述
– 另外,我們也學習了反卷積層這種新的網路結構,透過卷積層與反卷積層的配合,我們可以將圖形自由的變換大小!
– 輸入是圖片,這個對我們來說不成問題,但預測目標呢?讓我們先看看我們的成品:
– ISBI challenge提供了一個簡單的資料集,讓我們能嘗試把細胞給切割出來:
– 你可以在這裡下載壓縮檔,請將它解壓縮後進行運用!
library(imager)
library(abind)
library(jpeg)
train_img_list <- list()
train_files <- list.files('ISBI/train-volume', pattern = '.jpg', full.names = TRUE)
for (i in 1:length(train_files)) {
train_img_list[[i]] <- readJPEG(train_files[i])
}
train.x <- abind(train_img_list, along = 3)
dim(train.x) <- c(512, 512, 1, 30)
par(mar=rep(0,4), mfcol = c(2, 5))
for (i in 1:10) {
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
img = as.raster(train.x[,,,i])
rasterImage(img, 0, 0, 1, 1, interpolate=FALSE)
}
train_label_list <- list()
train_files <- list.files('ISBI/train-labels', pattern = '.jpg', full.names = TRUE)
for (i in 1:length(train_files)) {
train_label_list[[i]] <- readJPEG(train_files[i])
}
train.y <- abind(train_label_list, along = 3)
dim(train.y) <- c(512, 512, 1, 30)
train.y[train.y > 0.5] <- 1
train.y[train.y <= 0.5] <- 0
par(mar=rep(0,4), mfcol = c(2, 5))
for (i in 1:10) {
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
img = as.raster(train.y[,,,i])
rasterImage(img, 0, 0, 1, 1, interpolate=FALSE)
}
library(mxnet)
my_iterator_core = function(batch_size) {
batch = 0
batch_per_epoch = dim(train.y)[4]/batch_size
reset = function() {batch <<- 0}
iter.next = function() {
batch <<- batch+1
if (batch > batch_per_epoch) {return(FALSE)} else {return(TRUE)}
}
value = function() {
idx <- 1:batch_size + (batch - 1) * batch_size
idx[idx > dim(train.y)[4]] <- sample(1:dim(train.y)[4], sum(idx > dim(train.y)[4]))
data <- mx.nd.array(array(train.x[,,,idx], dim = c(dim(train.x)[1:3], batch_size)))
label <- mx.nd.array(array(train.y[,,,idx], dim = c(dim(train.y)[1:3], batch_size)))
return(list(data = data, label = label))
}
return(list(reset = reset, iter.next = iter.next, value = value, batch_size = batch_size, batch = batch))
}
my_iterator_func <- setRefClass("Custom_Iter",
fields = c("iter", "batch_size"),
contains = "Rcpp_MXArrayDataIter",
methods = list(
initialize = function(iter, batch_size = 100){
.self$iter <- my_iterator_core(batch_size = batch_size)
.self
},
value = function(){
.self$iter$value()
},
iter.next = function(){
.self$iter$iter.next()
},
reset = function(){
.self$iter$reset()
},
finalize=function(){
}
)
)
## [1] TRUE
batch_data <- my_iter$value()
par(mar=rep(0,4), mfcol = c(2, 2))
for (i in 1:2) {
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(as.array(batch_data$data)[,,,i]), 0, 0, 1, 1, interpolate=FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(as.array(batch_data$label)[,,,i]), 0, 0, 1, 1, interpolate=FALSE)
}
data <- mx.symbol.Variable('data')
bn_data <- mx.symbol.BatchNorm(data = data, fix.gamma = TRUE, name = 'bn_data')
conv1 <- mx.symbol.Convolution(data = bn_data, kernel = c(3, 3), pad = c(1, 1), num_filter = 8, no.bias = TRUE, name = 'conv1')
bn1 <- mx.symbol.BatchNorm(data = conv1, fix.gamma = FALSE, name = 'bn1')
relu1 <- mx.symbol.Activation(data = bn1, act_type = "relu", name = 'relu1')
pool1 <- mx.symbol.Pooling(data = relu1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool1')
conv2 <- mx.symbol.Convolution(data = pool1, kernel = c(3, 3), pad = c(1, 1), num_filter = 16, no.bias = TRUE, name = 'conv2')
bn2 <- mx.symbol.BatchNorm(data = conv2, fix.gamma = FALSE, name = 'bn2')
relu2 <- mx.symbol.Activation(data = bn2, act_type = "relu", name = 'relu2')
pool2 <- mx.symbol.Pooling(data = relu2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool2')
conv3 <- mx.symbol.Convolution(data = pool2, kernel = c(3, 3), pad = c(1, 1), num_filter = 32, no.bias = TRUE, name = 'conv3')
bn3 <- mx.symbol.BatchNorm(data = conv3, fix.gamma = FALSE, name = 'bn3')
relu3 <- mx.symbol.Activation(data = bn3, act_type = "relu", name = 'relu3')
pool3 <- mx.symbol.Pooling(data = relu3, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool3')
conv4 <- mx.symbol.Convolution(data = pool3, kernel = c(3, 3), pad = c(1, 1), num_filter = 64, no.bias = TRUE, name = 'conv4')
bn4 <- mx.symbol.BatchNorm(data = conv4, fix.gamma = FALSE, name = 'bn4')
relu4 <- mx.symbol.Activation(data = bn4, act_type = "relu", name = 'relu4')
pool4 <- mx.symbol.Pooling(data = relu4, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool4')
conv5 <- mx.symbol.Convolution(data = pool4, kernel = c(3, 3), pad = c(1, 1), num_filter = 128, no.bias = TRUE, name = 'conv5')
bn5 <- mx.symbol.BatchNorm(data = conv5, fix.gamma = FALSE, name = 'bn5')
relu5 <- mx.symbol.Activation(data = bn5, act_type = "relu", name = 'relu5')
deconv6 <- mx.symbol.Deconvolution(data = relu5, kernel = c(2, 2), stride = c(2, 2), num_filter = 64, name = 'deconv6')
bn6 <- mx.symbol.BatchNorm(data = deconv6, fix.gamma = FALSE, name = 'bn6')
relu6 <- mx.symbol.Activation(data = bn6, act_type = "relu", name = 'relu6')
deconv7 <- mx.symbol.Deconvolution(data = relu6, kernel = c(2, 2), stride = c(2, 2), num_filter = 32, name = 'deconv7')
bn7 <- mx.symbol.BatchNorm(data = deconv7, fix.gamma = FALSE, name = 'bn7')
relu7 <- mx.symbol.Activation(data = bn7, act_type = "relu", name = 'relu7')
deconv8 <- mx.symbol.Deconvolution(data = relu7, kernel = c(2, 2), stride = c(2, 2), num_filter = 16, name = 'deconv8')
bn8 <- mx.symbol.BatchNorm(data = deconv8, fix.gamma = FALSE, name = 'bn8')
relu8 <- mx.symbol.Activation(data = bn8, act_type = "relu", name = 'relu8')
deconv9 <- mx.symbol.Deconvolution(data = relu8, kernel = c(2, 2), stride = c(2, 2), num_filter = 8, name = 'deconv9')
bn9 <- mx.symbol.BatchNorm(data = deconv9, fix.gamma = FALSE, name = 'bn9')
relu9 <- mx.symbol.Activation(data = bn9, act_type = "relu", name = 'relu9')
linear_pred <- mx.symbol.Convolution(data = relu9, kernel = c(1, 1), num_filter = 1, name = 'linear_pred')
logistic_pred <- mx.symbol.Activation(data = linear_pred, act.type = 'sigmoid', name = 'logistic_pred')
# CE loss
label <- mx.symbol.Variable(name = 'label')
eps <- 1e-8
ce_loss_pos <- mx.symbol.broadcast_mul(mx.symbol.log(logistic_pred + eps), label)
ce_loss_neg <- mx.symbol.broadcast_mul(mx.symbol.log(1 - logistic_pred + eps), 1 - label)
ce_loss_mean <- 0 - mx.symbol.mean(ce_loss_pos + ce_loss_neg, axis = 0:3)
ce_loss <- mx.symbol.MakeLoss(ce_loss_mean, name = 'ce_loss')
my.eval.metric.loss <- mx.metric.custom(
name = "ce-loss",
function(real, pred) {
return(as.array(pred))
}
)
mx.set.seed(0)
model_1 <- mx.model.FeedForward.create(symbol = ce_loss, X = my_iter, optimizer = my_optimizer,
eval.metric = my.eval.metric.loss,
array.batch.size = 2, ctx = mx.gpu(), num.round = 20)
test_img_list <- list()
test_files <- list.files('ISBI/test-volume', pattern = '.jpg', full.names = TRUE)
for (i in 1:length(test_files)) {
test_img_list[[i]] <- readJPEG(test_files[i])
}
test.x <- abind(test_img_list, along = 3)
dim(test.x) <- c(512, 512, 1, 30)
model_1$symbol <- logistic_pred
pred_y.1 <- predict(model_1, test.x)
pred_y.1[pred_y.1 > 0.5] <- 1
pred_y.1[pred_y.1 <= 0.5] <- 0
par(mar = rep(0, 4), mfcol = c(3, 4))
for (i in 1:4) {
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.1[,,,i]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
}
物體分割用自編碼器的結構已經做的非常好了,但我們趕快查一查其他人是怎樣做這件事情的。我們很快的查到了在這個任務上最出名的是Olaf Ronneberger、Philipp Fischer以及Thomas Brox於2015年提出的研究:U-Net: Convolutional Networks for Biomedical Image Segmentation。
下面是Paper中U-Net的結構,他的重點在於除了設計直筒式的網路結構之外,更在網路結構中增加了相對應圖像大小的直通通道:
data <- mx.symbol.Variable('data')
bn_data <- mx.symbol.BatchNorm(data = data, fix.gamma = TRUE, name = 'bn_data')
conv1 <- mx.symbol.Convolution(data = bn_data, kernel = c(3, 3), pad = c(1, 1), num_filter = 8, no.bias = TRUE, name = 'conv1')
bn1 <- mx.symbol.BatchNorm(data = conv1, fix.gamma = FALSE, name = 'bn1')
relu1 <- mx.symbol.Activation(data = bn1, act_type = "relu", name = 'relu1')
pool1 <- mx.symbol.Pooling(data = relu1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool1')
conv2 <- mx.symbol.Convolution(data = pool1, kernel = c(3, 3), pad = c(1, 1), num_filter = 16, no.bias = TRUE, name = 'conv2')
bn2 <- mx.symbol.BatchNorm(data = conv2, fix.gamma = FALSE, name = 'bn2')
relu2 <- mx.symbol.Activation(data = bn2, act_type = "relu", name = 'relu2')
pool2 <- mx.symbol.Pooling(data = relu2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool2')
conv3 <- mx.symbol.Convolution(data = pool2, kernel = c(3, 3), pad = c(1, 1), num_filter = 32, no.bias = TRUE, name = 'conv3')
bn3 <- mx.symbol.BatchNorm(data = conv3, fix.gamma = FALSE, name = 'bn3')
relu3 <- mx.symbol.Activation(data = bn3, act_type = "relu", name = 'relu3')
pool3 <- mx.symbol.Pooling(data = relu3, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool3')
conv4 <- mx.symbol.Convolution(data = pool3, kernel = c(3, 3), pad = c(1, 1), num_filter = 64, no.bias = TRUE, name = 'conv4')
bn4 <- mx.symbol.BatchNorm(data = conv4, fix.gamma = FALSE, name = 'bn4')
relu4 <- mx.symbol.Activation(data = bn4, act_type = "relu", name = 'relu4')
pool4 <- mx.symbol.Pooling(data = relu4, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool4')
conv5 <- mx.symbol.Convolution(data = pool4, kernel = c(3, 3), pad = c(1, 1), num_filter = 128, no.bias = TRUE, name = 'conv5')
bn5 <- mx.symbol.BatchNorm(data = conv5, fix.gamma = FALSE, name = 'bn5')
relu5 <- mx.symbol.Activation(data = bn5, act_type = "relu", name = 'relu5')
deconv6 <- mx.symbol.Deconvolution(data = relu5, kernel = c(2, 2), stride = c(2, 2), num_filter = 64, name = 'deconv6')
bn6 <- mx.symbol.BatchNorm(data = deconv6, fix.gamma = FALSE, name = 'bn6')
relu6 <- mx.symbol.Activation(data = bn6, act_type = "relu", name = 'relu6')
concat7 <- mx.symbol.concat(data = list(relu6, relu4), num.args = 2, dim = 1, name = 'concat7')
deconv7 <- mx.symbol.Deconvolution(data = concat7, kernel = c(2, 2), stride = c(2, 2), num_filter = 32, name = 'deconv7')
bn7 <- mx.symbol.BatchNorm(data = deconv7, fix.gamma = FALSE, name = 'bn7')
relu7 <- mx.symbol.Activation(data = bn7, act_type = "relu", name = 'relu7')
concat8 <- mx.symbol.concat(data = list(relu7, relu3), num.args = 2, dim = 1, name = 'concat8')
deconv8 <- mx.symbol.Deconvolution(data = concat8, kernel = c(2, 2), stride = c(2, 2), num_filter = 16, name = 'deconv8')
bn8 <- mx.symbol.BatchNorm(data = deconv8, fix.gamma = FALSE, name = 'bn8')
relu8 <- mx.symbol.Activation(data = bn8, act_type = "relu", name = 'relu8')
concat9 <- mx.symbol.concat(data = list(relu8, relu2), num.args = 2, dim = 1, name = 'concat9')
deconv9 <- mx.symbol.Deconvolution(data = concat9, kernel = c(2, 2), stride = c(2, 2), num_filter = 8, name = 'deconv9')
bn9 <- mx.symbol.BatchNorm(data = deconv9, fix.gamma = FALSE, name = 'bn9')
relu9 <- mx.symbol.Activation(data = bn9, act_type = "relu", name = 'relu9')
concat10 <- mx.symbol.concat(data = list(relu9, relu1), num.args = 2, dim = 1, name = 'concat10')
linear_pred <- mx.symbol.Convolution(data = concat10, kernel = c(1, 1), num_filter = 1, name = 'linear_pred')
logistic_pred <- mx.symbol.Activation(data = linear_pred, act.type = 'sigmoid', name = 'logistic_pred')
# CE loss
label <- mx.symbol.Variable(name = 'label')
eps <- 1e-8
ce_loss_pos <- mx.symbol.broadcast_mul(mx.symbol.log(logistic_pred + eps), label)
ce_loss_neg <- mx.symbol.broadcast_mul(mx.symbol.log(1 - logistic_pred + eps), 1 - label)
ce_loss_mean <- 0 - mx.symbol.mean(ce_loss_pos + ce_loss_neg, axis = 0:3)
ce_loss <- mx.symbol.MakeLoss(ce_loss_mean, name = 'ce_loss')
my.eval.metric.loss <- mx.metric.custom(
name = "ce-loss",
function(real, pred) {
return(as.array(pred))
}
)
mx.set.seed(0)
model_2 <- mx.model.FeedForward.create(symbol = ce_loss, X = my_iter, optimizer = my_optimizer,
eval.metric = my.eval.metric.loss,
array.batch.size = 2, ctx = mx.gpu(), num.round = 20)
model_2$symbol <- logistic_pred
pred_y.2 <- predict(model_2, test.x)
pred_y.2[pred_y.2 > 0.5] <- 1
pred_y.2[pred_y.2 <= 0.5] <- 0
par(mar = rep(0, 4), mfcol = c(3, 4))
for (i in 1:2) {
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.1[,,,i]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.2[,,,i]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,i]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
}
data <- mx.symbol.Variable('data')
bn_data <- mx.symbol.BatchNorm(data = data, fix.gamma = TRUE, name = 'bn_data')
conv1 <- mx.symbol.Convolution(data = bn_data, kernel = c(3, 3), pad = c(1, 1), num_filter = 8, no.bias = TRUE, name = 'conv1')
bn1 <- mx.symbol.BatchNorm(data = conv1, fix.gamma = FALSE, name = 'bn1')
relu1 <- mx.symbol.Activation(data = bn1, act_type = "relu", name = 'relu1')
pool1 <- mx.symbol.Pooling(data = relu1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool1')
conv2 <- mx.symbol.Convolution(data = pool1, kernel = c(3, 3), pad = c(1, 1), num_filter = 16, no.bias = TRUE, name = 'conv2')
bn2 <- mx.symbol.BatchNorm(data = conv2, fix.gamma = FALSE, name = 'bn2')
relu2 <- mx.symbol.Activation(data = bn2, act_type = "relu", name = 'relu2')
pool2 <- mx.symbol.Pooling(data = relu2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool2')
conv3 <- mx.symbol.Convolution(data = pool2, kernel = c(3, 3), pad = c(1, 1), num_filter = 32, no.bias = TRUE, name = 'conv3')
bn3 <- mx.symbol.BatchNorm(data = conv3, fix.gamma = FALSE, name = 'bn3')
relu3 <- mx.symbol.Activation(data = bn3, act_type = "relu", name = 'relu3')
pool3 <- mx.symbol.Pooling(data = relu3, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool3')
conv4 <- mx.symbol.Convolution(data = pool3, kernel = c(3, 3), pad = c(1, 1), num_filter = 64, no.bias = TRUE, name = 'conv4')
bn4 <- mx.symbol.BatchNorm(data = conv4, fix.gamma = FALSE, name = 'bn4')
relu4 <- mx.symbol.Activation(data = bn4, act_type = "relu", name = 'relu4')
up4 <- mx.symbol.UpSampling(data = relu4, num_args = 1, scale = 2, sample_type = 'bilinear', num_filter = 64, name = 'up4')
pool4 <- mx.symbol.Pooling(data = relu4, pool_type = "max", kernel = c(2, 2), stride = c(2, 2), name = 'pool4')
conv5 <- mx.symbol.Convolution(data = pool4, kernel = c(3, 3), pad = c(1, 1), num_filter = 128, no.bias = TRUE, name = 'conv5')
bn5 <- mx.symbol.BatchNorm(data = conv5, fix.gamma = FALSE, name = 'bn5')
relu5 <- mx.symbol.Activation(data = bn5, act_type = "relu", name = 'relu5')
up5 <- mx.symbol.UpSampling(data = relu5, num_args = 1, scale = 4, sample_type = 'bilinear', num_filter = 128, name = 'up5')
concat6 <- mx.symbol.concat(data = list(relu3, up4, up5), num.args = 3, dim = 1, name = 'concat6')
deconv6 <- mx.symbol.Deconvolution(data = concat6, kernel = c(2, 2), stride = c(2, 2), num_filter = 16, name = 'deconv6')
bn6 <- mx.symbol.BatchNorm(data = deconv6, fix.gamma = FALSE, name = 'bn6')
relu6 <- mx.symbol.Activation(data = bn6, act_type = "relu", name = 'relu6')
deconv7 <- mx.symbol.Deconvolution(data = relu6, kernel = c(2, 2), stride = c(2, 2), num_filter = 8, name = 'deconv7')
bn7 <- mx.symbol.BatchNorm(data = deconv7, fix.gamma = FALSE, name = 'bn7')
relu7 <- mx.symbol.Activation(data = bn7, act_type = "relu", name = 'relu7')
linear_pred <- mx.symbol.Convolution(data = relu7, kernel = c(1, 1), num_filter = 1, name = 'linear_pred')
logistic_pred <- mx.symbol.Activation(data = linear_pred, act.type = 'sigmoid', name = 'logistic_pred')
# CE loss
label <- mx.symbol.Variable(name = 'label')
eps <- 1e-8
ce_loss_pos <- mx.symbol.broadcast_mul(mx.symbol.log(logistic_pred + eps), label)
ce_loss_neg <- mx.symbol.broadcast_mul(mx.symbol.log(1 - logistic_pred + eps), 1 - label)
ce_loss_mean <- 0 - mx.symbol.mean(ce_loss_pos + ce_loss_neg, axis = 0:3)
ce_loss <- mx.symbol.MakeLoss(ce_loss_mean, name = 'ce_loss')
my.eval.metric.loss <- mx.metric.custom(
name = "ce-loss",
function(real, pred) {
return(as.array(pred))
}
)
mx.set.seed(0)
model_3 <- mx.model.FeedForward.create(symbol = ce_loss, X = my_iter, optimizer = my_optimizer,
eval.metric = my.eval.metric.loss,
array.batch.size = 2, ctx = mx.gpu(), num.round = 20)
model_3$symbol <- logistic_pred
pred_y.3 <- predict(model_3, test.x)
pred_y.3[pred_y.3 > 0.5] <- 1
pred_y.3[pred_y.3 <= 0.5] <- 0
par(mar = rep(0, 4), mfcol = c(3, 3))
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.1[,,,1]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.2[,,,1]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
pred_img <- pred_y.3[,,,1]
pred_img[pred_img == 0] <- '#0000FF80'
pred_img[pred_img == 1] <- '#FFFFFF00'
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
rasterImage(as.raster(test.x[,,,1]), 0, 0, 1, 1, interpolate = FALSE)
rasterImage(pred_img, 0, 0, 1, 1, interpolate = FALSE)
– 給各位一個思考的時間,你能想像一下要怎樣,你要注意的是預測函數需要「固定」輸出數目,因此如何預測「不固定」數量的預測框其實就是一個大難題!
– 舉例來說我們可以像下圖這樣對每一個候選框進行分類任務,看看這個框是否有包含我們希望預測的物件:
– 模型的概念如下圖所示:
– 他的概念大概是透過簡單的色階變換決定圖像內有多少可分割的物體,而RCNN就是透過這種方式決定候選框有哪些:
– 更麻煩的是,在做預測的時候同樣要把整個步驟重新跑過一遍,可見他還有非常大的改進空間!
– 整個R-CNN最大的問題在於為什麼要做卷積神經網路的運算這麼多次,難道過程不能整合?
用來提取特徵的卷積神經網絡是作用在整個圖片上,而不是各個候選框上,而且這個卷積網絡也參與訓練過程。
候選框的搜索是在卷積神經網絡的輸出上,而不是原始圖片上。
在R-CNN裡,我們將形狀各異的提議區域resize後使用同樣的形狀來進行特徵提取。而在Fast R-CNN中整個過程為了整合進神經網路的推理之中,引入了興趣區域池化層(Region of Interest Pooling,RoI Pooling)來resize每個候選區域。
– 這裡要特別說明一下Fast R-CNN的候選框提取方法,由於相較於R-CNN速度加快了不少,Fast R-CNN預先定義了一系列非常多的候選框(或稱作錨框,anchor box)。
– 與R-CNN最大的不同是對於R-CNN而言,每張圖片的候選框都不一樣,而對於Fast R-CNN而言每張圖片的候選框都一樣。
– 在Fast R-CNN出來後不久,Ross Girshick又與他的同事Kaiming He(何愷明)、Shaoqing Ren以及Jian Sun發表了Faster R-CNN:Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
– 而在Faster R-CNN中,他把這個過程拆成兩個步驟,首先先使用Region Proposal Networks做相對簡單的任務(判斷anchor box是否存在以及預測偏移量),之後再用後面的網路進行後續的分類以及偏移預測。
在物件識別模型上另一個開創性的工作當屬Joseph Redmon、Santosh Divvala、Ross Girshick以及Ali Farhadi在2016年發表的YOLO:You Only Look Once: Unified, Real-Time Object Detection
這是一個突破性的邏輯轉換,在此之前的所有物件識別模型網路,總是希望能夠先找出框大概的位置在哪,接著把該範圍內的圖像做ROI Pooling,之後再做出預測。
– 而YOLO的邏輯在於他不再試圖把框大概的位置找出來,而是直接找尋「物件中心」,再透過「物件中心」的資訊直接預測出框的大小及位置。
框的中心位置(x座標與y座標)
框的長度及寬度
該框是屬於哪一個類別的物件
YOLO相較於R-CNN系列的模型捨棄了Region Proposal的過程,而僅僅用了非常有限的資訊(中心的資訊)預測目標框的資訊以及物件類別,這當然不可避免的嚴重危害了準確度。因此後面的模型就試圖解決資訊不足的問題!
一個基於YOLO概念所提出的模型:SSD(Single Shot MultiBox Detector),他試圖利用不同尺度的特徵圖做出類似於YOLO事情,成功的把準確度提升至與Faster R-CNN比肩的程度:
– 除了多尺度的預測之外,還有一個與YOLO不同的地方在於他不是憑空預測框的資訊,而是透過預先定義好一系列anchor boxes,再去預測目標框與anchor boxex的偏移量。
– 該論文為Feature Pyramid Networks for Object Detection,看看作者群你會發現怎麼又是Ross Girshick以及Kaiming He
– 這個模型是用COCO Detection Challenge中的資料集訓練而成的,抓取物件的標的共有80項,你可以到這裡下載完整的語法及模型,我們可以透過裡面的語法進行預測,多嘗試一下不同的圖片感受他的威力吧!
– 需要注意的是,這並不是原版的YOLO v3模型,這是一個較小的模型,準確度有略為降低,重要的是你是否有辦法從語法中、論文內、網路資源中找到這個最先進的模型是怎樣執行物件識別任務的!
由於這個模型的預測式包含了3個部分(分別是下採用8倍、16倍以及32倍),因此一般的predict函數沒有辦法使用,所以這個語法包含了一個「my_predict」函數,你必須優先了解它的輸出格式為何!
接著考慮到我們需要把輸出格式重新編碼成物件框的格式,我們需要讀取anchor_boxs (yolo v3).RData,這裡面記錄了9個anchor box的長寬資訊。有了這些anchor box之後,我們是利用下面這個方式進行解碼:
– 至於怎麼訓練一個物件識別模型,這還會遇到非常多的難題,並且非常考驗你的程式能力!
你可能會非常驚嘆於深度學習模型的潛力,還記得第一節課我們提到的目標嗎?我們希望能夠發展一個預測函數從而建立「任意\(x\)」與「任意\(y\)」的映射過程。這在最開始的時候是非常難想像的,但課程至此你應該會開始覺得這真的是有可能實現的了!
我們下一節課再帶領大家一步一步實現一個簡單任務:皮卡丘識別任務!由於下一節課的過程真的非常困難,我建議大家先下載這個Github上的範例:MxNetR-YOLO,試著先把整個皮卡丘識別的過程完整的跑過一輪,並了解其中每一行語法的意義: