深度學習理論與實務

林嶔 (Lin, Chin)

Lesson 13 物件識別模型實驗

第一節:訓練一個物件識別模型(1)

– 對於圖像分類模型,大多是透過Pooling把這個特徵圖縮減成1×1×n的特徵,並對其做Softmax regression的輸出

– 對於YOLO model,他是對每一個Gird都要有一系列輸出,因此他就是不做Pooling直接再用1×1的卷積核進行運算,從而輸出7×7×m的輸出,最終我們再對7×7×m的部分做解碼(Decode)。

F01

第一節:訓練一個物件識別模型(2)

F02

– 所以假設最終的特徵圖大小為7×7×n,那在皮卡丘識別任務中YOLO結構的輸出將會是7×7×6。

  1. 可信度:這是一個必須介於0至1的數值,所以需要經過Sigmoid轉換後方能輸出

  2. y座標(row)的「相對」位置:這也是一個必須介於0至1的數值

  3. x座標(column)的「相對」位置:這也是一個必須介於0至1的數值

  4. 寬度(x軸):這是一個必須大於0的數值,經過指數轉換可以把任意數轉換成符合需求,但常規的做法是把原始值經過對數轉換,而輸出值是不做任何處理的

  5. 高度(y軸):這個部分與寬度相同

  6. 類別1的可能性:在YOLO v1中,是將類別1至類別N的可能性一起做Softmax,但在YOLO v3中將這個部分全部改成Sigmoid輸出,以允許多重標籤的物件

  1. 為什麼要使用「相對」位置而非「絕對」位置?

  2. 為什麼在高度/寬度的輸出不是使用ReLU或是指數轉換,而是將原始值做對數處理後而輸出值保持原樣?

第一節:訓練一個物件識別模型(3)

– 讓我們先從這裡下載一個做圖像識別的MobileNet v2模型,我們先試試它的圖像分類效果:

library(mxnet)
library(imager)
library(jpeg)
library(OpenImageR)
library(magrittr)

#Load a pre-training residual network model

mobile_model <- mx.model.load("model/mobilev2", 0)
label_names <- readLines("model/synset.txt", encoding = "UTF-8")

#Define image processing functions

preproc.image <- function(im, width = 224, height = 224, method = 'bilinear') {
  resized <- resizeImage(image = im, width = width, height = height, method = method)
  resized <- as.array(resized) * 255
  resized[,,1] <- resized[,,1] - 123.68
  resized[,,2] <- resized[,,2] - 116.78
  resized[,,3] <- resized[,,3] - 103.94
  # Reshape to format needed by mxnet (width, height, channel, num)
  dim(resized) <- c(width, height, 3, 1)
  return(resized)
}

#Read image # Display image

img <- readJPEG("image/4.jpg")

#Pre-processing

normed <- preproc.image(img)

#Display image

par(mar = rep(0, 4))
plot(NA, xlim = c(0.04, 0.96), ylim = c(0.04, 0.96), xaxt = "n", yaxt = "n", bty = "n")
rasterImage(img, 0, 0, 1, 1, interpolate = FALSE)

#Predict

prob <- predict(mobile_model, X = normed, ctx = mx.gpu())
cat(paste0(label_names[which.max(prob)], ': ', formatC(max(prob), 4, format = 'f'), '\n'))
## n02497673 Madagascar cat, ring-tailed lemur, Lemur catta: 1.0000

第一節:訓練一個物件識別模型(4)

– 這裡的函數「DWCONV_function」以及「CONV_function」都只是在原先的基礎上再增加卷積層,關鍵是函數「YOLO_map_function」的部分。

– 根據剛剛的定義你會發現除了高度/寬度的輸出(第4項與第5項)不需要經過Sigmoid轉換之外,剩下都需要,所以我們先用函數「mx.symbol.SliceChannel」把他們拆開,最後再各自處理過後再用函數「mx.symbol.concat」合併。

# Libraries

library(mxnet)
library(magrittr)

## Define the model architecture
## Use pre-trained model and fine tuning

# Load MobileNet v2

Pre_Trained_model <- mx.model.load('model/mobilev2', 0)

# Get the internal output

Mobile_symbol <- Pre_Trained_model$symbol

Mobile_All_layer <- Mobile_symbol$get.internals()

basic_out <- which(Mobile_All_layer$outputs == 'conv6_3_linear_bn_output') %>% Mobile_All_layer$get.output()

# mx.symbol.infer.shape(basic_out, data = c(256, 256, 3, 7))$out.shapes
# conv6_3_linear_bn_output out shape = 8 8 320 n (if input shape = 256 256 3 n)

# Convolution layer for specific mission and training new parameters

# 1. Additional some architecture for better learning

DWCONV_function <- function (indata, num_filters = 256, Inverse_coef = 6, residual = TRUE, name = 'lvl1', stage = 1) {
  
  expend_conv <- mx.symbol.Convolution(data = indata, kernel = c(1, 1), stride = c(1, 1), pad = c(0, 0),
                                       no.bias = TRUE, num.filter = num_filters * Inverse_coef,
                                       name = paste0(name, '_', stage, '_expend'))
  expend_bn <- mx.symbol.BatchNorm(data = expend_conv, fix_gamma = FALSE, name = paste0(name, '_', stage, '_expend_bn'))
  expend_relu <- mx.symbol.LeakyReLU(data = expend_bn, act.type = 'leaky', slope = 0.1, name = paste0(name, '_', stage, '_expend_relu'))
  
  dwise_conv <- mx.symbol.Convolution(data = expend_relu, kernel = c(3, 3), stride = c(1, 1), pad = c(1, 1),
                                      no.bias = TRUE, num.filter = num_filters * Inverse_coef, num.group = num_filters * Inverse_coef,
                                      name = paste0(name, '_', stage, '_dwise'))
  dwise_bn <- mx.symbol.BatchNorm(data = dwise_conv, fix_gamma = FALSE, name = paste0(name, '_', stage, '_dwise_bn'))
  dwise_relu <- mx.symbol.LeakyReLU(data = dwise_bn, act.type = 'leaky', slope = 0.1, name = paste0(name, '_', stage, '_dwise_relu'))
  
  restore_conv <- mx.symbol.Convolution(data = dwise_relu, kernel = c(1, 1), stride = c(1, 1), pad = c(0, 0),
                                        no.bias = TRUE, num.filter = num_filters,
                                        name = paste0(name, '_', stage, '_restore'))
  restore_bn <- mx.symbol.BatchNorm(data = restore_conv, fix_gamma = FALSE, name = paste0(name, '_', stage, '_restore_bn'))
  
  if (residual) {
    
    block <- mx.symbol.broadcast_plus(lhs = indata, rhs = restore_bn, name = paste0(name, '_', stage, '_block'))
    return(block)
    
  } else {
    
    restore_relu <- mx.symbol.LeakyReLU(data = restore_bn, act.type = 'leaky', slope = 0.1, name = paste0(name, '_', stage, '_restore_relu'))
    return(restore_relu)
    
  }
  
}

CONV_function <- function (indata, num_filters = 256, name = 'lvl1', stage = 1) {
  
  conv <- mx.symbol.Convolution(data = indata, kernel = c(1, 1), stride = c(1, 1), pad = c(0, 0),
                                no.bias = TRUE, num.filter = num_filters,
                                name = paste0(name, '_', stage, '_conv'))
  bn <- mx.symbol.BatchNorm(data = conv, fix_gamma = FALSE, name = paste0(name, '_', stage, '_bn'))
  relu <- mx.symbol.Activation(data = bn, act.type = 'relu', name = paste0(name, '_', stage, '_relu'))
  
  return(relu)
  
}

YOLO_map_function <- function (indata, final_map = 6, num_box = 1, drop = 0.2, name = 'lvl1') {
  
  dp <- mx.symbol.Dropout(data = indata, p = drop, name = paste0(name, '_drop'))
  
  conv <- mx.symbol.Convolution(data = dp, kernel = c(1, 1), stride = c(1, 1), pad = c(0, 0),
                                no.bias = FALSE, num.filter = final_map, name = paste0(name, '_linearmap'))
  
  inter_split <- mx.symbol.SliceChannel(data = conv, num_outputs = final_map,
                                        axis = 1, squeeze_axis = FALSE, name = paste0(name, "_inter_split"))
  
  new_list <- list()
  
  for (k in 1:final_map) {
    if (!(k %% num_box) %in% c(4:5)) {
      new_list[[k]] <- mx.symbol.Activation(inter_split[[k]], act.type = 'sigmoid', name = paste0(name, "_yolomap_", k))
    } else {
      new_list[[k]] <- inter_split[[k]]
    }
  }
  
  yolomap <- mx.symbol.concat(data = new_list, num.args = final_map, dim = 1, name = paste0(name, "_yolomap"))
  
  return(yolomap)
  
}

yolo_conv_1 <- DWCONV_function(indata = basic_out, num_filters = 320, Inverse_coef = 3, residual = TRUE, name = 'yolo', stage = 1)
yolo_conv_2 <- DWCONV_function(indata = yolo_conv_1, num_filters = 320, Inverse_coef = 3, residual = TRUE, name = 'yolo', stage = 2)
yolo_conv_3 <- CONV_function(indata = yolo_conv_2, num_filters = 320, name = 'yolo', stage = 3)

yolomap <- YOLO_map_function(indata = yolo_conv_3, final_map = 6, drop = 0.2, name = 'final')

第一節:訓練一個物件識別模型(5)

F03

  1. 第一個部分是對於y座標與x座標的損失

  2. 第二個部分是對於寬度與高度的損失

  3. 第三個部分是可信度該找出而答錯的損失

  4. 第四個部分是可信度該略過而答錯的損失

  5. 第五個部分是類別n的可能性的損失

– 另外,他還有個\(\lambda_{coord}\)以及\(\lambda_{noobj}\)兩個參數,根據YOLO v1 paper的建議分別被定是5以及0.5,這是因為物件識別是一個極度類別不平衡的任務,所以給予正向樣本較高的權重。

第一節:訓練一個物件識別模型(6)

– 當然我們對y座標與x座標的部分是沒有辦法做修正的。

# 2. Custom loss function

MSE_loss_function <- function (indata, inlabel, obj, lambda) {
  
  diff_pred_label <- mx.symbol.broadcast_minus(lhs = indata, rhs = inlabel)
  square_diff_pred_label <- mx.symbol.square(data = diff_pred_label)
  obj_square_diff_loss <- mx.symbol.broadcast_mul(lhs = obj, rhs = square_diff_pred_label)
  MSE_loss <- mx.symbol.mean(data = obj_square_diff_loss, axis = 0:3, keepdims = FALSE)
  
  return(MSE_loss * lambda)
  
}

CE_loss_function <- function (indata, inlabel, obj, lambda, eps = 1e-4) {
  
  log_pred_1 <- mx.symbol.log(data = indata + eps)
  log_pred_2 <- mx.symbol.log(data = 1 - indata + eps)
  multiple_log_pred_label_1 <- mx.symbol.broadcast_mul(lhs = log_pred_1, rhs = inlabel)
  multiple_log_pred_label_2 <- mx.symbol.broadcast_mul(lhs = log_pred_2, rhs = 1 - inlabel)
  obj_weighted_loss <- mx.symbol.broadcast_mul(lhs = obj, rhs = multiple_log_pred_label_1 + multiple_log_pred_label_2)
  average_CE_loss <- mx.symbol.mean(data = obj_weighted_loss, axis = 0:3, keepdims = FALSE)
  CE_loss <- 0 - average_CE_loss * lambda
  
  return(CE_loss)
  
}

YOLO_loss_function <- function (indata, inlabel, final_map = 6, num_box = 1, lambda = 10, weight_classification = 0.2, name = 'yolo') {
  
  num_feature <- final_map/num_box
  
  my_loss <- 0
  
  yolomap_split <- mx.symbol.SliceChannel(data = indata, num_outputs = final_map, 
                                          axis = 1, squeeze_axis = FALSE, name = paste(name, '_yolomap_split'))
  
  label_split <- mx.symbol.SliceChannel(data = inlabel, num_outputs = final_map, 
                                        axis = 1, squeeze_axis = FALSE, name = paste(name, '_label_split'))
  
  for (j in 1:num_box) {
    for (k in 1:num_feature) {
      if (k %in% 1:5) {weight <- 1} else {weight <- weight_classification}
      if (!k %in% c(2:5)) {
        if (k == 1) {
          my_loss <- my_loss + CE_loss_function(indata = yolomap_split[[(j-1)*num_feature+k]],
                                                inlabel = label_split[[(j-1)*num_feature+k]],
                                                obj = label_split[[(j-1)*num_feature+1]],
                                                lambda = lambda * weight,
                                                eps = 1e-4)
          my_loss <- my_loss + CE_loss_function(indata = yolomap_split[[(j-1)*num_feature+k]],
                                                inlabel = label_split[[(j-1)*num_feature+k]],
                                                obj = 1 - label_split[[(j-1)*num_feature+1]],
                                                lambda = 1,
                                                eps = 1e-4)
        } else {
          my_loss <- my_loss + CE_loss_function(indata = yolomap_split[[(j-1)*num_feature+k]],
                                                inlabel = label_split[[(j-1)*num_feature+k]],
                                                obj = label_split[[(j-1)*num_feature+1]],
                                                lambda = lambda * weight,
                                                eps = 1e-4)
        }
      } else {
        my_loss <- my_loss + MSE_loss_function(indata = yolomap_split[[(j-1)*num_feature+k]],
                                               inlabel = label_split[[(j-1)*num_feature+k]],
                                               obj = label_split[[(j-1)*num_feature+1]],
                                               lambda = lambda * weight)
      }
    }
  }
  
  return(my_loss)
  
}

label <- mx.symbol.Variable(name = "label")

yolo_loss <- YOLO_loss_function(indata = yolomap, inlabel = label, final_map = 6, num_box = 1, lambda = 10, weight_classification = 0.2, name = 'yolo')

final_yolo_loss <- mx.symbol.MakeLoss(data = yolo_loss)

第一節:訓練一個物件識別模型(7)

– 先讓我們從這裡下載所需要的檔案

– 如果你想弄懂怎樣從JPG檔案變成我們現在需要的格式,請你參考MxNetR-YOLO/pikachu/code/1. Processing data的過程

# Libraries

library(OpenImageR)
library(jpeg)
library(mxnet)
library(imager)

# Load data (Training set)

load('data/train_img_list.RData')
load('data/train_box_info.RData')

head(train_box_info)
##   obj_name  col_left col_right   row_bot   row_top prob img_id
## 1  pikachu 0.6267570 0.7256063 0.4658268 0.3013253    1      1
## 2  pikachu 0.5070340 0.5993253 0.4963081 0.3682864    1      2
## 3  pikachu 0.5904536 0.6917713 0.5608004 0.3917792    1      3
## 4  pikachu 0.5722729 0.6571676 0.5396996 0.4144326    1      4
## 5  pikachu 0.3893552 0.5016431 0.4850163 0.3470082    1      5
## 6  pikachu 0.3819232 0.4916472 0.5595707 0.4213461    1      6
head(train_img_list[[1]], 20)
##  [1] ff d8 ff e0 00 10 4a 46 49 46 00 01 01 00 00 01 00 01 00 00
Show_img <- function (img, box_info = NULL, show_prob = FALSE, col_bbox = '#FFFFFF00', col_label = '#FF0000FF',
                      show_grid = FALSE, n.grid = 8, col_grid = '#0000FFFF') {
  
  require(imager)
  
  par(mar = rep(0, 4))
  plot(NA, xlim = c(0.04, 0.96), ylim = c(0.96, 0.04), xaxt = "n", yaxt = "n", bty = "n")
  img <- (img - min(img))/(max(img) - min(img))
  img <- as.raster(img)
  rasterImage(img, 0, 1, 1, 0, interpolate=FALSE)
  
  box_info[box_info[,2] < 0, 2] <- 0
  box_info[box_info[,3] > 1, 3] <- 1
  box_info[box_info[,4] > 1, 4] <- 1
  box_info[box_info[,5] < 0, 5] <- 0
  
  if (!is.null(box_info)) {
    for (i in 1:nrow(box_info)) {
      if (is.null(box_info$col[i])) {COL_LABEL <- col_label} else {COL_LABEL <- box_info$col[i]}
      if (show_prob) {
        TEXT <- paste0(box_info[i,1], ' (', formatC(box_info[i,6]*100, 0, format = 'f'), '%)')
      } else {
        TEXT <- box_info[i,1]
      }
      size <- max(box_info[i,3] - box_info[i,2], 0.05)
      rect(xleft = box_info[i,2], xright = box_info[i,2] + 0.04*sqrt(size)*nchar(TEXT),
           ybottom = box_info[i,5] + 0.08*sqrt(size), ytop = box_info[i,5],
           col = COL_LABEL, border = COL_LABEL, lwd = 0)
      text(x = box_info[i,2] + 0.02*sqrt(size) * nchar(TEXT),
           y = box_info[i,5] + 0.04*sqrt(size),
           labels = TEXT,
           col = 'white', cex = 1.5*sqrt(size), font = 2)
      rect(xleft = box_info[i,2], xright = box_info[i,3],
           ybottom = box_info[i,4], ytop = box_info[i,5],
           col = col_bbox, border = COL_LABEL, lwd = 5*sqrt(size))
    }
  }
  
  if (show_grid) {
    for (i in 1:n.grid) {
      if (i != n.grid) {
        abline(a = i/n.grid, b = 0, col = col_grid, lwd = 12/n.grid)
        abline(v = i/n.grid, col = col_grid, lwd = 12/n.grid)
      }
      for (j in 1:n.grid) {
        text((i-0.5)/n.grid, (j-0.5)/n.grid, paste0('(', j, ', ', i, ')'), col = col_grid, cex = 8/n.grid)
      }
    }
  }
  
}

img_id <- 1

resized_img <- readJPEG(train_img_list[[img_id]])
sub_BOX_INFOS <- train_box_info[train_box_info$img_id %in% img_id,]

Show_img(img = resized_img, box_info = sub_BOX_INFOS, show_grid = FALSE)

第一節:訓練一個物件識別模型(8)

– 這裡還需要一個函數「IoU_function」,因為在未來做輸出預測的時候很有可能會產生多個大範圍重疊的框框,所以我們需要用到非極大值抑制(Non-Maximum Suppression, NMS)來移除多餘的框:

F04

# Custom function

# Note: this function made some efforts to keep the coordinate system consistent.
# The major challenge is that 'bottomleft' is the original point of "plot" function,
# but the original point of image is 'topleft'

IoU_function <- function (label, pred) {
  
  overlap_width <- min(label[,2], pred[,2]) - max(label[,1], pred[,1])
  overlap_height <- min(label[,3], pred[,3]) - max(label[,4], pred[,4])
  
  if (overlap_width > 0 & overlap_height > 0) {
    
    pred_size <- (pred[,2]-pred[,1])*(pred[,3]-pred[,4])
    label_size <- (label[,2]-label[,1])*(label[,3]-label[,4])
    overlap_size <- overlap_width * overlap_height
    
    return(overlap_size/(pred_size + label_size - overlap_size))
    
  } else {
    
    return(0)
    
  }
  
}

Encode_fun <- function (box_info, n.grid = 8, eps = 1e-8, obj_name = 'pikachu') {
  
  img_ids <- unique(box_info$img_id)
  num_pred <- 5 + length(obj_name)
  out_array <- array(0, dim = c(n.grid, n.grid, num_pred, length(img_ids)))
  
  for (j in 1:length(img_ids)) {
    
    sub_box_info <- box_info[box_info$img_id == img_ids[j],]
    
    for (i in 1:nrow(sub_box_info)) {
      
      bbox_center_row <- (sub_box_info[i,4] + sub_box_info[i,5]) / 2 * n.grid
      bbox_center_col <- (sub_box_info[i,2] + sub_box_info[i,3]) / 2 * n.grid
      bbox_width <- (sub_box_info[i,3] - sub_box_info[i,2]) * n.grid
      bbox_height <- (sub_box_info[i,4] - sub_box_info[i,5]) * n.grid
      
      center_row <- ceiling(bbox_center_row)
      center_col <- ceiling(bbox_center_col)
      
      row_related_pos <- bbox_center_row %% 1
      row_related_pos[row_related_pos == 0] <- 1
      col_related_pos <- bbox_center_col %% 1
      col_related_pos[col_related_pos == 0] <- 1
      
      out_array[center_row,center_col,1,j] <- 1
      out_array[center_row,center_col,2,j] <- row_related_pos
      out_array[center_row,center_col,3,j] <- col_related_pos
      out_array[center_row,center_col,4,j] <- log(bbox_width + eps)
      out_array[center_row,center_col,5,j] <- log(bbox_height + eps)
      out_array[center_row,center_col,5+which(obj_name %in% sub_box_info$obj_name[i]),j] <- 1 
      
    }
    
  }
  
  return(out_array)
  
}

Decode_fun <- function (encode_array, cut_prob = 0.5, cut_overlap = 0.3,
                        obj_name = 'pikachu',
                        obj_col = '#FF0000FF',
                        img_id_list = NULL) {
  
  num_img <- dim(encode_array)[4]
  num_feature <- length(obj_name) + 5
  pos_start <- (0:(dim(encode_array)[3]/num_feature-1)*num_feature)
  
  box_info <- NULL
  
  # Decoding
  
  for (j in 1:num_img) {
    
    sub_box_info <- NULL
    
    for (i in 1:length(pos_start)) {
      
      sub_encode_array <- as.array(encode_array)[,,pos_start[i]+1:num_feature,j]
      
      pos_over_cut <- which(sub_encode_array[,,1] >= cut_prob)
      
      if (length(pos_over_cut) >= 1) {
        
        pos_over_cut_row <- pos_over_cut %% dim(sub_encode_array)[1]
        pos_over_cut_row[pos_over_cut_row == 0] <- dim(sub_encode_array)[1]
        pos_over_cut_col <- ceiling(pos_over_cut/dim(sub_encode_array)[1])
        
        for (l in 1:length(pos_over_cut)) {
          
          encode_vec <- sub_encode_array[pos_over_cut_row[l],pos_over_cut_col[l],]
          
          if (encode_vec[2] < 0) {encode_vec[2] <- 0}
          if (encode_vec[2] > 1) {encode_vec[2] <- 1}
          if (encode_vec[3] < 0) {encode_vec[3] <- 0}
          if (encode_vec[3] > 1) {encode_vec[3] <- 1}
          
          center_row <- (encode_vec[2] + (pos_over_cut_row[l] - 1))/dim(sub_encode_array)[1]
          center_col <- (encode_vec[3] + (pos_over_cut_col[l] - 1))/dim(sub_encode_array)[2]
          width <- exp(encode_vec[4])/dim(sub_encode_array)[2]
          height <- exp(encode_vec[5])/dim(sub_encode_array)[1]
          
          if (is.null(img_id_list)) {new_img_id <- j} else {new_img_id <- img_id_list[j]}
          
          new_box_info <- data.frame(obj_name = obj_name[which.max(encode_vec[-c(1:5)])],
                                     col_left = center_col-width/2,
                                     col_right = center_col+width/2,
                                     row_bot = center_row+height/2,
                                     row_top = center_row-height/2,
                                     prob = encode_vec[1],
                                     img_id = new_img_id,
                                     col = obj_col[which.max(encode_vec[-c(1:5)])],
                                     stringsAsFactors = FALSE)
          
          sub_box_info <- rbind(sub_box_info, new_box_info)
          
        }
        
      }
      
    }
    
    if (!is.null(sub_box_info)) {
      
      # Remove overlapping
      
      sub_box_info <- sub_box_info[order(sub_box_info$prob, decreasing = TRUE),]
      
      for (obj in unique(sub_box_info$obj_name)) {
        
        obj_sub_box_info <- sub_box_info[sub_box_info$obj_name == obj,]
        
        if (nrow(obj_sub_box_info) == 1) {
          
          box_info <- rbind(box_info, obj_sub_box_info)
          
        } else {
          
          overlap_seq <- NULL
          
          for (m in 2:nrow(obj_sub_box_info)) {
            
            for (n in 1:(m-1)) {
              
              if (!n %in% overlap_seq) {
                
                overlap_prob <- IoU_function(label = obj_sub_box_info[m,2:5], pred = obj_sub_box_info[n,2:5])
                
                overlap_width <- min(obj_sub_box_info[m,3], obj_sub_box_info[n,3]) - max(obj_sub_box_info[m,2], obj_sub_box_info[n,2])
                overlap_height <- min(obj_sub_box_info[m,4], obj_sub_box_info[n,4]) - max(obj_sub_box_info[m,5], obj_sub_box_info[n,5])
                
                if (overlap_prob >= cut_overlap) {
                  
                  overlap_seq <- c(overlap_seq, m)
                  
                }
                
              }
              
            }
            
          }
          
          if (!is.null(overlap_seq)) {
            
            obj_sub_box_info <- obj_sub_box_info[-overlap_seq,]
            
          }
          
          box_info <- rbind(box_info, obj_sub_box_info)
          
        }
        
      }
      
    }
    
  }
  
  return(box_info)
  
}

第一節:訓練一個物件識別模型(9)

# Test Encode & Decode function

img_id <- 1

resized_img <- readJPEG(train_img_list[[img_id]])

sub_BOX_INFOS <- train_box_info[train_box_info$img_id %in% img_id,]

Encode_label <- Encode_fun(box_info = sub_BOX_INFOS)
restore_BOX_INFOS <- Decode_fun(encode_array = Encode_label)

Show_img(img = resized_img, box_info = restore_BOX_INFOS, show_grid = TRUE)

Encode_label
## , , 1, 1
## 
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    0    0    0    0    0    0    0    0
## [2,]    0    0    0    0    0    0    0    0
## [3,]    0    0    0    0    0    0    0    0
## [4,]    0    0    0    0    0    1    0    0
## [5,]    0    0    0    0    0    0    0    0
## [6,]    0    0    0    0    0    0    0    0
## [7,]    0    0    0    0    0    0    0    0
## [8,]    0    0    0    0    0    0    0    0
## 
## , , 2, 1
## 
##      [,1] [,2] [,3] [,4] [,5]       [,6] [,7] [,8]
## [1,]    0    0    0    0    0 0.00000000    0    0
## [2,]    0    0    0    0    0 0.00000000    0    0
## [3,]    0    0    0    0    0 0.00000000    0    0
## [4,]    0    0    0    0    0 0.06860864    0    0
## [5,]    0    0    0    0    0 0.00000000    0    0
## [6,]    0    0    0    0    0 0.00000000    0    0
## [7,]    0    0    0    0    0 0.00000000    0    0
## [8,]    0    0    0    0    0 0.00000000    0    0
## 
## , , 3, 1
## 
##      [,1] [,2] [,3] [,4] [,5]      [,6] [,7] [,8]
## [1,]    0    0    0    0    0 0.0000000    0    0
## [2,]    0    0    0    0    0 0.0000000    0    0
## [3,]    0    0    0    0    0 0.0000000    0    0
## [4,]    0    0    0    0    0 0.4094529    0    0
## [5,]    0    0    0    0    0 0.0000000    0    0
## [6,]    0    0    0    0    0 0.0000000    0    0
## [7,]    0    0    0    0    0 0.0000000    0    0
## [8,]    0    0    0    0    0 0.0000000    0    0
## 
## , , 4, 1
## 
##      [,1] [,2] [,3] [,4] [,5]       [,6] [,7] [,8]
## [1,]    0    0    0    0    0  0.0000000    0    0
## [2,]    0    0    0    0    0  0.0000000    0    0
## [3,]    0    0    0    0    0  0.0000000    0    0
## [4,]    0    0    0    0    0 -0.2347173    0    0
## [5,]    0    0    0    0    0  0.0000000    0    0
## [6,]    0    0    0    0    0  0.0000000    0    0
## [7,]    0    0    0    0    0  0.0000000    0    0
## [8,]    0    0    0    0    0  0.0000000    0    0
## 
## , , 5, 1
## 
##      [,1] [,2] [,3] [,4] [,5]      [,6] [,7] [,8]
## [1,]    0    0    0    0    0 0.0000000    0    0
## [2,]    0    0    0    0    0 0.0000000    0    0
## [3,]    0    0    0    0    0 0.0000000    0    0
## [4,]    0    0    0    0    0 0.2746061    0    0
## [5,]    0    0    0    0    0 0.0000000    0    0
## [6,]    0    0    0    0    0 0.0000000    0    0
## [7,]    0    0    0    0    0 0.0000000    0    0
## [8,]    0    0    0    0    0 0.0000000    0    0
## 
## , , 6, 1
## 
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    0    0    0    0    0    0    0    0
## [2,]    0    0    0    0    0    0    0    0
## [3,]    0    0    0    0    0    0    0    0
## [4,]    0    0    0    0    0    1    0    0
## [5,]    0    0    0    0    0    0    0    0
## [6,]    0    0    0    0    0    0    0    0
## [7,]    0    0    0    0    0    0    0    0
## [8,]    0    0    0    0    0    0    0    0

第一節:訓練一個物件識別模型(10)

# Build an iterator

train_ids <- unique(train_box_info[,'img_id'])

my_iterator_core <- function (batch_size, img_size = 256, resize_method = 'bilinear',
                              aug_crop = TRUE, aug_flip = TRUE) {
  
  batch <-  0
  batch_per_epoch <- floor(length(train_ids)/batch_size)
  
  reset <- function() {batch <<- 0}
  
  iter.next <- function() {
    
    batch <<- batch + 1
    if (batch > batch_per_epoch) {return(FALSE)} else {return(TRUE)}
    
  }
  
  value <- function() {
    
    idx <- 1:batch_size + (batch - 1) * batch_size
    idx[idx > length(train_ids)] <- sample(1:(idx[1]-1), sum(idx > length(train_ids)))
    idx <- sort(idx)
    
    batch.box_info <- train_box_info[train_box_info$img_id %in% train_ids[idx],]
    
    #t0 <- Sys.time()
    
    img_array <- array(0, dim = c(img_size, img_size, 3, batch_size))
    
    for (i in 1:batch_size) {
      
      read_img <- readJPEG(train_img_list[[train_ids[idx[i]]]])
      img_array[,,,i] <- preproc.image(read_img, width = img_size, height = img_size, method = resize_method)
      
    }
    
    if (aug_flip) {
      
      original_dim <- dim(img_array)
      
      if (sample(0:1, 1) == 1) {
        
        img_array <- img_array[,original_dim[2]:1,,]
        flip_left <- 1 - batch.box_info[,2]
        flip_right <- 1 - batch.box_info[,3]
        batch.box_info[,2] <- flip_right
        batch.box_info[,3] <- flip_left
        dim(img_array) <- original_dim
        
      }
      
    }
    
    if (aug_crop) {
      
      revised_dim <- dim(img_array)
      revised_dim[1:2] <- img_size - 32
      
      random.row <- sample(0:32, 1)
      random.col <- sample(0:32, 1)
      
      img_array <- img_array[random.row+1:(img_size-32),random.col+1:(img_size-32),,]
      dim(img_array) <- revised_dim
      
      batch.box_info[,4:5] <- batch.box_info[,4:5] * img_size / (img_size - 32) - random.row/256
      batch.box_info[,2:3] <- batch.box_info[,2:3] * img_size / (img_size - 32) - random.col/256
      
      for (j in 2:5) {
        batch.box_info[batch.box_info[,j] <= 0,j] <- 0
        batch.box_info[batch.box_info[,j] >= 1,j] <- 1
      }
      
    } 
    
    label <- Encode_fun(box_info = batch.box_info, n.grid = dim(img_array)[1]/32)
    label <- mx.nd.array(label)
    data <- mx.nd.array(img_array)
    
    return(list(data = data, label = label))
    
  }
  
  return(list(reset = reset, iter.next = iter.next, value = value, batch_size = batch_size, batch = batch))
  
}

my_iterator_func <- setRefClass("Custom_Iter",
                                fields = c("iter", "batch_size", "img_size", "resize_method", "aug_crop", "aug_flip"),
                                contains = "Rcpp_MXArrayDataIter",
                                methods = list(
                                  initialize = function(iter, batch_size = 16, img_size = 256, resize_method = 'nearest',
                                                        aug_crop = TRUE, aug_flip = TRUE){
                                    .self$iter <- my_iterator_core(batch_size = batch_size, img_size = img_size, resize_method = resize_method,
                                                                   aug_crop = aug_crop, aug_flip = aug_flip)
                                    .self
                                  },
                                  value = function(){
                                    .self$iter$value()
                                  },
                                  iter.next = function(){
                                    .self$iter$iter.next()
                                  },
                                  reset = function(){
                                    .self$iter$reset()
                                  },
                                  finalize=function(){
                                  }
                                )
)

第一節:訓練一個物件識別模型(11)

# Test iterator function

my_iter <- my_iterator_func(iter = NULL, batch_size = 16, img_size = 256, resize_method = 'bilinear',
                            aug_crop = TRUE, aug_flip = TRUE)

my_iter$reset()

my_iter$iter.next()
## [1] TRUE
test <- my_iter$value()

img_seq <- 1

iter_img <- as.array(test$data)[,,,img_seq]
iter_img[,,1] <- iter_img[,,1] + 123.68
iter_img[,,2] <- iter_img[,,2] + 116.78
iter_img[,,3] <- iter_img[,,3] + 103.94
iter_img <- iter_img / 255

iter_box_info <- Decode_fun(test$label)

Show_img(img = iter_img, box_info = iter_box_info[iter_box_info$img_id == img_seq,], show_grid = FALSE)

第一節:訓練一個物件識別模型(12)

# Custom callback function

my.eval.metric.loss <- mx.metric.custom(
  name = "multi_part_loss",
  function(label, pred) {
    return(as.array(pred))
  }
)

my.callback_batch <- function (batch.size = 16, frequency = 10) {
  function(iteration, nbatch, env, verbose = TRUE) {
    count <- nbatch
    if (is.null(env$count)) 
      env$count <- 0
    if (is.null(env$init)) 
      env$init <- FALSE
    if (env$count > count) 
      env$init <- FALSE
    env$count = count
    if (env$init) {
      if (count%%frequency == 0 && !is.null(env$metric)) {
        time <- as.double(difftime(Sys.time(), env$tic, 
                                   units = "secs"))
        speed <- frequency * batch.size/time
        result <- env$metric$get(env$train.metric)
        if (nbatch != 0 & verbose) {
          message(paste0("Batch [", nbatch, "] Speed: ", 
                         formatC(speed, 3, format = "f"), " samples/sec Train-", result$name, 
                         "=", as.array(result$value)))
        }
        env$tic = Sys.time()
      }
    }
    else {
      env$init <- TRUE
      env$tic <- Sys.time()
    }
  }
}


my.callback_epoch <- function (out_symbol, logger = NULL, 
                               prefix = 'model/yolo_v1',
                               fixed.params = NULL,
                               period = 1) {
  function(iteration, nbatch, env, verbose = TRUE) {
    if (iteration%%period == 0) {
      env_model <- env$model
      env_all_layers <- env_model$symbol$get.internals()
      model_write_out <- list(symbol = out_symbol,
                              arg.params = env_model$arg.params,
                              aux.params = env_model$aux.params)
      model_write_out[[2]] <- append(model_write_out[[2]], fixed.params)
      class(model_write_out) <- "MXFeedForwardModel"
      mx.model.save(model_write_out, prefix, iteration)
      if (verbose) {
        message(sprintf("Model checkpoint saved to %s-%04d.params", prefix, iteration))
      }
    }
    if (!is.null(logger)) {
      if (class(logger) != "mx.metric.logger") {
        stop("Invalid mx.metric.logger.")
      } else {
        result <- env$metric$get(env$train.metric)
        logger$train <- c(logger$train, result$value)
        if (!is.null(env$eval.metric)) {
          result <- env$metric$get(env$eval.metric)
          logger$eval <- c(logger$eval, result$value)
        }
      }
    }
    return(TRUE)
  }
}

第一節:訓練一個物件識別模型(13)

# initiate Parameter for model

new_arg <- mxnet:::mx.model.init.params(symbol = final_yolo_loss, 
                                        input.shape = list(data = c(224, 224, 3, 13), 
                                                           label = c(7, 7, 6, 13)), 
                                        output.shape = NULL, initializer = mxnet:::mx.init.Xavier(rnd_type = "uniform", magnitude = 2.24), 
                                        ctx = mx.gpu())

# Bind Pre-trained Parameter into model

Pre_trained_ARG <- Pre_Trained_model$arg.params

ARG_in_net_name <- names(Pre_trained_ARG) %>% .[. %in% names(new_arg$arg.params)]  # remove paramter does not in model

for (i in 1:length(ARG_in_net_name)){
  new_arg$arg.params[names(new_arg$arg.params) == ARG_in_net_name[i]] <- Pre_trained_ARG[names(Pre_trained_ARG) == ARG_in_net_name[i]]
}

ARG.PARAMS <- new_arg$arg.params

# Model Training

my_logger <- mx.metric.logger$new()
my_optimizer <- mx.opt.create(name = "sgd", learning.rate = 5e-3, momentum = 0.9, wd = 1e-4)

my_iter <- my_iterator_func(iter = NULL, batch_size = 16, img_size = 256, aug_crop = TRUE, aug_flip = TRUE)

YOLO_model <- mx.model.FeedForward.create(final_yolo_loss, X = my_iter,
                                          ctx = mx.gpu(), num.round = 1, optimizer = my_optimizer,
                                          arg.params = ARG.PARAMS,  eval.metric = my.eval.metric.loss,
                                          input.names = 'data', output.names = 'label',
                                          batch.end.callback = my.callback_batch(batch.size = 16, frequency = 10),
                                          epoch.end.callback = my.callback_epoch(out_symbol = yolomap, logger = my_logger,
                                                                                 prefix = 'model/yolo_pikachu', period = 1))

練習1:親手訓練並用這個模型做出預測

– 如果你因為電腦問題沒辦法很快的得到模型,你可以下載yolo_v1-symbol.json以及yolo_v1-0000.params下載已經訓練好的模型。

# Load valiation dataset

val_img_list_path <- 'data/val_img_list.RData'
val_box_info_path <- 'data/val_box_info.RData'

load(val_img_list_path)
load(val_box_info_path)

# Select an image

used_img_id <- 3

img <- readJPEG(val_img_list[[used_img_id]])
sub_BOX_INFOS <- val_box_info[val_box_info$img_id %in% used_img_id,]

# Show image

Show_img(img = img, box_info = sub_BOX_INFOS, show_grid = FALSE)

練習1答案(1)

– 現在的問題是,我們該如何解碼這個輸出呢?

# Select an image

used_img_id <- 3

img <- readJPEG(val_img_list[[used_img_id]])
img_array <- preproc.image(img, width = 256, height = 256)

# Predict and decode

pred_out <- mxnet:::predict.MXFeedForwardModel(model = YOLO_model, X = img_array)

# Show output

pred_out
## , , 1, 1
## 
##              [,1]         [,2]         [,3]         [,4]         [,5]
## [1,] 2.183773e-05 7.931536e-05 6.351219e-04 4.764824e-05 3.066422e-04
## [2,] 2.112850e-05 8.728065e-06 8.421379e-05 4.261563e-05 2.267240e-05
## [3,] 3.515312e-05 6.092310e-06 6.672289e-05 1.391574e-04 4.951582e-06
## [4,] 2.894120e-05 1.691693e-05 9.405715e-05 5.073927e-05 3.202549e-04
## [5,] 3.926421e-05 7.203409e-06 3.814128e-05 1.176981e-04 4.450712e-06
## [6,] 1.580108e-04 1.356378e-04 1.882506e-04 1.024184e-04 1.296587e-04
## [7,] 1.299090e-04 1.584105e-05 6.206470e-05 1.087106e-04 4.044878e-05
## [8,] 1.478992e-04 3.950361e-05 1.172350e-05 5.081013e-06 7.446285e-06
##              [,6]         [,7]         [,8]
## [1,] 5.042905e-05 1.206236e-04 3.142923e-05
## [2,] 1.875633e-05 6.406666e-06 6.458136e-05
## [3,] 2.528928e-07 1.362063e-07 5.915533e-05
## [4,] 9.999520e-01 1.514749e-09 4.630832e-05
## [5,] 1.955162e-06 4.397059e-08 2.994973e-05
## [6,] 1.827018e-05 3.710182e-06 1.194206e-04
## [7,] 5.231993e-05 3.681783e-06 5.022814e-04
## [8,] 2.055441e-05 3.945771e-05 6.166154e-05
## 
## , , 2, 1
## 
##           [,1]      [,2]      [,3]      [,4]      [,5]       [,6]      [,7]
## [1,] 0.6811010 0.5992813 0.5588801 0.5918353 0.5709596 0.56653702 0.4892548
## [2,] 0.6025977 0.4801804 0.4766892 0.6117824 0.6155071 0.71469992 0.5594321
## [3,] 0.5267593 0.5213332 0.4998184 0.6468284 0.9106307 0.98305821 0.8658469
## [4,] 0.5180151 0.4374247 0.4843335 0.6602411 0.7039936 0.47149846 0.8244733
## [5,] 0.5525034 0.5094016 0.5309802 0.5598103 0.5592340 0.08420401 0.6220540
## [6,] 0.5059406 0.4042488 0.4696932 0.4871991 0.3894823 0.20204745 0.4235689
## [7,] 0.4832372 0.3369739 0.4324991 0.3261008 0.2914434 0.30894563 0.3775699
## [8,] 0.1947426 0.2085135 0.2981278 0.3268141 0.3252218 0.34786418 0.2280285
##           [,8]
## [1,] 0.5099480
## [2,] 0.3608977
## [3,] 0.3018216
## [4,] 0.2923717
## [5,] 0.2853925
## [6,] 0.3369421
## [7,] 0.3532733
## [8,] 0.3354525
## 
## , , 3, 1
## 
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]        [,7]
## [1,] 0.4752454 0.4654770 0.4474602 0.4145004 0.4986147 0.4299467 0.354526907
## [2,] 0.4111670 0.3225459 0.3028007 0.3956220 0.5524070 0.4264280 0.200698659
## [3,] 0.3451282 0.2392129 0.2272664 0.3999768 0.8253816 0.5090470 0.020727312
## [4,] 0.3181776 0.2618865 0.1814105 0.4150837 0.9581521 0.4551763 0.002361933
## [5,] 0.3555844 0.2550228 0.1595805 0.3474151 0.9324422 0.5543780 0.070362419
## [6,] 0.3686740 0.2943202 0.1065822 0.1668596 0.4033784 0.3448375 0.209898204
## [7,] 0.4111833 0.4030324 0.1919183 0.1723866 0.2298892 0.2421275 0.414325953
## [8,] 0.3812090 0.4858882 0.3803023 0.3771349 0.4024045 0.4102047 0.501650393
##            [,8]
## [1,] 0.29737329
## [2,] 0.26700559
## [3,] 0.09080849
## [4,] 0.07152078
## [5,] 0.11881936
## [6,] 0.24182341
## [7,] 0.35300747
## [8,] 0.47599444
## 
## , , 4, 1
## 
##            [,1]       [,2]       [,3]       [,4]       [,5]         [,6]
## [1,] 0.16491586 0.18378334 0.28542048 0.26018864 0.14889957 1.749086e-01
## [2,] 0.09440644 0.07127112 0.28318104 0.20255308 0.10651366 7.307230e-02
## [3,] 0.06701270 0.04338539 0.23514247 0.16765966 0.02271725 3.762850e-03
## [4,] 0.07184193 0.03279034 0.19499126 0.16322628 0.01311888 3.537172e-06
## [5,] 0.06131087 0.02977058 0.17533329 0.16524935 0.07603120 3.725016e-02
## [6,] 0.05124541 0.01864368 0.10862837 0.12757657 0.05287513 6.733549e-02
## [7,] 0.04296201 0.01140163 0.03496159 0.05462516 0.06538107 9.614037e-02
## [8,] 0.16897696 0.10097314 0.11095339 0.11884030 0.12064246 1.821003e-01
##             [,7]      [,8]
## [1,] 0.323346138 0.3325693
## [2,] 0.154499650 0.3327717
## [3,] 0.003637901 0.1550119
## [4,] 0.004098119 0.1500305
## [5,] 0.005127498 0.1426961
## [6,] 0.076475367 0.1702000
## [7,] 0.053354282 0.0853233
## [8,] 0.203049257 0.2541927
## 
## , , 5, 1
## 
##           [,1]      [,2]       [,3]      [,4]      [,5]       [,6]      [,7]
## [1,] 0.4088876 0.3281469 0.31757659 0.3296816 0.2416847 0.30842003 0.2826222
## [2,] 0.3253536 0.1952580 0.23505569 0.2239144 0.1382685 0.13826904 0.2247975
## [3,] 0.3053932 0.1463771 0.14762405 0.1212323 0.1170232 0.11569682 0.2140862
## [4,] 0.3221013 0.1421157 0.12821636 0.1281368 0.2803877 0.07619502 0.2475316
## [5,] 0.2965327 0.1442205 0.11869530 0.1298239 0.3110506 0.37224638 0.3108101
## [6,] 0.2872577 0.1306140 0.09673508 0.1259071 0.1966543 0.17768337 0.2070399
## [7,] 0.2724642 0.2064336 0.21457450 0.2435248 0.3111090 0.27551523 0.2968786
## [8,] 0.3671204 0.4170501 0.36455652 0.4603936 0.4649581 0.49876365 0.4791833
##           [,8]
## [1,] 0.3009813
## [2,] 0.3539481
## [3,] 0.2280535
## [4,] 0.2340001
## [5,] 0.1893230
## [6,] 0.3498937
## [7,] 0.3536443
## [8,] 0.3997338
## 
## , , 6, 1
## 
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
## [1,] 0.6730731 0.7012892 0.8282602 0.6135319 0.7431093 0.5815746 0.5686185
## [2,] 0.6672665 0.7685835 0.8880886 0.7069159 0.8173864 0.7794816 0.6700435
## [3,] 0.7810093 0.9139746 0.9600584 0.9205896 0.9022682 0.9701681 0.9467540
## [4,] 0.7860104 0.9339116 0.9637063 0.9028959 0.9725211 0.9999988 0.9427245
## [5,] 0.7916315 0.9270773 0.9462017 0.9415361 0.9460345 0.9871424 0.9548299
## [6,] 0.8096858 0.9571847 0.9718243 0.9563374 0.9810125 0.9853585 0.9216541
## [7,] 0.8482009 0.9809137 0.9700474 0.9533759 0.9575083 0.9471467 0.9486054
## [8,] 0.9243892 0.9356348 0.9095957 0.8696740 0.9020783 0.8695765 0.8862369
##           [,8]
## [1,] 0.6328783
## [2,] 0.5658334
## [3,] 0.7586090
## [4,] 0.7908683
## [5,] 0.8406799
## [6,] 0.7496398
## [7,] 0.8141131
## [8,] 0.8568270

練習1答案(2)

# Decode output

pred_box_info <- Decode_fun(pred_out, cut_prob = 0.5, cut_overlap = 0.3)
pred_box_info
##   obj_name  col_left col_right   row_bot  row_top     prob img_id       col
## 1  pikachu 0.6193968 0.7443973 0.5013856 0.366489 0.999952      1 #FF0000FF
# Show image

Show_img(img = img, box_info = pred_box_info, show_prob = TRUE, show_grid = FALSE)

第二節:物件識別模型的評估指標(1)

– 但這些比賽通常同時有多種物件同時需要識別,所以比賽的指標一般來說都使用「mean Average Precision (mAP)」,而這也是相關Paper使用的模型評估指標。

– 「Recall」就是醫學上常用的「Sensitivity」

– 根據他的定義,其實就是畫一條「Precision x Recall curve」,並計算它的曲線下面積:

F06

第二節:物件識別模型的評估指標(2)

F07

# Sample information

num_obj <- 4
pred_value <- c(0.93, 0.75, 0.67, 0.71, 0.82, 0.91)
real_value <- c(1, 1, 0, 1, 0, 0)

# Calculation process

real_value <- real_value[order(pred_value, decreasing=TRUE)]
cum_TP <- cumsum(real_value)

P_list <- cum_TP * real_value / seq_along(real_value)
P_list <- P_list[P_list!=0]

while (sum(diff(P_list) > 0) >= 1) {
    diff_P_list <- diff(P_list)
    diff_P_list[diff_P_list < 0] <- 0
    P_list <- P_list + c(diff_P_list, 0)
}

# Average Precision

sum(P_list)/num_obj
## [1] 0.55

第二節:物件識別模型的評估指標(3)

F05

IoU_function(sub_BOX_INFOS[,2:5], pred_box_info[,2:5])
## [1] 0.4222606

第二節:物件識別模型的評估指標(4)

# Sample information

num_obj <- 4
pred_value <- c(0.93, 0.75, 0.67, 0.71, 0.82, 0.91)
real_IoU <- c(0.75, 0.81, 0.42, 0.69, 0.27, 0.39)

# Calculation function

AP_function <- function (obj_IoU, obj_prob, num_obj, IoU_cut = 0.5) {
  
  sort_obj_IoU <- obj_IoU[order(obj_prob, decreasing=TRUE)]
  pred_postive <- sort_obj_IoU > IoU_cut
  
  cum_TP <- cumsum(pred_postive)
  
  P_list <- cum_TP * pred_postive / seq_along(pred_postive)
  P_list <- P_list[P_list!=0]
  
  while (sum(diff(P_list) > 0) >= 1) {
    diff_P_list <- diff(P_list)
    diff_P_list[diff_P_list < 0] <- 0
    P_list <- P_list + c(diff_P_list, 0)
  }
  
  return(sum(P_list)/num_obj)
  
}

# Show AP

AP_function(obj_IoU = real_IoU, obj_prob = pred_value, num_obj = num_obj)
## [1] 0.55

練習2:評估你訓練出來的模型準確度

– 比較有意思的是,剛剛的訓練過程中你應該有把每一代的模型都儲存下來了,你是否能稍微找一下訓練到第幾代就差不多了,不要過度訓練以防overfitting?

– 當然,你也可以試著調整訓練中所使用的參數,舉例來說…

  1. 在我們的Loss function中因為嚴重的類別不平衡,我們給定\(\lambda = 10\)以及\(\mbox{weight_classification} = 0.2\),你可以調整看看是不是會更好

  2. 在預測的時候,我們移除多餘預測框使用了\(\mbox{cut_overlap} = 0.3\),降低這個值會減少多餘的框,這會不會增加Average Precision呢

  3. 換一個起始模型進行轉移特徵學習,舉例來說改成ResNet,並且你可以修正它後面所連接的結構

  4. 有一堆額外的超參數可以讓你調整,像是Batch size、L2正則化的強度、學習率等

練習2答案

# Load model

YOLO_model <- mx.model.load('model/yolo_v1', 0)

# Load valiation dataset

val_img_list_path <- 'data/val_img_list.RData'
val_box_info_path <- 'data/val_box_info.RData'

load(val_img_list_path)
load(val_box_info_path)

# Read images

img_array <- array(0, dim = c(256, 256, 3, length(val_img_list)))

for (i in 1:length(val_img_list)) {
  
  img <- readJPEG(val_img_list[[i]])
  img_array[,,,i] <- preproc.image(img, width = 256, height = 256)
  
}

# Predict and decode

pred_out <- mxnet:::predict.MXFeedForwardModel(model = YOLO_model, X = img_array)
pred_box_info <- Decode_fun(pred_out, cut_prob = 0.5, cut_overlap = 0.3)

# Calculate IoU

pred_box_info$IoU <- 0

for (m in 1:nrow(pred_box_info)) {
  
  sub_label_box_info <- val_box_info[val_box_info[,'img_id'] == pred_box_info[m,'img_id'], ]
  IoUs <- numeric(nrow(sub_label_box_info))
  
  for (n in 1:nrow(sub_label_box_info)) {
    IoUs[n] <- IoU_function(label = sub_label_box_info[n,2:5], pred = pred_box_info[m,2:5])
  }
  
  pred_box_info[m,'IoU'] <- max(IoUs)
  
}

# Calculate AP

obj_IoU <- pred_box_info[,'IoU']
obj_prob <- pred_box_info[,'prob']
num_obj <- nrow(val_box_info)

AP_function(obj_IoU = obj_IoU, obj_prob = obj_prob, num_obj = num_obj, IoU_cut = 0.5)
## [1] 0.8672958

第三節:特殊的損失函數(1)

– 在剛剛的實驗中,我們是透過了加權正樣本10倍來粗略的解決這個問題,但似乎太過簡單暴力了。

– 因此,Kaiming He、Ross Girshick與他們Facebook的同事又合作提出了一個新的損失函數:Focal Loss,它的想法是Soft-OHEM,並基於這個方法訓練出了RetinaNet

F08

第三節:特殊的損失函數(2)

\[CE(y, p, \alpha) = -\frac{{1}}{n}\sum \limits_{i=1}^{n} \left(\alpha \cdot y_{i} \cdot log(p_{i}) + (1 - \alpha) \cdot (1-y_{i}) \cdot log(1-p_{i})\right)\]

\[FL(y, p, \alpha, \gamma) = -\frac{{1}}{n}\sum \limits_{i=1}^{n} \left(\alpha \cdot (1 - p_{i})^{\gamma} \cdot y_{i} \cdot log(p_{i}) + (1 - \alpha) \cdot p_{i}^{\gamma} \cdot (1-y_{i}) \cdot log(1-p_{i})\right)\]

F09

第三節:特殊的損失函數(3)

CE_loss_function <- function (indata, inlabel, obj, lambda, eps = 1e-4) {
  
  log_pred_1 <- mx.symbol.log(data = indata + eps)
  log_pred_2 <- mx.symbol.log(data = 1 - indata + eps)
  multiple_log_pred_label_1 <- mx.symbol.broadcast_mul(lhs = log_pred_1, rhs = inlabel)
  multiple_log_pred_label_2 <- mx.symbol.broadcast_mul(lhs = log_pred_2, rhs = 1 - inlabel)
  obj_weighted_loss <- mx.symbol.broadcast_mul(lhs = obj, rhs = multiple_log_pred_label_1 + multiple_log_pred_label_2)
  average_CE_loss <- mx.symbol.mean(data = obj_weighted_loss, axis = 0:3, keepdims = FALSE)
  CE_loss <- 0 - average_CE_loss * lambda
  
  return(CE_loss)
  
}
Focal_loss_function <- function (indata, inlabel, obj, lambda, gamma = 0, eps = 1e-4) {
  
  log_pred_1 <- mx.symbol.log(data = indata + eps)
  log_pred_2 <- mx.symbol.log(data = 1 - indata + eps)
  multiple_log_pred_label_1 <- mx.symbol.broadcast_mul(lhs = log_pred_1, rhs = inlabel)
  multiple_log_pred_label_2 <- mx.symbol.broadcast_mul(lhs = log_pred_2, rhs = 1 - inlabel)
  obj_weighted_loss <- mx.symbol.broadcast_mul(lhs = obj, rhs = (1 - indata + eps)^gamma * multiple_log_pred_label_1 + (indata + eps)^gamma * multiple_log_pred_label_2)
  average_Focal_loss <- mx.symbol.mean(data = obj_weighted_loss, axis = 0:3, keepdims = FALSE)
  Focal_loss <- 0 - average_Focal_loss * lambda
  
  return(Focal_loss)
  
}

第三節:特殊的損失函數(4)

\[MSE(y,\hat{y}) = \sum \limits_{i=1}^{n} (y - \hat{y})^2\]

– 但到了物件識別領域中又存在問題了,那就是若誤差太大時他給的損失值會以平方加權,但它所負責的部分是邊界框的長寬以及座標,偏移過多錯了就錯了,似乎不用給太大的損失。

\[MAE(y,\hat{y}) = \sum \limits_{i=1}^{n} |y - \hat{y}|\]

– 因此,我們又迫切的需要一個損失函數,滿足上述特性但具有連續可微的性質!

第三節:特殊的損失函數(5)

F10

\[L(y,\hat{y}) = \sum \limits_{i=1}^{n} log(cosh(y - \hat{y}))\]

\[cosh(x) = \frac{e^x + e^{-x}}{2}\]

第三節:特殊的損失函數(6)

MSE_loss_function <- function (indata, inlabel, obj, lambda) {
  
  diff_pred_label <- mx.symbol.broadcast_minus(lhs = indata, rhs = inlabel)
  square_diff_pred_label <- mx.symbol.square(data = diff_pred_label)
  obj_square_diff_loss <- mx.symbol.broadcast_mul(lhs = obj, rhs = square_diff_pred_label)
  MSE_loss <- mx.symbol.mean(data = obj_square_diff_loss, axis = 0:3, keepdims = FALSE)
  
  return(MSE_loss * lambda)
  
}
LOGCOSH_loss_function <- function (indata, inlabel, obj, lambda) {
  
  diff_pred_label <- mx.symbol.broadcast_minus(lhs = indata, rhs = inlabel)
  cosh_diff_pred_label <- mx.symbol.cosh(data = diff_pred_label)
  logcosh_diff_pred_label <- mx.symbol.log(data = cosh_diff_pred_label)
  obj_logcosh_diff_pred_label <- mx.symbol.broadcast_mul(lhs = obj, rhs = logcosh_diff_pred_label)
  LOGCOSH_loss <- mx.symbol.mean(data = obj_logcosh_diff_pred_label, axis = 0:3, keepdims = FALSE)
  
  return(LOGCOSH_loss * lambda)
  
}

練習3:運用新的損失函數訓練模型

– 這應該是一個很簡單的題目,你只需要修改odel Architecture的部分,而剩下的部分完全都不用動到就能執行了!

練習3答案

# Libraries

library(mxnet)
library(magrittr)

## Define the model architecture
## Use pre-trained model and fine tuning

# Load MobileNet v2

Pre_Trained_model <- mx.model.load('model/mobilev2', 0)

# Get the internal output

Mobile_symbol <- Pre_Trained_model$symbol

Mobile_All_layer <- Mobile_symbol$get.internals()

basic_out <- which(Mobile_All_layer$outputs == 'conv6_3_linear_bn_output') %>% Mobile_All_layer$get.output()

# mx.symbol.infer.shape(basic_out, data = c(256, 256, 3, 7))$out.shapes
# conv6_3_linear_bn_output out shape = 8 8 320 n (if input shape = 256 256 3 n)

# Convolution layer for specific mission and training new parameters

# 1. Additional some architecture for better learning

DWCONV_function <- function (indata, num_filters = 256, Inverse_coef = 6, residual = TRUE, name = 'lvl1', stage = 1) {
  
  expend_conv <- mx.symbol.Convolution(data = indata, kernel = c(1, 1), stride = c(1, 1), pad = c(0, 0),
                                       no.bias = TRUE, num.filter = num_filters * Inverse_coef,
                                       name = paste0(name, '_', stage, '_expend'))
  expend_bn <- mx.symbol.BatchNorm(data = expend_conv, fix_gamma = FALSE, name = paste0(name, '_', stage, '_expend_bn'))
  expend_relu <- mx.symbol.LeakyReLU(data = expend_bn, act.type = 'leaky', slope = 0.1, name = paste0(name, '_', stage, '_expend_relu'))
  
  dwise_conv <- mx.symbol.Convolution(data = expend_relu, kernel = c(3, 3), stride = c(1, 1), pad = c(1, 1),
                                      no.bias = TRUE, num.filter = num_filters * Inverse_coef, num.group = num_filters * Inverse_coef,
                                      name = paste0(name, '_', stage, '_dwise'))
  dwise_bn <- mx.symbol.BatchNorm(data = dwise_conv, fix_gamma = FALSE, name = paste0(name, '_', stage, '_dwise_bn'))
  dwise_relu <- mx.symbol.LeakyReLU(data = dwise_bn, act.type = 'leaky', slope = 0.1, name = paste0(name, '_', stage, '_dwise_relu'))
  
  restore_conv <- mx.symbol.Convolution(data = dwise_relu, kernel = c(1, 1), stride = c(1, 1), pad = c(0, 0),
                                        no.bias = TRUE, num.filter = num_filters,
                                        name = paste0(name, '_', stage, '_restore'))
  restore_bn <- mx.symbol.BatchNorm(data = restore_conv, fix_gamma = FALSE, name = paste0(name, '_', stage, '_restore_bn'))
  
  if (residual) {
    
    block <- mx.symbol.broadcast_plus(lhs = indata, rhs = restore_bn, name = paste0(name, '_', stage, '_block'))
    return(block)
    
  } else {
    
    restore_relu <- mx.symbol.LeakyReLU(data = restore_bn, act.type = 'leaky', slope = 0.1, name = paste0(name, '_', stage, '_restore_relu'))
    return(restore_relu)
    
  }
  
}

CONV_function <- function (indata, num_filters = 256, name = 'lvl1', stage = 1) {
  
  conv <- mx.symbol.Convolution(data = indata, kernel = c(1, 1), stride = c(1, 1), pad = c(0, 0),
                                no.bias = TRUE, num.filter = num_filters,
                                name = paste0(name, '_', stage, '_conv'))
  bn <- mx.symbol.BatchNorm(data = conv, fix_gamma = FALSE, name = paste0(name, '_', stage, '_bn'))
  relu <- mx.symbol.Activation(data = bn, act.type = 'relu', name = paste0(name, '_', stage, '_relu'))
  
  return(relu)
  
}

YOLO_map_function <- function (indata, final_map = 6, num_box = 1, drop = 0.2, name = 'lvl1') {
  
  dp <- mx.symbol.Dropout(data = indata, p = drop, name = paste0(name, '_drop'))
  
  conv <- mx.symbol.Convolution(data = dp, kernel = c(1, 1), stride = c(1, 1), pad = c(0, 0),
                                no.bias = FALSE, num.filter = final_map, name = paste0(name, '_linearmap'))
  
  inter_split <- mx.symbol.SliceChannel(data = conv, num_outputs = final_map,
                                        axis = 1, squeeze_axis = FALSE, name = paste0(name, "_inter_split"))
  
  new_list <- list()
  
  for (k in 1:final_map) {
    if (!(k %% num_box) %in% c(4:5)) {
      new_list[[k]] <- mx.symbol.Activation(inter_split[[k]], act.type = 'sigmoid', name = paste0(name, "_yolomap_", k))
    } else {
      new_list[[k]] <- inter_split[[k]]
    }
  }
  
  yolomap <- mx.symbol.concat(data = new_list, num.args = final_map, dim = 1, name = paste0(name, "_yolomap"))
  
  return(yolomap)
  
}

yolo_conv_1 <- DWCONV_function(indata = basic_out, num_filters = 320, Inverse_coef = 3, residual = TRUE, name = 'yolo', stage = 1)
yolo_conv_2 <- DWCONV_function(indata = yolo_conv_1, num_filters = 320, Inverse_coef = 3, residual = TRUE, name = 'yolo', stage = 2)
yolo_conv_3 <- CONV_function(indata = yolo_conv_2, num_filters = 320, name = 'yolo', stage = 3)

yolomap <- YOLO_map_function(indata = yolo_conv_3, final_map = 6, drop = 0.2, name = 'final')

# 2. Custom loss function

LOGCOSH_loss_function <- function (indata, inlabel, obj, lambda) {
  
  diff_pred_label <- mx.symbol.broadcast_minus(lhs = indata, rhs = inlabel)
  cosh_diff_pred_label <- mx.symbol.cosh(data = diff_pred_label)
  logcosh_diff_pred_label <- mx.symbol.log(data = cosh_diff_pred_label)
  obj_logcosh_diff_pred_label <- mx.symbol.broadcast_mul(lhs = obj, rhs = logcosh_diff_pred_label)
  LOGCOSH_loss <- mx.symbol.mean(data = obj_logcosh_diff_pred_label, axis = 0:3, keepdims = FALSE)
  
  return(LOGCOSH_loss * lambda)
  
}

Focal_loss_function <- function (indata, inlabel, obj, lambda, gamma = 0, eps = 1e-4) {
  
  log_pred_1 <- mx.symbol.log(data = indata + eps)
  log_pred_2 <- mx.symbol.log(data = 1 - indata + eps)
  multiple_log_pred_label_1 <- mx.symbol.broadcast_mul(lhs = log_pred_1, rhs = inlabel)
  multiple_log_pred_label_2 <- mx.symbol.broadcast_mul(lhs = log_pred_2, rhs = 1 - inlabel)
  obj_weighted_loss <- mx.symbol.broadcast_mul(lhs = obj, rhs = (1 - indata + eps)^gamma * multiple_log_pred_label_1 + (indata + eps)^gamma * multiple_log_pred_label_2)
  average_Focal_loss <- mx.symbol.mean(data = obj_weighted_loss, axis = 0:3, keepdims = FALSE)
  Focal_loss <- 0 - average_Focal_loss * lambda
  
  return(Focal_loss)
  
}

YOLO_loss_function <- function (indata, inlabel, final_map = 6, num_box = 1, lambda = 10, gamma = 2, weight_classification = 0.2, name = 'yolo') {
  
  num_feature <- final_map/num_box
  
  my_loss <- 0
  
  yolomap_split <- mx.symbol.SliceChannel(data = indata, num_outputs = final_map, 
                                          axis = 1, squeeze_axis = FALSE, name = paste(name, '_yolomap_split'))
  
  label_split <- mx.symbol.SliceChannel(data = inlabel, num_outputs = final_map, 
                                        axis = 1, squeeze_axis = FALSE, name = paste(name, '_label_split'))
  
  for (j in 1:num_box) {
    for (k in 1:num_feature) {
      if (k %in% 1:5) {weight <- 1} else {weight <- weight_classification}
      if (!k %in% c(2:5)) {
        if (k == 1) {
          my_loss <- my_loss + Focal_loss_function(indata = yolomap_split[[(j-1)*num_feature+k]],
                                                   inlabel = label_split[[(j-1)*num_feature+k]],
                                                   obj = label_split[[(j-1)*num_feature+1]],
                                                   lambda = lambda * weight,
                                                   gamma = gamma,
                                                   eps = 1e-4)
          my_loss <- my_loss + Focal_loss_function(indata = yolomap_split[[(j-1)*num_feature+k]],
                                                   inlabel = label_split[[(j-1)*num_feature+k]],
                                                   obj = 1 - label_split[[(j-1)*num_feature+1]],
                                                   lambda = 1,
                                                   gamma = gamma,
                                                   eps = 1e-4)
        } else {
          my_loss <- my_loss + Focal_loss_function(indata = yolomap_split[[(j-1)*num_feature+k]],
                                                   inlabel = label_split[[(j-1)*num_feature+k]],
                                                   obj = label_split[[(j-1)*num_feature+1]],
                                                   lambda = lambda * weight,
                                                   gamma = gamma,
                                                   eps = 1e-4)
        }
      } else {
        my_loss <- my_loss + LOGCOSH_loss_function(indata = yolomap_split[[(j-1)*num_feature+k]],
                                                   inlabel = label_split[[(j-1)*num_feature+k]],
                                                   obj = label_split[[(j-1)*num_feature+1]],
                                                   lambda = lambda * weight)
      }
    }
  }
  
  return(my_loss)
  
}

label <- mx.symbol.Variable(name = "label")

yolo_loss <- YOLO_loss_function(indata = yolomap, inlabel = label, final_map = 6, num_box = 1, lambda = 10, gamma = 2, weight_classification = 0.2, name = 'yolo')

final_yolo_loss <- mx.symbol.MakeLoss(data = yolo_loss)

第四節:錨框的使用與多重尺度輸出(1)

– 這個原因是我們今天訓練的是一個YOLO v1模型,而錨框(anchor box)的使用是從YOLO v2的模型開始的。

– 錨框的想法是非常重要的,否則我們很難在YOLO v1的基礎上增加每個grid所預測的bounding box數量。

– 錨框的長寬位置仍然是非常重要的,舉例來說我們要做道路上的人車識別,很有可能行人的長寬比大多都是3比1,而汽車的長寬比大多都是1比5,因此我們就可以運用不同的錨框來預測相似屬性的物件

– 引入錨框之後,Encode與Decode的過程變成了計算與錨框之間的差異:

F11

第四節:錨框的使用與多重尺度輸出(2)

– Joseph Redmon在YOLO v2的論文中:YOLO9000: Better, Faster, Stronger提出了另一種決定錨框的長寬比的思路,那就是把這些框的長寬比做聚類分析(clustering analysis),之後再決定出數個錨框。

# box_info_path (Training and Validation set)

original_box_info_path <- 'data/train_box_info.RData'
revised_box_info_path <- 'data/train_box_info (yolo v3).RData'
anchor_boxs_path <- 'data/anchor_boxs (yolo v3).RData'

# Start to define anchor boxes 

load(original_box_info_path)

anchor_box_info <- data.frame(width = log(train_box_info[,3] - train_box_info[,2]),
                              height = log(train_box_info[,4] - train_box_info[,5]),
                              stringsAsFactors = FALSE)

kmean_model <- kmeans(x = anchor_box_info, centers = 9, iter.max = 10)

anchor_boxs <- as.data.frame(kmean_model$centers, stringsAsFactors = FALSE)
anchor_boxs$width <- exp(anchor_boxs$width)
anchor_boxs$height <- exp(anchor_boxs$height)
anchor_boxs$rank <- rank(anchor_boxs[,1] * anchor_boxs[,2])
anchor_boxs$lvl <- ceiling(anchor_boxs$rank / 3)
anchor_boxs$seq <- anchor_boxs$rank %% 3 + 1
anchor_boxs$col <- rainbow(9)[anchor_boxs$rank]

anchor_boxs
##        width    height rank lvl seq       col
## 1 0.07782336 0.1312339    2   1   3 #FFAA00FF
## 2 0.09309299 0.1455360    5   2   3 #00FFAAFF
## 3 0.11728431 0.1439148    7   3   2 #0000FFFF
## 4 0.09873949 0.1612554    6   2   1 #00AAFFFF
## 5 0.11056990 0.1719687    8   3   3 #AA00FFFF
## 6 0.08775265 0.1329813    3   1   1 #AAFF00FF
## 7 0.13994964 0.1717804    9   3   1 #FF00AAFF
## 8 0.08017669 0.1209479    1   1   2 #FF0000FF
## 9 0.10678621 0.1261838    4   2   2 #00FF00FF
# Visualization

par(mar = c(5, 4, 4, 2))

plot(exp(anchor_box_info$width), exp(anchor_box_info$height), pch = 19, cex = 1.5,
     col = anchor_boxs$col[kmean_model$cluster], 
     xlab = 'Width', ylab = 'Height', main = 'Anchor box clusters')

第四節:錨框的使用與多重尺度輸出(3)

# Add anchor box info to train_box_info

train_box_info$bbox_center_row <- (train_box_info[,4] + train_box_info[,5])/2
train_box_info$bbox_center_col <- (train_box_info[,2] + train_box_info[,3])/2
train_box_info$bbox_width <- exp(anchor_box_info$width)
train_box_info$bbox_height <- exp(anchor_box_info$height)
train_box_info$anchor_width <- anchor_boxs$width[kmean_model$cluster]
train_box_info$anchor_height <- anchor_boxs$height[kmean_model$cluster]
train_box_info$rank <- anchor_boxs$rank[kmean_model$cluster]
train_box_info$lvl <- anchor_boxs$lvl[kmean_model$cluster]
train_box_info$seq <- anchor_boxs$seq[kmean_model$cluster]

head(train_box_info)
##   obj_name  col_left col_right   row_bot   row_top prob img_id bbox_center_row
## 1  pikachu 0.6267570 0.7256063 0.4658268 0.3013253    1      1       0.3835761
## 2  pikachu 0.5070340 0.5993253 0.4963081 0.3682864    1      2       0.4322973
## 3  pikachu 0.5904536 0.6917713 0.5608004 0.3917792    1      3       0.4762898
## 4  pikachu 0.5722729 0.6571676 0.5396996 0.4144326    1      4       0.4770661
## 5  pikachu 0.3893552 0.5016431 0.4850163 0.3470082    1      5       0.4160123
## 6  pikachu 0.3819232 0.4916472 0.5595707 0.4213461    1      6       0.4904584
##   bbox_center_col bbox_width bbox_height anchor_width anchor_height rank lvl
## 1       0.6761816 0.09884930   0.1645015   0.09873949     0.1612554    6   2
## 2       0.5531797 0.09229130   0.1280217   0.08775265     0.1329813    3   1
## 3       0.6411124 0.10131770   0.1690212   0.09873949     0.1612554    6   2
## 4       0.6147203 0.08489472   0.1252670   0.08017669     0.1209479    1   1
## 5       0.4454992 0.11228791   0.1380081   0.11728431     0.1439148    7   3
## 6       0.4367852 0.10972404   0.1382246   0.11728431     0.1439148    7   3
##   seq
## 1   1
## 2   1
## 3   1
## 4   2
## 5   2
## 6   2
# Save data

save(train_box_info, file = revised_box_info_path)

anchor_boxs <- anchor_boxs[order(anchor_boxs$rank),]
rownames(anchor_boxs) <- 1:nrow(anchor_boxs)

save(anchor_boxs, file = anchor_boxs_path)

第四節:錨框的使用與多重尺度輸出(4)

Encode_fun <- function (box_info, n.grid = c(32, 16, 8), eps = 1e-8, n.anchor = 3,
                        obj_name = 'pikachu') {
  
  img_IDs <- unique(box_info$img_id)
  num_pred <- 5 + length(obj_name)
  
  out_array_list <- list()
  
  for (k in 1:length(n.grid)) {
    
    out_array_list[[k]] <- array(0, dim = c(n.grid[k], n.grid[k], n.anchor * num_pred, length(img_IDs)))
    
  }
  
  for (j in 1:length(img_IDs)) {
    
    sub_box_info <- box_info[box_info$img_id == img_IDs[j],]
    
    for (k in 1:length(n.grid)) {
      
      if (k %in% sub_box_info$lvl) {
        
        rescale_box_info <- sub_box_info[sub_box_info$lvl == k,c(1, 8:13, 15:16)]
        rescale_box_info[,2:7] <- rescale_box_info[,2:7] * n.grid[k]
        
        for (i in 1:nrow(rescale_box_info)) {
          
          center_row <- ceiling(rescale_box_info$bbox_center_row[i])
          center_col <- ceiling(rescale_box_info$bbox_center_col[i])
          
          row_related_pos <- rescale_box_info$bbox_center_row[i] %% 1
          row_related_pos[row_related_pos == 0] <- 1
          col_related_pos <- rescale_box_info$bbox_center_col[i] %% 1
          col_related_pos[col_related_pos == 0] <- 1
          
          out_array_list[[k]][center_row,center_col,(rescale_box_info$seq[i]-1)*num_pred+1,j] <- 1
          out_array_list[[k]][center_row,center_col,(rescale_box_info$seq[i]-1)*num_pred+2,j] <- row_related_pos
          out_array_list[[k]][center_row,center_col,(rescale_box_info$seq[i]-1)*num_pred+3,j] <- col_related_pos
          out_array_list[[k]][center_row,center_col,(rescale_box_info$seq[i]-1)*num_pred+4,j] <- log(rescale_box_info$bbox_width[i]/rescale_box_info$anchor_width[i] + eps)
          out_array_list[[k]][center_row,center_col,(rescale_box_info$seq[i]-1)*num_pred+5,j] <- log(rescale_box_info$bbox_height[i]/rescale_box_info$anchor_height[i] + eps)
          out_array_list[[k]][center_row,center_col,(rescale_box_info$seq[i]-1)*num_pred+5+which(obj_name %in% rescale_box_info$obj_name[i]),j] <- 1 
          
        }
        
      }
      
    }
    
  }
  
  return(out_array_list)
  
}

Decode_fun <- function (encode_array_list, anchor_boxs,
                        cut_prob = 0.5, cut_overlap = 0.5,
                        obj_name = 'pikachu',
                        obj_col = '#FF0000FF') {
  
  num_list <- length(encode_array_list)
  num_img <- dim(encode_array_list[[1]])[4]
  num_feature <- length(obj_name) + 5
  pos_start <- (0:(dim(encode_array_list[[1]])[3]/num_feature-1)*num_feature)
  
  box_info <- NULL
  
  # Decoding
  
  for (j in 1:num_img) {
    
    sub_box_info <- NULL
    
    for (k in 1:num_list) {
      
      for (i in 1:length(pos_start)) {
        
        sub_encode_array <- as.array(encode_array_list[[k]])[,,pos_start[i]+1:num_feature,j]
        
        pos_over_cut <- which(sub_encode_array[,,1] >= cut_prob)
        
        if (length(pos_over_cut) >= 1) {
          
          pos_over_cut_row <- pos_over_cut %% dim(sub_encode_array)[1]
          pos_over_cut_row[pos_over_cut_row == 0] <- dim(sub_encode_array)[1]
          pos_over_cut_col <- ceiling(pos_over_cut/dim(sub_encode_array)[1])
          anchor_box <- anchor_boxs[anchor_boxs$lvl == k & anchor_boxs$seq == i, 1:2]
          
          for (l in 1:length(pos_over_cut)) {
            
            encode_vec <- sub_encode_array[pos_over_cut_row[l],pos_over_cut_col[l],]
            
            if (encode_vec[2] < 0) {encode_vec[2] <- 0}
            if (encode_vec[2] > 1) {encode_vec[2] <- 1}
            if (encode_vec[3] < 0) {encode_vec[3] <- 0}
            if (encode_vec[3] > 1) {encode_vec[3] <- 1}
            
            center_row <- (encode_vec[2] + (pos_over_cut_row[l] - 1))/dim(sub_encode_array)[1]
            center_col <- (encode_vec[3] + (pos_over_cut_col[l] - 1))/dim(sub_encode_array)[2]
            width <- exp(encode_vec[4]) * anchor_box[1,1]
            height <- exp(encode_vec[5]) * anchor_box[1,2]
            
            new_box_info <- data.frame(obj_name = obj_name[which.max(encode_vec[-c(1:5)])],
                                       col_left = center_col-width/2,
                                       col_right = center_col+width/2,
                                       row_bot = center_row+height/2,
                                       row_top = center_row-height/2,
                                       prob = encode_vec[1],
                                       img_ID = j,
                                       col = obj_col[which.max(encode_vec[-c(1:5)])],
                                       stringsAsFactors = FALSE)
            
            sub_box_info <- rbind(sub_box_info, new_box_info)
            
          }
          
        }
        
      }
      
    }
    
    if (!is.null(sub_box_info)) {
      
      # Remove overlapping
      
      sub_box_info <- sub_box_info[order(sub_box_info$prob, decreasing = TRUE),]
      
      for (obj in unique(sub_box_info$obj_name)) {
        
        obj_sub_box_info <- sub_box_info[sub_box_info$obj_name == obj,]
        
        if (nrow(obj_sub_box_info) == 1) {
          
          box_info <- rbind(box_info, obj_sub_box_info)
          
        } else {
          
          overlap_seq <- NULL
          
          for (m in 2:nrow(obj_sub_box_info)) {
            
            for (n in 1:(m-1)) {
              
              if (!n %in% overlap_seq) {
                
                overlap_prob <- IoU_function(label = obj_sub_box_info[m,2:5], pred = obj_sub_box_info[n,2:5])
                
                overlap_width <- min(obj_sub_box_info[m,3], obj_sub_box_info[n,3]) - max(obj_sub_box_info[m,2], obj_sub_box_info[n,2])
                overlap_height <- min(obj_sub_box_info[m,4], obj_sub_box_info[n,4]) - max(obj_sub_box_info[m,5], obj_sub_box_info[n,5])
                
                if (overlap_prob >= cut_overlap) {
                  
                  overlap_seq <- c(overlap_seq, m)
                  
                }
                
              }
              
            }
            
          }
          
          if (!is.null(overlap_seq)) {
            
            obj_sub_box_info <- obj_sub_box_info[-overlap_seq,]
            
          }
          
          box_info <- rbind(box_info, obj_sub_box_info)
          
        }
        
      }
      
    }
    
  }
  
  return(box_info)
  
}

第四節:錨框的使用與多重尺度輸出(5)

# Load data (Training set)

load('data/train_img_list.RData')
load('data/train_box_info (yolo v3).RData')
load('data/anchor_boxs (yolo v3).RData')

# Test Encode & Decode function

img_id <- 1

resized_img <- readJPEG(train_img_list[[img_id]])

sub_BOX_INFOS <- train_box_info[train_box_info$img_id %in% img_id,]

Encode_label <- Encode_fun(box_info = sub_BOX_INFOS)
restore_BOX_INFOS <- Decode_fun(encode_array_list = Encode_label, anchor_boxs = anchor_boxs)

Show_img(img = resized_img, box_info = restore_BOX_INFOS, show_grid = FALSE)

– 至於接著就是要進行模型的訓練及預測,這個部分就請你直接下載現成的語法:pikachu object detection (multi boxes).R以及predict (multi boxes).R

結語

– 上過這節課之後,你再回頭看看上一節課的家庭作業,你是不是覺得更清楚它的運作方式了?

– 由於有太多超參數需要調整,你可能會需要訓練很多次,最終把你的研究過程寫成一個簡短的報告與同學分享。

– 如果你真的想要訓練一個物件識別模型,一定要把Github上的範例:MxNetR-YOLO中VOC2007的部分做過一次,並試著看能不能將整個流程套用到你想要的地方。

– 你可以把上節課作業的模型當作是你的預訓練模型用到自己的任務上,由於結構更為相似他轉移特徵學習的效果應該會更好!

F13