機器學習3-神經網路及多層感知器

林嶔 (Lin, Chin)

Lesson 21

神經網路介紹(1)

– 回想一下我們在下半學期開始時所說的,如果我們能清楚X與Y實際上的關係,那就能精準的預測,但目前困難的是我們根本描述不出他的邏輯。

F21_1

神經網路介紹(2)

F21_2

– 神經細胞的構造如下,不論是何種神經元皆可分成:接收區、觸發區、傳導區和輸出區。

F21_3

F21_4

神經網路介紹(3)

F21_5

perceptron_v1 = function (x1, x2, w0, w1, w2) {
  weighted.sum = w0 + x1 * w1 + x2 * w2
  return(weighted.sum>0)
}

perceptron_v1(x1 = 1, x2 = 3, w0 = -1, w1 = 2, w2 = -1)
## [1] FALSE
perceptron_v1(x1 = 1, x2 = -2, w0 = -1, w1 = 2, w2 = -1)
## [1] TRUE

– 既然是邏輯斯回歸的簡單版,那就不用談了,我們已經很清楚邏輯斯回歸的極限了。

神經網路介紹(4)

perceptron_v2 = function (x1, x2, w0, w1, w2) {
  weighted.sum = w0 + x1 * w1 + x2 * w2
  prop = 1/(1+exp(-weighted.sum))
  return(prop)
}

perceptron_v2(x1 = 1, x2 = 3, w0 = -1, w1 = 2, w2 = -1)
## [1] 0.1192029
perceptron_v2(x1 = 1, x2 = -2, w0 = -1, w1 = 2, w2 = -1)
## [1] 0.9525741
set.seed(0)
x1 = rnorm(1000) 
x2 = rnorm(1000) 
lr1 = x1^2 + x2^2
p1 = 1/(1+exp(-lr1))
y1 = p1 > mean(p1)

plot(x1, x2, col = (y1 + 1)*2, pch = 19)

– 開始求解,使用最大概似估計法已樣本機率當指標

library(stats4)

Accuracy.y1 = function (w0, w1, w2) {
  pred.y1 = perceptron_v2(x1 = x1, x2 = x2, w0 = w0, w1 = w1, w2 = w2)
  lr = (log(pred.y1)*y1 + log(1-pred.y1)*(1-y1))
  return(-sum(lr))
}

fit1 = mle(Accuracy.y1, start = list(w0 = 0, w1 = 0, w2 = 0), method = "SANN")

pred.y1 = perceptron_v2(x1 = x1, x2 = x2,
                        w0 = fit1@coef[1], w1 = fit1@coef[2], w2 = fit1@coef[3])
tab1 = table(pred.y1>0.5, y1)
print(tab1)
##        y1
##         FALSE TRUE
##   FALSE     0   17
##   TRUE    465  518
cat("Accuracy (Perceptron) = ", sum(diag(tab1))/sum(tab1))
## Accuracy (Perceptron) =  0.518

– 但有趣的事情就來了,畢竟大腦並非由單一神經元組成,假設我們把多個神經元堆疊在一起看看呢?(我們換個講法,就是把很多個邏輯斯回歸組合在一起)

神經網路介紹(5)

F21_6

perceptron_v2 = function (x1, x2, w0, w1, w2) {
  weighted.sum = w0 + x1 * w1 + x2 * w2
  prop = 1/(1+exp(-weighted.sum))
  return(prop)
}

mynet = function (x1, x2, w01, w11, w21, w02, w12, w22, z0, z1, z2) {
  h1 = perceptron_v2(x1 = x1, x2 = x2, w0 = w01, w1 = w11, w2 = w21)
  h2 = perceptron_v2(x1 = x1, x2 = x2, w0 = w02, w1 = w12, w2 = w22)
  o1 = perceptron_v2(x1 = h1, x2 = h2, w0 = z0, w1 = z1, w2 = z2)
  return(o1)
}

mynet(x1 = 0, x2 = 1,
      w01 = 0.1, w11 = 0.2, w21 = 0.3,
      w02 = 0.4, w12 = 0.5, w22 = 0.6,
      z0 = 0.7, z1 = 0.8, z2 = 0.9)
## [1] 0.862582
Accuracy_mynet.y1 = function (w01, w11, w21, w02, w12, w22, z0, z1, z2) {
  pred.y1 = mynet(x1 = x1, x2 = x2,
                  w01 = w01, w11 = w11, w21 = w21,
                  w02 = w02, w12 = w12, w22 = w22,
                  z0 = z0, z1 = z1, z2 = z2)
  lr = (log(pred.y1)*y1 + log(1-pred.y1)*(1-y1))
  return(-sum(lr))
}

fit3 = mle(Accuracy_mynet.y1, start = list(w01 = 0, w11 = 0, w21 = 0, w02 = 0, w12 = 0, w22 = 0, z0 = 0, z1 = 0, z2 = 0), method = "SANN")

print(fit3)
## 
## Call:
## mle(minuslogl = Accuracy_mynet.y1, start = list(w01 = 0, w11 = 0, 
##     w21 = 0, w02 = 0, w12 = 0, w22 = 0, z0 = 0, z1 = 0, z2 = 0), 
##     method = "SANN")
## 
## Coefficients:
##        w01        w11        w21        w02        w12        w22 
## 10.1333278  2.6968961 -7.9747142  7.3321474 -0.9534551  6.3121327 
##         z0         z1         z2 
## 14.3265385 -8.1730799 -7.1797248
pred.y1 = mynet(x1 = x1, x2 = x2,
                w01 = fit3@coef[1], w11 = fit3@coef[2], w21 = fit3@coef[3],
                w02 = fit3@coef[4], w12 = fit3@coef[5], w22 = fit3@coef[6],
                z0 = fit3@coef[7], z1 = fit3@coef[8], z2 = fit3@coef[9])

tab2 = table(pred.y1>0.5, y1)
print(tab2)
##        y1
##         FALSE TRUE
##   FALSE   432  181
##   TRUE     33  354
cat("Accuracy (Neural Network) = ", sum(diag(tab2))/sum(tab2))
## Accuracy (Neural Network) =  0.786

神經網路介紹(6)

– 我們換個數學上的說法,假定X與Y存在一個未知函數能夠完美預測,那神經網路將有能力逼近任何複雜函數

– 這下跟我們人腦不就一樣了?即使我們不知道手寫數字是如何辨識的(我們講不出他的邏輯),但我們就是有能力分辨手寫數字。

– 但由於這樣的結構是多個感知機的結合,為了和以後的神經網路作名稱上的區別,他通常被稱作「多層感知器」

練習-1

– 在我們開始寫程式碼之前,我們先嘗試一下左上方的4種資料分布情形,並試著用簡單到複雜的神經網路去逼近他。

MxNet套件介紹與多層感知器實作(1)

– 但要注意一點,僅有64位元的作業系統能安裝MxNet。

– 他的安裝方法比較特別,並且有安裝GPU版本的方法,下面是安裝CPU版本的作法:

cran <- getOption("repos")
cran["dmlc"] <- "https://s3-us-west-2.amazonaws.com/apache-mxnet/R/CRAN/"
options(repos = cran)
install.packages("mxnet")

MxNet套件介紹與多層感知器實作(2)

– 請在這裡下載MNIST的手寫數字資料

– 再次複習一下他的資料結構

DAT = read.csv("data/train.csv")
DAT = data.matrix(DAT)

#Split data

set.seed(0)
Train.sample = sample(1:nrow(DAT), nrow(DAT)*0.6, replace = FALSE)

Train.X = DAT[Train.sample,-1]/255
Train.Y = DAT[Train.sample,1]
Test.X = DAT[-Train.sample,-1]/255
Test.Y = DAT[-Train.sample,1]

#Display

library(imager)

par(mar=rep(0,4), mfcol = c(4, 4))
for (i in 1:16) {
  plot(NA, xlim = 0:1, ylim = 0:1, xaxt = "n", yaxt = "n", bty = "n")
  img = as.raster(t(matrix(as.numeric(Train.X[i,]), nrow = 28)))
  rasterImage(img, -0.04, -0.04, 1.04, 1.04, interpolate=FALSE)
  text(0.05, 0.95, Train.Y[i], col = "green", cex = 2)
}

MxNet套件介紹與多層感知器實作(3)

– 先定義神經網路

library(mxnet)

data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=128)
act1 <- mx.symbol.Activation(fc1, name="sigmoid1", act_type="sigmoid")
fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=64)
act2 <- mx.symbol.Activation(fc2, name="sigmoid2", act_type="sigmoid")
fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=10)
softmax <- mx.symbol.SoftmaxOutput(fc3, name="sm")
  1. 原始特徵28*28共784個特徵作為輸入神經元(輸入層),對128個神經元做輸入(隱藏層1)

  2. 整合計算完成後,做「sigmoid」函數轉換

  3. 隱藏層1的128個神經元所產生的128個作為二階特徵,對64個神經元做輸入(隱藏層2)

  4. 整合計算完成後,做「sigmoid」函數轉換

  5. 隱藏層2的64個神經元所產生的64個作為三階特徵,對10個神經元做輸入(輸出層)

  6. 使用「softmax」函數做預測

– 開始訓練(先訓練20輪)

mx.set.seed(0)
model = mx.model.FeedForward.create(softmax, X = Train.X, y = Train.Y,
                                    ctx = mx.cpu(), num.round = 20, array.batch.size = 100,
                                    learning.rate = 0.05, momentum = 0.9,
                                    eval.metric = mx.metric.accuracy,
                                    epoch.end.callback = mx.callback.log.train.metric(100))

MxNet套件介紹與多層感知器實作(4)

prop.y = predict(model, Test.X[1:2,])
round(prop.y, 3)
##        [,1]  [,2]
##  [1,] 0.993 0.000
##  [2,] 0.000 0.999
##  [3,] 0.000 0.000
##  [4,] 0.000 0.000
##  [5,] 0.000 0.000
##  [6,] 0.004 0.000
##  [7,] 0.001 0.000
##  [8,] 0.000 0.000
##  [9,] 0.000 0.000
## [10,] 0.001 0.000
params = model$arg.params

Input = matrix(Test.X[1,], nrow = 1) # 1x784

Weight_1 = as.matrix(as.array(params$fc1_weight)) #784x128
Bias_1 = t(as.matrix(as.array(params$fc1_bias))) #1x128
Hidden_1 = Input %*% Weight_1 + Bias_1 # 1x128

Sigmoid_1 = 1/(1+exp(-Hidden_1)) # 1x128

Weight_2 = as.matrix(as.array(params$fc2_weight)) #128x64
Bias_2 = t(as.matrix(as.array(params$fc2_bias))) #1x64
Hidden_2 = Sigmoid_1 %*% Weight_2 + Bias_2 # 1x64

Sigmoid_2 = 1/(1+exp(-Hidden_2)) # 1x64

Weight_3 = as.matrix(as.array(params$fc3_weight)) #64x10
Bias_3 = t(as.matrix(as.array(params$fc3_bias))) #1x10
Output = Sigmoid_2 %*% Weight_3 + Bias_3 # 1x10

Softmax.Output = exp(Output)/sum(exp(Output)) # 1x10
round(Softmax.Output, 3)
##       [,1] [,2] [,3] [,4] [,5]  [,6]  [,7] [,8] [,9] [,10]
## [1,] 0.993    0    0    0    0 0.004 0.001    0    0 0.001

MxNet套件介紹與多層感知器實作(5)

– 「ReLU」函數非常簡單,意思就是負數都當作0,正數不變