機器學習及演算法

林嶔 (Lin, Chin)

Lesson 2 統計分析實作1(描述性統計及描述統計圖表)

第一節:描述性統計(1)

– 請至這裡下載範例資料

dat <- read.csv("ECG_train.csv", header = TRUE, fileEncoding = 'CP950', stringsAsFactors = FALSE, na.strings = "")
  1. 函數「mean()」可以幫助我們計算平均值
mean(dat$AGE, na.rm = TRUE)
## [1] 61.14584
  1. 函數「sd()」可以幫助我們計算標準差
sd(dat$AGE, na.rm = TRUE)
## [1] 18.45581
  1. 函數「var()」可以幫助我們計算變異數
var(dat$AGE, na.rm = TRUE)
## [1] 340.6169
  1. 函數「median()」可以幫助我們計算中位數
median(dat$AGE, na.rm = TRUE)
## [1] 61.96178
  1. 函數「quantile()」可以幫助我們計算百分位數
quantile(dat$AGE, na.rm = TRUE)
##        0%       25%       50%       75%      100% 
##  20.03530  48.18833  61.96178  75.82827 102.63361
quantile(dat$AGE, 0.5, na.rm = TRUE)
##      50% 
## 61.96178
quantile(dat$AGE, 0.95, na.rm = TRUE)
##      95% 
## 89.31906
  1. 函數「min()」可以幫助我們找出最小值
min(dat$AGE, na.rm = TRUE)
## [1] 20.0353
  1. 函數「max()」可以幫助我們找出最大值
max(dat$AGE, na.rm = TRUE)
## [1] 102.6336

第一節:描述性統計(2)

  1. 函數「table()」可以幫助我們產生列聯表
table(dat$GENDER)
## 
## female   male 
##   2172   2828
table(dat$GENDER, dat$LVD)
##         
##            0   1
##   female 459 247
##   male   703 703
  1. 函數「prop.table()」可以幫助我們產生列聯表的百分比
tab1 <- table(dat$GENDER)
prop.table(tab1)
## 
## female   male 
## 0.4344 0.5656
tab2 <- table(dat$GENDER, dat$LVD)
prop.table(tab2)
##         
##                  0         1
##   female 0.2173295 0.1169508
##   male   0.3328598 0.3328598
prop.table(tab2, margin = 1)
##         
##                  0         1
##   female 0.6501416 0.3498584
##   male   0.5000000 0.5000000
prop.table(tab2, margin = 2)
##         
##                  0         1
##   female 0.3950086 0.2600000
##   male   0.6049914 0.7400000

第二節:基礎繪圖函數簡介(1)

– 在R裡面,我們能夠畫出任何統計圖!

  1. 直方圖:需要使用函數「hist()」
hist(dat[,"AGE"])

  1. 盒鬚圖:需要使用函數「boxplot()」
boxplot(dat[,"AGE"])

  1. 圓餅圖:需要使用函數「pie()」以及函數「table()」
pie(table(dat[,"AMI"]))

  1. 長條圖:需要使用函數「barplot()」以及函數「table()」
barplot(table(dat[,"AMI"]))

第二節:基礎繪圖函數簡介(2)

– 在R裡面的顏色可以在Colors in R裡查看

– 另外,這裡教一個新函數「par()」,他可以指定繪圖環境。其中最常見的應用為把4張圖放在同一張畫布內:

par(mfrow = c(2, 2))
hist(dat[,"AGE"], col = "red")
boxplot(dat[,"AGE"], col = "blue")
pie(table(dat[,"AMI"]), col = c("blue", "red", "green"))
barplot(table(dat[,"AMI"]), col = c("gray90", "gray50", "gray10"))

pdf("plot1.pdf", height = 8, width = 8, family = "serif")
par(mfrow = c(2, 2))
hist(dat[,"AGE"], col = "red")
boxplot(dat[,"AGE"], col = "blue")
pie(table(dat[,"AMI"]), col = c("blue", "red", "green"))
barplot(table(dat[,"AMI"]), col = c("gray90", "gray50", "gray10"))
dev.off()

練習1:簡單繪圖

練習1答案

boxplot(count ~ spray, data = InsectSprays, col = "lightgray")

head(InsectSprays)
##   count spray
## 1    10     A
## 2     7     A
## 3    20     A
## 4    14     A
## 5    14     A
## 6    12     A
boxplot(dat[,"AGE"] ~ dat[,"LVD"], col = c("blue", "red"), ylab = "Age", xlab = "LVD", main = "Age value by LVD status", lwd = 1.5)

第三節:繪圖物件簡介(1)

plot(dat[,"AGE"], dat[,"Rate"], ylab = "Heart rate", xlab = "AGE", main = "Scatter plot of AGE and Heart rate")

plot(dat[,"AGE"], dat[,"Rate"], ylab = "Heart rate", xlab = "AGE", main = "Scatter plot of AGE and Heart rate", pch = 19)

第三節:繪圖物件簡介(2)

– 函數「lines()」的效果是按照順序把幾個點連起來,舉例來說…

– 註:函數「plot.new()」及函數「plot.window()」是拿來開一張新畫布用的!

x <- c(1, 4, 7)
y <- c(2, 9, 6)
plot.new()
plot.window(xlim = c(0, 10), ylim = c(0, 10))
lines(x, y)

z <- 0:1000/100
x <- sin(z) #三角函數sin
y <- cos(z) #三角函數cos
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
lines(x, y)

第三節:繪圖物件簡介(3)

– 預測線的方程式,需要先學會線性回歸,在R語言裡面線性回歸是這樣建立的:

# 建立MODEL以及預測線的座標
X <- dat[,"AGE"]
Y <- dat[,"Rate"]
model <- lm(Y~X)
model
## 
## Call:
## lm(formula = Y ~ X)
## 
## Coefficients:
## (Intercept)            X  
##     78.9313       0.1021
x <- c(10, 150)
y <- 78.9313 + 0.1021 * x

plot(dat[,"AGE"], dat[,"Rate"], ylab = "Heart rate", xlab = "AGE", main = "Scatter plot of AGE and Heart rate", pch = 19)
lines(x, y, col = "red", lwd = 2)

plot(dat[,"AGE"], dat[,"Rate"], ylab = "Heart rate", xlab = "AGE", main = "Scatter plot of AGE and Heart rate", pch = 19)
abline(model, col = "red", lwd = 2)

第三節:繪圖物件簡介(4)

class(model)
## [1] "lm"

– 函數「ls()」可以協助我們看看物件中有哪些東西

– 函數「names()」也可以做到一樣的事情

ls(model)
##  [1] "assign"        "call"          "coefficients"  "df.residual"  
##  [5] "effects"       "fitted.values" "model"         "qr"           
##  [9] "rank"          "residuals"     "terms"         "xlevels"
COEF <- model$coefficients
COEF
## (Intercept)           X 
##  78.9312550   0.1021352
x <- c(10, 150)
y = COEF[1] + COEF[2] * x

plot(dat[,"AGE"], dat[,"Rate"], ylab = "Heart rate", xlab = "AGE", main = "Scatter plot of AGE and Heart rate", pch = 19)
lines(x, y, col = "red", lwd = 2)

第三節:繪圖物件簡介(5)

  1. 函數「text()」可以為你的圖片上加文字描述
x = c(1, 0, -1, 0)
y = c(0, 1, 0, -1)
t = c("A", "B", "C", "D")
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
text(x, y, t)

  1. 函數「points()」可以為你的圖片上加點
x = c(1, 0, -1, 0)
y = c(0, 1, 0, -1)
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
points(x, y, pch = 1:4)

  1. 函數「legend()」可以為你的圖片加上註釋
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
legend("topleft", c("Female", "Male"), col = c("red", "blue"), pch = c(15, 19), bg = "gray90")
legend(0, 0, c("estimates", "95% CI"), lty = c(1, 2), lwd = 2, col = "black")

  1. 函數「polygon()」可以畫多邊形
x = c(1, 0, -1, 0)
y = c(0, 1, 0, -1)
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
polygon(x, y, col = "green")