第一節：基礎繪圖函數簡介-1(1)

同樣的資訊，使用圖像相呈現較於表格/文字，通常能讓閱讀者更快的獲得資訊。
在R裡面，我們能夠畫出任何統計圖！

– 請至這裡下載本週的範例資料

dat = read.csv("Example_data.csv", header = TRUE)
head(dat)

##       eGFR Disease Survival.time Death Diabetes Cancer      SBP      DBP
## 1 34.65379       1     0.4771037     0        0      1 121.2353 121.3079
## 2 37.21183       1     3.0704424     0        1      1 122.2000 122.6283
## 3 32.60074       1     0.2607117     1        0      0 118.9136 121.7621
## 4 29.68481       1            NA    NA        0      0 118.2212 112.7043
## 5 28.35726       0     0.1681673     1        0      0 116.7469 115.7705
## 6 33.95012       1     1.2238556     0        0      0 119.9936 116.3872
##   Education Income
## 1         2      0
## 2         2      0
## 3         0      0
## 4         1      0
## 5         0      0
## 6         1      0

第一節：基礎繪圖函數簡介-1(2)

我們先從幾個簡單的統計圖開始

直方圖：需要使用函數「hist()」

hist(dat[,"eGFR"])

盒鬚圖：需要使用函數「boxplot()」

boxplot(dat[,"eGFR"])

圓餅圖：需要使用函數「pie()」以及函數「table()」

pie(table(dat[,"Education"]))

長條圖：需要使用函數「barplot()」以及函數「table()」

barplot(table(dat[,"Education"]))

第一節：基礎繪圖函數簡介-1(3)

這些圖都能透過增加不同的參數增加變化，我們可以透過函數「help()」查詢它們內部的參數。舉例來說，我們可以用下列方式改變圖的顏色

– 在R裡面的顏色可以在Colors in R裡查看

– 另外，這裡教一個新函數「par()」，他可以指定繪圖環境。其中最常見的應用為把4張圖放在同一張畫布內：

par(mfrow = c(2, 2))
hist(dat[,"eGFR"], col = "red")
boxplot(dat[,"eGFR"], col = "blue")
pie(table(dat[,"Education"]), col = c("blue", "red", "green"))
barplot(table(dat[,"Education"]), col = c("gray90", "gray50", "gray10"))

你如果喜歡你畫的圖，可以透過函數「pdf()」把圖片存出去，注意最後一定要用函數「dev.off()」關掉那個PDF檔案

pdf("plot1.pdf", height = 8, width = 8, family = "serif")
par(mfrow = c(2, 2))
hist(dat[,"eGFR"], col = "red")
boxplot(dat[,"eGFR"], col = "blue")
pie(table(dat[,"Education"]), col = c("blue", "red", "green"))
barplot(table(dat[,"Education"]), col = c("gray90", "gray50", "gray10"))
dev.off()

練習1：調整繪圖參數

請透過函數「help()」查詢該如何完成下面這張圖：

練習1答案

你應該會看到範例程式碼：

boxplot(count ~ spray, data = InsectSprays, col = "lightgray")

我們可以用我們的資料套進來：

boxplot(dat[,"eGFR"] ~ dat[,"Disease"], col = c("blue", "red"), ylab = "eGFR", xlab = "Disease", main = "eGFR value by Disease status", lwd = 1.5)

第二節：基礎繪圖函數簡介-2(1)

接著我們介紹一個強大的函數「plot()」，他支援了多種不同的圖形，其中最主要的是散布圖：

plot(dat[,"SBP"], dat[,"DBP"], ylab = "DBP", xlab = "SBP", main = "Scatter plot of SBP and DBP")

其實，我們可以修改點的造型，例如：

plot(dat[,"SBP"], dat[,"DBP"], ylab = "DBP", xlab = "SBP", main = "Scatter plot of SBP and DBP", pch = 19)

下面有pch造型與數字的對應表：

第二節：基礎繪圖函數簡介-2(2)

你可以為你的圖形加點東西，首先我們先介紹函數「lines()」。

– 函數「lines()」的效果是按照順序把幾個點連起來，舉例來說…

– 註：函數「plot.new()」及函數「plot.window()」是拿來開一張新畫布用的！

x = c(1, 4, 7)
y = c(2, 9, 6)
plot.new()
plot.window(xlim = c(0, 10), ylim = c(0, 10))
lines(x, y)

當然，如果點夠密，你其實可以畫出圓！

z = 0:1000/100
x = sin(z) #三角函數sin
y = cos(z) #三角函數cos
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
lines(x, y)

第二節：基礎繪圖函數簡介-2(3)

學會函數「lines()」以後，我們能夠幫散布圖上加預測線了…

– 預測線的方程式，需要函數「lm()」幫忙建立，你看得懂下面的程式碼嗎？

# 建立MODEL以及預測線的座標
X = dat[,"SBP"]
Y = dat[,"DBP"]
model = lm(Y~X)
COEF = model$coefficients
x = c(0, 200)
y = COEF[1] + COEF[2] * x

plot(dat[,"SBP"], dat[,"DBP"], ylab = "DBP", xlab = "SBP", main = "Scatter plot of SBP and DBP", pch = 19)
lines(x, y, col = "red", lwd = 2)

第二節：基礎繪圖函數簡介-2(4)

其實，你還可以為你的圖形加點料…

函數「text()」可以為你的圖片上加文字描述

x = c(1, 0, -1, 0)
y = c(0, 1, 0, -1)
t = c("A", "B", "C", "D")
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
text(x, y, t)

函數「points()」可以為你的圖片上加點

x = c(1, 0, -1, 0)
y = c(0, 1, 0, -1)
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
points(x, y, pch = 1:4)

函數「legend()」可以為你的圖片加上註釋

plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
legend("topleft", c("Female", "Male"), col = c("red", "blue"), pch = c(15, 19), bg = "gray90")
legend(0, 0, c("estimates", "95% CI"), lty = c(1, 2), lwd = 2, col = "black")

函數「polygon()」可以畫多邊形

x = c(1, 0, -1, 0)
y = c(0, 1, 0, -1)
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
polygon(x, y, col = "green")

練習2：手刻一張圖

你已經會畫長條圖了吧? 請你畫出下面這張圖！

##   Variable Disease:0   Disease:1   p-value
## 1 "eGFR"   "33.2±6.4" "34.4±7.2" "0.020"

練習2答案

我們可以用上面的數字來畫出來

m0 = 33.2
s0 = 6.4/sqrt(sum(dat[,"Disease"] == 0, na.rm = TRUE))
m1 = 34.4
s1 = 7.2/sqrt(sum(dat[,"Disease"] == 1, na.rm = TRUE))

XXX = c(m0, m1)
barplot(XXX, col = c("gray50", "white"), xlab = "Disease", ylab = "eGFR", ylim = c(0, 43))
lines(c(1.9, 1.9), c(m1 - qnorm(0.975) * s1, m1 + qnorm(0.975) * s1), lwd = 3)
lines(c(1.75, 2.05), c(m1 + qnorm(0.975) * s1, m1 + qnorm(0.975) * s1), lwd = 3)
lines(c(1.75, 2.05), c(m1 - qnorm(0.975) * s1, m1 - qnorm(0.975) * s1), lwd = 3)
lines(c(0.7, 0.7), c(m0 - qnorm(0.975) * s0, m0 + qnorm(0.975) * s0), lwd = 3)
lines(c(0.55, 0.85), c(m0 + qnorm(0.975) * s0, m0 + qnorm(0.975) * s0), lwd = 3)
lines(c(0.55, 0.85), c(m0 - qnorm(0.975) * s0, m0 - qnorm(0.975) * s0), lwd = 3)
lines(c(0.7, 0.7, 1.9, 1.9), c(36, 38, 38, 36), lwd = 3)
text(1.3, 40, "p = 0.020")
legend("topright", c("Control", "Case"), fill = c("gray50", "white"))

第三節：色彩透明度與函數(1)

還記得剛剛的SBP對DBP的散布圖嗎?是不是感覺到有很多點重疊在一起。

– 資料量多的時候經常會遇到這樣的問題，這時候我們可能需要告訴使用者不同區域點的密度。

plot(dat[,"SBP"], dat[,"DBP"], ylab = "DBP", xlab = "SBP", main = "Scatter plot of SBP and DBP", cex = 2)

plot(dat[,"SBP"], dat[,"DBP"], ylab = "DBP", xlab = "SBP", main = "Scatter plot of SBP and DBP", pch = 19, cex = 2)

第三節：色彩透明度與函數(2)

在R裡面，我們使用的是6或8位元的16進位色碼，其規格為：#[紅色][綠色][藍色][透明度]

– 舉例來說，不透明的紅色的色碼為『#FF0000』或『#FF0000FF』

– 透明度50%的紅色色碼為『#FF000080』

– 透明度50%的黑色色碼為『#00000080』

x = c(1, 0, -1, 0)
y = c(0, 1, 0, -1)
plot.new()
plot.window(xlim = c(-1, 1), ylim = c(-1, 1))
points(x, y, pch = 19, cex = 2, col = c("#FF0000", "#FF0000FF", "#FF000080", "#00000080"))

如果你懶得自己想色碼，函數「rgb()」可以協助你調色

rgb(1, 0, 0, 0.5)

## [1] "#FF000080"

rgb(0.7, 0.5, 0.3, 0.7)

## [1] "#B3804DB3"

有了半透明的顏色後，剛剛的散布圖終於可以看出密度了

plot(dat[,"SBP"], dat[,"DBP"], ylab = "DBP", xlab = "SBP", main = "Scatter plot of SBP and DBP", pch = 19, cex = 2, col = "#00000030")

第三節：色彩透明度與函數(3)

事實上，函數「smoothScatter()」可以畫出與剛剛類似的散布圖：

smoothScatter(dat[,"SBP"], dat[,"DBP"], nrpoints = 0, ylab = "DBP", xlab = "SBP", main = "Scatter plot of SBP and DBP")

我們還可以幫他加註釋，但這比較難，但我們可以google看看有沒有解法

F01

看起來是有解法的，但要安裝套件『fields』

library(fields)

fudgeit <- function(){
  xm <- get('xm', envir = parent.frame(1))
  ym <- get('ym', envir = parent.frame(1))
  z  <- get('dens', envir = parent.frame(1))
  colramp <- get('colramp', parent.frame(1))
  image.plot(xm,ym,z, col = colramp(256), legend.only = T, add =F)
}

par(mar = c(5,4,4,5))
smoothScatter(dat[,"SBP"], dat[,"DBP"], nrpoints = 0, ylab = "DBP", xlab = "SBP", main = "Scatter plot of SBP and DBP", postPlotHook = fudgeit)

練習3：修改別人的程式碼

感受過Google大神的威力後，你應該知道如果你想要畫出漂亮的圖片，問Google最快了。

– 現在，假設你對單一色階的散布圖仍然不滿意，想要精益求精，google給了你一條明路，請參考R Scatter Plot: symbol color represents number of overlapping points

F02

該怎樣將網頁上的程式碼，套用到我們的圖上呢?

練習3答案

其實我們只要改最上面就可以了：

x1 <- dat[,"SBP"]
x2 <- dat[,"DBP"]
df <- data.frame(x1,x2)

## Use densCols() output to get density at each point
x <- densCols(x1,x2, colramp=colorRampPalette(c("black", "white")))
df$dens <- col2rgb(x)[1,] + 1L

## Map densities to colors
cols <-  colorRampPalette(c("#000099", "#00FEFF", "#45FE4F", 
                            "#FCFF00", "#FF9400", "#FF3100"))(256)
df$col <- cols[df$dens]

## Plot it, reordering rows so that densest points are plotted on top
plot(x2~x1, data=df[order(df$dens),], pch = 19, col = col, cex = 2, ylab = "DBP", xlab = "SBP", main = "Scatter plot of SBP and DBP")

小結

本週介紹了簡易的繪圖功能，這讓我們能在R裡面繪畫，並且由於有能力進行手刻，任意統計圖理論上都能畫出來了！
另外我們又一次感受到了Google的威力，而且這次更加有趣！同學學習到最重要的部分就是如何利用Google找到與自己想做的事情相似的程式碼，並利用Google到的程式碼套用到自己的資料上。
現在你的整個資料分析流程又包含了繪圖了，之後再讓我們學一些更強的功能吧！

R語言程式設計導論