機器學習及演算法

林嶔 (Lin, Chin)

Lesson 1 程式語言基本簡介

課程介紹

什麼是R語言

  1. R 完全免費,可以直接從網站上下載,且定期更新版本。

  2. R 有許多使用者分享程式套件(packages),囊括先進的統計方法,且不定期更新。

  3. R 具有強大且彈性的繪圖功能。

  4. R 可以讀取各類型的資料。除了其它統計軟體的資料檔,R也可以讀取網頁、媒體或線上資料等。

  5. R 是統計專業人員的研究工具,也是資料科學家經常使用的重要工具之一。

– 截止至2021年4月,R 在TIOBE Index排名第16位。

開始的第一步:安裝R語言(1)

– 以Download R for Windows為例。

F01

– 選base

F02

– 選Download R 4.0.4 for Windows,如要下載先前版本選Previous releases

F03

– 程式安裝位置盡量選擇D槽

開始的第一步:安裝R語言(2)

– 選Free的版本

F04

– 選Download RStudio Desktop

F05

– 網頁往下滑會有其他版本可選擇

F06

開始的第一步:安裝R語言(3)

– 介面概況

F07

– 可以調整介面字體大小,Tools → Global Options → Appearance → Zoom or Editor font size

F08

– 建立一個Project : New Directory → New Project → 輸入Directory name → 選擇儲存位置 → Create Project

F09

F10

F11

第一節:向量結構(1)

– 在R裡面,小括號是函數的意思,小括號裡面是函數所輸入的參數

– 其中「<-」符號的意思是把右邊的「物件」儲存至左邊的「變數」中,而這個變數也會形成一個新的「物件」

– 函數「print()」是將該物件印出的函數。

nums <- c(87, 78)
print(nums)

[1] 87 78

– 而「=」也有類似的效果,你可以試試看

nums = c(87, 78)
print(nums)

[1] 87 78

– 其實如果不打「print()」指令亦可以將物件印出。

nums

[1] 87 78

第一節:向量結構(2)

first_nums <- 11:13
second_nums <- 1:3

first_nums + second_nums

[1] 12 14 16

first_nums - second_nums

[1] 10 10 10

first_nums * second_nums

[1] 11 24 39

first_nums / second_nums

[1] 11.000000 6.000000 4.333333

first_nums ** second_nums

[1] 11 144 2197

first_nums %/% second_nums

[1] 11 6 4

first_nums %% second_nums

[1] 0 0 1

第二節:讀取檔案的基本操作(1)

– 使用函數「read.csv()」讀取csv檔

dat <- read.csv("ECG_train.csv", header = TRUE, fileEncoding = 'CP950', stringsAsFactors = FALSE, na.strings = "") #請將路徑改為自己放置的位置
help(read.csv)
?read.csv #也可以這樣查詢

第二節:讀取檔案的基本操作(2)

– 這份資料共包含了5000人,應用目標是希望透過心電圖的參數去預測幾個重要的指標,包含了:

  1. AMI:這是個類別變項描述心肌梗塞的狀態,包含STEMI、NSTEMI及not-AMI

  2. K:這是一個連續變項描述鉀離子的濃度

  3. LVD:這是一個二元類別變項:1代表left ventricular dysfunction,0則代表正常

  4. time與death:這組變項描述病患隔多久後死亡與否,這用來做存活分析之用

– 除了性別(GENDER)和年齡(AGE)外,心電圖的重要參數包含了8個連續變項特徵(Rate、PR、QRSd、QT、QTc、Axes_P、Axes_QRS、Axes_T)以及31個二元類別變項描述相對應的rhythm。

– rhythm依序為:abnormal T wave、atrial fibrillation、atrial flutter、atrial premature complex、complete AV block、complete left bundle branch block、complete right bundle branch block、first degree AV block、incomplete left bundle branch block、incomplete right bundle branch block、ischemia/infarction、junctional rhythm、left anterior fascicular block、left atrial enlargement、left axis deviation、left posterior fascicular block、left ventricular hypertrophy、low QRS voltage、pacemaker rhythm、prolonged QT interval、right atrial enlargement、right ventricular hypertrophy、second degree AV block、sinus bradycardia、sinus pause、sinus rhythm、sinus tachycardia、supraventricular tachycardia、ventricular premature complex、ventricular tachycardia、Wolff-Parkinson-White syndrome

– 在大部分的狀態下,我們會使用後面的幾個變項去預測前面的4組變項。

第二節:讀取檔案的基本操作(3)

– 想當然,我們的物件屬性是資料表(data.frame)格式

class(dat)
## [1] "data.frame"

– 函數「head()」可以查看資料表的前6列

head(dat)
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1   STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## 2 not-AMI 2.2  NA    3     0   male 50.95044  112 192   99 360 492      0
## 3    <NA> 3.7  NA   NA    NA female 66.22767   76 154   95 397 447     79
## 4    <NA> 2.7  NA    4     0 female 67.46526   65 184   86 440 457     61
## 5 not-AMI  NA   1  816     0   male 50.69258   99 151  103 360 462     55
## 6   STEMI 3.8   0  949     0   male 52.39062   93 224   89 386 481     60
##   Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1      -63    104        0        0        0        0        0        0
## 2       41     -2        0        0        0        0        0        0
## 3       11     30        0        0        0        0        0        0
## 4       40     30        0        0        0        0        0        0
## 5       12     24        0        0        0        0        0        0
## 6      -36     -4        0        0        0        0        0        0
##   rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## 1        0        1        0         0         1         0         0         0
## 2        0        0        0         0         1         0         0         0
## 3        0        0        0         0         0         0         0         0
## 4        0        0        0         0         0         0         0         1
## 5        0        0        0         0         1         0         0         0
## 6        0        1        0         0         1         0         0         0
##   rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## 1         0         0         1         0         0         0         0
## 2         0         0         0         0         0         0         0
## 3         0         0         0         0         0         0         0
## 4         0         0         1         0         0         0         0
## 5         0         0         0         0         0         0         0
## 6         1         0         0         0         0         0         0
##   rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## 1         0         0         0         0         1         0         0
## 2         0         0         0         0         0         1         0
## 3         0         0         0         0         1         0         0
## 4         0         0         0         0         1         0         0
## 5         0         0         0         0         1         0         0
## 6         0         0         0         0         1         0         0
##   rhythm.29 rhythm.30 rhythm.31
## 1         0         0         0
## 2         0         0         0
## 3         0         0         0
## 4         0         0         0
## 5         0         0         0
## 6         0         0         0

– 善用「help()」,你將會發現函數「head()」有個參數【n = 6】,我們試著修正看看

head(dat, n = 10)
##        AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1    STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## 2  not-AMI 2.2  NA    3     0   male 50.95044  112 192   99 360 492      0
## 3     <NA> 3.7  NA   NA    NA female 66.22767   76 154   95 397 447     79
## 4     <NA> 2.7  NA    4     0 female 67.46526   65 184   86 440 457     61
## 5  not-AMI  NA   1  816     0   male 50.69258   99 151  103 360 462     55
## 6    STEMI 3.8   0  949     0   male 52.39062   93 224   89 386 481     60
## 7     <NA>  NA   0   NA    NA   male 44.22640   67 153  100 407 430     41
## 8     <NA> 5.7  NA    1     0 female 38.60219   63 165   94 414 424     67
## 9     <NA> 5.7   0 1769     0 female 83.70883   71 192   92 412 448     71
## 10    <NA> 6.0  NA 1130     0 female 62.12934   62 180   88 452 459     62
##    Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1       -63    104        0        0        0        0        0        0
## 2        41     -2        0        0        0        0        0        0
## 3        11     30        0        0        0        0        0        0
## 4        40     30        0        0        0        0        0        0
## 5        12     24        0        0        0        0        0        0
## 6       -36     -4        0        0        0        0        0        0
## 7        31     32        0        0        0        0        0        0
## 8        23     15        0        0        0        0        0        0
## 9        49    138        0        0        0        0        0        0
## 10       27     36        0        0        0        0        0        0
##    rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## 1         0        1        0         0         1         0         0         0
## 2         0        0        0         0         1         0         0         0
## 3         0        0        0         0         0         0         0         0
## 4         0        0        0         0         0         0         0         1
## 5         0        0        0         0         1         0         0         0
## 6         0        1        0         0         1         0         0         0
## 7         0        0        0         0         1         0         0         0
## 8         0        0        0         0         1         0         0         0
## 9         0        0        0         0         1         0         0         0
## 10        0        0        0         0         0         0         0         0
##    rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## 1          0         0         1         0         0         0         0
## 2          0         0         0         0         0         0         0
## 3          0         0         0         0         0         0         0
## 4          0         0         1         0         0         0         0
## 5          0         0         0         0         0         0         0
## 6          1         0         0         0         0         0         0
## 7          0         0         0         0         0         0         0
## 8          0         0         0         0         0         0         0
## 9          0         0         0         0         0         0         0
## 10         0         0         0         0         0         0         0
##    rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## 1          0         0         0         0         1         0         0
## 2          0         0         0         0         0         1         0
## 3          0         0         0         0         1         0         0
## 4          0         0         0         0         1         0         0
## 5          0         0         0         0         1         0         0
## 6          0         0         0         0         1         0         0
## 7          0         0         0         0         1         0         0
## 8          0         0         0         0         1         0         0
## 9          0         0         0         0         1         0         0
## 10         0         0         0         0         1         0         0
##    rhythm.29 rhythm.30 rhythm.31
## 1          0         0         0
## 2          0         0         0
## 3          0         0         0
## 4          0         0         0
## 5          0         0         0
## 6          0         0         0
## 7          0         0         0
## 8          0         0         0
## 9          0         0         0
## 10         0         0         0

– 函數「tail()」可以查看資料表的後6列,同樣的有個參數【n = 6】,我們試著修正看看

tail(dat, n = 3)
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 4998 <NA> 1.6  NA  269     0   male 24.92654   69 171   98 439 471     67
## 4999 <NA>  NA   1   47     0   male 73.05372  102 181  156 400 522    -84
## 5000 <NA> 6.5  NA   18     1   male 71.22509   95 130  101 370 425     65
##      Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 4998      -71     33        0        0        0        0        0        0
## 4999     -121     34        0        0        0        0        0        0
## 5000       43     58        0        0        0        0        0        0
##      rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13
## 4998        0        0        0         0         1         0         1
## 4999        1        0        0         0         0         0         0
## 5000        0        0        0         0         1         0         0
##      rhythm.14 rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20
## 4998         0         0         0         0         0         0         0
## 4999         0         0         0         1         0         0         1
## 5000         0         0         0         0         0         0         0
##      rhythm.21 rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27
## 4998         0         0         0         0         0         1         0
## 4999         0         0         0         0         0         0         0
## 5000         0         0         0         0         0         1         0
##      rhythm.28 rhythm.29 rhythm.30 rhythm.31
## 4998         0         0         0         0
## 4999         1         0         0         0
## 5000         0         1         0         0

第二節:讀取檔案的基本操作(4)

– 資料表可以接受運算子「$」作為撈取特定變項的方式,我們試著用已經學習過的函數「head()」及函數「class()」進行操作:

x <- dat$AGE
head(x, n = 10)
##  [1] 60.09729 50.95044 66.22767 67.46526 50.69258 52.39062 44.22640 38.60219
##  [9] 83.70883 62.12934
class(x)
## [1] "numeric"

– 你發現了嗎?原始的物件「dat」是屬於資料表(data.frame),但裡面的「AGE」卻是屬於數值(numeric)

– 不需要另存成物件「x」其實也可以操作

class(dat$AGE)
## [1] "numeric"

第二節:讀取檔案的基本操作(5)

class(dat$GENDER)
## [1] "character"

– 你發現了嗎,「GENDER」是屬於文字(character)

– 在R裡面,類別變項(categorical variable)應該要表達成因子(factor),我們可以透過函數「as.XXX()」來改變他的狀態

dat$GENDER <- as.factor(dat$GENDER)

– 注意這個操作,我們把右邊的物件「GENDER」進行處理後,又把他覆蓋到左邊的「GENDER」了

– 現在,我們再來看看「GENDER」是屬於哪個屬性

class(dat$GENDER)
## [1] "factor"

– 函數「levels()」可以查詢因子向量的類別種類

levels(dat$GENDER)
## [1] "female" "male"

– 函數「levels()」不能用在非因子向量上,即使他看起來真的很像

levels(dat$death)
## NULL

第二節:讀取檔案的基本操作(6)

length(dat$GENDER)
## [1] 5000
length(dat)
## [1] 46

– 原來得到的是變數數目

lvl <- levels(dat$GENDER)
length(lvl)
## [1] 2

– 用下列這串意思一樣

length(levels(dat$GENDER))
## [1] 2

第二節:讀取檔案的基本操作(7)

dat$seq <- 1:5000

– 使用函數「head()」與「tail()」進行確認

head(dat)
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1   STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## 2 not-AMI 2.2  NA    3     0   male 50.95044  112 192   99 360 492      0
## 3    <NA> 3.7  NA   NA    NA female 66.22767   76 154   95 397 447     79
## 4    <NA> 2.7  NA    4     0 female 67.46526   65 184   86 440 457     61
## 5 not-AMI  NA   1  816     0   male 50.69258   99 151  103 360 462     55
## 6   STEMI 3.8   0  949     0   male 52.39062   93 224   89 386 481     60
##   Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1      -63    104        0        0        0        0        0        0
## 2       41     -2        0        0        0        0        0        0
## 3       11     30        0        0        0        0        0        0
## 4       40     30        0        0        0        0        0        0
## 5       12     24        0        0        0        0        0        0
## 6      -36     -4        0        0        0        0        0        0
##   rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## 1        0        1        0         0         1         0         0         0
## 2        0        0        0         0         1         0         0         0
## 3        0        0        0         0         0         0         0         0
## 4        0        0        0         0         0         0         0         1
## 5        0        0        0         0         1         0         0         0
## 6        0        1        0         0         1         0         0         0
##   rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## 1         0         0         1         0         0         0         0
## 2         0         0         0         0         0         0         0
## 3         0         0         0         0         0         0         0
## 4         0         0         1         0         0         0         0
## 5         0         0         0         0         0         0         0
## 6         1         0         0         0         0         0         0
##   rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## 1         0         0         0         0         1         0         0
## 2         0         0         0         0         0         1         0
## 3         0         0         0         0         1         0         0
## 4         0         0         0         0         1         0         0
## 5         0         0         0         0         1         0         0
## 6         0         0         0         0         1         0         0
##   rhythm.29 rhythm.30 rhythm.31 seq
## 1         0         0         0   1
## 2         0         0         0   2
## 3         0         0         0   3
## 4         0         0         0   4
## 5         0         0         0   5
## 6         0         0         0   6
tail(dat)
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 4995 <NA>  NA   1   71     0   male 69.58407   83 176   84 513 603    -13
## 4996 <NA> 2.9  NA 1522     0 female 72.33066  110 148  108 366 496     50
## 4997 <NA>  NA  NA    1     0   male 53.74001   88 171   95 373 452     65
## 4998 <NA> 1.6  NA  269     0   male 24.92654   69 171   98 439 471     67
## 4999 <NA>  NA   1   47     0   male 73.05372  102 181  156 400 522    -84
## 5000 <NA> 6.5  NA   18     1   male 71.22509   95 130  101 370 425     65
##      Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 4995       48    103        0        0        0        0        0        0
## 4996      -24     62        0        0        0        0        0        0
## 4997       36     14        0        0        0        0        0        0
## 4998      -71     33        0        0        0        0        0        0
## 4999     -121     34        0        0        0        0        0        0
## 5000       43     58        0        0        0        0        0        0
##      rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13
## 4995        0        0        0         0         0         0         0
## 4996        1        0        0         0         1         0         0
## 4997        0        0        0         0         0         0         0
## 4998        0        0        0         0         1         0         1
## 4999        1        0        0         0         0         0         0
## 5000        0        0        0         0         1         0         0
##      rhythm.14 rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20
## 4995         0         0         0         0         0         0         1
## 4996         0         0         0         1         0         0         0
## 4997         0         0         0         0         0         0         0
## 4998         0         0         0         0         0         0         0
## 4999         0         0         0         1         0         0         1
## 5000         0         0         0         0         0         0         0
##      rhythm.21 rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27
## 4995         0         0         0         0         0         1         0
## 4996         0         0         0         0         0         0         1
## 4997         0         0         0         0         0         1         0
## 4998         0         0         0         0         0         1         0
## 4999         0         0         0         0         0         0         0
## 5000         0         0         0         0         0         1         0
##      rhythm.28 rhythm.29 rhythm.30 rhythm.31  seq
## 4995         0         0         0         0 4995
## 4996         0         0         0         0 4996
## 4997         0         0         0         0 4997
## 4998         0         0         0         0 4998
## 4999         1         0         0         0 4999
## 5000         0         1         0         0 5000
dat$seq <- 1:5
head(dat)
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1   STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## 2 not-AMI 2.2  NA    3     0   male 50.95044  112 192   99 360 492      0
## 3    <NA> 3.7  NA   NA    NA female 66.22767   76 154   95 397 447     79
## 4    <NA> 2.7  NA    4     0 female 67.46526   65 184   86 440 457     61
## 5 not-AMI  NA   1  816     0   male 50.69258   99 151  103 360 462     55
## 6   STEMI 3.8   0  949     0   male 52.39062   93 224   89 386 481     60
##   Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1      -63    104        0        0        0        0        0        0
## 2       41     -2        0        0        0        0        0        0
## 3       11     30        0        0        0        0        0        0
## 4       40     30        0        0        0        0        0        0
## 5       12     24        0        0        0        0        0        0
## 6      -36     -4        0        0        0        0        0        0
##   rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## 1        0        1        0         0         1         0         0         0
## 2        0        0        0         0         1         0         0         0
## 3        0        0        0         0         0         0         0         0
## 4        0        0        0         0         0         0         0         1
## 5        0        0        0         0         1         0         0         0
## 6        0        1        0         0         1         0         0         0
##   rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## 1         0         0         1         0         0         0         0
## 2         0         0         0         0         0         0         0
## 3         0         0         0         0         0         0         0
## 4         0         0         1         0         0         0         0
## 5         0         0         0         0         0         0         0
## 6         1         0         0         0         0         0         0
##   rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## 1         0         0         0         0         1         0         0
## 2         0         0         0         0         0         1         0
## 3         0         0         0         0         1         0         0
## 4         0         0         0         0         1         0         0
## 5         0         0         0         0         1         0         0
## 6         0         0         0         0         1         0         0
##   rhythm.29 rhythm.30 rhythm.31 seq
## 1         0         0         0   1
## 2         0         0         0   2
## 3         0         0         0   3
## 4         0         0         0   4
## 5         0         0         0   5
## 6         0         0         0   1
tail(dat)
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 4995 <NA>  NA   1   71     0   male 69.58407   83 176   84 513 603    -13
## 4996 <NA> 2.9  NA 1522     0 female 72.33066  110 148  108 366 496     50
## 4997 <NA>  NA  NA    1     0   male 53.74001   88 171   95 373 452     65
## 4998 <NA> 1.6  NA  269     0   male 24.92654   69 171   98 439 471     67
## 4999 <NA>  NA   1   47     0   male 73.05372  102 181  156 400 522    -84
## 5000 <NA> 6.5  NA   18     1   male 71.22509   95 130  101 370 425     65
##      Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 4995       48    103        0        0        0        0        0        0
## 4996      -24     62        0        0        0        0        0        0
## 4997       36     14        0        0        0        0        0        0
## 4998      -71     33        0        0        0        0        0        0
## 4999     -121     34        0        0        0        0        0        0
## 5000       43     58        0        0        0        0        0        0
##      rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13
## 4995        0        0        0         0         0         0         0
## 4996        1        0        0         0         1         0         0
## 4997        0        0        0         0         0         0         0
## 4998        0        0        0         0         1         0         1
## 4999        1        0        0         0         0         0         0
## 5000        0        0        0         0         1         0         0
##      rhythm.14 rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20
## 4995         0         0         0         0         0         0         1
## 4996         0         0         0         1         0         0         0
## 4997         0         0         0         0         0         0         0
## 4998         0         0         0         0         0         0         0
## 4999         0         0         0         1         0         0         1
## 5000         0         0         0         0         0         0         0
##      rhythm.21 rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27
## 4995         0         0         0         0         0         1         0
## 4996         0         0         0         0         0         0         1
## 4997         0         0         0         0         0         1         0
## 4998         0         0         0         0         0         1         0
## 4999         0         0         0         0         0         0         0
## 5000         0         0         0         0         0         1         0
##      rhythm.28 rhythm.29 rhythm.30 rhythm.31 seq
## 4995         0         0         0         0   5
## 4996         0         0         0         0   1
## 4997         0         0         0         0   2
## 4998         0         0         0         0   3
## 4999         1         0         0         0   4
## 5000         0         1         0         0   5
dat$seq <- 1:11

練習1:新增變數的花式操作

  1. 我想要增加一個新變數是PR減去Rate,並將其增加至「dat」,變數名稱隨意

  2. 這筆資料裡面有一個變數叫做「rhythm.1」,另外一個叫做「rhythm.2」,他們分別都是二元類別變項(你可以先檢查),你有沒有辦法產生一個新變數「Group」滿足下列規則:

– 「rhythm.1 = 0 & rhythm.2 = 0」則「Group = 0」

– 「rhythm.1 = 0 & rhythm.2 = 1」則「Group = 1」

– 「rhythm.1 = 1 & rhythm.2 = 0」則「Group = 2」

– 「rhythm.1 = 1 & rhythm.2 = 1」則「Group = 3」

練習1答案

dat$diff <- dat$PR - dat$Rate
head(dat)
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1   STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## 2 not-AMI 2.2  NA    3     0   male 50.95044  112 192   99 360 492      0
## 3    <NA> 3.7  NA   NA    NA female 66.22767   76 154   95 397 447     79
## 4    <NA> 2.7  NA    4     0 female 67.46526   65 184   86 440 457     61
## 5 not-AMI  NA   1  816     0   male 50.69258   99 151  103 360 462     55
## 6   STEMI 3.8   0  949     0   male 52.39062   93 224   89 386 481     60
##   Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1      -63    104        0        0        0        0        0        0
## 2       41     -2        0        0        0        0        0        0
## 3       11     30        0        0        0        0        0        0
## 4       40     30        0        0        0        0        0        0
## 5       12     24        0        0        0        0        0        0
## 6      -36     -4        0        0        0        0        0        0
##   rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## 1        0        1        0         0         1         0         0         0
## 2        0        0        0         0         1         0         0         0
## 3        0        0        0         0         0         0         0         0
## 4        0        0        0         0         0         0         0         1
## 5        0        0        0         0         1         0         0         0
## 6        0        1        0         0         1         0         0         0
##   rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## 1         0         0         1         0         0         0         0
## 2         0         0         0         0         0         0         0
## 3         0         0         0         0         0         0         0
## 4         0         0         1         0         0         0         0
## 5         0         0         0         0         0         0         0
## 6         1         0         0         0         0         0         0
##   rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## 1         0         0         0         0         1         0         0
## 2         0         0         0         0         0         1         0
## 3         0         0         0         0         1         0         0
## 4         0         0         0         0         1         0         0
## 5         0         0         0         0         1         0         0
## 6         0         0         0         0         1         0         0
##   rhythm.29 rhythm.30 rhythm.31 seq diff
## 1         0         0         0   1  161
## 2         0         0         0   2   80
## 3         0         0         0   3   78
## 4         0         0         0   4  119
## 5         0         0         0   5   52
## 6         0         0         0   1  131

– 我們先檢查一下他們各自是否都為二元類別變項

levels(as.factor(dat$rhythm.1))
## [1] "0" "1"
levels(as.factor(dat$rhythm.2))
## [1] "0" "1"

– 看看這巧妙的解法,想一下為什麼能做到:

dat$Group <- dat$rhythm.1 * 2 + dat$rhythm.2

第三節:條件判斷式與索引(1)

– 當我們進行判斷條件或者資料篩選的時候會需要仰賴邏輯值向量(logical),邏輯值向量只有 TRUE 與 FALSE 這兩個值,亦可以簡寫為 T 與 F。

class(TRUE)

[1] “logical”

class(FALSE)

[1] “logical”

class(T)

[1] “logical”

class(F)

[1] “logical”

1.【== 、 !=:等於以及不等於】

2.【>、>=、<、<=:大於、大於等於、小於以及小於等於】

3.【 %in%:包含於】

4.【 !:非】

8 > 7 # 判斷 8 是否大於 7

[1] TRUE

8 != 7 # 判斷 8 是否不等於 7

[1] TRUE

7 %in% c(8, 7) # 判斷 7 是否包含於一個數值向量之中

[1] TRUE

!(8 > 7) # 反轉 8 是否大於 7 的判斷

[1] FALSE

第三節:條件判斷式與索引(2)

x <- c(T, T, F, F, T)
y <- c(6, 7, 9, 8, 10)
y[x]
## [1]  6  7 10
y <- c(6, 7, 9, 8, 10)
y[y <= 8]
## [1] 6 7 8

第三節:條件判斷式與索引(3)

y <= 8 & 6 > 2
## [1]  TRUE  TRUE FALSE  TRUE FALSE
y > 8 | y <= 6
## [1]  TRUE FALSE  TRUE FALSE  TRUE
y[y <= 8 & 6 > 2]
## [1] 6 7 8
y[y > 8 | y <= 6]
## [1]  6  9 10

第三節:條件判斷式與索引(4)

subdat <- dat[dat$LVD == 0,]
levels(as.factor(subdat$LVD))
## [1] "0"
levels(as.factor(subdat[,'LVD']))
## [1] "0"

第三節:條件判斷式與索引(5)

head(subdat)
##        AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1    STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## NA    <NA>  NA  NA   NA    NA   <NA>       NA   NA  NA   NA  NA  NA     NA
## NA.1  <NA>  NA  NA   NA    NA   <NA>       NA   NA  NA   NA  NA  NA     NA
## NA.2  <NA>  NA  NA   NA    NA   <NA>       NA   NA  NA   NA  NA  NA     NA
## 6    STEMI 3.8   0  949     0   male 52.39062   93 224   89 386 481     60
## 7     <NA>  NA   0   NA    NA   male 44.22640   67 153  100 407 430     41
##      Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1         -63    104        0        0        0        0        0        0
## NA         NA     NA       NA       NA       NA       NA       NA       NA
## NA.1       NA     NA       NA       NA       NA       NA       NA       NA
## NA.2       NA     NA       NA       NA       NA       NA       NA       NA
## 6         -36     -4        0        0        0        0        0        0
## 7          31     32        0        0        0        0        0        0
##      rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13
## 1           0        1        0         0         1         0         0
## NA         NA       NA       NA        NA        NA        NA        NA
## NA.1       NA       NA       NA        NA        NA        NA        NA
## NA.2       NA       NA       NA        NA        NA        NA        NA
## 6           0        1        0         0         1         0         0
## 7           0        0        0         0         1         0         0
##      rhythm.14 rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20
## 1            0         0         0         1         0         0         0
## NA          NA        NA        NA        NA        NA        NA        NA
## NA.1        NA        NA        NA        NA        NA        NA        NA
## NA.2        NA        NA        NA        NA        NA        NA        NA
## 6            0         1         0         0         0         0         0
## 7            0         0         0         0         0         0         0
##      rhythm.21 rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27
## 1            0         0         0         0         0         1         0
## NA          NA        NA        NA        NA        NA        NA        NA
## NA.1        NA        NA        NA        NA        NA        NA        NA
## NA.2        NA        NA        NA        NA        NA        NA        NA
## 6            0         0         0         0         0         1         0
## 7            0         0         0         0         0         1         0
##      rhythm.28 rhythm.29 rhythm.30 rhythm.31 seq diff Group
## 1            0         0         0         0   1  161     0
## NA          NA        NA        NA        NA  NA   NA    NA
## NA.1        NA        NA        NA        NA  NA   NA    NA
## NA.2        NA        NA        NA        NA  NA   NA    NA
## 6            0         0         0         0   1  131     0
## 7            0         0         0         0   2   86     0
head(dat$LVD == 0)
## [1]  TRUE    NA    NA    NA FALSE  TRUE
subdat <- dat[dat$LVD %in% 0,]
head(subdat)
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1   STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## 6   STEMI 3.8   0  949     0   male 52.39062   93 224   89 386 481     60
## 7    <NA>  NA   0   NA    NA   male 44.22640   67 153  100 407 430     41
## 9    <NA> 5.7   0 1769     0 female 83.70883   71 192   92 412 448     71
## 11 NSTEMI  NA   0  308     0   male 55.62976   67 162  110 430 454     58
## 15   <NA>  NA   0  853     0 female 73.00605   66 227   86 440 461     45
##    Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1       -63    104        0        0        0        0        0        0
## 6       -36     -4        0        0        0        0        0        0
## 7        31     32        0        0        0        0        0        0
## 9        49    138        0        0        0        0        0        0
## 11       18     79        0        0        0        0        0        0
## 15       -5     36        0        0        0        0        0        0
##    rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## 1         0        1        0         0         1         0         0         0
## 6         0        1        0         0         1         0         0         0
## 7         0        0        0         0         1         0         0         0
## 9         0        0        0         0         1         0         0         0
## 11        0        0        0         0         1         0         0         0
## 15        0        1        0         0         0         0         0         0
##    rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## 1          0         0         1         0         0         0         0
## 6          1         0         0         0         0         0         0
## 7          0         0         0         0         0         0         0
## 9          0         0         0         0         0         0         0
## 11         0         0         0         0         0         0         0
## 15         0         0         0         0         0         0         0
##    rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## 1          0         0         0         0         1         0         0
## 6          0         0         0         0         1         0         0
## 7          0         0         0         0         1         0         0
## 9          0         0         0         0         1         0         0
## 11         0         0         0         0         1         0         0
## 15         0         0         0         1         1         0         0
##    rhythm.29 rhythm.30 rhythm.31 seq diff Group
## 1          0         0         0   1  161     0
## 6          0         0         0   1  131     0
## 7          0         0         0   2   86     0
## 9          0         0         0   4  121     0
## 11         0         0         0   1   95     0
## 15         0         0         0   5  161     0

第三節:條件判斷式與索引(6)

newdat <- dat[,c('K', 'LVD')]
head(newdat)
##     K LVD
## 1  NA   0
## 2 2.2  NA
## 3 3.7  NA
## 4 2.7  NA
## 5  NA   1
## 6 3.8   0
newdat[1:6,]
##     K LVD
## 1  NA   0
## 2 2.2  NA
## 3 3.7  NA
## 4 2.7  NA
## 5  NA   1
## 6 3.8   0

第三節:條件判斷式與索引(7)

head(subdat)
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1   STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## 6   STEMI 3.8   0  949     0   male 52.39062   93 224   89 386 481     60
## 7    <NA>  NA   0   NA    NA   male 44.22640   67 153  100 407 430     41
## 9    <NA> 5.7   0 1769     0 female 83.70883   71 192   92 412 448     71
## 11 NSTEMI  NA   0  308     0   male 55.62976   67 162  110 430 454     58
## 15   <NA>  NA   0  853     0 female 73.00605   66 227   86 440 461     45
##    Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1       -63    104        0        0        0        0        0        0
## 6       -36     -4        0        0        0        0        0        0
## 7        31     32        0        0        0        0        0        0
## 9        49    138        0        0        0        0        0        0
## 11       18     79        0        0        0        0        0        0
## 15       -5     36        0        0        0        0        0        0
##    rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## 1         0        1        0         0         1         0         0         0
## 6         0        1        0         0         1         0         0         0
## 7         0        0        0         0         1         0         0         0
## 9         0        0        0         0         1         0         0         0
## 11        0        0        0         0         1         0         0         0
## 15        0        1        0         0         0         0         0         0
##    rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## 1          0         0         1         0         0         0         0
## 6          1         0         0         0         0         0         0
## 7          0         0         0         0         0         0         0
## 9          0         0         0         0         0         0         0
## 11         0         0         0         0         0         0         0
## 15         0         0         0         0         0         0         0
##    rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## 1          0         0         0         0         1         0         0
## 6          0         0         0         0         1         0         0
## 7          0         0         0         0         1         0         0
## 9          0         0         0         0         1         0         0
## 11         0         0         0         0         1         0         0
## 15         0         0         0         1         1         0         0
##    rhythm.29 rhythm.30 rhythm.31 seq diff Group
## 1          0         0         0   1  161     0
## 6          0         0         0   1  131     0
## 7          0         0         0   2   86     0
## 9          0         0         0   4  121     0
## 11         0         0         0   1   95     0
## 15         0         0         0   5  161     0
idx <- c(1, 6, 7, 9, 11, 15)
subdat[idx,]
##        AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1    STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## 15    <NA>  NA   0  853     0 female 73.00605   66 227   86 440 461     45
## 18   STEMI 3.2   0  463     0 female 62.38273   85 163   86 370 440     78
## 28    <NA> 5.9   0    3     1 female 88.57705   97 149  102 356 453      0
## 34    <NA> 2.4   0  626     0 female 78.70181   60  NA  116 566 566     NA
## 48 not-AMI 2.7   0   17     0 female 50.85254   84 186   89 428 507    -14
##    Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1       -63    104        0        0        0        0        0        0
## 15       -5     36        0        0        0        0        0        0
## 18      107    140        0        0        0        0        0        0
## 28        3    117        1        0        0        0        0        0
## 34       55    121        1        1        0        0        0        0
## 48       11     -2        0        0        0        0        0        0
##    rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## 1         0        1        0         0         1         0         0         0
## 15        0        1        0         0         0         0         0         0
## 18        0        0        0         0         1         0         0         0
## 28        0        0        0         0         0         0         0         0
## 34        0        0        0         0         1         0         0         0
## 48        1        0        0         0         0         0         0         0
##    rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## 1          0         0         1         0         0         0         0
## 15         0         0         0         0         0         0         0
## 18         0         0         0         0         0         0         0
## 28         0         0         0         0         0         0         0
## 34         0         0         0         0         0         0         0
## 48         0         0         0         1         1         1         0
##    rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## 1          0         0         0         0         1         0         0
## 15         0         0         0         1         1         0         0
## 18         0         0         0         0         1         0         0
## 28         0         0         0         0         1         0         0
## 34         0         0         0         0         0         0         0
## 48         0         0         0         0         0         1         0
##    rhythm.29 rhythm.30 rhythm.31 seq diff Group
## 1          0         0         0   1  161     0
## 15         0         0         0   5  161     0
## 18         0         0         0   3   78     0
## 28         0         0         0   3   52     2
## 34         1         0         0   4   NA     3
## 48         1         0         0   3  102     0
idx <- as.character(c(1, 6, 7, 9, 11, 15))
subdat[idx,]
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1   STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## 6   STEMI 3.8   0  949     0   male 52.39062   93 224   89 386 481     60
## 7    <NA>  NA   0   NA    NA   male 44.22640   67 153  100 407 430     41
## 9    <NA> 5.7   0 1769     0 female 83.70883   71 192   92 412 448     71
## 11 NSTEMI  NA   0  308     0   male 55.62976   67 162  110 430 454     58
## 15   <NA>  NA   0  853     0 female 73.00605   66 227   86 440 461     45
##    Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1       -63    104        0        0        0        0        0        0
## 6       -36     -4        0        0        0        0        0        0
## 7        31     32        0        0        0        0        0        0
## 9        49    138        0        0        0        0        0        0
## 11       18     79        0        0        0        0        0        0
## 15       -5     36        0        0        0        0        0        0
##    rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## 1         0        1        0         0         1         0         0         0
## 6         0        1        0         0         1         0         0         0
## 7         0        0        0         0         1         0         0         0
## 9         0        0        0         0         1         0         0         0
## 11        0        0        0         0         1         0         0         0
## 15        0        1        0         0         0         0         0         0
##    rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## 1          0         0         1         0         0         0         0
## 6          1         0         0         0         0         0         0
## 7          0         0         0         0         0         0         0
## 9          0         0         0         0         0         0         0
## 11         0         0         0         0         0         0         0
## 15         0         0         0         0         0         0         0
##    rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## 1          0         0         0         0         1         0         0
## 6          0         0         0         0         1         0         0
## 7          0         0         0         0         1         0         0
## 9          0         0         0         0         1         0         0
## 11         0         0         0         0         1         0         0
## 15         0         0         0         1         1         0         0
##    rhythm.29 rhythm.30 rhythm.31 seq diff Group
## 1          0         0         0   1  161     0
## 6          0         0         0   1  131     0
## 7          0         0         0   2   86     0
## 9          0         0         0   4  121     0
## 11         0         0         0   1   95     0
## 15         0         0         0   5  161     0

第三節:條件判斷式與索引(8)

dat$LVD <- as.factor(dat$LVD)
levels(dat$LVD)
## [1] "0" "1"
subdat <- dat[dat$LVD %in% 0,]
head(subdat)
##       AMI   K LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 1   STEMI  NA   0  642     0   male 60.09729   80 241   95 333 385     40
## 6   STEMI 3.8   0  949     0   male 52.39062   93 224   89 386 481     60
## 7    <NA>  NA   0   NA    NA   male 44.22640   67 153  100 407 430     41
## 9    <NA> 5.7   0 1769     0 female 83.70883   71 192   92 412 448     71
## 11 NSTEMI  NA   0  308     0   male 55.62976   67 162  110 430 454     58
## 15   <NA>  NA   0  853     0 female 73.00605   66 227   86 440 461     45
##    Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 1       -63    104        0        0        0        0        0        0
## 6       -36     -4        0        0        0        0        0        0
## 7        31     32        0        0        0        0        0        0
## 9        49    138        0        0        0        0        0        0
## 11       18     79        0        0        0        0        0        0
## 15       -5     36        0        0        0        0        0        0
##    rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## 1         0        1        0         0         1         0         0         0
## 6         0        1        0         0         1         0         0         0
## 7         0        0        0         0         1         0         0         0
## 9         0        0        0         0         1         0         0         0
## 11        0        0        0         0         1         0         0         0
## 15        0        1        0         0         0         0         0         0
##    rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## 1          0         0         1         0         0         0         0
## 6          1         0         0         0         0         0         0
## 7          0         0         0         0         0         0         0
## 9          0         0         0         0         0         0         0
## 11         0         0         0         0         0         0         0
## 15         0         0         0         0         0         0         0
##    rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## 1          0         0         0         0         1         0         0
## 6          0         0         0         0         1         0         0
## 7          0         0         0         0         1         0         0
## 9          0         0         0         0         1         0         0
## 11         0         0         0         0         1         0         0
## 15         0         0         0         1         1         0         0
##    rhythm.29 rhythm.30 rhythm.31 seq diff Group
## 1          0         0         0   1  161     0
## 6          0         0         0   1  131     0
## 7          0         0         0   2   86     0
## 9          0         0         0   4  121     0
## 11         0         0         0   1   95     0
## 15         0         0         0   5  161     0
levels(subdat$LVD)
## [1] "0" "1"
subdat$LVD <- as.factor(as.character(subdat$LVD))
levels(subdat$LVD)
## [1] "0"

第三節:條件判斷式與索引(9)

subdat <- dat[dat$LVD %in% 0,]
levels(subdat$LVD)
## [1] "0" "1"
subdat$LVD <- as.factor(as.numeric(subdat$LVD))
levels(subdat$LVD)
## [1] "1"
subdat <- dat[dat$LVD %in% 0,]
write.csv(subdat, "data_clean.csv")

練習2:找出不合理的數值

  1. 我們假定「QTc」一定比「PR」來的更大,請你試著找出「違反(PR > QTc)」這一原則的資料。

  2. 你應該會發現無論是「QTc」或是「PR」,他們都有可能存在missing值(NA),請你試著做資料清理,僅留下「QTc比PR大的」,如果「QTc」或是「PR」是NA亦可保留。

練習2答案(1)

subdat <- dat[dat$QTc < dat$PR,]
head(subdat)
##       AMI  K  LVD time death GENDER AGE Rate PR QRSd QT QTc Axes_P Axes_QRS
## NA   <NA> NA <NA>   NA    NA   <NA>  NA   NA NA   NA NA  NA     NA       NA
## NA.1 <NA> NA <NA>   NA    NA   <NA>  NA   NA NA   NA NA  NA     NA       NA
## NA.2 <NA> NA <NA>   NA    NA   <NA>  NA   NA NA   NA NA  NA     NA       NA
## NA.3 <NA> NA <NA>   NA    NA   <NA>  NA   NA NA   NA NA  NA     NA       NA
## NA.4 <NA> NA <NA>   NA    NA   <NA>  NA   NA NA   NA NA  NA     NA       NA
## NA.5 <NA> NA <NA>   NA    NA   <NA>  NA   NA NA   NA NA  NA     NA       NA
##      Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6 rhythm.7
## NA       NA       NA       NA       NA       NA       NA       NA       NA
## NA.1     NA       NA       NA       NA       NA       NA       NA       NA
## NA.2     NA       NA       NA       NA       NA       NA       NA       NA
## NA.3     NA       NA       NA       NA       NA       NA       NA       NA
## NA.4     NA       NA       NA       NA       NA       NA       NA       NA
## NA.5     NA       NA       NA       NA       NA       NA       NA       NA
##      rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13 rhythm.14
## NA         NA       NA        NA        NA        NA        NA        NA
## NA.1       NA       NA        NA        NA        NA        NA        NA
## NA.2       NA       NA        NA        NA        NA        NA        NA
## NA.3       NA       NA        NA        NA        NA        NA        NA
## NA.4       NA       NA        NA        NA        NA        NA        NA
## NA.5       NA       NA        NA        NA        NA        NA        NA
##      rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20 rhythm.21
## NA          NA        NA        NA        NA        NA        NA        NA
## NA.1        NA        NA        NA        NA        NA        NA        NA
## NA.2        NA        NA        NA        NA        NA        NA        NA
## NA.3        NA        NA        NA        NA        NA        NA        NA
## NA.4        NA        NA        NA        NA        NA        NA        NA
## NA.5        NA        NA        NA        NA        NA        NA        NA
##      rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27 rhythm.28
## NA          NA        NA        NA        NA        NA        NA        NA
## NA.1        NA        NA        NA        NA        NA        NA        NA
## NA.2        NA        NA        NA        NA        NA        NA        NA
## NA.3        NA        NA        NA        NA        NA        NA        NA
## NA.4        NA        NA        NA        NA        NA        NA        NA
## NA.5        NA        NA        NA        NA        NA        NA        NA
##      rhythm.29 rhythm.30 rhythm.31 seq diff Group
## NA          NA        NA        NA  NA   NA    NA
## NA.1        NA        NA        NA  NA   NA    NA
## NA.2        NA        NA        NA  NA   NA    NA
## NA.3        NA        NA        NA  NA   NA    NA
## NA.4        NA        NA        NA  NA   NA    NA
## NA.5        NA        NA        NA  NA   NA    NA
criteria <- dat$QTc < dat$PR
subdat <- dat[criteria %in% TRUE,]
head(subdat)
##          AMI  K  LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 2735 not-AMI NA    0   24     0   male 90.60016   56 423   94 381 368      0
## 4755 not-AMI  3 <NA>  261     0 female 88.56513  143  49  196  NA   0     26
##      Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 2735       30     73        0        0        0        1        0        0
## 4755      -13     NA        0        0        0        0        0        0
##      rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13
## 2735        0        1        0         0         1         0         0
## 4755        0        0        0         0         0         0         0
##      rhythm.14 rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20
## 2735         0         0         0         0         0         0         0
## 4755         0         0         0         1         0         1         0
##      rhythm.21 rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27
## 2735         0         0         0         1         1         0         0
## 4755         0         0         0         0         0         0         1
##      rhythm.28 rhythm.29 rhythm.30 rhythm.31 seq diff Group
## 2735         0         0         0         0   5  367     0
## 4755         0         0         1         0   5  -94     0
subdat <- dat[(dat$QTc < dat$PR) & !(dat$QTc %in% NA) & !(dat$PR %in% NA), ]
head(subdat)
##          AMI  K  LVD time death GENDER      AGE Rate  PR QRSd  QT QTc Axes_P
## 2735 not-AMI NA    0   24     0   male 90.60016   56 423   94 381 368      0
## 4755 not-AMI  3 <NA>  261     0 female 88.56513  143  49  196  NA   0     26
##      Axes_QRS Axes_T rhythm.1 rhythm.2 rhythm.3 rhythm.4 rhythm.5 rhythm.6
## 2735       30     73        0        0        0        1        0        0
## 4755      -13     NA        0        0        0        0        0        0
##      rhythm.7 rhythm.8 rhythm.9 rhythm.10 rhythm.11 rhythm.12 rhythm.13
## 2735        0        1        0         0         1         0         0
## 4755        0        0        0         0         0         0         0
##      rhythm.14 rhythm.15 rhythm.16 rhythm.17 rhythm.18 rhythm.19 rhythm.20
## 2735         0         0         0         0         0         0         0
## 4755         0         0         0         1         0         1         0
##      rhythm.21 rhythm.22 rhythm.23 rhythm.24 rhythm.25 rhythm.26 rhythm.27
## 2735         0         0         0         1         1         0         0
## 4755         0         0         0         0         0         0         1
##      rhythm.28 rhythm.29 rhythm.30 rhythm.31 seq diff Group
## 2735         0         0         0         0   5  367     0
## 4755         0         0         1         0   5  -94     0

練習2答案(2)

criteria <- dat$QTc > dat$PR
subdat <- dat[criteria %in% c(TRUE, NA),]
write.csv(subdat, "data_clean.csv")
subdat <- dat[(dat$QTc > dat$PR) | (dat$QTc %in% NA) | (dat$PR %in% NA),]
write.csv(subdat, "data_clean.csv")

課程小結

– R語言其實是一個強大的程式語言,但這門課的課程目標並不是帶大家學會寫程式,而是帶大家理解機器學習及演算法背後的數學機制及簡單的實現方法

– 如果你想要體驗R語言做為程式語言的用途,如更複雜的資料串流、寫一個互動式網頁等方法,你需要修「R語言程式設計導論」,這會從程式語言的基礎開始介紹

– 下節課開始,我們將從大學時期就有接觸過的生物統計課程中,所包含的幾個經典算法進行介紹,讓大家了解如何在R語言內實現他們,使你們未來不需要使用SPSS