When performance matters?

  • Performance does not always matters
  • Correctness > Performance
  • Performance matters in repeated jobs

Why is my code so slow?

  • Slowness is contagious: using other people’s slow code also slows down your code
  • Bad practices: frequent reallocation, non-vectorized compuation, etc.
  • The job is too complex to be run fast in any way

Measuring code performance

1
system.time(lapply(1:10000, rnorm))
1
2
## user system elapsed
## 6.339 0.293 7.160
1
system.time(for(i in 1:10000)rnorm(i))
1
2
## user system elapsed
## 5.668 0.350 7.298
1
2
microbenchmark::microbenchmark(lapply = lapply(1:100, function(x) rnorm(1000)),
for_loop = for (i in 1:100) rnorm(1000))
1
2
3
4
## Unit: milliseconds
## expr min lq mean median uq max neval
## lapply 10.102221 11.21580 12.37603 11.95803 13.25440 20.33809 100
## for_loop 9.807794 10.92933 11.90900 11.78046 12.73029 15.50422 100

Bad practice: Frequent reallocation

Reallocate in each iteration

1
2
3
4
5
6
7
8
9
mycusum1 <- function(x){
y <- numeric()
sum_x = 0
for (xi in x) {
sum_x <- sum_x+xi
y <- c(y, sum_x)
}
return(y)
}

Allocate once before iteration

1
2
3
4
5
6
7
8
9
mycusum2 <- function(x){
y <- numeric(length(x))
sum_x = 0
y[1] <- x[1]
for (i in 2:length(x)) {
y[[i]] = y[i-1] + x[i]
}
return(y)
}
1
2
3
x <- rnorm(100)
library(microbenchmark)
microbenchmark(mycusum1(x), mycusum2(x), cumsum(x))
1
2
3
4
5
## Unit: nanoseconds
## expr min lq mean median uq max neval
## mycusum1(x) 89382 101388.0 115093.35 107565.5 117922.0 277584 100
## mycusum2(x) 136493 149666.5 171331.72 157004.0 173731.5 403474 100
## cumsum(x) 415 576.0 1107.63 697.5 1259.0 3336 100
1
2
x <- rnorm(1000)
microbenchmark(mycusum1(x), mycusum2(x), cumsum(x))
1
2
3
4
5
6
7
8
9
## Unit: microseconds
## expr min lq mean median uq max
## mycusum1(x) 1844.218 2280.448 3134.49053 2883.048 3649.8135 12382.487
## mycusum2(x) 1131.872 1327.814 2282.94622 1616.663 1963.2655 16333.073
## cumsum(x) 2.203 3.710 6.01617 5.805 7.3325 21.596
## neval
## 100
## 100
## 100

Bad practice: Non-vectorized computation

1
2
3
4
5
6
7
8
9
10
11
12
13
algo1_for <- function(n) {
res <- 0
for (i in seq_len(n)) {
res <- res + 1 / i ^ 2
}
res
}
algo1_vec <- function(n) {
sum(1 / seq_len(n) ^ 2)
}
microbenchmark(algo1_for(1e4), algo1_vec(1e4))
1
2
3
4
5
6
7
## Unit: microseconds
## expr min lq mean median uq
## algo1_for(10000) 4691.365 5984.8850 9216.1017 7574.1820 11114.447
## algo1_vec(10000) 98.640 113.1185 139.9632 127.5585 149.615
## max neval
## 28963.285 100
## 359.271 100

The Good and the bad: copy-on-modify

All symbols in R has a reference counter n. The following expressions triggers copy-on-modify when n > 1:

1
2
3
x[range] <- v
x[[i]] <- v
x$item <- v
1
2
3
x <- c(1, 2, 3)
y <- x
y[1] <- 0 # copy!
1
2
3
4
5
6
v <- c(1, 2, 3)
f <- function(x) {
x[1] <- 0 # copy!
x
}
f(v)

The following code does not trigger any copy:

1
2
3
4
5
6
x <- c(1, 2, 3)
tracemem(x)
y <- x
tracemem(y)
rm(x)
y[1] <- 0 # no copy!

How can I make my code faster?

  • Profile code to find bottleneck
  • Use built-in functions, vectorization and matrix algorithms (10-100x)
  • Use high-performance packages (data.table) (5-1000x)
  • Use MKL-powered R distribution (Microsoft R Open)
  • Compare code performance with system.time or microbenchmark
  • Use compiler to compile R code into byte-code (2-8x)
  • Use parallel functions to take advantage of multiple cores (4-6x)
  • Rewrite R code into equivalent C++ code via Rcpp and extensions (0-1000x)