- Performance does not always matters
- Correctness > Performance
- Performance matters in repeated jobs
Why is my code so slow?
- Slowness is contagious: using other people’s slow code also slows down your code
- Bad practices: frequent reallocation, non-vectorized compuation, etc.
- The job is too complex to be run fast in any way
1
| system.time(lapply(1:10000, rnorm))
|
1 2
| ## user system elapsed ## 6.339 0.293 7.160
|
1
| system.time(for(i in 1:10000)rnorm(i))
|
1 2
| ## user system elapsed ## 5.668 0.350 7.298
|
1 2
| microbenchmark::microbenchmark(lapply = lapply(1:100, function(x) rnorm(1000)), for_loop = for (i in 1:100) rnorm(1000))
|
1 2 3 4
| ## Unit: milliseconds ## expr min lq mean median uq max neval ## lapply 10.102221 11.21580 12.37603 11.95803 13.25440 20.33809 100 ## for_loop 9.807794 10.92933 11.90900 11.78046 12.73029 15.50422 100
|
Bad practice: Frequent reallocation
Reallocate in each iteration
1 2 3 4 5 6 7 8 9
| mycusum1 <- function(x){ y <- numeric() sum_x = 0 for (xi in x) { sum_x <- sum_x+xi y <- c(y, sum_x) } return(y) }
|
Allocate once before iteration
1 2 3 4 5 6 7 8 9
| mycusum2 <- function(x){ y <- numeric(length(x)) sum_x = 0 y[1] <- x[1] for (i in 2:length(x)) { y[[i]] = y[i-1] + x[i] } return(y) }
|
1 2 3
| x <- rnorm(100) library(microbenchmark) microbenchmark(mycusum1(x), mycusum2(x), cumsum(x))
|
1 2 3 4 5
| ## Unit: nanoseconds ## expr min lq mean median uq max neval ## mycusum1(x) 89382 101388.0 115093.35 107565.5 117922.0 277584 100 ## mycusum2(x) 136493 149666.5 171331.72 157004.0 173731.5 403474 100 ## cumsum(x) 415 576.0 1107.63 697.5 1259.0 3336 100
|
1 2
| x <- rnorm(1000) microbenchmark(mycusum1(x), mycusum2(x), cumsum(x))
|
1 2 3 4 5 6 7 8 9
| ## Unit: microseconds ## expr min lq mean median uq max ## mycusum1(x) 1844.218 2280.448 3134.49053 2883.048 3649.8135 12382.487 ## mycusum2(x) 1131.872 1327.814 2282.94622 1616.663 1963.2655 16333.073 ## cumsum(x) 2.203 3.710 6.01617 5.805 7.3325 21.596 ## neval ## 100 ## 100 ## 100
|
Bad practice: Non-vectorized computation
1 2 3 4 5 6 7 8 9 10 11 12 13
| algo1_for <- function(n) { res <- 0 for (i in seq_len(n)) { res <- res + 1 / i ^ 2 } res } algo1_vec <- function(n) { sum(1 / seq_len(n) ^ 2) } microbenchmark(algo1_for(1e4), algo1_vec(1e4))
|
1 2 3 4 5 6 7
| ## Unit: microseconds ## expr min lq mean median uq ## algo1_for(10000) 4691.365 5984.8850 9216.1017 7574.1820 11114.447 ## algo1_vec(10000) 98.640 113.1185 139.9632 127.5585 149.615 ## max neval ## 28963.285 100 ## 359.271 100
|
The Good and the bad: copy-on-modify
All symbols in R has a reference counter n. The following expressions triggers copy-on-modify when n > 1:
1 2 3
| x[range] <- v x[[i]] <- v x$item <- v
|
1 2 3
| x <- c(1, 2, 3) y <- x y[1] <- 0
|
1 2 3 4 5 6
| v <- c(1, 2, 3) f <- function(x) { x[1] <- 0 x } f(v)
|
The following code does not trigger any copy:
1 2 3 4 5 6
| x <- c(1, 2, 3) tracemem(x) y <- x tracemem(y) rm(x) y[1] <- 0
|
How can I make my code faster?
- Profile code to find bottleneck
- Use built-in functions, vectorization and matrix algorithms (10-100x)
- Use high-performance packages (data.table) (5-1000x)
- Use MKL-powered R distribution (Microsoft R Open)
- Compare code performance with system.time or microbenchmark
- Use compiler to compile R code into byte-code (2-8x)
- Use parallel functions to take advantage of multiple cores (4-6x)
- Rewrite R code into equivalent C++ code via Rcpp and extensions (0-1000x)