 Performance does not always matters
 Correctness > Performance
 Performance matters in repeated jobs
Why is my code so slow?
 Slowness is contagious: using other people’s slow code also slows down your code
 Bad practices: frequent reallocation, nonvectorized compuation, etc.
 The job is too complex to be run fast in any way
1
 system.time(lapply(1:10000, rnorm))

1 2
 ## user system elapsed ## 6.339 0.293 7.160

1
 system.time(for(i in 1:10000)rnorm(i))

1 2
 ## user system elapsed ## 5.668 0.350 7.298

1 2
 microbenchmark::microbenchmark(lapply = lapply(1:100, function(x) rnorm(1000)), for_loop = for (i in 1:100) rnorm(1000))

1 2 3 4
 ## Unit: milliseconds ## expr min lq mean median uq max neval ## lapply 10.102221 11.21580 12.37603 11.95803 13.25440 20.33809 100 ## for_loop 9.807794 10.92933 11.90900 11.78046 12.73029 15.50422 100

Bad practice: Frequent reallocation
Reallocate in each iteration
1 2 3 4 5 6 7 8 9
 mycusum1 < function(x){ y < numeric() sum_x = 0 for (xi in x) { sum_x < sum_x+xi y < c(y, sum_x) } return(y) }

Allocate once before iteration
1 2 3 4 5 6 7 8 9
 mycusum2 < function(x){ y < numeric(length(x)) sum_x = 0 y[1] < x[1] for (i in 2:length(x)) { y[[i]] = y[i1] + x[i] } return(y) }

1 2 3
 x < rnorm(100) library(microbenchmark) microbenchmark(mycusum1(x), mycusum2(x), cumsum(x))

1 2 3 4 5
 ## Unit: nanoseconds ## expr min lq mean median uq max neval ## mycusum1(x) 89382 101388.0 115093.35 107565.5 117922.0 277584 100 ## mycusum2(x) 136493 149666.5 171331.72 157004.0 173731.5 403474 100 ## cumsum(x) 415 576.0 1107.63 697.5 1259.0 3336 100

1 2
 x < rnorm(1000) microbenchmark(mycusum1(x), mycusum2(x), cumsum(x))

1 2 3 4 5 6 7 8 9
 ## Unit: microseconds ## expr min lq mean median uq max ## mycusum1(x) 1844.218 2280.448 3134.49053 2883.048 3649.8135 12382.487 ## mycusum2(x) 1131.872 1327.814 2282.94622 1616.663 1963.2655 16333.073 ## cumsum(x) 2.203 3.710 6.01617 5.805 7.3325 21.596 ## neval ## 100 ## 100 ## 100

Bad practice: Nonvectorized computation
1 2 3 4 5 6 7 8 9 10 11 12 13
 algo1_for < function(n) { res < 0 for (i in seq_len(n)) { res < res + 1 / i ^ 2 } res } algo1_vec < function(n) { sum(1 / seq_len(n) ^ 2) } microbenchmark(algo1_for(1e4), algo1_vec(1e4))

1 2 3 4 5 6 7
 ## Unit: microseconds ## expr min lq mean median uq ## algo1_for(10000) 4691.365 5984.8850 9216.1017 7574.1820 11114.447 ## algo1_vec(10000) 98.640 113.1185 139.9632 127.5585 149.615 ## max neval ## 28963.285 100 ## 359.271 100

The Good and the bad: copyonmodify
All symbols in R has a reference counter n. The following expressions triggers copyonmodify when n > 1:
1 2 3
 x[range] < v x[[i]] < v x$item < v

1 2 3
 x < c(1, 2, 3) y < x y[1] < 0

1 2 3 4 5 6
 v < c(1, 2, 3) f < function(x) { x[1] < 0 x } f(v)

The following code does not trigger any copy:
1 2 3 4 5 6
 x < c(1, 2, 3) tracemem(x) y < x tracemem(y) rm(x) y[1] < 0

How can I make my code faster?
 Profile code to find bottleneck
 Use builtin functions, vectorization and matrix algorithms (10100x)
 Use highperformance packages (data.table) (51000x)
 Use MKLpowered R distribution (Microsoft R Open)
 Compare code performance with system.time or microbenchmark
 Use compiler to compile R code into bytecode (28x)
 Use parallel functions to take advantage of multiple cores (46x)
 Rewrite R code into equivalent C++ code via Rcpp and extensions (01000x)