High Performance Data Manipulation(Ken Ren)

When performance matters?

Performance does not always matters
Correctness > Performance
Performance matters in repeated jobs

Why is my code so slow?

Slowness is contagious: using other people’s slow code also slows down your code
Bad practices: frequent reallocation, non-vectorized compuation, etc.
The job is too complex to be run fast in any way

Measuring code performance

1	system.time(lapply(1:10000, rnorm))

1 2	## user system elapsed ## 6.339 0.293 7.160

1	system.time(for(i in 1:10000)rnorm(i))

1 2	## user system elapsed ## 5.668 0.350 7.298

1 2	microbenchmark::microbenchmark(lapply = lapply(1:100, function(x) rnorm(1000)), for_loop = for (i in 1:100) rnorm(1000))

## Unit: milliseconds
##      expr       min       lq     mean   median       uq      max neval
##    lapply 10.102221 11.21580 12.37603 11.95803 13.25440 20.33809   100
##  for_loop  9.807794 10.92933 11.90900 11.78046 12.73029 15.50422   100

Bad practice: Frequent reallocation

Reallocate in each iteration

mycusum1 <- function(x){
  y <- numeric()
  sum_x = 0
  for (xi in x) {
    sum_x <- sum_x+xi
    y <- c(y, sum_x)
  }
  return(y)
}

Allocate once before iteration

mycusum2 <- function(x){
  y <- numeric(length(x))
  sum_x = 0
  y[1] <- x[1]
  for (i in 2:length(x)) {
   y[[i]] = y[i-1] + x[i]
  }
  return(y)
}

1
2
3

x <- rnorm(100)
library(microbenchmark)
microbenchmark(mycusum1(x), mycusum2(x), cumsum(x))

## Unit: nanoseconds
##         expr    min       lq      mean   median       uq    max neval
##  mycusum1(x)  89382 101388.0 115093.35 107565.5 117922.0 277584   100
##  mycusum2(x) 136493 149666.5 171331.72 157004.0 173731.5 403474   100
##    cumsum(x)    415    576.0   1107.63    697.5   1259.0   3336   100

1 2	x <- rnorm(1000) microbenchmark(mycusum1(x), mycusum2(x), cumsum(x))

## Unit: microseconds
##         expr      min       lq       mean   median        uq       max
##  mycusum1(x) 1844.218 2280.448 3134.49053 2883.048 3649.8135 12382.487
##  mycusum2(x) 1131.872 1327.814 2282.94622 1616.663 1963.2655 16333.073
##    cumsum(x)    2.203    3.710    6.01617    5.805    7.3325    21.596
##  neval
##    100
##    100
##    100

Bad practice: Non-vectorized computation

algo1_for <- function(n) {
  res <- 0
  for (i in seq_len(n)) {
    res <- res + 1 / i ^ 2
  }
  res
}
algo1_vec <- function(n) {
  sum(1 / seq_len(n) ^ 2)
}
microbenchmark(algo1_for(1e4), algo1_vec(1e4))

## Unit: microseconds
##              expr      min        lq      mean    median        uq
##  algo1_for(10000) 4691.365 5984.8850 9216.1017 7574.1820 11114.447
##  algo1_vec(10000)   98.640  113.1185  139.9632  127.5585   149.615
##        max neval
##  28963.285   100
##    359.271   100

The Good and the bad: copy-on-modify

All symbols in R has a reference counter n. The following expressions triggers copy-on-modify when n > 1:

1
2
3

x[range] <- v
x[[i]] <- v
x$item <- v

1
2
3

x <- c(1, 2, 3)
y <- x
y[1] <- 0 # copy!

v <- c(1, 2, 3)
f <- function(x) {
  x[1] <- 0 # copy!
  x
}
f(v)

The following code does not trigger any copy:

x <- c(1, 2, 3)
tracemem(x)
y <- x
tracemem(y)
rm(x)
y[1] <- 0 # no copy!

How can I make my code faster?

Profile code to find bottleneck
Use built-in functions, vectorization and matrix algorithms (10-100x)
Use high-performance packages (data.table) (5-1000x)
Use MKL-powered R distribution (Microsoft R Open)
Compare code performance with system.time or microbenchmark
Use compiler to compile R code into byte-code (2-8x)
Use parallel functions to take advantage of multiple cores (4-6x)
Rewrite R code into equivalent C++ code via Rcpp and extensions (0-1000x)

CHANGSHUN的BLOG

Changshun

High Performance Data Manipulation(Ken Ren)

Sep 19, 2016

When performance matters?

Why is my code so slow?

Measuring code performance

Bad practice: Frequent reallocation

Reallocate in each iteration

Allocate once before iteration

Bad practice: Non-vectorized computation

The Good and the bad: copy-on-modify

How can I make my code faster?