R with Intel MKL for Windows

R for Windows that is downloaded from CRAN uses the reference BLAS and LAPACK implementations for linear algebra operations. The reference implementations are very stable and compatible across platforms, but are not optimized for performance. By switching to a highly optimized BLAS library such as the Intel MKL, you could see significant performance improvements for linear algebra computations. In the following I will show how to replace the standard BLAS and LAPACK implementations with the Intel MKL.

I assume that you already have Intel MKL installed on your Windows computer, otherwise installation instructions can be found here.

In the first step, run the R benchmark script 1-r-benchmark-25.R with the reference BLAS and LAPACK such that you can later compare with the Intel MKL. For example, on my computer the benchmark script reported

R Benchmark 2.5
===============
Number of times each test is run__________________________:  3


   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.689999999999979 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.796666666666662 
Sorting of 7,000,000 random values__________________ (sec):  0.716666666666678 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  13.4233333333333 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  6.40333333333335 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  1.54051756267069 


   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.270000000000001 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.676666666666658 
Determinant of a 2500x2500 random matrix____________ (sec):  3.61666666666667 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  5.39666666666669 
Inverse of a 1600x1600 random matrix________________ (sec):  3.01999999999998 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  1.9478854034592 


   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.586666666666626 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.213333333333367 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  0.220000000000027 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.0433333333333318 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.280000000000086 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.235982506922657 




Total time for all 15 tests_________________________ (sec):  36.3533333333334 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.891326097331175 
                      --- End of test ---

Then, go to the folder where the R dll’s are installed and rename the BLAS and LAPACK dll’s. On my computer I executed the following commands from a command prompt as Administrator

cd "C:\Program Files\R\R-4.0.0\bin\x64"
rename Rblas.dll Rblas.dll.orig
rename Rlapack.dll Rlapack.dll.orig

Next, go to the folder where the Intel MKL dll’s are installed and copy the files mkl_rt.dll and mkl_intel_thread.dll to the folder where the R dll’s are installed. On my computer I executed the following, again from a command prompt as Administrator

cd "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\redist\intel64_win\mkl"
copy mkl_rt.dll "C:\Program Files\R\R-4.0.0\bin\x64\Rblas.dll"
copy mkl_rt.dll "C:\Program Files\R\R-4.0.0\bin\x64\Rlapack.dll"
copy mkl_intel_thread.dll "C:\Program Files\R\R-4.0.0\bin\x64"

Finally, add the folder where the Intel MKL dll’s are installed and another Intel MKL folder to the PATH. On my computer I added the highlighted paths

Now, you should be able to run the R benchmark script 1-r-benchmark-25.R with the Intel MKL BLAS and LAPACK. On my computer the benchmark script this time reported

R Benchmark 2.5
===============
Number of times each test is run__________________________:  3


   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  0.62 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.743333333333334 
Sorting of 7,000,000 random values__________________ (sec):  0.673333333333333 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.183333333333332 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.100000000000001 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.424574994841722 


   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.233333333333333 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.206666666666666 
Determinant of a 2500x2500 random matrix____________ (sec):  0.090000000000001 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.0899999999999999 
Inverse of a 1600x1600 random matrix________________ (sec):  0.12 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.130686701541752 


   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.529999999999999 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.193333333333335 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  0.203333333333331 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.0333333333333338 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.25 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.214199495002168 




Total time for all 15 tests_________________________ (sec):  4.27 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.228210041938561
                      --- End of test ---

As can be seen from the Benchmark results, R with the Intel MKL is significantly faster, in my example runs based on the total time of the benchmark script about 8 times faster than R with the default BLAS and LAPACK.

Adrian Trapletti
Adrian Trapletti
PhD, CEO

Quant, software engineer, and consultant mostly investment industry. Long-term contributor and package author R Project for Statistical Computing.