R的极客理想系列文章,涵盖了R的思想,使用,工具,创新等的一系列要点,以我个人的学习和体验去诠释R的强大。
R语言作为统计学一门语言,一直在小众领域闪耀着光芒。直到大数据的爆发,R语言变成了一门炙手可热的数据分析的利器。随着越来越多的工程背景的人的加入,R语言的社区在迅速扩大成长。现在已不仅仅是统计领域,教育,银行,电商,互联网….都在使用R语言。
要成为有理想的极客,我们不能停留在语法上,要掌握牢固的数学,概率,统计知识,同时还要有创新精神,把R语言发挥到各个领域。让我们一起动起来吧,开始R的极客理想。
关于作者:
- 张丹(Conan), 程序员Java,R,Javascript
- weibo:@Conan_Z
- blog: http://blog.fens.me
- email: bsspirit@gmail.com
转载请注明出处:
http://blog.fens.me/r-martix-openblas/
前言
昨天在IBM的大会上,又一次见到了OpenBlas主题的分享,这次必须要试一下。我第一次了解OpenBlas是在2年前的R语言大会上,听到了OpenBlas的各种优势,不过听完也就过去了。2年后再来这个项目,团队成员稳定,项目进展顺利,已经广泛接受,再不尝试一下就真的要落伍了。
目录
- OpenBlas介绍
- R和OpenBlas的安装
- 让R语言加速
1. OpenBlas介绍
OpenBlas是一个开源项目,是由 中科院软件所并行软件与计算科学实验室 发起的基于GotoBLAS2 1.13 BSD版的开源BLAS库高性能实现。
BLAS(Basic Linear Algebra Subprograms 基础线性代数程序集)是一个应用程序接口(API)标准,用以规范发布基础线性代数操作的数值库(如矢量或矩阵乘法)。该程序集最初发布于1979年,并 用于建立更大的数值程序包(如LAPACK)。在高性能计算领域,BLAS被广泛使用。例如,LINPACK的运算成绩则很大程度上取决于BLAS中子程 序DGEMM的表现。为提高性能,各軟硬件厂商则针对其產品对BLAS接口实现进行高度优化。
2. R和OpenBlas的安装
OpenBlas可以为各种语言底层,提供矩阵计算的性能提升,那么让我们把R和OpenBlas结合试试吧!
本机的系统环境:
- Linux Ubuntu 14.01.1
- CPU 双核 Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
- 内存 4G
通过命令查看系统参数
# 操作系统
~ cat /etc/issue
Ubuntu 14.04.1 LTS \n \l
# CPU
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
stepping : 4
microcode : 0x428
cpu MHz : 2600.048
cache size : 20480 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm
bogomips : 5200.09
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
stepping : 4
microcode : 0x428
cpu MHz : 2600.048
cache size : 20480 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm
bogomips : 5200.09
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
# 内存
~ cat /proc/meminfo
MemTotal: 4046820 kB
MemFree: 1572372 kB
Buffers: 40588 kB
Cached: 709684 kB
SwapCached: 0 kB
Active: 1953940 kB
Inactive: 418084 kB
Active(anon): 1621840 kB
Inactive(anon): 5732 kB
Active(file): 332100 kB
Inactive(file): 412352 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 24 kB
Writeback: 0 kB
AnonPages: 1623792 kB
Mapped: 34936 kB
Shmem: 5828 kB
Slab: 58024 kB
SReclaimable: 45252 kB
SUnreclaim: 12772 kB
KernelStack: 1512 kB
PageTables: 8980 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 2023408 kB
Committed_AS: 2556460 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 9664 kB
VmallocChunk: 34359723308 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1562624 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 28672 kB
DirectMap2M: 4296704 kB
首先,我们要安装R语言的运行环境,在Linux Ubuntu中一条命令就可以搞定。
# 安装R语言
~ sudo apt-get install r-base
#查看R语言的版本
~ R --version
R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
http://www.gnu.org/licenses/.
我们安装的R语言最新版本是3.2.2。
接下来,我们跑一个矩阵的计算,让2个3000行列的方阵相乘。
# 启动R
~ R
# 方阵相乘
> x <- matrix(1:(3000 * 3000), 3000, 3000)
# 计算耗时
> system.time(tmp <- x %*% x)
user system elapsed
33.329 0.332 33.788
接下来,我们安装OpenBlas来提高计算性能。在Ubuntu中安装OpenBlas非常简单,只需要一条命令就可以搞定。
~ sudo apt-get install libopenblas-base
切换blas的计算引擎,使用openblas替换libblas。
~ sudo update-alternatives --config libblas.so.3
There are 2 choices for the alternative libblas.so.3 (providing /usr/lib/libblas.so.3).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/openblas-base/libblas.so.3 40 auto mode
1 /usr/lib/libblas/libblas.so.3 10 manual mode
2 /usr/lib/openblas-base/libblas.so.3 40 manual mode
Press enter to keep the current choice[*], or type selection number: 0
选择0,使用openblas-base引擎。
我们重新打开R运行环境,再次执行刚才的矩阵相乘计算。
~ R
> x <- matrix(1:(3000 * 3000), 3000, 3000)
# 计算耗时
> system.time(tmp <- x %*% x)
user system elapsed
7.391 0.127 3.869
神奇的事情发生了,速度提升了4倍多。由于OpenBlas可以对矩阵计算加速,那么我们对所有矩阵操作都做一下测试吧。
3. 让R语言加速
通过互联网我找到了两个用于R语言性能测试的脚本,我们可以在自己的环境中测试一下。Benchmarks脚本的发布页,脚本代码下载。
我发现Revolution Analytics公司也用这个脚本进行了测试,并对比了Revolution企业版和R的官方发行版的区别。
下载脚本
~ wget http://brettklamer.com/assets/files/statistical/faster-blas-in-r/R-benchmark-25.R
--2015-09-24 12:06:05-- http://brettklamer.com/assets/files/statistical/faster-blas-in-r/R-benchmark-25.R
Resolving brettklamer.com (brettklamer.com)... 199.96.156.242
Connecting to brettklamer.com (brettklamer.com)|199.96.156.242|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13666 (13K)
Saving to: ‘R-benchmark-25.R’
100%[============================================================================================>] 13,666 --.-K/s in 0s
2015-09-24 12:06:06 (203 MB/s) - ‘R-benchmark-25.R’ saved [13666/13666]
执行脚本。
~ R
# 运行脚本
> source("R-benchmark-25.R")
Loading required package: Matrix
Loading required package: SuppDists
R Benchmark 2.5
===============
Number of times each test is run__________________________: 3
I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 1.103
2400x2400 normal distributed random matrix ^1000____ (sec): 0.812333333333333
Sorting of 7,000,000 random values__________________ (sec): 0.962666666666667
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 1.547
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.828000000000001
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.957989159036612
II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 0.365333333333335
Eigenvalues of a 640x640 random matrix______________ (sec): 1.43466666666667
Determinant of a 2500x2500 random matrix____________ (sec): 0.895999999999998
Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.832000000000003
Inverse of a 1600x1600 random matrix________________ (sec): 0.724333333333334
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.814310314522547
III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.776666666666661
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.269666666666671
Grand common divisors of 400,000 pairs (recursion)__ (sec): 0.570666666666663
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.506666666666665
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.533000000000001
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.536138937440438
Total time for all 15 tests_________________________ (sec): 12.162
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 0.747841037469598
--- End of test ---
我们再切换到,R语言默认的blas引擎运行一次。
~ sudo update-alternatives --config libblas.so.3
There are 2 choices for the alternative libblas.so.3 (providing /usr/lib/libblas.so.3).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/openblas-base/libblas.so.3 40 auto mode
1 /usr/lib/libblas/libblas.so.3 10 manual mode
2 /usr/lib/openblas-base/libblas.so.3 40 manual mode
Press enter to keep the current choice[*], or type selection number: 1
update-alternatives: using /usr/lib/libblas/libblas.so.3 to provide /usr/lib/libblas.so.3 (libblas.so.3) in manual mode
选择1,切换到libblas引擎。重启R语言环境,并执行脚本。
~ R
> source("R-benchmark-25.R")
Loading required package: Matrix
Loading required package: SuppDists
R Benchmark 2.5
===============
Number of times each test is run__________________________: 3
I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 1.09366666666667
2400x2400 normal distributed random matrix ^1000____ (sec): 0.817333333333333
Sorting of 7,000,000 random values__________________ (sec): 0.954333333333333
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 15.3033333333333
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 7.155
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.95463154033118
II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 0.363666666666669
Eigenvalues of a 640x640 random matrix______________ (sec): 1.131
Determinant of a 2500x2500 random matrix____________ (sec): 5.061
Cholesky decomposition of a 3000x3000 matrix________ (sec): 5.634
Inverse of a 1600x1600 random matrix________________ (sec): 4.142
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 2.87278425762591
III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 0.775000000000006
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 0.259666666666665
Grand common divisors of 400,000 pairs (recursion)__ (sec): 0.633333333333345
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 0.533666666666666
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 0.647999999999996
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 0.602780428790226
Total time for all 15 tests_________________________ (sec): 44.505
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.5014435867612
--- End of test ---
从运行结果看到,用OpenBlas库在做矩阵计算时,性能优势是非常明显的。完成15个测试,OpenBlas库用时12秒,而默认的Blas库用时44秒。仅仅是切换一个底层算法库的成本,就可以让计算性能得到非常大的提升,各位R的小伙伴赶紧用起来吧。