librsb : A shared memory parallel sparse matrix computations library for the Recursive Sparse Blocks format


Introduction

librsb is a library for sparse matrix computations featuring the Recursive Sparse Blocks (RSB) matrix format. This format allows cache efficient and multi-threaded (that is, shared memory parallel) operations on large sparse matrices. The most common operations necessary to iterative solvers are available, e.g.: matrix-vector multiplication, triangular solution, rows/columns scaling, diagonal extraction / setting, blocks extraction, norm computation, formats conversion. The RSB format is especially well suited for symmetric and transposed multiplication variants. Most numerical kernels code is auto generated, and the supported numerical types can be chosen by the user at build time.
librsb implements the Sparse BLAS standard, as specified in the [BLAS Forum] documents.

You might be interested in:

The Recursive Sparse Blocks format

The RSB format is hierarchical. Its basic idea is a recursive subdivision of a matrix in the two dimensions. The subdivision algorithm attempts to partition the matrix until the individual submatrices occupy approximately the amount of memory contained in the CPU caches. The number of cores/thread available also plays a role.
The following picture shows the symmetric matrix audikw_1 (943695 rows, 39297771 nonzeroes, obtained from the University of Florida Sparse Matrix Collection ) partitioned for different numbers of executing threads (respectively 1,2,4 and 8). Because of the symmetry, the upper triangle representation has been omitted.

Example of matrix partitioning for 1 thread.

Example of matrix partitioning for 2 threads.

Example of matrix partitioning for 4 threads.

Example of matrix partitioning for 8 threads.

The memory layout of the individual sparse submatrices follows a Z (or Morton) ordering. Each submatrix is stored either in coordinate or CSR (Compressed Sparse Rows) format, and the numerical index type size is chosen in order to minimize memory consumption.
This layout enables efficient and multithreaded algorithms implementing the Sparse BLAS operations.
The algorithms and the performance of RSB in practice are described in a series of papers; give a look at the papers or the performance sections.

Performance results

See either autotuning results for 1.1 version or absolute (older) 1.0 version results (on the same machine).

Results for librsb-1.1

Results obtained on the Intel Sandy Bridge (2 x Intel Xeon E5-2680); autotuned versus untuned results, and versus the Intel Math Kernel Library (MKL)'s 11.0-5 Compressed Sparse Rows (CSR) implementation, using the best picked threads and structure with use of the new autotuning feature. These results are computed on large matrices publicly available on the University of Florida Sparse Matrix Collection.
The points indicate speedup (old time to new time ratio) for the different matrix-vector multiplication variants for different matrices and numerical types.
These results suggest that librsb is competitive with the Intel MKL library implementation (mkl_dcsrmv, mkl_scsrmv, mkl_zcsrmv, mkl_ccsrmv) of the CSR, especially on the symmetric and transposed matrix-vector multiplication variants.

Autotuning ("Best threads") Results for librsb-1.1, comparison to Intel MKL, all BLAS types

The (very cheap --- it costs the time of a few multiplications) best threads autotuning strategy has been applied to MKL and RSB; here, ratio of RSB speed to MKL's.

Performance Comparison Plot

Autotuning Results for librsb-1.1, unsymmetric, all BLAS types

Untransposed (N) and transposed (T) multiplication results for both threads autotuning and structure autotuning. Please note that the experiment with matrix "arabic" with type Z did not fit in the memory so the corresponding result is missing.

Performance Comparison Plot

Autotuning Results for librsb-1.1, symmetric, all BLAS types

Speedup to base RSB after either threads and structure autotuning. Structure autotuning by definition yields better results, hence no legend is necessary in the following.

Performance Comparison Plot

Results for librsb-1.0

Results obtained on the Intel Sandy Bridge (2 x Intel Xeon E5-2680), versus Intel Math Kernel Library (MKL)'s 10.3-7 Compressed Sparse Rows (CSR) implementation, using 16 threads. These results are computed on large matrices publicly available on the University of Florida Sparse Matrix Collection.
The bars in the graphs are proportional to the performance of the different matrix-vector multiplication variants for different matrices.
The "Flops/sec" measure unit we use is defined as the ratio of the nonzeroes count of each matrix by a single operation time.
Please note how different matrices exhibit different performance because of their nonzeroes pattern (structure) impacting on the multiplication algorithm performance.
These results suggest that librsb is competitive with the Intel MKL library implementation (mkl_dcsrmv) of the CSR, especially on the symmetric and transposed matrix-vector multiplication variants.

Symmetric Sparse Matrix-Vector Multiply

Performance Comparison Plot

Unsymmetric Transposed Sparse Matrix-Vector Multiply

Performance Comparison Plot

Unsymmetric Untransposed Sparse Matrix-Vector Multiply

Performance Comparison Plot

Download

User Documentation

Reference Documentation and README

The complete documentation in HTML format for the latest released version is available here.
The same documentation is included in the sources archive and provides several examples for starting using librsb.

Papers / Presentations


Build Instructions (tested on GNU/Linux)

Building the librsb library, C headers, documentation

There are many options to build a customized library (see documentation).
However, building with reasonable defaults can be as simple as:
# unpack the sources archive
./configure --help # there are many options available
./configure --enable-fortran-module-install --enable-matrix-types="double, double complex" \
	CC=gcc FC=gfortran CFLAGS=-O3 FCFLAGS=-O3 --prefix=$HOME/local
# If using Intel's compiler 'icc', use 'LD=xild'; e.g.: configure LD=xild CC=icc FC=ifort ... 
# With GCC, you may get faster code with "-march=native -mfpmath=sse -msse2"...
#
# Don't ignore configure's output diagnostics.
#
make cleanall      # cleans up generated code
make               # builds the library and examples
make qtests        # quick consistency test
make tests         # long consistency test
make install       # install to $HOME/local
Please note that if you intend to link librsb to shared libraries (e.g.: in case of sparsersb's sparsersb.oct) then in the above you should specify flags to build position indipendent code, e.g.: CFLAGS="-O3 -fPIC" FCFLAGS="-O3 -fPIC" .
If you want to have a debug build with error verbosity activated, you may configure with something like:
 ./configure --enable-allocator-wrapper --enable-interface-error-verbosity=2 \
 --enable-internals-error-verbosity=1 CFLAGS="-O0 -ggdb" FCFLAGS="-O0 -ggdb" #... 

Building the sparsersb plugin for GNU Octave

Given a working GNU Octave installation (complete with development programs), one may proceed building the sparsersb plugin:
# be sure you compiled librsb with the position independent code flag (-fPIC)
# be sure to have a fairly modern Octave version (e.g.: 3.6.3)
# 'export' the code from octave-forge (or 'checkout'):
svn export http://svn.code.sf.net/p/octave/code/trunk/octave-forge/main/sparsersb/ sparsersb
#
# Alternative 1: with pkg
#
# make an archive:
tar cvzf sparsersb-1.0.0.tar.gz  sparsersb
# invoke octave and build with installed librsb path and compile options:
CXXFLAGS=-O3 \
RSBINCDIR=`$HOME/local/bin/librsb-config --prefix`/include \
RSBLIBDIR=`$HOME/local/bin/librsb-config --libdir` \
 octave
# inside the octave prompt:
pkg install -verbose "sparsersb-1.0.0.tar.gz"
# this is it!
pkg load sparsersb
help pkg
help sparsersb
#
# Alternative 2: without pkg
#
cd sparsersb/src   # step in the build directory
sh autogen.sh      # builds the configure (you need automake, autoconf, ... for this)
./configure CXXFLAGS=-O3 LIBRSB_CONFIG=$HOME/local/bin/librsb-config
make               # builds sparsersb.oct
make tests         # consistency test
octave             # the "sparsersb" keyword is now available in GNU Octave

Usage Examples

The sparsersb plugin for GNU Octave

The sparsersb plugin for GNU Octave allows you to manipulate sparse matrices using librsb without the need of programming in C or Fortran.
You can reuse any of your GNU Octave programs operating on sparse matrices, and instead of creating matrices using the "sparse" keyword, use "sparsersb".
Example:
octave:1> R=(rand(3)>.6)
R =

   0   0   0
   0   0   0
   1   0   1

octave:2> A_octave=sparse(R)
A_octave =

Compressed Column Sparse (rows = 3, cols = 3, nnz = 2 [22%])

  (3, 1) ->  1
  (3, 3) ->  1

octave:3> A_librsb=sparsersb(R)
A_librsb =

Recursive Sparse Blocks  (rows = 3, cols = 3, nnz = 2 [22%])

  (3, 1) -> 1
  (3, 3) -> 1

octave:4> help sparsersb
...
Matrices created in this way support nearly all the operations supported by native GNU Octave ones (see the GNU Octave's Sparse Matrices documentation.
On some operations, conversion to Octave's formats may occur.
See the build instructions section on how to build sparsersb.

A simple C program using the Sparse BLAS interface

Let us consider a simple program creating a sparse matrix and multiplying it by a vector.
Create a source code program file (say, myrsb.c) with a text editor:
#include <rsb.h>         /* for rsb_lib_init */
#include <blas_sparse.h> /* Sparse BLAS on the top of librsb */
int main(const int argc, char * const argv[])
{
	blas_sparse_matrix matrix = blas_invalid_handle; /* handle for A */
	const int nnzA=4;	/* number of nonzeroes of matrix A */
	const int  nrA=3,ncA=3;	/* number of A's rows and columns */
	const int  incX=1,incB=1; /* spacing of X, B entries */
	int   IA[]={ 0, 1, 2, 2 }; /* row    indices*/
	int   JA[]={ 0, 1, 0, 2 }; /* column indices*/
	double VA[]={ 11.0, 22.0, 13.0, 33.0  }; /* coefficients */
	double X[]={  0.0,  0.0,  0.0 };
	double B[]={ -1.0, -2.0, -2.0 };
	rsb_err_t errval = RSB_ERR_NO_ERROR;
	if((errval = rsb_lib_init(RSB_NULL_INIT_OPTIONS)) != RSB_ERR_NO_ERROR) return -1;
	matrix = BLAS_duscr_begin(nrA,ncA);                             /* begin matrix creation */
	if(matrix==blas_invalid_handle) return -1;                    /* check */
	if( BLAS_ussp(matrix,blas_lower_symmetric) != 0 ) return -1;  /* set symmetry property */
	if( BLAS_duscr_insert_entries(matrix, nnzA, VA, IA, JA) != 0 ) return -1; /* insert some nonzeroes */
	if( BLAS_duscr_end(matrix) == blas_invalid_handle ) return -1; /* end creation */
	if( BLAS_dusmv(blas_no_trans,-1,matrix,B,incB,X,incX)) return -1;    /* X := X + (-1) * A * B  */
	if( BLAS_usds(matrix)) return -1;                              /* destroy matrix */
	if((errval = rsb_lib_exit(RSB_NULL_EXIT_OPTIONS)) != RSB_ERR_NO_ERROR) return -1;
	return 0;
}
then compile, build, execute:
export PATH=$PATH:$HOME/local/librsb/bin/
gcc `librsb-config --I_opts`  -c myrsb.c 
gcc -o myrsb myrsb.o `librsb-config --static --ldflags --extra_libs`
# you may have to add OpenMP flags here above (e.g.: -openmp or -fopenmp)
./myrsb
See the reference documentation for more complete examples.

A simple Fortran program using the Sparse BLAS interface

The same program (say, myrsb.F90) in Fortran:
      PROGRAM MAIN
      USE rsb             ! Fortran interface to C's rsb.h
      USE blas_sparse     ! Fortran interface librsb's Sparse BLAS
      IMPLICIT NONE
      INTEGER :: istat = 0             ! error status variable
      INTEGER :: A                     ! BLAS sparse matrix descriptor
      INTEGER :: transt=blas_no_trans  ! no transposition
      INTEGER :: incX=1, incB=1        ! X, B vectors increment
      REAL(KIND=8) :: alpha=3
      INTEGER :: nnzA=4, nrA=3, ncA=3  ! nonzeroes, rows, columns of matrix A
      INTEGER :: IA(4)=(/1, 2, 3, 3/)  ! row    indices
      INTEGER :: JA(4)=(/1, 2, 1, 3/)  ! column indices
      REAL(KIND=8) :: VA(4)=(/11.0, 22.0, 13.0, 33.0/) ! coefficients
      REAL(KIND=8) :: X(3)=(/   0,    0,    0/)
      REAL(KIND=8) :: B(3)=(/-1.0, -2.0, -2.0/)
      TYPE(C_PTR),PARAMETER :: EO = C_NULL_PTR ! 
      istat = rsb_lib_init(EO)                 ! librsb initialization
      IF(istat.NE.0)STOP                       ! check
      ! the 'd' in 'duscr_begin' specifies 'double precision' type
      CALL duscr_begin(nrA,ncA,A,istat)        ! begin matrix creation
      ! in the following calls the type is implicit due to the arguments
      CALL ussp(A,blas_lower_symmetric,istat)  ! set symmetry property
      CALL uscr_insert_entries(A,nnzA,VA,IA,JA,istat) ! insert some nonzeroes
      CALL uscr_end(A,istat)                   ! end matrix creation
      CALL usmv(transt,alpha,A,B,incB,X,incX,istat) ! X := X + (3) * A * B 
      ! please note the 'usds' is type independent
      CALL usds(A,istat)                       ! destroy matrix
      istat = rsb_lib_exit(EO)                 ! librsb finalization
      END PROGRAM MAIN
Note how default indexing in Fortran is 1-based.
To compile, build, execute:
export PATH=$PATH:$HOME/local/librsb/bin/
gfortran `librsb-config --I_opts`  -c myrsb.F90 
gfortran -o myrsb myrsb.o `librsb-config --static --ldflags --extra_libs`
# you may have to add OpenMP flags here above (e.g.: -openmp or -fopenmp)
./myrsb
The same program can be rewritten using the native RSB interface via the ISO C Binding feature:
      PROGRAM MAIN
      USE rsb
      USE ISO_C_BINDING
      IMPLICIT NONE
      INTEGER,TARGET :: errval
      INTEGER :: transt = RSB_TRANSPOSITION_N  ! no transposition
      INTEGER :: incX = 1, incB = 1        ! X, B vectors increment
      REAL(KIND=8),TARGET :: alpha = 3, beta = 1
      INTEGER :: nnzA = 4, nrA = 3, ncA = 3     ! nonzeroes, rows, columns of matrix A
      INTEGER,TARGET :: IA(4) = (/1, 2, 3, 3/)  ! row    indices
      INTEGER,TARGET :: JA(4) = (/1, 2, 1, 3/)  ! column indices
      INTEGER :: typecode = RSB_NUMERICAL_TYPE_DOUBLE
      INTEGER :: flags = RSB_FLAG_DEFAULT_MATRIX_FLAGS+RSB_FLAG_SYMMETRIC
      REAL(KIND=8),TARGET :: VA(4) = (/11.0, 22.0, 13.0, 33.0/)     ! coefficients
      REAL(KIND=8),TARGET :: X(3) = (/   0,    0,    0/)
      REAL(KIND=8),TARGET :: B(3) = (/-1.0, -2.0, -2.0/)
      TYPE(C_PTR),TARGET  :: mtxAp=C_NULL_PTR
      TYPE(C_PTR)  :: mtxApp = C_NULL_PTR
      errval = rsb_lib_init(C_NULL_PTR)                             ! librsb initialization
      IF(errval.NE.0) STOP "error calling rsb_lib_init"
      mtxAp = rsb_mtx_alloc_from_coo_begin(nnzA,typecode,nrA,ncA,flags,C_LOC(errval)) ! begin matrix creation
      errval = rsb_mtx_set_vals(mtxAp,C_LOC(VA),IA,JA,nnzA,flags) ! insert some nonzeroes
      mtxApp = C_LOC(mtxAp)                                         ! create a matrix pointer pointer
      IF(errval.NE.0) STOP "error calling rsb_mtx_set_vals"
      errval = rsb_mtx_alloc_from_coo_end(mtxApp)                   ! end matrix creation
      IF(errval.NE.0) STOP "error calling rsb_mtx_alloc_from_coo_end"
      errval = rsb_spmv(transt,C_LOC(alpha),mtxAp,C_LOC(X),incX,C_LOC(beta),C_LOC(B),incB) ! X:=X+(3)*A*B 
      IF(errval.NE.0) STOP "error calling rsb_spmv"
      mtxAp = rsb_mtx_free(mtxAp)                                   ! destroy matrix
      IF(errval.NE.0) STOP "error calling rsb_mtx_free"
      errval = rsb_lib_exit(RSB_NULL_EXIT_OPTIONS)                  ! librsb finalization
      IF(errval.NE.0) STOP "error calling rsb_lib_exit"
      PRINT *, "rsb module Fortran test is ok"
      END PROGRAM MAIN
And linked the usual way. Just make sure you enabled the Fortran headers installation at configure time.
The advantage of using this interface is that it matches exactly the C version; this applies to the documentation as well.

Change log

A summary of user visible changes is here,

librsb's license

librsb is copyright (C) 2008-2014 Michele Martone and is licensed under the Lesser GNU Public License version 3 (LGPLv3) or later.
You find a FAQ about the GPL here, while here some discussion specific to the GPLv3 .

Mailing list

Mailing list is at: http://sourceforge.net/p/librsb/mailman/.

Contact librsb's author


This webpage participates to the Viewable With Any Browser Campaign. Valid HTML 4.01 Transitional GNU Public License Frequently Asked Questions