librsb  1.2.0.11
librsb Documentation

A sparse matrix library implementing the `Recursive Sparse Blocks' (RSB) matrix storage.

This is the documentation for the application programming interface (API) of the `librsb' library.
In order to use librsb, there is no need for the user to know the RSB layout and algorithms: this documentation should be sufficient.
This library is dual-interfaced; it supports: a native (`RSB') interface (with identifiers prefixed by `rsb_' or `RSB_'), and a (mostly complete) Sparse BLAS interface, as a wrapper around the RSB interface.
Many computationally intensive operations are implemented with thread parallelism, by using OpenMP.
Thread parallelism can be turned off at configure time, if desired, or limited at execution time.
Many of the computational kernels source code files (mostly internals) were automatically generated.
This user documentation concerns the end user API only; that is, neither the internals, nor the code generator.
You should consult the remaining documentation (e.g. the README file, code comments) to find information about how to modify the generator or the library internals.

This library is research software and as such, still experimental. For a first approach, we suggest to go through the Example programs and code documentation section, or the quick start examples section on this page.


Information about the supported matrix types and matrix operations resides in the rsb_types.h file.

A C/C++ user can use the native API of RSB by including the rsb.h header. The same interface is available in Fortran via the ISO C Binding interface, specified in rsb.F90.
The C header file for the The Sparse BLAS interface to librsb (blas_sparse.h, rsb_blas_sparse.F90) is blas_sparse.h.

Author
Michele Martone < michelemartone AT users DOT sourceforge DOT net >

Contents of the README file :

================================================================================

 librsb README file 
 
================================================================================
	librsb - Recursive Sparse Blocks  Matrix computations library

 A library for sparse matrix computations featuring the Recursive Sparse Blocks
 (RSB) matrix format. This format allows cache efficient and multi-threaded
 (that is, shared memory parallel) operations on large sparse matrices.
 It provides the most common operations necessary to iterative solvers, like
 matrix-vector multiplication, triangular solution, rows/columns scaling, 
 diagonal extraction / setting, blocks extraction, norm computation, formats
 conversion.  The RSB format is especially well suited for symmetric and
 transposed multiplication variants.
 Most of numerical kernels code is auto generated, and the supported numerical
 types can be chosen by the user at build time.
 This library is dual-interfaced: it can be used via the native (`RSB') 
 interface (with identifiers prefixed by `rsb_' or `RSB_'), and a Sparse BLAS
 one (`BLAS_').
 The `RSB' interface can be used from C (rsb.h header) or via modern Fortran
 ISO-C-BINDING ("rsb" module in rsb.F90).
 The Sparse BLAS interface is usable from C via the blas_sparse.h header, and
 from Fortran via the "blas_sparse" module.

================================================================================

 This (README) is the first document you should read about librsb.
 It contains basic instructions to generate, compile, install, and use librsb.
 The reference documentation for programming with librsb is contained in the
 ./doc/ source package subdirectory and when installed, placed in the
 appropriate system directories as both Unix man pages (./doc/man/) and HTML
 (./doc/html/).
 If you are a user of a previous version of librsb, see the NEWS file listing
 the changes.
 After having read this file you are welcome to ask questions to the author.

--------------------------------------------------------------------------------
		INTRODUCTION
--------------------------------------------------------------------------------

 librsb is a library for sparse matrix algebra computations.
 It is stand-alone: does not require any other library to build or work.
 It is shared memory parallel, using the OpenMP standard.
 It focuses on high performance and provides configure-time build options.
 A part of the library code is automatically generated from templates and
 macros, on the basis of the numerical types a user wishes to have supported.
 The configure script options (self documented --- not documented here) provide
 many build time options, especially with respect to debug and additional 
 verbosity.

   		INTRODUCTION
   		MAIN ASPECTS,FEATURES
   		QUICK INSTALL AND TESTING
   		LIBRARY CONFIGURATION, GENERATION, BUILD 
   		INSTALLATION, USAGE
   		EXECUTION AND ENVIRONMENT VARIABLES
   		DOCUMENTATION, EXAMPLES AND PROGRAMMING GUIDELINES
   		CONFIGURE, BUILD AND BENCHMARK EXAMPLE
   		COMPATIBILITY
   		FAQ
   		POSSIBLE / POTENTIAL FUTURE FEATURES / ENHANCEMENTS
   		ABOUT THE INTERNALS
   		BUGS
   		CONTACTS
   		CREDITS
   		LICENSE

--------------------------------------------------------------------------------
		MAIN ASPECTS,FEATURES
--------------------------------------------------------------------------------

 * very efficient (see the website for benchmark performance results)
 * threads/structure autotuning feature for additional performance
 * support for multiple numerical data types which can be turned
   on/off individually (e.g.:double, float, int, char, complex, double complex)
   at configure time
 * a Sparse BLAS interface for matrix assembly, computation, destruction
 * a code generator for its inner CSR, COO computational kernels
 * based on a recursive memory layout of submatrices
 * enough functionality to implement the most common iterative methods 
 * basic input sanitizing (index types overflow checks, etc)
 * parallel matrix assembly and conversion routines
 * auxiliary functions for matrix I/O (using the "Matrix Market" format:
   real, integer, complex and pattern are supported)
 * implemented as a building block for solvers like e.g. PSBLAS
 * dual implementation of kernels: with "full word" and "half word" indices
 * thread level (shared memory) parallelism by using OpenMP
 * basic (unoptimized) sparse matrices multiplication and summation
 * interactive usage possible by using the "sparsersb" plugin for GNU Octave 
 * complete with examples and a test suite
 * see the NEWS text file for a list of changes in each release

--------------------------------------------------------------------------------
		QUICK INSTALL AND TESTING EXAMPLE
--------------------------------------------------------------------------------
	
	# unpack the archives or get them from the repositories
	./autogen.sh	# only necessary if  configure  file does not exist
	./configure --prefix=$HOME/local/librsb/
        # see also ./configure --help for many other options
	# librsb has been configured
	make help	# provide information
	make		# build the library and test programs
	# librsb has been built
        make  qtests	# perform brief sanity tests
        make qqtests	# the same, but with less output
        make  tests	# perform extended sanity tests
	ls examples/*.c   # editable examples; build them with 'make'
	ls examples/*.F90 # editable examples; build them with 'make'
	make install	# install to $HOME/local/librsb/
	# librsb has been installed and can be used

	# for instance, try using one of the librsb examples as a model: 
	mkdir -p ~/rsb-test/ && cp examples/hello.c ~/rsb-test/myrsb.c
	# adapt hello.c to your needs and recompile:
	cd ~/rsb-test/
	export PATH=$PATH:$HOME/local/librsb/bin/
	gcc `librsb-config --I_opts`.  -c myrsb.c 
 	gcc -o myrsb myrsb.o `librsb-config --static --ldflags --extra_libs`
 	./myrsb         # run your program

--------------------------------------------------------------------------------
 		LIBRARY CONFIGURATION, GENERATION, BUILD 
--------------------------------------------------------------------------------

 This library consists of C code (C 99), partially generated by M4 macros.
 The user wishing to build librsb can specify different initial parameters 
 determining the supported matrix operations, inner explicit loop unrolling
 factors, available numerical data types and code variations.
 These parameters have to be specified to the  ./configure  script.

 The M4 macros are used at build time to generate specialized C code.
 If building from repository sources, an M4 preprocessor is required.
 Otherwise, it is necessary only when specifying ./configure  options affecting
 code generation (see ./configure --help).
 The M4 preprocessor executable can be specified explicitly to ./configure
 with the M4 environment variable or via the --with-m4 option.
 After invoking ./configure  and before running 'make' it is possible to invoke
 'make cleanall' to make sure that auto-generated code is deleted first.
 
 At configure time, it is very important that the configure script is able to
 detect the system cache memory hierarchy parameters.
 In the case it fails, you are encouraged to specify cache parameters by 
 re-running ./configure  and setting the --with-memhinfo  option.
 For instance:
    --with-memhinfo=L2:4/64/512K,L1:8/64/24K 
 These values need not be exact: they can be approximate.
 Yet they may be critical to library performance; for this reason you are
 allowed to override this default in a variety of ways.
 Read further to get a description of the memory hierarchy info string format.

 If you want to build Fortran examples, be sure of invoking ./configure with the
 --enable-fortran-examples option.  You can specify the desired Fortran compiler
 and compilation flags via the FC and FCFLAGS variables.

 Set the CPPFLAGS variable at configure time to provide additional compilation
 flags; e.g. configure to detect necessary headers in non-standard location.
 Similarly, the LDFLAGS variable can be set to contain link time options; so 
 you can use it to specify libraries to be linked to librsb examples.
 Invoke ./configure --help  for details of other relevant environment variables.
 
 After ./configure  you will see information about the current build options
 and if satisfied, invoke 'make' to build the library and the examples.

 To check for library consistence, run:

   make qtests # takes a short time, spots most problems
or
   make tests  # takes longer
 
 If these tests terminate with an error code, it may be that it has been caused
 by a bug in librsb, so please tell us (see BUGS).

--------------------------------------------------------------------------------
		INSTALLATION, USAGE
--------------------------------------------------------------------------------
 
 Once built, the library can be installed with:

	sudo make install	#'make install' installs the library system-wide

 This installs header files, binary library files, and the librsb-config
 program.
 Then, application C programs should include the rsb.h header file with
	#include <rsb.h>
 and be compiled using include options as generated by the output of 
  	`librsb-config --I_opts`.

 To link to the librsb.a static library file and its dependencies one can use 
 the output of `librsb-config --static --ldflags --extra_libs`.
 
 If you wish to use the library without installing it in the system directories,
 make sure to include the <rsb.h> header file and link to the librsb.a library
 and all the necessary additional libraries.  

 Users of pkg-config can manually copy the librsb.pc file to the appropriate
 directory to use pkg-config in a way similar to librsb-config.

--------------------------------------------------------------------------------
		EXECUTION AND ENVIRONMENT VARIABLES
--------------------------------------------------------------------------------
 
 By default, the only environment variable read by librsb is
 RSB_USER_SET_MEM_HIERARCHY_INFO, and will override configure-time and
 auto-detected settings about memory hierarchy.

 Its value is specified as n concatenated strings of the form:
	 L<l>:<a_l>/<b_l>/<c_l>
 These strings are separated by a comma (","), and each of them is made
 up from substrings where:
   <n> is the cache memories hierarchy height, from 1 upwards.
   <l> is the cache level, from 1 upwards.
   <a_l> is the cache associativity
   <b_l> is the cache block size (cache line length)
   <c_l> is the cache capacity (size)

 The <a_l>, <b_l>, <c_l> substrings consist of an integer number with an
 optional multiplier character among {K,M,G} (to specify respectively 2^10,
 2^20 or 2^30).
 Any value is permitted, a long as it is positive. Higher level cache
 capacities are required to be larger than lower level ones.
 Example strings and usage in the BASH shell:
  RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/512K,L1:8/64/32K"  <your program>
  RSB_USER_SET_MEM_HIERARCHY_INFO="L1:8/128/2M"  <your program>

 You may explicitly set this environment variable to fine-tune the library
 operation.
 If not doing so, runtime detection will be attempted; if this shall fail,
 a configure time detected value will be used.
 In some cases the configure time detection fails (e.g.: on very recent
 systems); this is not a fault of librsb but rather of the underlying
 environment.

 A default value for this memory hierarchy info string can be set at configure
 time by using the  --with-memhinfo  configure option.

 If you don't know values for these parameters, you can run the
  ./scripts/linux-sys-cache.sh 
 script to try to get a guess on a Linux system.
 On other systems, please consult the available documentation.
 E.g.: On Mac OS 10.6 it is possible to get this information by invoking
  "sysctl -a | grep cache".
  
 The librsb library achieves parallelism by using OpenMP.
 Even though librsb does not directly read any OpenMP environment variable,
 it is still affected by them (e.g. the OMP_NUM_THREADS environment variable
 specifying the number of parallel threads).
 Please consult your compiler's OpenMP implementation documentation
 for more information.

--------------------------------------------------------------------------------
		DOCUMENTATION, EXAMPLES AND PROGRAMMING GUIDELINES
--------------------------------------------------------------------------------

 The API is entirely specified in the <rsb.h> header file. This is the only
 header file the application developer should ever include to use the library.
 
 The complete API documentation is generated by the doxygen tool in the doc
 directory in both HTML and man formats, and gets installed with 'make install'.
 If you wish not to use doxygen (or don't have it) you can skip documentation
 generation by adding the "DOXYGEN=false" argument to ./configure .

 There are a number of working example programs in the "examples" directory.

 The library only declares symbols prefixed by `rsb_'.
 These symbols include those declared in rsb.h, as well as internal,
 undocumented service functions and variables.
 Therefore, to avoid name clashes, you should avoid declaring `rsb_' prefixed
 identifiers in programs using librsb.  

 If configure has been invoked with the --enable-sparse-blas-interface, then
 the corresponding `BLAS_' and `blas_' prefixed symbols will also be built.

 If after building the library, you find that it exports symbols with different
 prefixes (besides the system specific, compiler-generated symbols), please 
 report this to us -- it is a bug.

--------------------------------------------------------------------------------
	CONFIGURE, BUILD AND BENCHMARK EXAMPLE
--------------------------------------------------------------------------------

 First configure and build with reasonable options, such as (gcc, 64 bit):

  export MKLROOT=/opt/intel/mkl
  ./configure --disable-debug CC=gcc FC=gfortran CFLAGS='-Ofast' \
    --with-mkl="-static -L${MKLROOT}/lib/intel64 \
    -Wl,--start-group,-lmkl_intel_lp64,-lmkl_gnu_thread,-lmkl_core,--end-group \
    -fopenmp -lpthread"                        \
    --with-memhinfo=L2:4/64/512K,L1:8/64/24K   \
    --with-mkl-include=/opt/intel/mkl/include/ \
    --prefix=/opt/librsb-optimized/            \
    --enable-matrix-types="double,double complex"

 Or (icc, 64 bit):

  export MKLROOT=/opt/intel/mkl
 ./configure --disable-debug CC=icc FC=ifort CFLAGS='-Ofast' \
 --with-mkl="-static -L${MKLROOT}/lib/intel64 -openmp -lpthread \
 -Wl,--start-group,-lmkl_intel_lp64,-lmkl_intel_thread,-lmkl_core,--end-group" \
 --with-memhinfo=L2:4/64/512K,L1:8/64/24K   \
 --with-mkl-include=/opt/intel/mkl/include/ \
 --prefix=/opt/librsb-optimized/            \
 --enable-matrix-types="double,double complex"

  or (32 bit):

  ./configure --disable-debug CC=gcc FC=gfortran CFLAGS='-Ofast' \
   --with-memhinfo=L2:4/64/512K,L1:8/64/24K     \
   --with-mkl="-static -L/opt/intel/mkl/lib/ia32/ -lmkl_solver \
   -Wl,--start-group,-lmkl_intel,-lmkl_gnu_thread,-lmkl_core,--end-group \
   -fopenmp -lpthread" \
   --with-mkl-include=/opt/intel/mkl/include/   \
   --prefix=/opt/librsb-optimized/              \
   --enable-matrix-types="double,double complex"

and then

  make       # builds library and test programs
  make tests # optional

 In the above example, optional use of the MKL library is configured in.
 However, librsb does not use MKL in any way: it is only used by the
 "rsbench" test program.

 Say you want to quickly benchmark the library for a quick SPMV speed test.
 You have a valid Matrix Market file containing a matrix, A.mtx,
 and you want to benchmark librsb with it on 1 and 4 cores, performing
 100 sparse matrix-vector multiply iterations.
 Then do a serial test first:
 ./rsbench -oa -Ob -f A.mtx -qH -R -n1 -t100 --verbose 
 and then a parallel test:
 OMP_NUM_THREADS=4 ./rsbench -oa -Ob -f A.mtx -qH -R -n1,4 -t100 --verbose

 You can add option --compare-competitors to enable comparisons to the MKL,
 provided it has been configured in.
 If not specifying a type (argument to the -T option), the default will be
 used.
 If configured in at build time, choices may be -T D (where D is the BLAS
 prefix for "double"), -T Z (Z stands for "double complex") and so on.
 You can specify "-T :" to mean all of the configured types.
 Output of 'rsbench' shall be easy to understand or parse.

 For more options and configure information, invoke:

 ./rsbench --help

 To get the built in defaults, invoke the following:
 ./rsbench -oa -Ob --help
 ./rsbench --help
 ./rsbench --version
 ./rsbench -I
 ./rsbench -C

 An example Matrix Market matrix file contents:

%%MatrixMarket matrix coordinate pattern general
% This is a comment.
% See other examples in the distributed *.mtx files.
2 2 3
1 1
2 1
2 2

--------------------------------------------------------------------------------
		COMPATIBILITY
--------------------------------------------------------------------------------
 
 This library has been built and tested on Unix machines.
 Microsoft Windows users might try building librsb under the Cygwin environment.

 Some tricks may have to be used on IBM AIX. For instance, adding the
 --without-xdr or the --without-zlib switch to ./configure.
 Your mileage may vary.
 AIX's "make" program may give problems; use the GNU version "gmake" instead;
 the same shall be done with the M4 interpreter.

 This library was developed mostly on Debian Linux and using only free software.

--------------------------------------------------------------------------------
		FAQ
--------------------------------------------------------------------------------

 Q: Can you provide me good configure defaults for an optimized build ?
 A: Default './configure' options are appropriate for an optimized build.
    A good starting point for gcc is ./configure CC=gcc CFLAGS='-O3'. 
    However, if you need complex arithmetic and are using GCC, I'd advise using
    -Ofast. On many versions of GCC I observed sub-optimal complex arithmetic
    performance with -O3, regardless use of e.g. -mtune=native.
    For more, consult your compiler documentation (e.g. man gcc, man icc),
    and learn about the best flags for your specific platform.
    Striping your executable (make install-strip for librsb's rsbench) may
    help.

 Q: I am a beginner and I wish librsb to be very verbose when I invoke
    library interface functions incorrectly.
    Can you provide me good configure defaults for such a "debug" build ?
 A: Yes: ./scripts/configure_for_debug.sh

 Q: **I am using CC=clang and FC=gfortran.
    Linking Fortran examples fails: it seems like some library is missing.
    What should I do?**
 A: Did you try `make FCLD=clang`?

 Q: I have machine X, compiler Y, compiling flags Z; is SpMV performance P with
    matrix M good ?
 A: In general, hard to tell. However you can `make hinfo.log' and send me 
    (see CONTACTS) the hinfo.log file and your matrix in Matrix Market format
    (well, please don't send matrices by email but rather upload them
    somewhere on the web and send an URL to them).
    The hinfo.log file will contain useful compile and machine information.
    Then I *may* get an idea about the performance you should get with that
    matrix on that computer.

 Q: What is the Sparse BLAS ?
 A: It's a programming interface specification:
    [sparseblas_2001]:
    BLAS Technical Forum Standard, Chapter 3, Sparse BLAS
    http://www.netlib.org/blas/blast-forum/chapter3.pdf
    [dhp_2002]:
    An Overview of the Sparse Basic Linear Algebra Subprograms:
     The New Standard from the BLAS Technical Forum
    IAIN S. DUFF, CERFACS and Rutherford Appleton Laboratory
    MICHAEL A. HEROUX, Sandia National Laboratories
    ROLDAN POZO, National Institute of Standards and Technology
    [dv_2002]:
    Algorithm 818:
     A Reference Model Implementation of the Sparse BLAS in Fortran 95
    IAIN S. DUFF, CERFACS, France and Atlas Centre, RAL, England
    CHRISTOF VÖMEL, CERFACS, France

 Q: Is there an easy way to profile librsb usage in my application ?
 A: Yes: build with --enable-librsb-stats and extract time elapsed in librsb
    via e.g.: RSB_REINIT_SINGLE_VALUE_GET(RSB_IO_WANT_LIBRSB_ETIME,&dt,errval).

 Q: Why another sparse matrix library ?
 A: This library is the fruit of the author's PhD work, focused on researching
    improved multi threaded and cache friendly matrix storage schemes for
    PSBLAS.

 Q: What are the key features of this library when compared to other ones ?
 A: Recursive storage, a code generator, parallel BLAS operations
    (including matrix assembly, matrix-matrix multiplication, transposed
     matrix-vector multiply), a battery of tests, a Sparse BLAS
     interface and a free software licensing.
 
 Q: How do I detect librsb from my package's configure script ?
 A: Add to your configure.ac:
    AH_TEMPLATE([HAVE_LIBRSB])
    AC_CHECK_FUNC([rsb_lib_init],AC_DEFINE([HAVE_LIBRSB],[1],[librsb detected]))
    then rerun autoconf and invoke configure as:
    ./configure	CFLAGS=`librsb-config   --cflags` \
                LDFLAGS=`librsb-config  --ldflags --extra_libs`
   
 Q: How is correctness checked in the librsb test suite ?
 A: Different linear system generators and tester programs are being used to
    brute-force-test several routines and input combinations as possible.
    See 'make tests'; and run/edit the following tester programs if you are
    curious:
    test -f sbtc && ./sbtc||true # Sparse BLAS checker (C interface based)
    test -f sbtf && ./sbtf||true # Sparse BLAS checker (Fortran interface, opt.)
    ./rsbench -Q 10.0 # 10 seconds brute-test

 Q: Why did you write the library in C and not in C++ ?
 A: Mainly...
    Because C can be easily interfaced with C++ and Fortran.
    Because using a debugger under full fledged C++ is a headache.
    Because of the C's 'restrict' keyword.
    
 Q: Why did you use C and not Fortran ?
 A: This library is slightly system-oriented, and system calls interfacing is
    much easier in C. Also C's pointers arithmetic support plays a crucial role.

 Q: Is there a quick and easy way to perform an artificial performance
    test with huge matrices without having to program ?
 A: Sure. The following lines generate matrices of a specified dimension.
    You can play with them by changing the matrix size, for instance. 
    ./rsbench  -oa -Ob -qH -R --dense 1                    --verbose
    ./rsbench  -oa -Ob -qH -R --dense 1024                 --verbose
    ./rsbench  -oa -Ob -qH -R --lower 1024 --as-symmetric  --verbose
    ./rsbench  -oa -Ob -qH -R --dense 1000 --gen-lband 10 --gen-uband 3
    ./rsbench  -oa -Ob -qH -R --generate-diagonal 1000

 Q: I've found a bug! What should I do ?
 A: First please make sure it is really a bug: read the documentation, check,
    double check.
    Then you can write a description of the problem, with a minimal program
    source code and data to replicate it.
    Then you can jump to the CONTACTS details section.

 Q: Is it possible to build matrices of, say, long double or 
    long double complex or int or short int ?
 A: Yes, it's not a problem. You should invoke the configure script accordingly,
    e.g.: --enable-matrix-types="long double".
    If this breaks code compilation, feel free to contact the author
    (see the CONTACTS section).

 Q: Is there a way to compare the performance of this library to some other
    high performance libraries ?
 A: If you build rsbench with support for the Intel MKL library, then you
    can do performance comparisons with e.g.:
    # ./rsbench -oa -Ob -qH -R --gen-diag 100 --compare-competitors --verbose
    or use the following script:
    # bench/dense.sh ' '
    Or even better, check out the --write-performance-record feature ; for 
    details see the output of:
    # rsbench -oa -Ob --help

 Q: Is there a non-threaded (serial) version of librsb ?
 A: Yes: you can configure the library to work serially (with no OpenMP).
    See ./configure --help. 

 Q: Is this library thread-safe ?
 A: Probably yes: no static buffers are being used, and reentrant C standard
    library functions are invoked.

 Q: Does the librsb library run on GPUs or Intel MIC ?
 A: It has been built on Intel MIC once, but not tested.

 Q: I built and compiled the code without enabling any BLAS type (S,D,C,Z), 
     and both `make qtests' and `make tests' ran successfully outside the
     ./examples directory, but `make tests' breaks within ./examples directory.
 A: Well, the tests passed because the examples testing was simply skipped.
    The example programs need at least one of these types to work.

 Q: At build time I get many "unused variable" warnings. Why ? 
 A: librsb accommodates many code generation and build time configuration
    options. Some combinations may turn off compilation of certain parts of the
    code, leading some variables to be unused.

 Q: Are there papers to read about the RSB format and algorithms ?
 A: Yes, the following:

    Michele Martone
    Efficient Multithreaded Untransposed, Transposed or Symmetric Sparse
    Matrix-Vector Multiplication with the Recursive Sparse Blocks Format
    Parallel Computing 40(7): 251-270 (2014)
    http://dx.doi.org/10.1016/j.parco.2014.03.008

    Michele Martone
    Cache and Energy Efficiency of Sparse Matrix-Vector Multiplication for
    Different BLAS Numerical Types with the RSB Format
    Proceedings of the ParCo 2013 conference, September 2013, Munich, Germany
    PARCO 2013: 193-202
    http://dx.doi.org/10.3233/978-1-61499-381-0-193

    Michele Martone, Marcin Paprzycki, Salvatore Filippone: An Improved Sparse
    Matrix-Vector Multiply Based on Recursive Sparse Blocks Layout.
    LSSC 2011: 606-613
    http://dx.doi.org/10.1007/978-3-642-29843-1_69

    Michele Martone, Salvatore Filippone, Salvatore Tucci, Marcin Paprzycki,
    Maria Ganzha: Utilizing Recursive Storage in Sparse Matrix-Vector
    Multiplication - Preliminary Considerations. CATA 2010: 300-305
    
    Michele Martone, Salvatore Filippone, Marcin Paprzycki, Salvatore Tucci:
    Assembling Recursively Stored Sparse Matrices. IMCSIT 2010: 317-325
    http://www.proceedings2010.imcsit.org/pliks/205.pdf

    Michele Martone, Salvatore Filippone, Pawel Gepner, Marcin Paprzycki,
    Salvatore Tucci: Use of Hybrid Recursive CSR/COO Data Structures in Sparse
    Matrices-Vector Multiplication. IMCSIT 2010: 327-335
    http://dx.doi.org/10.1109/SYNASC.2010.72

    Michele Martone, Salvatore Filippone, Marcin Paprzycki, Salvatore Tucci:
    On BLAS Operations with Recursively Stored Sparse Matrices.
    SYNASC 2010: 49-56
    http://dx.doi.org/10.1109/SYNASC.2010.72

    Michele Martone, Salvatore Filippone, Marcin Paprzycki, Salvatore Tucci:
    On the Usage of 16 Bit Indices in Recursively Stored Sparse Matrices.
    SYNASC 2010: 57-64
    http://dx.doi.org/10.1109/SYNASC.2010.77

 Q: I have M4-related problems on IBM SP5/SP6 (my M4 preprocessor tries to
    regenerate code but it fails). What should I do ?
 A: A fix is to use a GNU M4 implementation 
    e.g.: M4=/opt/freeware/bin/m4 ./configure ...
    e.g.: M4=gm4 ./configure ...
    or execute:
    touch *.h ; touch *.c ; make
    Or "./configure; make"  the library on a different machine, then build 
    a sources archive with `make dist', and use it on the original machine.
   
--------------------------------------------------------------------------------
	POSSIBLE / POTENTIAL FUTURE FEATURES / ENHANCEMENTS
--------------------------------------------------------------------------------

 * auxiliary functions for numerical vectors
 * CSC,BCSR,BCSC and other formats
 * (optional) loop unrolled kernels for BCSR/BCSC
 * performance prediction/estimation facilities (experimental)
 * types of the blocks, nonzeroes, and coordinates indices can be user specified
 * a code generator for BCSR, BCSC, VBR, VBC kernels
 * full support for BCSR, BCSC storages 
 * automatic matrix blocking selection (for BCSR/BCSC) 
 * an arbitrary subset of block size kernels can be specified to be generated
 * full support for VBR,VBC storages
 * recursive storage variants of blocked formats (non uniform blocking)
 * more auto-tuning and prediction control
 * use of assembly functions or intrinsics
 * the use of context variables (scenarios with multiple libraries using
   librsb completely independently at the same time are not supported)
 * enhanced in-place matrix assembly functions (useful for really huge matrices)

--------------------------------------------------------------------------------
   		ABOUT THE INTERNALS
--------------------------------------------------------------------------------

 The following good practices are being followed during development of librsb.

 - only symbols beginning with `rsb_' or `blas_' are being exported.
 - internal functions are usually prefixed by `rsb__'.
 - no library internal function shall call any API function.

 If by using/inspecting the code you notice any of the above is being violated,
 please report about it.

--------------------------------------------------------------------------------
		BUGS
--------------------------------------------------------------------------------

 If you encounter any bug (e.g.: mismatch of library/program behaviour and
 documentation, please let me know about it by sending me (see CONTACTS) all
 relevant information (code snippet, originating data/matrix, config.log), in
 such a way that I can replicate the bug behaviour on my machines.
 If the bug occurred when using rsb interfaced to some proprietary library,
 please make sure the bug is in librsb.

 It may be of great help to you to build the library with the debug compile
 options on (e.g.: CFLAGS='-O0 -ggdb'), and with appropriate library verbosity
 levels (--enable-internals-error-verbosity, --enable-interface-error-verbosity
 and --enable-io-level  options to configure) to better understand the program 
 behaviour before sending a report.

 Make sure you have the latest version of the library when reporting a bug. 

--------------------------------------------------------------------------------
		CONTACTS
--------------------------------------------------------------------------------

 You are welcome to contact the librsb author:

  Michele Martone < michelemartone AT users DOT sourceforge DOT net >
 
 Please specify "librsb" in the "Subject:" line of your emails.

 More information and downloads on  http://sourceforge.net/projects/librsb

 Mailing list: https://lists.sourceforge.net/lists/listinfo/librsb-users
 
--------------------------------------------------------------------------------
		CREDITS	(in alphabetical order)
--------------------------------------------------------------------------------

For librsb-1.2:
 Marco Atzeri provided testing, patches to build librsb under cygwin over
  nearly each release, and spotted a few bugs.
 Fabio Cassini spotted an unintended conversion via sparsersb and +.
 John Donoghue spotted a rendering corner case bug.
 Sebastian Koenig spotted a computational bug in -rc6.
 Rafael Laboissiere helped a lot improving the documentation and the build 
  system.
 Mu-Chu Lee provided a patch to fix sorting code crashing with > 10^9 nnz.
 Constanza Manassero spotted an inconsistency in the usmm/ussm interface.
 Markus Muetzel helped debugging rsb_mtx_rndr().
 Dmitri Sergatskov spotted a double free in rsb_mtx_rndr() and convinced about
  the necessity of sanitizing memory usage.

For librsb-1.1:
 Gilles Gouaillardet provided a patch for OpenMP-encapsulated I/O.
 Marco Restelli provided with testing and detailed comments and suggestions.

For librsb-1.0:
 Francis Casson helped with testing and documentation reviewing during the first
 release.
 Nitya Hariharan helped revising early versions of the documentation.

--------------------------------------------------------------------------------
		LICENSE
--------------------------------------------------------------------------------

 This software is distributed under the terms of the Lesser GNU Public License
 version 3 (LGPLv3) or later.
 See the COPYING file for a copy of the LGPLv3.

 librsb is free software.
 To support it, consider writing "thank you" to the author and acknowledging use
 of librsb in your publications. That would be very appreciated.

--------------------------------------------------------------------------------

For a quick startup, consider the following two programs.

The first, using the internal RSB interface:

/*
Copyright (C) 2008-2020 Michele Martone
This file is part of librsb.
librsb is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
librsb is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
License for more details.
You should have received a copy of the GNU Lesser General Public
License along with librsb; see the file COPYING.
If not, see <http://www.gnu.org/licenses/>.
*/
/*!
\ingroup rsb_doc_examples
@file
@author Michele Martone
@brief This is a first "hello RSB" example program.
\include hello.c
*/
#include <rsb.h> /* librsb header to include */
#include <stdio.h> /* printf() */
#include <stdlib.h> // EXIT_SUCCESS
int main(const int argc, char * const argv[])
{
/*!
A Hello-RSB program.
This program shows how to use the rsb.h interface correctly to:
- initialize the library using #rsb_lib_init()
- set library options using #rsb_lib_set_opt()
- revert such changes
- allocate (build) a single sparse matrix in the RSB format
using #rsb_mtx_alloc_from_coo_const()
- prints information obtained via #rsb_mtx_get_info_str()
- multiply the matrix times a vector using #rsb_spmv()
- deallocate the matrix using #rsb_mtx_free()
- finalize the library using #rsb_lib_exit(RSB_NULL_EXIT_OPTIONS)
In this example, we use #RSB_DEFAULT_TYPE as matrix type.
This type depends on what was configured at library build time.
* */
struct rsb_mtx_t *mtxAp = NULL; /* matrix structure pointer */
const int bs = RSB_DEFAULT_BLOCKING;
const int brA = bs, bcA = bs;
const RSB_DEFAULT_TYPE one = 1;
const rsb_nnz_idx_t nnzA = 4; /* matrix nonzeroes count */
const rsb_coo_idx_t nrA = 3; /* matrix rows count */
const rsb_coo_idx_t ncA = 3; /* matrix columns count */
/* nonzero row indices coordinates: */
rsb_coo_idx_t IA[] = {0,1,2,2};
/* nonzero column indices coordinates: */
rsb_coo_idx_t JA[] = {0,1,2,2};
RSB_DEFAULT_TYPE VA[] = {11,22,32,1};/* values of nonzeroes */
RSB_DEFAULT_TYPE X[] = { 0, 0, 0 }; /* X vector's array */
const RSB_DEFAULT_TYPE B[] = { -1, -2, -5 }; /* B vector's array */
char ib[200];
printf("Hello, RSB!\n");
printf("Initializing the library...\n");
{
printf("Error initializing the library!\n");
goto err;
}
printf("Correctly initialized the library.\n");
printf("Attempting to set the"
" RSB_IO_WANT_EXTRA_VERBOSE_INTERFACE library option.\n");
{
rsb_int_t evi=1;
/* Setting a single optional library parameter. */
errval = rsb_lib_set_opt(
if(errval != RSB_ERR_NO_ERROR)
{
char errbuf[256];
rsb_strerror_r(errval,&errbuf[0],sizeof(errbuf));
printf("Failed setting the"
" RSB_IO_WANT_EXTRA_VERBOSE_INTERFACE"
" library option (reason string:\n%s).\n",errbuf);
{
printf("This error may be safely ignored.\n");
}
else
{
printf("Some unexpected error occurred!\n");
goto err;
}
}
else
{
printf("Setting back the "
"RSB_IO_WANT_EXTRA_VERBOSE_INTERFACE"
" library option.\n");
evi = 0;
&evi);
errval = RSB_ERR_NO_ERROR;
}
}
VA,IA,JA,nnzA,typecode,nrA,ncA,brA,bcA,
RSB_FLAG_NOFLAGS /* default format will be chosen */
|RSB_FLAG_DUPLICATES_SUM/* duplicates will be summed */
,&errval);
if((!mtxAp) || (errval != RSB_ERR_NO_ERROR))
{
printf("Error while allocating the matrix!\n");
goto err;
}
printf("Correctly allocated a matrix.\n");
printf("Summary information of the matrix:\n");
/* print out the matrix summary information */
rsb_mtx_get_info_str(mtxAp,"RSB_MIF_MATRIX_INFO__TO__CHAR_P",
ib,sizeof(ib));
printf("%s",ib);
printf("\n");
if((errval =
rsb_spmv(RSB_TRANSPOSITION_N,&one,mtxAp,B,1,&one,X,1))
{
printf("Error performing a multiplication!\n");
goto err;
}
printf("Correctly performed a SPMV.\n");
rsb_mtx_free(mtxAp);
printf("Correctly freed the matrix.\n");
{
printf("Error finalizing the library!\n");
goto err;
}
printf("Correctly finalized the library.\n");
printf("Program terminating with no error.\n");
return EXIT_SUCCESS;
err:
rsb_perror(NULL,errval);
printf("Program terminating with error.\n");
return EXIT_FAILURE;
}

And the second, using the Sparse BLAS interface:

/*
Copyright (C) 2008-2020 Michele Martone
This file is part of librsb.
librsb is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
librsb is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
License for more details.
You should have received a copy of the GNU Lesser General Public
License along with librsb; see the file COPYING.
If not, see <http://www.gnu.org/licenses/>.
*/
/*!
\ingroup rsb_doc_examples
@file
@author Michele Martone
@brief This is a first "hello RSB" example program using
a Sparse BLAS interface.
\include hello-spblas.c
*/
#include <rsb.h> /* for rsb_lib_init */
#include <blas_sparse.h> /* Sparse BLAS on the top of librsb */
#include <stdio.h> /* printf */
#include <stdlib.h> // EXIT_SUCCESS
int main(const int argc, char * const argv[])
{
/*!
* A Hello/Sparse BLAS program.
*
* This program shows how to use the blas_sparse.h
* interface correctly to:
*
* - initialize the library using #rsb_lib_init()
* - allocate (build) a single sparse matrix in the RSB
* format using #BLAS_duscr_begin()/#BLAS_duscr_insert_entries()
* /#BLAS_duscr_end()
* - extract one matrix element with #BLAS_dusget_element()
* - multiply the matrix times a vector using #BLAS_dusmv()
* - deallocate the matrix using #BLAS_usds()
* - finalize the library using
* #rsb_lib_exit(#RSB_NULL_EXIT_OPTIONS)
*/
#ifndef RSB_NUMERICAL_TYPE_DOUBLE
printf("'double' type configured out."
" Please reconfigure the library with it and recompile.\n");
return EXIT_SUCCESS;
#else /* RSB_NUMERICAL_TYPE_DOUBLE */
blas_sparse_matrix A = blas_invalid_handle; /* handle for A */
const int nnz = 4; /* number of nonzeroes of matrix A */
const int nr = 3; /* number of A's rows */
const int nc = 3; /* number of A's columns */
/* A's nonzero elements row indices (coordinates): */
int IA[] = { 0, 1, 2, 2 };
/* A's nonzero elements column indices (coordinates): */
int JA[] = { 0, 1, 0, 2 };
/* A's nonzero values (matrix coefficients): */
double VA[] = { 11.0, 22.0, 13.0, 33.0 };
/* the X vector's array: */
double X[] = { 0.0, 0.0, 0.0 };
/* the B vector's array: */
double B[] = { -1.0, -2.0, -2.0 };
/* the (known) result array: */
double AB[] = { 11.0+26.0, 44.0, 66.0+13.0 };
/* rsb error variable: */
int i;
printf("Hello, RSB!\n");
/* initialize the library */
{
goto err;
}
printf("Correctly initialized the library.\n");
/* initialize a matrix descriptor */
A = BLAS_duscr_begin(nr,nc);
{
goto err;
}
/* specify properties (e.g.: symmetry)*/
{
goto err;
}
/* get properties (e.g.: symmetry) */
{
printf("Symmetry property non set ?!\n");
goto err;
}
/* insert the nonzeroes (here, all at once) */
if( BLAS_duscr_insert_entries(A, nnz, VA, IA, JA)
{
goto err;
}
/* finalize (allocate) the matrix build */
{
goto err;
}
printf("Correctly allocated a matrix.\n");
VA[0] = 0.0;
if( BLAS_dusget_element(A, IA[0], JA[0], &VA[0]) )
{
goto err;
}
/* a check */
if( VA[0] != 11.0 )
{
goto err;
}
/* compute X = X + (-1) * A * B */
if(BLAS_dusmv(blas_no_trans,-1,A,B,1,X,1))
{
goto err;
}
for( i = 0 ; i < nc; ++i )
if( X[i] != AB[i] )
{
printf("Computed SPMV result seems wrong. Terminating.\n");
goto err;
}
printf("Correctly performed a SPMV.\n");
/* deallocate matrix A */
if( BLAS_usds(A) )
{
goto err;
}
printf("Correctly freed the matrix.\n");
/* finalize the library */
{
goto err;
}
printf("Correctly finalized the library.\n");
printf("Program terminating with no error.\n");
return EXIT_SUCCESS;
err:
rsb_perror(NULL,errval);
printf("Program terminating with error.\n");
return EXIT_FAILURE;
#endif /* RSB_NUMERICAL_TYPE_DOUBLE */
}

For more, see the Example programs and code section.