A sparse matrix library implementing the `Recursive Sparse Blocks' (RSB) matrix storage.

This is the documentation for the application programming interface (API) of the `librsb' library.
In order to use librsb, there is no need for the user to know the RSB layout and algorithms: this documentation should be sufficient.
librsb is dual-interfaced; it supports: a native (`RSB') interface (with identifiers prefixed by `rsb_' or `RSB_'), and a (mostly complete) Sparse BLAS interface, as a wrapper around the RSB interface.
Many computationally intensive operations are implemented with thread parallelism, by using OpenMP.
Thread parallelism can be turned off at configure time, if desired, or controlled at execution time.
Many of the computational kernels source code files (mostly internals) were automatically generated.
This user documentation concerns the end user API only.
This library was born as research software.
Over the years it has been stabilized and its test suite expanded considerably.
There are projects making available librsb in Octave, Python or Julia; you might want to check them out as well.

For a first approach, see the Example programs and code documentation section, or the quick start examples section on this page.

Information about the supported matrix types and matrix operations resides in the rsb_types.h file.

A C/C++ user can use the native API of RSB by including the rsb.h header.
The same interface is available in Fortran via the ISO C Binding interface, specified in rsb.F90.
An experimental C++ API is available in (optional) rsb.hpp.
The C header file for The Sparse BLAS interface to librsb (blas_sparse.h, rsb_blas_sparse.F90) is blas_sparse.h (notice there is no native C++ equivalent).

Author: Michele Martone < michelemartone AT users DOT sourceforge DOT net >

Contents of the README file :

librsb README file
================================================================================

librsb - Recursive Sparse Blocks Matrix computations library
--------------------------------------------------------------------------------

A library for sparse matrix computations featuring the Recursive Sparse Blocks
(RSB) matrix format, geared towards cache efficient and multi-threaded
(that is, shared memory parallel) operations on large sparse matrices.
It provides the most common operations necessary to iterative solvers, like
matrix-vector and matrix-matrix multiplication, triangular solution, scaling of
rows/columns, diagonal extraction / setting, blocks extraction, computation of
norm, formats conversion.
The RSB format is especially well suited for symmetric and transposed
multiplication variants.
Much of the source code is machine-generated; the supported numerical types can
be chosen by the user at build time.
This library is dual-interfaced: it can be used via the native (`RSB`)
interface (with function identifiers prefixed by `rsb_` or `RSB_`), and a
Sparse BLAS one (function identifiers prefixed by `BLAS_`).
The `RSB` interface can be used from C/C++ (`rsb.h` header) or via modern
Fortran ISO-C-BINDING (`rsb` module in `rsb.F90`).
New in version 1.3 is the C++-only API in `rsb.hpp` (`Rsb`-prefixed classes).
The Sparse BLAS interface is usable from C/C++ via the `blas_sparse.h` header,
and from Fortran via the `blas_sparse` module.

--------------------------------------------------------------------------------

This (README) is the first document you should read about librsb.
It contains basic instructions to configure, build, install, and use librsb.
The reference documentation for programming with librsb is contained in the
`./doc/` source package subdirectory and when installed, placed in the
appropriate system directories as both Unix man pages (`./doc/man/`) and HTML
(`./doc/html/`).
If you have used a previous release version of librsb, see the `NEWS` file
for a succinct list of user-relevant changes.

INTRODUCTION
--------------------------------------------------------------------------------

librsb is a library for sparse matrix algebra computations.
It is stand-alone: it does not require any other library to build or work.
It is shared memory parallel, using constructs from the OpenMP standard.
It is focused on high performance SpMV and SpMM operations.
A part of the library code is automatically generated from templates and
macros, on the basis of the numerical types a user wishes to have supported.
The configure script provides many build time options, especially with respect
to debug and additional verbosity (see `configure --help`).
Defaults should be fine for most users, though.

- INTRODUCTION
- MAIN ASPECTS,FEATURES
- QUICK INSTALL AND TESTING EXAMPLE
- LIBRARY CONFIGURATION, GENERATION, BUILD
- INSTALLATION, USAGE
- EXECUTION AND ENVIRONMENT VARIABLES
- DOCUMENTATION, EXAMPLES AND PROGRAMMING GUIDELINES
- CONFIGURE, BUILD AND BENCHMARK EXAMPLE
- COMPATIBILITY
- FAQ
- POSSIBLE FUTURE ENHANCEMENTS
- ABOUT THE INTERNALS
- BUGS
- CONTACTS
- CREDITS
- LICENSE

MAIN ASPECTS,FEATURES
--------------------------------------------------------------------------------

* conceived for efficient multithreaded SpMV/SpMM
* thread level (shared memory) parallelism by using OpenMP
* threads/structure auto-tuning feature for additional performance
* support for multiple numerical data types which can be turned
on/off individually (e.g.:`double`, `float`, `int`, `char`, `float complex`,
`double complex`) at configure time
* a Sparse BLAS interface for matrix assembly, computation, destruction
* code generators for the inner CSR, COO computational kernels
* based on a recursive memory layout of submatrices
* enough functionality to implement the most common iterative methods
* basic input sanitizing (index types overflow checks, etc)
* parallel matrix assembly and conversion routines
* auxiliary functions for matrix I/O (using the "Matrix Market" format:
real, integer, complex and pattern are supported)
* implemented as a building block for solvers like e.g. PSBLAS, MaPHyS++
* dual implementation of inner kernels: with "full-" and "half-word" indices
* basic (unoptimized) sparse matrices multiplication and summation
* interactive usage possible by using the `sparsersb` plugin for GNU Octave
or the PyRSB Python package
* complete with examples and a test suite
* see the `NEWS` text file for a detailed list of changes in each release

QUICK INSTALL AND TESTING EXAMPLE
--------------------------------------------------------------------------------

# unpack the archives or get them from the repositories
./autogen.sh # only necessary if configure file does not exist
./configure --prefix=$HOME/local/librsb/ # with installation destination
# see also ./configure --help for many other options
# librsb has been configured
make help # provide information
make # build the library and test programs
# librsb has been built
make qtests # perform brief sanity tests
make qqtests # the same, but with less output
make tests # perform extended sanity tests
ls examples/*.c # editable C examples; build them with 'make'
ls examples/*.cpp # editable C++ examples; build them with 'make'
ls examples/*.F90 # editable Fortran examples; build them with 'make'
make install # install to $HOME/local/librsb/
make itests # perform post-installation tests on examples
# librsb has been installed and can be used

# for instance, try using one of the librsb examples as a base:
mkdir -p ~/rsb-test/ && cp examples/hello.c ~/rsb-test/myrsb.c
# adapt hello.c to your needs and recompile:
cd ~/rsb-test/
export PATH=$PATH:$HOME/local/librsb/bin/
gcc `librsb-config --I_opts`. -c myrsb.c
gcc -o myrsb myrsb.o `librsb-config --static --ldflags --extra_libs`
./myrsb # run your program

LIBRARY CONFIGURATION, GENERATION, BUILD
--------------------------------------------------------------------------------

A good portion of this library is C code generated via M4 macros.
Certain generation options (notably, supported numerical types) can be set via
the `./configure` script.
The M4 preprocessor executable can be specified explicitly to `./configure`
with the `M4` environment variable or via the `--with-m4` option.
After invoking `./configure` and before running `make` it is possible to invoke
`make cleanall` to make sure that auto-generated code is deleted first.

The `configure` script attempts at detecting the system cache memory hierarchy
parameters and prints them out.
If detection fails, you can pass them via the `--with-memhinfo=...` option.
For instance, declaring having 3 MB of L3, 256 kB of L2, and 32 kB of L1:
`./configure --with-memhinfo="L3:12/64/3072K,L2:8/64/256K,L1:8/64/32K" `
These values need not be exact: also approximate can help achieving better
performance.
You can override the configure-time default at run time: see API documentation.
Read a few sections further for a description of the memory hierarchy info
string format.

If you want to enable Fortran examples, be sure of running `./configure` with
the `--enable-fortran-examples` option. You can specify the desired Fortran
compiler and compilation flags via the `FC` and `FCFLAGS` variables.

Set the `CPPFLAGS` variable at configure time to provide additional compilation
flags; e.g. `configure` to detect necessary headers in non-standard location.
Similarly, the `LDFLAGS` variable can be set to contain link time options; so
you can use it to specify libraries to be linked to librsb examples.
Invoke `./configure --help` for details of other relevant environment
variables.

The `./configure` script will emit information about the current build options.
If all went fine, you can invoke `make` to build the library and the examples.

To check for library consistency, run:

make qtests # takes a short time, spots most problems
or

make tests # takes longer, qtests is usually enough

If these tests terminate with an error code, it may be that it has been caused
by a bug in librsb, so please report it (see BUGS).
These tests also check error reporting capabilities, so don't be scared by
error messages appearing in the running output.

INSTALLATION, USAGE
--------------------------------------------------------------------------------

Once built, the library can be installed with e.g.:

sudo make install # 'sudo' may be needed for system-wide locations

This installs header files, binary library files, documentation, examples,
and the `librsb-config` program.
Then, application C programs should include the `rsb.h` header file with
`#include <rsb.h>`
C++ programs can use that as well, or `#include <rsb.hpp>`.
Path to the librsb headers and extra options can be obtained via
`librsb-config --I_opts`.

To link to the librsb library and its dependencies one can use the output of
`librsb-config --static --ldflags --extra_libs`.

Users of `pkg-config` can manually copy the `librsb.pc` file to the appropriate
directory to use `pkg-config` in a way similar to `librsb-config`.

EXECUTION AND ENVIRONMENT VARIABLES
--------------------------------------------------------------------------------

By default, librsb reads the environment variable
`RSB_USER_SET_MEM_HIERARCHY_INFO`, which allows to override settings of memory
hierarchy set at configure-time or detected at runtime.

Its value is specified as n concatenated strings of the form:

L<l>:<a_l>/<b_l>/<c_l>

These strings are separated by a comma (","), and each of them is made
up from substrings where:

<n> is the cache memories hierarchy height, from 1 upwards.
<l> is the cache level, from 1 upwards.
<a_l> is the cache associativity
<b_l> is the cache block size (cache line length)
<c_l> is the cache capacity (size)

The `<a_l>`, `<b_l>`, `<c_l>` substrings consist of an integer number with an
optional multiplier character among {K,M,G} (to specify respectively 2^10,
2^20 or 2^30).
Any value is permitted, a long as it is positive. Higher level cache
capacities are required to be larger than lower level ones.
Please note that currently, only the cache capacity value is being used.
Example strings and usage in the BASH shell:

RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/512K,L1:8/64/32K" <your program>
RSB_USER_SET_MEM_HIERARCHY_INFO="L1:8/128/2M" <your program>

Experimenting with this environment variable can help tuning performance.

Setting this environment variable may be also needed if automatic detection
fails (e.g. very recent systems).

A default value for this memory hierarchy info string can be set at configure
time by using the `--with-memhinfo` configure option.

If you don't know values for these parameters, you can run the
`./scripts/linux-sys-cache.sh`
script to try to get a guess on a Linux system.
On other systems, please consult the available documentation.
E.g.: On Mac OS 10.6 it was possible to get this information by invoking
`sysctl -a | grep cache`.

You can control the active threads count with OpenMP variable
`OMP_NUM_THREADS`, but also override it via `RSB_NUM_THREADS`.

DOCUMENTATION, EXAMPLES AND PROGRAMMING GUIDELINES
--------------------------------------------------------------------------------

The latest API documentation is reachable from http://librsb.sourceforge.net/ .

The `<rsb.h>` header file specifies the C API of librsb.
Header `<rsb.hpp>` provides two C++ classes for librsb and most of the
functionality of `<rsb.h>`.

The complete API documentation is generated by the `doxygen` tool in the `doc`
directory in both HTML and man formats, and gets installed with `make install`.
The librsb man pages can be listed with `apropos rsb` or `man -k rsb`.
If you wish not to use doxygen (or don't have it) you can skip documentation
generation by adding the `DOXYGEN=false` argument to `./configure`.

The `examples` directory has a number of working example programs.

The library declares symbols prefixed by `rsb_`.
To avoid name clashes, you should avoid declaring identifiers prefixed that way
in programs using librsb.

If `configure` has been invoked with the `--enable-sparse-blas-interface`, then
the corresponding `BLAS_`- and `blas_`- prefixed symbols will also be built.

CONFIGURE, BUILD AND BENCHMARK EXAMPLE
--------------------------------------------------------------------------------

The performance-critical parts of librsb are written in C and C++.
Compilation flags for C and C++ are specified by the CFLAGS and CXXFLAGS
variables
Fortran flags can be set by FCFLAGS, but are of minor importance.
A sensible setup can be:

./configure CC=gcc CFLAGS='-Ofast' CXX=g++ CXXFLAGS='-Ofast -std=c++17' FC=gfortran --prefix=/opt/librsb

You should be able to specify other combination of compilers, like icx+icpx+ifc
or icc+icpc+ifort. If you have luck, even mix compilers from different suites.

If you wish to build the librsb benchmark program with the MKL library (say,
for performance comparison purposes), you can use one of the following three
recipes as a base.

With gcc, 64 bit (and a few more options you may want to adapt):

export MKLROOT=/opt/intel/mkl
./configure \
CC=gcc CFLAGS='-Ofast' \
CXX=g++ CXXFLAGS='-Ofast -std=c++17' \
FC=gfortran \
--with-mkl="-static -L${MKLROOT}/lib/intel64 \
-Wl,--start-group,-lmkl_intel_lp64,-lmkl_gnu_thread,-lmkl_core,--end-group \
-fopenmp -lpthread" \
--with-memhinfo=L2:4/64/512K,L1:8/64/24K \
--with-mkl-include=/opt/intel/mkl/include/

With icc, 64 bit)

export MKLROOT=/opt/intel/mkl
./configure \
CC=icc CFLAGS='-Ofast' \
CXX=icpc CXXFLAGS='-Ofast -std=c++17' \
FC=ifort \
--with-mkl="-static -L${MKLROOT}/lib/intel64 -openmp -lpthread \
-Wl,--start-group,-lmkl_intel_lp64,-lmkl_intel_thread,-lmkl_core,--end-group" \
--with-memhinfo=L2:4/64/512K,L1:8/64/24K \
--with-mkl-include=/opt/intel/mkl/include/

or 32 bit:

./configure \
CC=gcc CFLAGS='-Ofast'
CXX=g++ CXXFLAGS='-Ofast -std=c++17' \
FC=gfortran \
--with-memhinfo=L2:4/64/512K,L1:8/64/24K \
--with-mkl="-static -L/opt/intel/mkl/lib/ia32/ -lmkl_solver \
-Wl,--start-group,-lmkl_intel,-lmkl_gnu_thread,-lmkl_core,--end-group \
-fopenmp -lpthread" \
--with-mkl-include=/opt/intel/mkl/include/

Once you chose your configure options, you want to build.

make cleanall # (optional) delete generated sources
make # build library, documentation, tests, examples
make qtests # optional
make install # optional
make itests # optional

If you modified configure options impacting code generation, may want to run
`make cleanall`, which deletes stale sources and ensures sources re-generation
before the 'make'.

To speed up 'make', consider using the parallelism option, e.g.:

make -j 3 # use 3 build threads

After the build, say you want to benchmark the library for SpMV/SpMM.
You have a Matrix Market file representing a matrix, `A.mtx`,
you want to use 1 and 4 cores, and type Z (double complex).
With SpMV, and SpMM with 2 right hand sides laid in C order.
Then running:

./rsbench -oa -Ob --bench -f A.mtx -qH -R -n1,4 -T z --verbose --nrhs 1,2 --by-rows

will output performance and timing results in a tabular form.

If not specifying a type (argument to the `-T` option), all will be used.
If configured in at build time, choices may be `-T D` (where `D` is the BLAS
prefix for `double`), `-T Z` (`Z` stands for `double complex`) and so on.
You can also pass `-T :` to specify all of the configured types.

For more options and configuration information, invoke:

./rsbench -oa -Ob --help
./rsbench --help
./rsbench --version
./rsbench -I
./rsbench -C

An example Matrix Market matrix file contents:

%%MatrixMarket matrix coordinate pattern general
% This is a comment.
% See other examples in the distributed *.mtx files.
2 2 3
1 1
2 1
2 2

COMPATIBILITY
--------------------------------------------------------------------------------

This library was developed mostly on Debian Linux and using only free software.

This library has been built and tested on Unix machines.
Microsoft Windows users can try building librsb under the Cygwin environment.

(Ancient comment kept for the nostalgic reader)
Some tricks may have to be used on IBM AIX. For instance, adding the
`--without-xdr` or the `--without-zlib` switch to `./configure`.
Your mileage may vary.
AIX's `make` program may give problems; use the GNU version `gmake` instead;
the same shall be done with the M4 interpreter.

FAQ
--------------------------------------------------------------------------------

Q: **Can you provide me good configure defaults for an optimized build ?**

A: Default `./configure` options are appropriate for an optimized build.
A good starting point for `gcc` is `./configure CC=gcc CFLAGS='-O3'`.
However, if you need complex arithmetic and are using GCC, I'd advise using
-Ofast. On many versions of GCC I observed sub-optimal complex arithmetic
performance with `-O3`, regardless use of e.g. `-mtune=native`.
For more, consult your compiler documentation (e.g. `man gcc`, `man icx`,
`man icc`),
and learn about the best flags for your specific platform.
Striping your executable (`make install-strip` for librsb's `rsbench`) may
help.

Q: **I am a beginner and I wish librsb to be very verbose when I invoke
library interface functions incorrectly.
Can you provide me good configure defaults for such a "debug" build ?**

A: Yes: via the `./scripts/configure_for_debug.sh` script.

Q: **I have problems linking, seems like some Fortran library is missing.
What should I do?**

A: Did you try `./configure --enable-fortran-linker` ?

Q: **I have machine X, compiler Y, compiling flags Z; is SpMV performance P
with matrix M good ?**

A: In general, hard to tell. You can `make hinfo.log` and send to me (see
CONTACTS) the `hinfo.log` file and your matrix in Matrix Market format
(well, please don't send matrices by email but rather upload them
somewhere on the web and send an URL to them).
The `hinfo.log` file will contain useful compile and machine information.
Then I *may* get an idea about the performance you should get with that
matrix on that computer.

Q: **What is the Sparse BLAS ?**

A: It's a programming interface specification:

* [sparseblas_2001]:
BLAS Technical Forum Standard, Chapter 3, Sparse BLAS.
http://www.netlib.org/blas/blast-forum/chapter3.pdf

* [dhp_2002]:
An Overview of the Sparse Basic Linear Algebra Subprograms:
The New Standard from the BLAS Technical Forum.
IAIN S. DUFF, CERFACS and Rutherford Appleton Laboratory.
MICHAEL A. HEROUX, Sandia National Laboratories.
ROLDAN POZO, National Institute of Standards and Technology.

* [dv_2002]:
Algorithm 818:
A Reference Model Implementation of the Sparse BLAS in Fortran 95.
IAIN S. DUFF, CERFACS, France and Atlas Centre, RAL, England.
CHRISTOF V�MEL, CERFACS, France.

Q: **Is there an easy way to profile librsb usage in my application ?**

A: Yes: build with `--enable-librsb-stats` and extract time elapsed in librsb
via e.g.:`rsb_lib_get_opt(RSB_IO_WANT_LIBRSB_ETIME,&etime);`.

Q: **Why another sparse matrix library ?**

A: This library is the fruit of the author's PhD work, focused on researching
improved multi threaded and cache friendly matrix storage schemes for e.g.
PSBLAS.

Q: **What are the key features of this library when compared to other ones ?**

A: Recursive storage, a code generator, parallel BLAS operations
(including matrix assembly, matrix-matrix multiplication, transposed
matrix-vector multiply), a rich test suite, a Sparse BLAS
interface and a free software licensing.

Q: **How do I detect librsb from my package's `configure` script ?**

A: Please check out `examples/configure.ac` and `examples/makefile.am`:

Q: **How is correctness checked in the librsb test suite ?**

A: In different ways, often configuration dependent.
Check out the `qtests` and `tests` targets in the Makefile.

Q: **Why did you originally write the library in C and not in C++ ?**

A: C is pretty easy to interface with C++ and Fortran.
Using a debugger in C++ can be a headache.
Also C's `restrict` keyword.

Q: **Why did you use C and not Fortran ?**

A: This library is slightly system-oriented, and system calls interfacing is
much easier in C. Also C's pointer arithmetic was desirable.

Q: **Is there a quick and easy way to perform an artificial performance
test with huge matrices without having to program ?**

A: Express your matrix in a Matrix Market format as the `A.mtx` file and
use it as e.g.:**

./rsbench -oa -Ob --bench -f A.mtx --verbose --nrhs 1,4 --by-rows

Q: **I've found a bug! What should I do ?**

A: First please make sure it is really a bug:
read the documentation, check, double check.
Then you can write a description of the problem, with a minimal program
source code and data to replicate it.
Then you can jump to the CONTACTS details section.

Q: **Is it possible to build matrices of, say, `long double` or
`long double complex` or `int` or `short int` ?**

A: Yes. Invoke the configure script accordingly, e.g.:
`--enable-matrix-types="long double"` or
`--enable-matrix-types="double,long double,double complex"`
If this breaks code compilation, feel free to contact the author
(see the CONTACTS section).
Note that some combinations may break `make qtests` or other test recipes.

Q: **Is there a way to compare the performance of this library to some other
high performance libraries ?**

A: If you build `rsbench` with support for the Intel MKL library, then you
can do performance comparisons with e.g.:
`# ./rsbench -oa -Ob -qH -R --gen-diag 100 --compare-competitors --verbose`
or use the following script:
`# bench/dense.sh ' '`
Or even better, check out the `--write-performance-record` feature.
For details see the output of:
`# rsbench -oa -Ob --help`

Q: **Is there a non-threaded (serial) version of librsb ?**

A: Yes: you can configure the library to work serially (with no OpenMP).
See `./configure --help`.

Q: **Is this library thread-safe ?**

A: Probably yes: no static buffers are being used, and reentrant C standard
library functions are invoked.

Q: **Does the librsb library run on GPUs?**

A: Not yet.

Q: **I built and compiled the code without enabling any BLAS type (S,D,C,Z),
and both `make qtests` and `make tests` ran successfully outside the
`./examples` directory, but `make tests` breaks within `./examples`
directory.**

A: Well, the tests passed because the examples testing was simply skipped.
The example programs need at least one of these types to work.

Q: **At build time I get many "unused variable" warnings. Why ?**

A: librsb accommodates many code generation and build time configuration
options. Some combinations may turn off compilation of certain parts of the
code, leading some variables to be unused.

Q: **Are there papers to read about the RSB format and algorithms ?**

A: Yes, the following:

* Michele Martone, Simone Bacchio.
Portable performance on multi-threaded Sparse BLAS operations with PyRSB
Proceedings of SciPy 2021 (Scientific Python Conference).
https://doi.org/10.25080/majora-1b6fd038-00e

* Michele Martone.
Efficient Multithreaded Untransposed, Transposed or Symmetric Sparse
Matrix-Vector Multiplication with the Recursive Sparse Blocks Format.
Parallel Computing 40(7): 251-270 (2014).
http://dx.doi.org/10.1016/j.parco.2014.03.008

* Michele Martone.
Cache and Energy Efficiency of Sparse Matrix-Vector Multiplication for
Different BLAS Numerical Types with the RSB Format.
Proceedings of the ParCo 2013 conference, September 2013, Munich, Germany.
PARCO 2013: 193-202.
http://dx.doi.org/10.3233/978-1-61499-381-0-193

* Michele Martone, Marcin Paprzycki, Salvatore Filippone.
An Improved Sparse Matrix-Vector Multiply Based on Recursive Sparse Blocks
Layout.
LSSC 2011: 606-613.
http://dx.doi.org/10.1007/978-3-642-29843-1_69

* Michele Martone, Salvatore Filippone, Salvatore Tucci, Marcin Paprzycki,
Maria Ganzha.
Utilizing Recursive Storage in Sparse Matrix-Vector.
Multiplication - Preliminary Considerations. CATA 2010: 300-305

* Michele Martone, Salvatore Filippone, Marcin Paprzycki, Salvatore Tucci.
Assembling Recursively Stored Sparse Matrices. IMCSIT 2010: 317-325.
http://www.proceedings2010.imcsit.org/pliks/205.pdf

* Michele Martone, Salvatore Filippone, Pawel Gepner, Marcin Paprzycki,
Salvatore Tucci.
Use of Hybrid Recursive CSR/COO Data Structures in Sparse Matrices-Vector
Multiplication.
IMCSIT 2010: 327-335.
http://dx.doi.org/10.1109/SYNASC.2010.72

* Michele Martone, Salvatore Filippone, Marcin Paprzycki, Salvatore Tucci.
On BLAS Operations with Recursively Stored Sparse Matrices.
SYNASC 2010: 49-56.
http://dx.doi.org/10.1109/SYNASC.2010.72

* Michele Martone, Salvatore Filippone, Marcin Paprzycki, Salvatore Tucci.
On the Usage of 16 Bit Indices in Recursively Stored Sparse Matrices.
SYNASC 2010: 57-64.
http://dx.doi.org/10.1109/SYNASC.2010.77

Q: **I have M4-related problems on IBM SP5/SP6 (my M4 preprocessor tries to
regenerate code but it fails). What should I do ?**

A: A fix is to use a GNU M4 implementation
e.g.: `M4=/opt/freeware/bin/m4 ./configure ...`
e.g.: `M4=gm4 ./configure ...`
or execute:
`touch *.h ; touch *.c ; make`
Or `./configure; make` the library on a different machine, then build
a sources archive with `make dist`, and use it on the original machine.
And BTW, consider donating them machines to a computer museum, like
https://museo.freaknet.org/

POSSIBLE FUTURE ENHANCEMENTS
--------------------------------------------------------------------------------

* auxiliary functions for numerical vectors
* CSC,BCSR,BCSC and other block-level formats
* performance prediction/estimation facilities (experimental)
* types of the blocks, nonzeroes, and coordinates indices can be user specified
* automatic matrix blocking selection (for BCSR/BCSC)
* an arbitrary subset of block size kernels can be specified to be generated
* recursive storage variants of blocked formats (non uniform blocking)
* more auto-tuning and prediction control

ABOUT THE INTERNALS
--------------------------------------------------------------------------------

The following good practices are being followed during development of librsb:

- only symbols beginning with `rsb_` or `blas_` or `BLAS_` are being exported
- internal functions are usually prefixed by `rsb__`
- no library internal function is expected to call any API function

BUGS
--------------------------------------------------------------------------------

If you encounter any bug (e.g.: mismatch of library/program behaviour and
documentation, please let me know about it by sending me (see CONTACTS) all
relevant information (code snippet, originating data/matrix, `config.log` ), in
such a way that I can replicate the bug behaviour on my machines.
If the bug occurred when using rsb interfaced to some proprietary library,
please make sure the bug is in librsb.

It may be of great help to you to build the library with the debug compile
options on (e.g.: `CFLAGS='-O0 -ggdb'`), and with appropriate library verbosity
levels, e.g. (`--enable-internals-error-verbosity=1` and
`--enable-interface-error-verbosity=1` options to configure) to better
understand the program behaviour before sending a report.

Make sure you have the latest version of the library when reporting a bug.

CONTACTS
--------------------------------------------------------------------------------

You are welcome to contact the librsb author:

Michele Martone `<michelemartone AT users DOT sourceforge DOT net>`

Please specify "librsb" in the "Subject:" line of your emails.

More information and downloads on http://sourceforge.net/projects/librsb

Mailing list: https://lists.sourceforge.net/lists/listinfo/librsb-users

CREDITS (in alphabetical order)
--------------------------------------------------------------------------------

For librsb-1.3:

* Matthieu A. Simonin for testing, discussion and suggestions.
* Rafael Laboissiere for continued help in aspects of build, documentation,
testing and distribution.

For librsb-1.2:

* Marco Atzeri provided testing, patches to build librsb under cygwin over
nearly each release, and spotted a few bugs.
* Fabio Cassini spotted an unintended conversion via sparsersb and +.
* John Donoghue spotted a rendering corner case bug.
* Sebastian Koenig spotted a computational bug in -rc6.
* Rafael Laboissiere helped a lot improving the documentation and the build
system.
* Mu-Chu Lee provided a patch to fix sorting code crashing with > 10^9 nnz.
* Constanza Manassero spotted an inconsistency in the usmm/ussm interface.
* Markus Muetzel helped debugging rsb_mtx_rndr().
* Dmitri Sergatskov spotted a double free in rsb_mtx_rndr() and convinced about
the necessity of sanitizing memory usage.

For librsb-1.1:

* Gilles Gouaillardet provided a patch for OpenMP-encapsulated I/O.
* Marco Restelli provided with testing and detailed comments and suggestions.

For librsb-1.0:

* Francis Casson helped with testing and documentation reviewing during
the first release.
* Nitya Hariharan helped revising early versions of the documentation.

ACKNOWLEDGEMENTS
--------------------------------------------------------------------------------

In 2019-2021, librsb has been developed under the PRACE-6IP-WP8 (GRANT
AGREEMENT NUMBER 823767 -- PRACE-6IP) EU project LyNcs (partners:
Computation-based Science and Technology Research Centre (CaSToRC) of The
Cyprus Institute and Inria).

LICENSE
--------------------------------------------------------------------------------

This software is distributed under the terms of the Lesser GNU Public License
version 3 (LGPLv3) or later.
See the COPYING file for a copy of the LGPLv3.

librsb is free software.
To support it, consider writing to the author and acknowledging use of librsb
in your publications.
All that would be very appreciated.

--------------------------------------------------------------------------------

For a quick startup, consider the following three example programs.

The first, using the RSB interface (for native C++, please see the optional rsblib/examples/example.cpp):

/*
Copyright (C) 2008-2021 Michele Martone
This file is part of librsb.
librsb is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
librsb is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details.
You should have received a copy of the GNU Lesser General Public
License along with librsb; see the file COPYING.
If not, see <http://www.gnu.org/licenses/>.
*/
/*!
 \ingroup rsb_doc_examples
 @file
 @author Michele Martone
 @brief A first "hello RSB" example C program based on <rsb.h>.
   Uses #rsb_lib_set_opt(), #rsb_mtx_get_info_str().
 \include hello.c
*/
#include <rsb.h>        /* librsb header to include */
#include <stdio.h>      /* printf() */
int main(const int argc, char * const argv[])
{
        /*!
          A Hello-RSB program.
         
          This program shows how to use the rsb.h interface correctly to:
         
          - initialize the library using #rsb_lib_init()
          - set library options using #rsb_lib_set_opt()
          - revert such changes 
          - allocate (build) a single sparse matrix in the RSB format
            using #rsb_mtx_alloc_from_coo_const()
          - prints information obtained via #rsb_mtx_get_info_str()
          - multiply the matrix times a vector using #rsb_spmv()
          - deallocate the matrix using #rsb_mtx_free() 
          - finalize the library using #rsb_lib_exit()
         
          In this example, we use #RSB_DEFAULT_TYPE as matrix type.
          This type depends on what was configured at library build time.
         * */
        const rsb_blk_idx_t bs = RSB_DEFAULT_BLOCKING;
        const rsb_blk_idx_t brA = bs, bcA = bs;
        const RSB_DEFAULT_TYPE one = 1;
        const rsb_type_t typecode = RSB_NUMERICAL_TYPE_DEFAULT;
        const rsb_nnz_idx_t nnzA = 4;           /* matrix nonzeroes count */
        const rsb_coo_idx_t nrA = 3;            /* matrix rows count */
        const rsb_coo_idx_t ncA = 3;            /* matrix columns count */
        /* nonzero row indices coordinates: */
        const rsb_coo_idx_t IA[] = {0,1,2,2};
        /* nonzero column indices coordinates: */
        const rsb_coo_idx_t JA[] = {0,1,2,2};
        const RSB_DEFAULT_TYPE VA[] = {11,22,32,1};/* values of nonzeroes */
        RSB_DEFAULT_TYPE X[] = { 0, 0, 0 };     /* X vector's array */
        const RSB_DEFAULT_TYPE B[] = { -1, -2, -5 }; /* B vector's array */
        char ib[200];
        struct rsb_mtx_t *mtxAp = NULL; /* matrix structure pointer */
        rsb_err_t errval = RSB_ERR_NO_ERROR;
        printf("Hello, RSB!\n");
        printf("Initializing the library...\n");
        if((errval = rsb_lib_init(RSB_NULL_INIT_OPTIONS)) != 
                        RSB_ERR_NO_ERROR)
        {
                printf("Error initializing the library!\n");
                goto err;
        }
        printf("Correctly initialized the library.\n");
        printf("Attempting to set the"
               " RSB_IO_WANT_EXTRA_VERBOSE_INTERFACE library option.\n");
        {
                rsb_int_t evi=1; 
                /* Setting a single optional library parameter. */
                errval = rsb_lib_set_opt(
                        RSB_IO_WANT_EXTRA_VERBOSE_INTERFACE, &evi);
                if(errval != RSB_ERR_NO_ERROR)
                {
                        char errbuf[256];
                        rsb_strerror_r(errval,&errbuf[0],sizeof(errbuf));
                        printf("Failed setting the"
                        " RSB_IO_WANT_EXTRA_VERBOSE_INTERFACE"
                        " library option (reason string:\n%s).\n",errbuf);
                        if(errval&RSB_ERRS_UNSUPPORTED_FEATURES)
                        {
                          printf("This error may be safely ignored.\n");
                        }
                        else
                        {
                          printf("Some unexpected error occurred!\n");
                          goto err;
                        }
                }
                else
                {
                        printf("Setting back the "
                                "RSB_IO_WANT_EXTRA_VERBOSE_INTERFACE"
                                " library option.\n");
                        evi = 0;
                        errval = rsb_lib_set_opt(RSB_IO_WANT_EXTRA_VERBOSE_INTERFACE,
                                        &evi);
                        errval = RSB_ERR_NO_ERROR;
                }
        }
        mtxAp = rsb_mtx_alloc_from_coo_const(
                VA,IA,JA,nnzA,typecode,nrA,ncA,brA,bcA,
                RSB_FLAG_NOFLAGS    /* default format will be chosen */
                |RSB_FLAG_DUPLICATES_SUM/* duplicates will be summed */
                        ,&errval);
        if((!mtxAp) || (errval != RSB_ERR_NO_ERROR))
        {
                printf("Error while allocating the matrix!\n");
                goto err;
        }
        printf("Correctly allocated a matrix.\n");
        printf("Summary information of the matrix:\n");
        /* print out the matrix summary information  */
        rsb_mtx_get_info_str(mtxAp,"RSB_MIF_MATRIX_INFO__TO__CHAR_P",
                        ib,sizeof(ib));
        printf("%s",ib);
        printf("\n");
        if((errval = 
                rsb_spmv(RSB_TRANSPOSITION_N,&one,mtxAp,B,1,&one,X,1))
                        != RSB_ERR_NO_ERROR )
        {
                printf("Error performing a multiplication!\n");
                goto err;
        }
        printf("Correctly performed a SPMV.\n");
        rsb_mtx_free(mtxAp);
        printf("Correctly freed the matrix.\n");
        if((errval = rsb_lib_exit(RSB_NULL_EXIT_OPTIONS))
                        != RSB_ERR_NO_ERROR)
        {
                printf("Error finalizing the library!\n");
                goto err;
        }
        printf("Correctly finalized the library.\n");
        printf("Program terminating with no error.\n");
        return EXIT_SUCCESS;
err:
        rsb_perror(NULL,errval);
        printf("Program terminating with error.\n");
        return EXIT_FAILURE;
}

The second, using the Sparse BLAS interface:

/*
Copyright (C) 2008-2021 Michele Martone
This file is part of librsb.
librsb is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
librsb is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details.
You should have received a copy of the GNU Lesser General Public
License along with librsb; see the file COPYING.
If not, see <http://www.gnu.org/licenses/>.
*/
/*!
 \ingroup rsb_doc_examples
 @file
 @author Michele Martone
 @brief A first C "hello RSB" example program using
        a Sparse BLAS interface and <rsb.h>.
        Uses #BLAS_duscr_begin(), #BLAS_ussp(), #BLAS_usgp(),
        #BLAS_duscr_insert_entries(), #BLAS_duscr_end(),
        #BLAS_dusget_element(),#BLAS_dusmv(),#BLAS_usds().
 \include hello-spblas.c
*/
#include <rsb.h>        /* for rsb_lib_init */
#include <blas_sparse.h>        /* Sparse BLAS on the top of librsb */
#include <stdio.h>      /* printf */
int main(const int argc, char * const argv[])
{
        /*!
         * A Hello/Sparse BLAS program.
         *
         * This program shows how to use the blas_sparse.h
         * interface correctly to:
         *
         * - initialize the library using #rsb_lib_init()
         * - allocate (build) a single sparse matrix in the RSB
         *   format using #BLAS_duscr_begin()/#BLAS_duscr_insert_entries()
         *   /#BLAS_duscr_end()
         * - extract one matrix element with #BLAS_dusget_element()
         * - multiply the matrix times a vector using #BLAS_dusmv()
         * - deallocate the matrix using #BLAS_usds() 
         * - finalize the library using
         *   #rsb_lib_exit(#RSB_NULL_EXIT_OPTIONS) 
        */
#ifndef RSB_NUMERICAL_TYPE_DOUBLE   
        printf("'double' type configured out."
        " Please reconfigure the library with it and recompile.\n");
        return EXIT_SUCCESS;
#else /* RSB_NUMERICAL_TYPE_DOUBLE */
        blas_sparse_matrix A = blas_invalid_handle; /* handle for A */
        const int nnz = 4;      /* number of nonzeroes of matrix A */
        const int  nr = 3;      /* number of A's rows */
        const int  nc = 3;      /* number of A's columns */
        /* A's nonzero elements row indices (coordinates): */
#ifdef RSB_WANT_LONG_IDX_TYPE 
        const int64_t IA[] = { 0, 1, 2, 2 };
#else /* RSB_WANT_LONG_IDX_TYPE */
        const int   IA[] = { 0, 1, 2, 2 };
#endif /* RSB_WANT_LONG_IDX_TYPE */
        /* A's nonzero elements column indices (coordinates): */
#ifdef RSB_WANT_LONG_IDX_TYPE 
        const int64_t JA[] = { 0, 1, 0, 2 };
#else /* RSB_WANT_LONG_IDX_TYPE */
        const int   JA[] = { 0, 1, 0, 2 };
#endif /* RSB_WANT_LONG_IDX_TYPE */
        /* A's nonzero values (matrix coefficients): */
        double VA[] = { 11.0, 22.0, 13.0, 33.0  };
        /* the X vector's array: */
        double X[] = { 0.0, 0.0, 0.0 };
        /* the B vector's array: */
        const double B[] = { -1.0, -2.0, -2.0 };
        /* the (known) result array: */
        const double AB[] = { 11.0+26.0, 44.0, 66.0+13.0 };
        /* rsb error variable: */
        rsb_err_t errval = RSB_ERR_NO_ERROR;
        int i;
        printf("Hello, RSB!\n");
        /* initialize the library */
        if((errval = rsb_lib_init(RSB_NULL_INIT_OPTIONS)) 
                        != RSB_ERR_NO_ERROR)
        {
                goto err;
        }
        printf("Correctly initialized the library.\n");
        /* initialize a matrix descriptor */
        A = BLAS_duscr_begin(nr,nc);
        if( A == blas_invalid_handle )
        {
                goto err;
        }
        
        /* specify properties (e.g.: symmetry)*/
        if( BLAS_ussp(A,blas_lower_symmetric) != 0 )
        {
                goto err;
        }
        /* get properties (e.g.: symmetry) */
        if( BLAS_usgp(A,blas_lower_symmetric) != 1 )
        {
                printf("Symmetry property non set ?!\n");
                goto err;
        }
        /* insert the nonzeroes (here, all at once) */
        if( BLAS_duscr_insert_entries(A, nnz, VA, IA, JA)
                        == blas_invalid_handle)
        {
                goto err;
        }
        /* finalize (allocate) the matrix build  */
        if( BLAS_duscr_end(A) == blas_invalid_handle )
        {
                goto err;
        }
        printf("Correctly allocated a matrix.\n");
        VA[0] = 0.0;
        if( BLAS_dusget_element(A, IA[0], JA[0], &VA[0]) )
        {
                goto err;
        }
        /* a check */
        if( VA[0] != 11.0 )
        {
                goto err;
        }
        /* compute X = X + (-1) * A * B   */
        if(BLAS_dusmv(blas_no_trans,-1,A,B,1,X,1))
        {
                goto err;
        }
        for( i = 0 ; i < nc; ++i )
                if( X[i] != AB[i] )
                {
                        printf("Computed SPMV result seems wrong. Terminating.\n");
                        goto err;
                }
        printf("Correctly performed a SPMV.\n");
        /* deallocate matrix A */
        if( BLAS_usds(A) )
        {
                goto err;
        }
        printf("Correctly freed the matrix.\n");
        /* finalize the library */
        if((errval = rsb_lib_exit(RSB_NULL_EXIT_OPTIONS))
                        != RSB_ERR_NO_ERROR)
        {
                goto err;
        }
        printf("Correctly finalized the library.\n");
        printf("Program terminating with no error.\n");
        return EXIT_SUCCESS;
err:
        rsb_perror(NULL,errval);
        printf("Program terminating with error.\n");
        return EXIT_FAILURE;
#endif /* RSB_NUMERICAL_TYPE_DOUBLE */
}

For more, see the Example programs and code section.