Basic Linear Algebra Subprograms (BLAS)

Running into this as I try to implement my own Accelerated Eigen.

NVIDIA has an implementation called CuBLAS.

There are 3 levels

Level 1 BLAS: Vector-vector Operations. $y \leftarrow α x + y$

Level 2 BLAS: Matrix-vector operations.

$y \leftarrow α A x + β y$

Level 3 BLAS: Matrix-matrix operations. $C \leftarrow α A B + βC$

🛠️ Steven Gong