Presentation
Randomized SVD on TensorCores
SessionResearch Poster Session
Event Type
Research Poster
Pre-Recorded
TimeTuesday, June 23rd4:45pm - 4:50pm
LocationAnalog 1
DescriptionLow-rank approximation is vital and widely used for data compression, dimensionality reduction and noise reduction.
Randomized SVD, which requires the computation of a QR factorization of a tall skinny matrix (TSQR), is a robust and efficient algorithm for computing a low-rank approximation.
We implement a TSQR that runs efficiently in a parallel environment with TensorCores and use it to compute a Randomized SVD.
In TSQR, the QR factorization is done by dividing the input matrix into a column of blocks and recursively calculating the QR factorization of each resulting submatrix.
In randomized SVD, the matrix for which a TSQR needs to be computed is not skinny enough for our TensorCore TSQR implementation.
We thus employ a BlockQR algorithm that splits the matrix into skinny enough submatrices and consecutively applies the TSQR to them.
Some additional calculations are necessary between the consecutive TSQR applications to retrieve the QR factorization of the full matrix.
TensorCores are specialized hardware for matrix multiplication and addition and are available on the latest NVIDIA GPUs.
Converting input matrices to half-precision on TensorCores results in loss of accuracy.
We recover the accuracy by using an accuracy correction technique that leverages the single-precision multiplication and addition of TensorCores.
We evaluate the speed, accuracy and stability of the Randomized SVD on TensorCores.
Using TensorCores and correction techniques, our approach can calculate a Randomized SVD without much loss of accuracy.
Our approach also provides 1.5x faster performance in some cases and reduces working memory by 33% compared to cuSOLVER.
Randomized SVD, which requires the computation of a QR factorization of a tall skinny matrix (TSQR), is a robust and efficient algorithm for computing a low-rank approximation.
We implement a TSQR that runs efficiently in a parallel environment with TensorCores and use it to compute a Randomized SVD.
In TSQR, the QR factorization is done by dividing the input matrix into a column of blocks and recursively calculating the QR factorization of each resulting submatrix.
In randomized SVD, the matrix for which a TSQR needs to be computed is not skinny enough for our TensorCore TSQR implementation.
We thus employ a BlockQR algorithm that splits the matrix into skinny enough submatrices and consecutively applies the TSQR to them.
Some additional calculations are necessary between the consecutive TSQR applications to retrieve the QR factorization of the full matrix.
TensorCores are specialized hardware for matrix multiplication and addition and are available on the latest NVIDIA GPUs.
Converting input matrices to half-precision on TensorCores results in loss of accuracy.
We recover the accuracy by using an accuracy correction technique that leverages the single-precision multiplication and addition of TensorCores.
We evaluate the speed, accuracy and stability of the Randomized SVD on TensorCores.
Using TensorCores and correction techniques, our approach can calculate a Randomized SVD without much loss of accuracy.
Our approach also provides 1.5x faster performance in some cases and reduces working memory by 33% compared to cuSOLVER.
Poster PDF