Singular Value Decomposition (SVD) is a powerful matrix factorization technique from linear algebra. It decomposes any rectangular matrix ($\mathbf{M}$) into three simpler matrices: two orthogonal matrices ($\mathbf{U}$ and $\mathbf{V}^T$) and a diagonal matrix ($\mathbf{\Sigma}$) containing the singular values. SVD is a generalization of the eigendecomposition of a square matrix and is one of the most widely used methods for dimensionality reduction, noise reduction, and data compression in machine learning.
$$\mathbf{M} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T$$
Context: Relation to LLMs and Search
SVD was a foundational technique for latent semantic models in classical natural language processing (NLP) and is still utilized today for tasks such as preprocessing, dimensionality reduction, and understanding model architecture in Generative Engine Optimization (GEO).
- Latent Semantic Indexing (LSI): Historically, SVD was the core of LSI, an early form of Vector Search that attempted to solve the lexical mismatch problem by discovering latent semantic concepts in text. By applying SVD to a document-term matrix (like a Sparse Matrix generated by TF-IDF), the model could map documents and queries to a low-dimensional Latent Space, where terms with similar meanings would be close together.
- Dimensionality Reduction: SVD can be used to compress large feature vectors (e.g., millions of dimensions) into much smaller, dense vectors (e.g., hundreds of dimensions). This is done by discarding the smallest singular values, retaining only the most important linear combinations of the original data. This technique can be used for efficient storage and computation.
- Preprocessing for LLMs: In specialized Large Language Models (LLMs) and component-level models (like the Retrieval component of a Retrieval-Augmented Generation – RAG system), SVD can be used to preprocess massive input data or reduce the size of high-dimensional feature vectors before they are fed into the network.
The Mechanics: Decomposition and Approximation
1. The SVD Formula
Given an $m \times n$ matrix $\mathbf{M}$:
- $\mathbf{U}$ is an $m \times m$ orthogonal matrix (the left singular vectors).
- $\mathbf{\Sigma}$ is an $m \times n$ diagonal matrix containing the singular values ($\sigma_1, \sigma_2, \ldots$) in descending order.
- $\mathbf{V}^T$ is an $n \times n$ orthogonal matrix (the transpose of the right singular vectors).
2. Low-Rank Approximation
The power of SVD lies in the fact that the matrix $\mathbf{M}$ can be highly accurately approximated by using only the top $k$ largest singular values and their corresponding columns/rows from $\mathbf{U}$ and $\mathbf{V}^T$.
$$\mathbf{M} \approx \mathbf{U}_k \mathbf{\Sigma}_k \mathbf{V}_k^T$$
By choosing a small $k$, the resulting matrix $\mathbf{M}_k$ captures most of the original data’s variance and structure while significantly reducing complexity and removing noise inherent in the smaller, discarded singular values. This $k$-rank approximation is what forms the low-dimensional semantic space in LSI.
Related Terms
- Latent Space: The low-dimensional vector space created by applying SVD to a document-term matrix.
- Vector Embedding: Modern, dense vectors generated by deep learning models that have largely replaced the need for SVD-based embeddings.
- Precision: SVD-based retrieval aims to increase precision by identifying conceptual matches beyond just literal keywords.