SVD — Singular Value Decomposition

01

Why SVD? The Problem with Eigenvalues

Eigenvalues are beautiful — but they break down for rectangular matrices. If A is m×n (with m≠n), the equation Ax = λx is impossible: Ax lives in ℝᵐ while λx lives in ℝⁿ. They can't be equal.

⚡

Eigenvalue Problems

Only works for square matrices. Eigenvalues can be complex. Eigenvectors may not be orthogonal. Fails completely for data matrices.

✦

SVD Solution

Works for any matrix — any shape. Always real, non-negative values. Always orthogonal vectors. The universal factorization.

Big idea: SVD replaces one set of eigenvectors with two sets of singular vectors — one for the input space, one for the output space — connected by singular values.

02

The Core Formula

Any matrix A — no matter how large or rectangular — can be written as the product of three special matrices:

Am × n

=

Um × m

·

Σm × n

·

Vᵀn × n

U

Left Singular Vectors

Columns are orthonormal. They form an orthogonal basis for the output / column space. Think: "where things land after transformation."

Σ

Singular Values

A diagonal matrix with values σ₁ ≥ σ₂ ≥ … ≥ 0. They measure the strength of each component. Always real and non-negative.

V

Right Singular Vectors

Columns are orthonormal. They form an orthogonal basis for the input / row space. Think: "directions to stretch along."

↔

The Key Equation

Avᵢ = σᵢ uᵢ
Each right vector vᵢ maps to a scaled left vector uᵢ. This replaces Ax = λx.

03

How to Derive the SVD

The magic ingredient is AᵀA. Let's see why this square, symmetric, positive semi-definite matrix unlocks everything.

1

Form AᵀA — the symmetric powerhouse

For any m×n matrix A, AᵀA is n×n, symmetric, and positive semi-definite (all eigenvalues ≥ 0). This guarantees real, non-negative eigenvalues.

Properties of AᵀA

(AᵀA)ᵀ = AᵀA → symmetric

xᵀ(AᵀA)x = ‖Ax‖² ≥ 0 → PSD

2

Find V — eigenvectors of AᵀA

Since AᵀA is symmetric, the Spectral Theorem guarantees orthonormal eigenvectors. These become our right singular vectors V.

AᵀA = VΛVᵀ← spectral decomposition

→AᵀA vᵢ = λᵢ vᵢ← vᵢ are the right singular vectors

→σᵢ = √λᵢ← singular values are square roots

3

Find U — eigenvectors of AAᵀ

Similarly, AAᵀ is m×m, symmetric, and has the same non-zero eigenvalues as AᵀA. Its eigenvectors become our left singular vectors U.

AAᵀ = UΛUᵀ← same non-zero eigenvalues as AᵀA

→uᵢ = Avᵢ / σᵢ← practical formula to get uᵢ from vᵢ

4

Verify: u's are actually orthogonal

We need to check u₁ᵀu₂ = 0. This is the key orthogonality proof:

u₁ᵀu₂

=(Av₁/σ₁)ᵀ(Av₂/σ₂)← substitute definition of uᵢ

=v₁ᵀ(AᵀA)v₂ / (σ₁σ₂)← rearrange

=v₁ᵀ(σ₂²v₂) / (σ₁σ₂)← v₂ is eigenvector of AᵀA

=(σ₂/σ₁) · v₁ᵀv₂ = 0 ✓← v's are orthonormal!

Because the v's are orthonormal eigenvectors of AᵀA, when we multiply by A we get orthonormal u's. Orthogonal inputs → orthogonal outputs.

04

Worked Example — Step by Step

Let's compute the SVD of a concrete 2×2 matrix by hand.

Our Matrix

A = [[3, 0], [4, 5]]

Step 1 — Compute AᵀA

First transpose A, then multiply:

AᵀA

25

20

25

=

Aᵀ · A

[3 4] · [3 0]
[0 5]   [4 5]

Step 2 — Find Eigenvalues (σ²)

Solve det(AᵀA − λI) = 0:

det([[25−λ, 20], [20, 25−λ]]) = 0

=(25−λ)² − 400 = 0

→λ₁ = 45 λ₂ = 5

→σ₁ = √45 ≈ 6.71 σ₂ = √5 ≈ 2.24← singular values

Step 3 — Find V (eigenvectors of AᵀA)

For λ₁=45: v₁ = [1/√2, 1/√2]ᵀ ≈ [0.707, 0.707]ᵀ

For λ₂= 5: v₂ = [1/√2, −1/√2]ᵀ ≈ [0.707, −0.707]ᵀ

V =

0.707

−0.707

Σ =

6.71

0

2.24

Step 4 — Find U using uᵢ = Avᵢ / σᵢ

Av₁ = [3,0;4,5][0.707;0.707] = [2.12; 6.36]

→u₁ = [2.12;6.36] / 6.71 ≈ [0.316; 0.949]

Av₂ = [3,0;4,5][0.707;−0.707] = [2.12; −0.707]

→u₂ = [2.12;−0.707] / 2.24 ≈ [0.949; −0.316]

Step 5 — Final Result

3 0
4 5 A

=

0.316 0.949
0.949 −0.316 U

·

6.71 0
0 2.24 Σ

·

0.707 0.707
0.707 −0.707 Vᵀ

Verification: Multiply U · Σ · Vᵀ back out and you get the original matrix A = [[3,0],[4,5]] exactly. The decomposition is perfect.

Python Code

import numpy as np

# Define matrix A
A = np.array([[3, 0],
               [4, 5]], dtype=float)

# Compute SVD
U, sigma, Vt = np.linalg.svd(A)

print("U =\n", U)
print("Singular values:", sigma)
print("Vt =\n", Vt)

# Reconstruct A to verify
Sigma = np.diag(sigma)
A_reconstructed = U @ Sigma @ Vt
print("Reconstructed A =\n", A_reconstructed)

# Output:
# Singular values: [6.708  2.236]
# U ≈ [[0.316, 0.949], [0.949, -0.316]]
# Vt ≈ [[0.707, 0.707], [0.707, -0.707]]

05

Geometric Interpretation

The SVD tells us something profound: every linear transformation is a rotation, followed by a stretch, followed by another rotation.

①

Vᵀ — First Rotation

Multiply by Vᵀ (an orthogonal matrix). This rotates the input unit circle without changing its shape. Lengths are preserved. The unit circle stays a unit circle — just turned.

②

Σ — Stretching (the key part!)

Multiply by Σ. This stretches along the coordinate axes by σ₁, σ₂, … The unit circle becomes an ellipse with semi-axes of length σ₁ ≥ σ₂ ≥ …

σ₁ is always the largest. The ellipse's longest axis has length σ₁, shortest axis σ₂.

③

U — Second Rotation

Finally multiply by U. This rotates the ellipse to its final position in the output space. The tip of the longest axis points along u₁, shortest along u₂.

Geometric Summary

        Unit Circle
        → Vᵀ →
        Rotated Circle
        → Σ →
        Ellipse
        → U →
        Final Ellipse
      

Unit circle → rotation (Vᵀ) → ellipse stretch (Σ) → final rotation (U)

06

Full SVD vs. Compact SVD

There are two flavors of SVD. The compact version discards zeros and is much more efficient for large data.

Property	Full SVD	Compact (Economy) SVD
U size	m × m	m × r smaller
Σ size	m × n (padded with 0s)	r × r no zeros
V size	n × n	n × r smaller
Null space	✓ included	✗ excluded
Storage	large	efficient
Use case	Theory, full decomposition	Data science, compression, PCA

r = rank(A) — only the r non-zero singular values matter. Everything else is zeros that don't contribute to the reconstruction.

Rank-k Approximation

One of SVD's superpowers: you can approximate A using only the top-k terms:

Best Rank-k Approximation (Eckart–Young Theorem)

Aₖ = σ₁u₁v₁ᵀ + σ₂u₂v₂ᵀ + … + σₖuₖvₖᵀ

This is the best possible rank-k approximation — no other rank-k matrix is closer to A in Frobenius norm. It's the foundation of image compression, recommendation systems, and PCA.

07

Polar Decomposition

Just as complex numbers have a polar form z = r·eⁱᶿ = (magnitude) × (rotation), matrices have a polar decomposition:

Polar Form

A = S · Q

Where S is symmetric positive semi-definite (the "stretch") and Q is orthogonal (the "rotation"). We extract it directly from the SVD by inserting UᵀU = I:

A = UΣVᵀ

=U Σ Uᵀ · U Vᵀ← insert UᵀU = I

=S · Q

S

S = UΣUᵀ

Symmetric positive semi-definite. Represents the "pure stretch" part of the transformation — no rotation, just scaling.

Q

Q = UVᵀ

Orthogonal matrix. Represents the "pure rotation" part. QᵀQ = I. This is the rotation after all stretching is removed.

In mechanical engineering, this says: any deformation of an elastic body = pure stretch × internal rotation. The SVD separates these two effects cleanly.

08

Connection to PCA

Principal Component Analysis (PCA) is SVD — just viewed through a statistics lens. For a centered data matrix A (rows = samples, columns = features):

1

The covariance matrix is AᵀA / (n−1)

The eigenvectors of the covariance matrix are the principal components — and they are exactly the right singular vectors V of A.

2

Singular values encode variance

σᵢ² / (n−1) is the variance explained by the i-th principal component. The first component (σ₁, v₁) captures the most variance.

Variance Explained

Var_i = σᵢ² / Σⱼ σⱼ²

3

Dimensionality reduction

Project data onto the top-k principal components: Z = A · Vₖ where Vₖ contains the first k columns of V. This gives the best k-dimensional representation.

import numpy as np
from sklearn.preprocessing import StandardScaler

# Data matrix: n_samples × n_features
X = np.random.randn(100, 10)

# Step 1: Center the data
X_centered = X - X.mean(axis=0)

# Step 2: SVD
U, sigma, Vt = np.linalg.svd(X_centered, full_matrices=False)

# Step 3: Keep top-2 components
k = 2
Z = X_centered @ Vt[:k].T  # shape: 100 × 2

# Variance explained by each component
var_explained = sigma**2 / np.sum(sigma**2)
print(f"Top-2 explain: {var_explained[:2].sum()*100:.1f}%")

09

Practical Computation

⚠️ Never compute AᵀA explicitly in practice! The proof used AᵀA, but squaring a matrix squares its condition number, dramatically amplifying numerical errors.

🔢

Bidiagonalization

Golub-Kahan-Reinsch algorithm. Reduces A to bidiagonal form first, then applies QR iteration. Numerically stable — industry standard.

⚡

Randomized SVD

For huge matrices, random projection finds the top-k singular values/vectors in O(mnk) time. Used in sklearn's TruncatedSVD.

Complexity

Method	Time Complexity	Use When
Full SVD (LAPACK)	O(min(m,n) · mn)	m,n < 10,000
Randomized SVD	O(mnk)	Want top-k, large matrix
Sparse SVD (ARPACK)	O(nnz · k)	Sparse data, large n

Quick API Reference

import numpy as np
from sklearn.utils.extmath import randomized_svd

# Full SVD (numpy)
U, s, Vt = np.linalg.svd(A, full_matrices=True)   # full U, Vt
U, s, Vt = np.linalg.svd(A, full_matrices=False)  # compact form

# Randomized SVD — top k components (fast for large A)
U, s, Vt = randomized_svd(A, n_components=50, random_state=42)

# Rank-k reconstruction
A_approx = (U * s) @ Vt  # shape restored to original A shape

# Image compression example
for k in [5, 20, 50, 100]:
    img_k = (U[:,:k] * s[:k]) @ Vt[:k,:]
    print(f"k={k}: compression ratio = {img_k.size / k/(m+n+1):.1f}x")

10

Interactive Demo

Adjust the matrix entries and watch the singular values update in real time.

Live SVD Calculator 2×2 matrix

MATRIX ENTRY

a₁₁

a₁₂

a₂₁

a₂₂

PRESETS

Computing...