This repository contains the presentation and Python code for our research on stable and scalable QR factorizations for tall-and-skinny matrices using MapReduce. The study is based on the paper:
“Direct QR Factorizations for Tall-and-Skinny Matrices in MapReduce Architectures”
by Austin R. Benson, David F. Gleich, and James Demmel
To implement and evaluate various QR factorization methods for tall-and-skinny matrices in distributed systems, focusing on Direct TSQR — a numerically stable, efficient algorithm well-suited for big data environments.
cholesky_qr(A)householder_qr(A)indirect_tsqr(A)direct_tsqr(A)
Each method is tested for:
- Reconstruction Error:
‖A - QR‖ - Orthogonality:
‖QᵀQ - I‖
| File | Description |
|---|---|
presentation.pdf |
Slide deck summarizing the background, algorithm, implementation, and results |
qr_methods.ipynb |
Jupyter Notebook implementing and evaluating the factorization techniques |
README.md |
You're here! |
- Python with Dask for parallel/distributed computation
- NumPy & SciPy for numerical operations
- Jupyter for demonstration and analysis
- Cholesky QR: Fast but unstable on ill-conditioned matrices
- Householder QR: Accurate but not scalable
- Indirect TSQR: Accurate R, unstable Q
- Direct TSQR: Best trade-off between performance and accuracy
- Distributed Linear Algebra
- Machine Learning (e.g., PCA, least squares)
- Scientific Computing with large datasets
- Big Data Systems using Hadoop/Spark
The original paper and implementation:
🔗 https://github.com/arbenson/mrtsqr
- Manami Das - MDS202423
- Maria Paul Thurkadayil - MDS202424
- Mohit Singh Sinsniwal - MDS202425
Guided by Prof. Kavita Sutar
This project is for academic and educational purposes only.