AI RESEARCH
[R] PCA on ~40k × 40k matrix in representation learning — sklearn SVD crashes even with 128GB RAM. Any practical solutions?
r/MachineLearning
•
Hi all, I'm doing ML research in representation learning and ran into a computational issue while computing PCA. My pipeline produces a feature representation where the covariance matrix A TA is roughly 40k × 40k. I need the full eigendecomposition / PCA basis, not just the top-k components. Currently I'm trying to run PCA using sklearn.decomposition. PCA(svd_solver="full"), but it crashes. This happens even on our compute cluster where I allocate ~128GB RAM, so it doesn't appear to be a simple memory limit issue. submitted by /u/nat-abhishek [link] [comments.