Efficient Embedding Initialization via Dominant Eigenvector Projections

Petit, Quentin; Li, Chong; Emad, Nahid; Dongarra, Jack

Submitted by webmaster on Tue, 11/25/2025 - 11:36

Title	Efficient Embedding Initialization via Dominant Eigenvector Projections
Publication Type	Conference Paper
Year of Publication	2025
Authors	Petit, Q., C. Li, N. Emad, and J. Dongarra
Conference Name	SC Workshops '25: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
Date Published	2025-11
Publisher	ACM
Conference Location	St Louis, MO USA
ISBN Number	9798400718717
Abstract	The embedding layer is essential in deep learning, transforming high-dimensional data into compact representations. However, growing datasets and model sizes pose challenges in training time, memory, and generalization. We propose a scalable method for embedding initialization via spectral dimensionality reduction using dominant eigenvector projections. The proposed approach leverages on MIRAMns, multiple implicitly restarted Arnoldi method with nested subspaces, to extract most informative directions from large and potentially sparse data representations. Unlike traditional embeddings or autoencoders, this proposed approach requires few tunable parameters and is inherently parallel. We apply MIRAMns to matrix representations such as covariance and co-occurrence matrices to compute low-dimensional embeddings that preserve data structure and variance. Experiments across diverse datasets show that the proposed method achieves comparable or better accuracy with significantly reduced dimensionality, enabling smaller, faster deep networks. Additionally, our parallel implementation scales efficiently on HPC platforms, making it well-suited for large-scale scientific and AI workloads.
URL	https://doi.org/10.1145/3731599.3767541
DOI	10.1145/3731599.3767541

External Publication Flag: