NV-Embed: Training a Decoder-only Embedding Model with Latent Attention Pooling
The core idea of NV-Embed (NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models, 2024, NVIDIA, ICLR 2025): start directly from Mistral 7B, drop the causal attention mask, attach a latent attention layer on top of the last hidden state for pooling, then run two-stage contrastive instruction tuning, retrieval data with in-batch negatives first, then a mix with non-retrieval data and in-batch negatives turned off. On the 56-task MTEB, NV-Embed-v1 averages 69.32, and v2 pushes the score to 72.31 with hard-negative mining, synthetic data, and example-based multi-class labeling, taking the No.1 spot on MTEB in May and August 2024 respectively.