EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory

Key Features

Evolvable

Generates evolvable embeddings using native latent memory.

Scalability

Scales to 10x longer contexts across diverse domains.

Versatile

Enhances RAG and memory systems as both embedding and reranker.

Temporal

Born for temporal retrieval with high sensitivity to chronological order.

Models & Performance

0.8B

EvoEmbedding-0.8B

Ultra-efficient variant, outperforming much larger static baselines.

2B

EvoEmbedding-2B

The optimal balance between inference speed and retrieval accuracy.

4B

EvoEmbedding-4B

Our SOTA flagship, dominating long-context retrieval and memory tasks.

Methodology

Latent Memory Queue Segment Batching Evolvable Representations

EvoEmbedding jointly performs memory evolution and representation generation in parallel.

Conclusions

CORE FINDINGS

State-of-the-Art Retrieval Performance

EvoEmbedding achieves superior results across 10 benchmarks, outperforming established static and larger-scale specialist models with smaller parameter sizes.

Naive RAG Surpasses Dedicated Agentic Memory

A standard naive RAG pipeline using EvoEmbedding-4B outperforms complex agentic memory architectures while requiring no explicit memory construction token overhead at test time.

Plug-and-Play Compatibility with Agentic Workflows

EvoEmbedding works as a drop-in replacement in existing frameworks such as A-MEM and LightMem, improving performance without modifying the core generative LLMs.

Temporal Retrieval Capabilities

EvoEmbedding's latent space remains sensitive to chronological order, helping decouple temporal intents for queries constrained by terms such as "firstly" and "lastly".

Citation

@article{nie2026evoembedding,
  title={EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory},
  author={Nie, Chang and Fu, Chaoyou and Feng, Junlan and Shan, Caifeng},
  journal={arXiv preprint},
  year={2026}
}