Introducing Infinite Context Transformers with Infini-attention
Google’s latest research introduces a groundbreaking technique to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. The new attention mechanism, called “Infini-attention,” incorporates a compressive memory and combines masked local attention with long-term linear attention mechanisms in a single Transformer block. This innovation aims to address the challenge of context length limitations in LLMs, enabling them to process and generate responses based on an infinite context. Read the research paper “Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention” to learn more.