Back to Publications
EMNLP 2025 Findings 2024
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture
Xidong Wang*, Dingjie Song*, Shunian Chen, Chen Zhang, Benyou Wang
Abstract
An efficient hybrid architecture enabling multimodal LLMs to process up to 1000 images.