Eagle is a series of high-resolution multimodal large language models centered around vision, supporting an input resolution of over 1K and performing excellently in tasks such as optical character recognition and document understanding.
Multimodal Fusion
Transformers