Vision Language Models MoE-LLaVA, MOBILE-AGENT, and more

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Routers in Vision Mixture of Experts: An Empirical Study

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Models

LLaVA-1.6: Improved reasoning, OCR, and world knowledge

MouSi: Poly-Visual-Expert Vision-Language Models https://arxiv.org/pdf/2401.17221.pdf

https://github.com/vikhyat/moondream

https://huggingface.co/LanguageBind/MoE-LLaVA-Phi2-2.7B-4e-384

https://replicate.com/yorickvp/llava-v1.6-mistral-7b

https://qwenlm.github.io/blog/qwen-vl/