Apple-trained AI captions images better than models 10× its size
Airfind news item
By Marcus Mendes
Published on March 25, 2026.
Apple researchers have developed a new way to train AI models for image captioning that delivers more accurate, detailed descriptions while using smaller models. The study, titled RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning, involved a collaboration between Apple Researchers and the University of Wisconsin—Madison. The system randomly sampled 50,000 images from two training datasets and used existing vision language models to create caption options for each image using their own caption. The researchers then applied this to create more precise, structured feedback on what to fix, leading to more accurate captions without relying on a single "correct" answer. The new model outperformed models with up to 72 billion parameters.
Read Original Article