MENLO PARK — Facebook AI has announced significant advancements in its automatic alternative text (AAT) technology, a tool designed to generate photo descriptions for users who are blind or visually impaired. Originally launched in 2016, AAT has now been upgraded to recognize and describe over 1,200 concepts in photos, making it more inclusive and detailed than ever before.
The enhanced AAT model uses cutting-edge AI techniques, including weakly supervised learning from billions of public Instagram images and their hashtags. This approach has allowed Facebook to drastically increase the number of recognizable objects and improve the accuracy of its descriptions. For example, the system can now identify activities, landmarks, and objects with greater specificity, such as noting the presence of the Leaning Tower of Pisa in a photo or describing a scene with “two people in the center and three others scattered around.”
AAT’s improvements also extend to providing positional information and relative size descriptions, making it possible to convey not just what is in a photo but how the elements are arranged. This allows for more nuanced descriptions, enhancing the user experience for those who rely on screen readers.
“Accuracy is paramount for our users who are blind or visually impaired,” said a Facebook AI spokesperson. “We’ve focused on ensuring that the descriptions we generate are as precise and informative as possible, providing users with a richer understanding of the photos in their feeds.”
To ensure the technology meets the needs of its users, Facebook consulted extensively with screen reader users. The feedback informed the design of AAT, leading to features like customizable levels of detail in descriptions, depending on whether a photo is of personal significance or not.
This latest iteration of AAT reflects Facebook AI’s commitment to using technology to bridge accessibility gaps and improve the online experience for all users. By continuing to refine and expand AAT, Facebook is making strides toward a more inclusive platform where everyone, regardless of visual ability, can engage with and enjoy shared content.