Article Open Access

Screen Reader AI: A Conversational Web-Accessibility Assistant for Blind and Low-Vision Users

Rushilkumar Patel

Abstract


Blind and low-vision users continue to face significant challenges when interacting with modern dynamic and visually complex web applications. Traditional screen readers often fall short due to the rapid changes in content, single-page applications, and intricate layouts. This paper introduces Screen Reader AI, a novel conversational web accessibility assistant implemented as a browser extension, designed to provide adaptive and context-rich support for non-visual navigation. Unlike conventional screen readers, Screen Reader AI constructs and continuously updates a live semantic scene graph by integrating the Document Object Model (DOM) and the Accessibility Object Model (AOM). Leveraging multimodal vision-language reasoning powered by GPT-4o, it generates detailed visual interpretations, detects interface structures and interactive elements, and conveys this information through natural, conversational dialogue. This approach allows users to request clarifications, discover relationships between interface components, and receive proactive notifications about dynamic content updates. The system features a modular architecture that ensures compatibility with evolving AI models and web standards, while maintaining an intuitive user interface. Core capabilities include adaptive task guidance, an interactive dashboard with contextual summaries, nested menus, live feeds, and predictive navigation assistance across diverse content types such as forms and multimedia. An evaluation framework outlines expected improvements in user experience, including reduced task completion times, enhanced understanding of page layouts, and greater autonomy during browsing. Initial findings suggest that conversational interaction can decrease cognitive load by reducing repetitive commands and streamlining information retrieval. Screen Reader AI represents a paradigm shift in digital accessibility by embedding adaptive intelligence into assistive technology, empowering independence and inclusivity while making accessibility an integral part of web innovation.


Keywords


Accessibility, AI Screen Reader, Accessibility Object Model, User Experience, Document Object Model

References


Oh, U., Joh, H. and Lee, Y. (2021). Image Accessibility for Screen Reader Users: A Systematic Review and a Road Map. Electronics, 10(8), p.953. doi: https://doi.org/10.3390/electronics10080953 (accessed: 19 Jul 2025)

WICG, “Accessibility Object Model (AOM) v1.1,” 2024. [Online]. Available: https://wicg.github.io/aom/ (accessed: 19 Jul 2025)

G. Moterani and W. R. Lin, “Breaking the Linear Barrier: A Multi Modal LLM Based System for Navigating Complex Web Con-tent,” in 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), Jul. 2025, pp. 2066 2075. doi: 10.1109/COMPSAC65507.2025.00289

Caffagni, D., Cocchi, F., Barsellotti, L., Moratelli, N., Sarto, S., Baraldi, L., Baraldi, L., Cornia, M. and Cucchiara, R. (2024). The (R)Evolution of Multimodal Large Language Models: A Survey. [online] arXiv.org. doi: https://doi.org/10.48550/arXiv.2402.12451.

M. Perera, S. Ananthanarayan, C. Goncu, and K. Marriott, “The Sky is the Limit: Understanding How Generative AI can Enhance Screen Reader Users’ Experience with Productivity Applications,” in Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Apr. 2025, pp. 1 17. doi: 10.1145/3706598.3713634

Satwik Ram Kodandaram, Utku Uckun, Xiaojun Bi, I. V. Ramakrishnan, Vikas Ashok, et al., “Enabling Uniform Computer Interac-tion Experience for Blind Users through Large Language Models,” Proc. ACM SIGACCESS Conf. Computers and Accessibility (AS-SETS ’24), Oct. 2024. doi: 10.1145/3663548.3675605 ACM Digital Library+1

S. Haque and C. Csallner, “Early Accessibility: Automating Alt Text Generation for UI Icons During App Development,” arXiv, 2025. [Online]. Available: https://arxiv.org/abs/2504.13069 (accessed: 02 Sep 2025)

T. Çelik, “AI Driven Production in Modular Architecture: An Examination of Design Processes and Methods,” Computer and Deci-sion Making, vol. 1, pp. 320 339, Nov. 2024. doi: 10.59543/comdem.v1i.10825

Xu, S. (2025). Facilitating Visual Media Exploration for Blind and Low Vision Users through AI-Powered Interactive Storytelling. [online] arXiv.org. Doi: https://arxiv.org/abs/2508.03061.

L. Y. Wen, C. Morrison, M. Grayson, R. F. Marques, D. Massiceti, C. Longden, & E. Cutrell, “Find My Things: Personalized Ac-cessibility through Teachable AI for People who are Blind or Low Vision,” Extended Abstracts of CHI Conference on Human Fac-tors in Computing Systems, 2024, pp. 1 6. doi: 10.1145/3613905.3648641

N. Chen, L. K. Qiu, A. Z. Wang, Z. Wang, & Y. Yang, “Screen Reader Users in the Vibe Coding Era: Adaptation, Empowerment, and New Accessibility Landscape,” arXiv, 2025. [Online]. Available: https://arxiv.org/abs/2506.13270

M. Das, A. J. Fiannaca, M. R. Morris, S. K. Kane, & C. L. Bennett, “From Provenance to Aberrations: Image Creator and Screen Reader User Perspectives on Alt Text for AI Generated Images,” in Proc. CHI Conf. Hum. Factors in Computing Systems, 2024, pp. 1 21. doi: 10.1145/3613904.3642325

X. Deng, P. Shiralkar, C. Lockard, B. Huang, & H. Sun, “DOM LM: Learning Generalizable Representations for HTML Docu-ments,” arXiv, Jan. 25, 2022. doi: 10.48550/arXiv.2201.10608

A. Forootan, “Empowering Agentic Non Visual Web Navigation Through Tactile Controls and AI Support,” Open Research Reposi-tory, OCAD University, 2025. [Online]. Available: https://openresearch.ocadu.ca/id/eprint/4779/7/Empowering%20Agentic%20Non Visual%20Web%20Navigation%20Through%20Tactile%20Controls%20and%20AI%20Support Amin%20Forootan.pdf

R. E. Gonzalez, J. Collins, C. Bennett, & S. Azenkot, “Investigating Use Cases of AI Powered Scene Description Applications for Blind and Low Vision People,” arXiv, 2024. doi: 10.1145/3613904.3642211

M. Holmlund, “Evaluating ChatGPT’s Effectiveness in Web Accessibility for the Visually Impaired,” DIVA, 2024. [Online]. Availa-ble: https://www.diva portal.org/smash/record.jsf?pid=diva2:1872759

C. Kearney Volpe and A. Hurst, “Accessible Web Development,” ACM Trans. Accessible Comput., vol. 14, no. 2, pp. 1 32, 2021. doi: 10.1145/3458024

B. Leporini, M. Buzzi, and M. Hersh, “Video conferencing tools: Comparative study of the experiences of screen reader users and the development of more inclusive design guidelines,” ACM Trans. Accessible Comput., vol. 16, no. 1, 2022. doi: 10.1145/3573012

T. Liu, P. Fazli, & H. Jeong, “Artificial Intelligence in Virtual Reality for Blind and Low Vision Individuals: Literature Review,” Proc. Human Factors and Ergonomics Society Annual Meeting, 2024. doi: 10.1177/10711813241266832

J. Nino, S. Ochoa, J. Kiss, G. Edwards, E. Morales, J. Hutson, F. Poncet, & W. Wittich, “Assistive Technologies for Internet Navi-gation: A Review of Screen Reader Solutions for the Blind and Visually Impaired,” Int. J. Recent Eng. Sci., vol. 11, no. 6, pp. 260 274, 2024. doi: 10.14445/23497157/ijres v11i6p122

A. Sharif, S. S. Chintalapati, J. O. Wobbrock, & K. Reinecke, “Understanding Screen Reader Users’ Experiences with Online Data Visualizations,” in 23rd Int. ACM SIGACCESS Conf. Computers and Accessibility, 2021. doi: 10.1145/3441852.3471202

Oh, U., Joh, H., & Lee, Y., “Image Accessibility for Screen Reader Users: A Systematic Review and a Road Map,” Electronics, vol. 10, no. 8, 2021, article 953. doi: 10.3390/electronics10080953

A. S. Al Subaihin, A. S. Al Khlaifa, and H. S. Al Khlaifa, “Accessibility of Mobile Web Apps by Screen Readers of Touch Based Mobile Phones,” in Trends in Mobile Web Information Systems (MobiWIS 2013), Communications in Computer and Information Science, vol. 183, pp. 35 43, 2013. doi: 10.1007/978 3 319 03737 0_5




DOI: https://doi.org/10.52088/ijesty.v5i3.1562

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Rushilkumar Patel

International Journal of Engineering, Science, and Information Technology (IJESTY) eISSN 2775-2674