Skip to main content
To KTH's start page To KTH's start page

Fine-Grained and Continual Visual Recognition for Assisting Visually Impaired People

Time: Tue 2022-11-08 09.00

Location: F3, Lindstedtsvägen 26 & 28, Stockholm

Video link: zoom link for online defense

Language: English

Subject area: Computer Science

Doctoral student: Marcus Klasson , Robotik, perception och lärande, RPL

Opponent: Associate Professor Davide Bacciu, University of Pisa, Computer Science Department

Supervisor: Hedvig Kjellström, Robotik, perception och lärande, RPL; Cheng Zhang, Microsoft Research, Cambridge, United Kingdom

Export to calendar

QC 20221014


In recent years, computer vision-based assistive technologies have enabled visually impaired people to use automatic visual recognition on their mobile phones. These systems should be capable of recognizing objects on fine-grained levels to provide the user with accurate predictions. Additionally, the user should have the option to update the system continuously to recognize new objects of interest. However, there are several challenges that need to be tackled to enable such features with assistive vision systems in real and highly-varying environments. For instance, fine-grained image recognition usually requires large amounts of labeled data to be robust. Moreover, image classifiers struggle with retaining performance of previously learned abilities when they are adapted to new tasks. This thesis is divided into two parts where we address these challenges. First, we focus on the application of using assistive vision systems for grocery shopping, where items are naturally structured based on fine-grained details. We demonstrate how image classifiers can be trained with a combination of natural images and web-scraped information about the groceries to obtain more accurate classification performance compared to only using natural images for training. Thereafter, we bring forward a new approach for continual learning called replay scheduling, where we select which tasks to replay at different times to improve memory retention. Furthermore, we propose a novel framework for learning replay scheduling policies that can generalize to new continual learning scenarios for mitigating the catastrophic forgetting effect in image classifiers. This thesis provides insights on practical challenges that need to be addressed to enhance the usefulness of computer vision for assisting the visually impaired in real-world scenarios.