Deep Learning Assisted Visual Odometry
Time: Fri 2020-06-12 10.00
Location: Zoom webinar link: (English)
Doctoral student: Jiexiong Tang , Robotik, perception och lärande, RPL, Centrum för autonoma system, CAS
Opponent: Davison Andrew, Imperial College London
Supervisor: Patric Jensfelt, Signaler, sensorer och system, Numerisk analys och datalogi, NADA, Robotik, perception och lärande, RPL
Abstract
The capabilities to autonomously explore and interact with the environmenthas always been a greatly demanded capability for robots. Varioussensor based SLAM methods were investigated and served for this purposein the past decades. Vision intuitively provides 3D understanding of the surroundingand contains a vast amount of information that require high levelintelligence to interpret. Sensors like LIDAR, returns the range measurementdirectly. The motion estimation and scene reconstruction using camera is aharder problem. In this thesis, we are in particular interested in the trackingfrond-end of vision based SLAM, i.e. Visual Odometry (VO), with afocus on deep learning approaches. Recently, learning based methods havedominated most of the vision applications and gradually appears in our dailylife and real-world applications. Different to classical methods, deep learningbased methods can potentially tackle some of the intrinsic problems inmulti-view geometry and straightforwardly improve the performance of crucialprocedures of VO. For example, the correspondences estimation, densereconstruction and semantic representation.
In this work, we propose novel learning schemes for assisting both directand in-direct visual odometry methods. For the direct approaches, weinvestigate mainly the monocular setup. The lack of the baseline that providesscale as in stereo has been one of the well-known intrinsic problems inthis case. We propose a coupled single view depth and normal estimationmethod to reduce the scale drift and address the issue of lacking observationsof the absolute scale. It is achieved by providing priors for the depthoptimization. Moreover, we utilize higher-order geometrical information toguide the dense reconstruction in a sparse-to-dense manner. For the in-directmethods, we propose novel feature learning based methods which noticeablyimprove the feature matching performance in comparison with common classicalfeature detectors and descriptors. Finally, we discuss potential ways tomake the training self-supervised. This is accomplished by incorporating thedifferential motion estimation into the training while performing multi-viewadaptation to maximize the repeatability and matching performance. We alsoinvestigate using a different type of supervisory signal for the training. Weadd a higher-level proxy task and show that it is possible to train a featureextraction network even without the explicit loss for it.
In summary, this thesis presents successful examples of incorporating deeplearning techniques to assist a classical visual odometry system. The resultsare promising and have been extensively evaluated on challenging benchmarks,real robot and handheld cameras. The problem we investigate is stillin an early stage, but is attracting more and more interest from researcher inrelated fields.