Repository logo
 

CAD Model-Based 3D Scene Reconstruction


Loading...
Thumbnail Image

Type

Change log

Abstract

Accurate scene reconstruction from an image or a video is essential for various applications in robotics and augmented reality. One common method involves retrieving the best-matching CAD model for each observed object from a database and aligning it with the corresponding input. This technique yields a CAD model-based 3D scene representation that is compact, contains realistic shapes, and is well-suited for a wide range of downstream tasks. This thesis addresses the challenges of deriving such a representation by answering four key research questions. First, we investigate how to retrieve and align a CAD model from a database for an object detected in an image, assuming that an exact match exists for the detected object. We show that retrieving CAD model renders from an embedding space and predicting cross-domain keypoint correspondences between the render and the input image enables accurate alignments. Next, we tackle the problem of adapting CAD models when their shapes do not perfectly match the observed objects. Here we show that the established keypoint correspondences can not only be used to align the CAD model but also to modify its shape, and thereby better represent a wider range of object shapes. The third challenge involves accurately aligning retrieved CAD models when discrep- ancies exist between their shapes and the observed objects. To this end, we introduce a learned render-and-compare framework for CAD model-based scene reconstruction. In this framework, a neural network receives dual input streams — information about the observed image and the CAD model rendered in an initial pose — and is trained to iteratively refine the object’s pose. This method yields significantly more accurate alignments compared to existing approaches and improves further by jointly predicting alignments for multiple objects, leveraging regularities in the natural arrangement of objects in indoor scenes. Finally, we focus on achieving efficient, real-time CAD model-based scene reconstruction. For this purpose, we train a neural network to predict CAD model retrieval and alignment simultaneously and jointly for all objects present in a scene. This method significantly reduces the inference time by a factor of 50 compared to existing techniques. It can process both input point clouds and RGB videos, enabling real-time performance at 10 frames per second.

Description

Date

2024-09-26

Advisors

Cipolla, Roberto

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge

Rights and licensing

Except where otherwised noted, this item's license is described as All rights reserved