On Neural Representations for Point Cloud-based 3D Shape and Motion Modeling
Overview
This is the doctoral thesis which I did on the topic of neural representations for point cloud-based applications.
Abstract
3D scene representations are an important consideration for many computer vision and computer graphics-based applications. While traditional representations, such as point clouds, triangle meshes and voxel grids have been researched and used for decades, the advent of neural networks and deep learning has led to the continuous development of new scene representations. Each of these comes with its benefits and drawbacks, and the choice often depends on the specific task at hand. While point clouds are easy to obtain (e.g. from depth sensors) and form the backbone of many computer vision related tasks, neural representations offer new advantages such as being lightweight and providing data-driven natural interpolation of scene properties. The utilization of neural representations within the context of point cloud-based applications raises new challenges, such as how to efficiently extract detailed geometry information from large-scale point clouds, how to estimate the scene flow between two sets of points, or how to track human poses from point scans, to name a few. This thesis aims to address some of these challenges, focusing on four main tasks: Surface reconstruction of large point clouds, meshing of neural implicit representations, human body modeling and scene flow estimation. First, we examine neural implicit functions as a scene representation for static surface reconstruction from point clouds, with a particular focus on largescale scenes. Second, we observe that aforementioned neural implicit representations need an additional post-processing step to extract the final iso-surface and for this reason, we develop a novel meshing approach to address this. Next, we move from generic scenes to the problem of human body modeling and propose a representation that disentangles the pose from the shape, while still allowing the pose to be controllable. Finally, we shift our focus again to look at scene representations in the context of motion, in particular scene flow estimation, and suggest a method which disentangles camera motion from non-rigid scene flow.