Strategies to address the false-negative pairing problem in contrastive learning

Timoteos Onur Özçelik, Berk Gökberk, Lale Akarun

Signal, Image and Video Processing

Abstract

The availability of ShapeNet, a dataset with vast numbers of 3D objects, has led to the development of successful 3D reconstruction models. However, evaluation against similar datasets that measure aspects closely related to ShapeNet is often misleading. We propose a novel benchmark to tackle this assessment problem. We selected three state-of-the-art models for comparison: The voxel-based 3D-C2FT, Pix2Vox, and occupancy function based Occupancy Networks to demonstrate the effectiveness of our benchmark. We adapted a novel dataset, 3DCoMPaT++, which offers rich material and part annotations for the evaluation of 3D reconstructions. We assessed the reconstruction performance by changing viewpoints and varying styles in 2D input images. The results show that models struggle to adapt to novel settings. We also evaluated models at the part level to identify the most challenging parts. We propose Part F1-Score@0.01 for evaluation. Our experiments show quantitatively that performance degrades drastically and the methods perform poorly in finer details and thin parts.