Self-Supervised Dense Visual Representation Learning

Timoteos Onur Özçelİk, Berk Gökberk, Lale Akarun

2024 32nd Signal Processing and Communications Applications Conference (SIU)

Abstract

Self-supervised representation learning has shown promising results in recent years. However, most of the proposed methods are pre-trained on object-centric datasets with image-level pretext tasks. In this study, we follow DenseCL, which is pre-trained on pixel-level scene-centric datasets with contrastive learning. Our goal is to alleviate the false negative pairing problem in contrastive learning by consistency regularization. Our method outperforms DenseCL and PixContrast models in most of the scenarios. In PASCAL VOC object detection, we see 0.2% AP50 and 0.3% AP improvements. In COCO object detection, we get 0.3% AP and 0.7% AP boosts. We also improve by 0.4% AP and 0.6% AP in COCO instance segmentation, and 0.1% mAP and 0.9% mAP in PASCAL VOC semantic segmentation. Moreover, attention map visualization and k-nearest neighbour retrieval indicate qualitative improvement from the proposed method.