Self-Supervised Dense Visual Representation Learning
Timoteos Onur Özçelİk, Berk Gökberk, Lale Akarun
2024 32nd Signal Processing and Communications Applications Conference (SIU)
Abstract
Self-supervised representation learning has shown promising results in recent years. However, most of the proposed methods are pre-trained on object-centric datasets with image-level pretext tasks. In this study, we follow DenseCL, which is pre-trained on pixel-level scene-centric datasets with contrastive learning. Our goal is to alleviate the false negative pairing problem in contrastive learning by consistency regularization. Our method outperforms DenseCL and PixContrast models in most of the scenarios. In PASCAL VOC object detection, we see 0.2% AP50 and 0.3% AP improvements. In COCO object detection, we get 0.3% AP and 0.7% AP boosts. We also improve by 0.4% AP and 0.6% AP in COCO instance segmentation, and 0.1% mAP and 0.9% mAP in PASCAL VOC semantic segmentation. Moreover, attention map visualization and k-nearest neighbour retrieval indicate qualitative improvement from the proposed method.