Adaptive Depth-converted-Scale Convolution for Self-supervised Monocular Depth Estimation

ArXi:2604.07665v1 Announce Type: new Self-supervised monocular depth estimation (MDE) has received increasing interests in the last few years. The objects in the scene, including the object size and relationship among different objects, are the main clues to extract the scene structure. However, previous works lack the explicit handling of the changing sizes of the object due to the change of its depth. Especially in a monocular video, the size of the same object is continuously changed, resulting in size and depth ambiguity.