HarvestFlex: Strawberry Harvesting via Vision-Language-Action Policy Adaptation in the Wild

ArXi:2603.05982v1 Announce Type: cross This work presents the first study on transferring vision-language-action (VLA) policies to real greenhouse tabletop strawberry harvesting, a long-horizon, unstructured task challenged by occlusion and specular reflections. We built an end-to-end closed-loop system on the HarvestFlex platform using three-view RGB sensing (two fixed scene views plus a wrist-mounted view) and intentionally avoided depth clouds and explicit geometric calibration.