通过大型模型进行视觉语言导航中的可纠正的地标发现

📝

内容提要

Vision-Language Navigation (VLN) requires the agent to align landmarks based on instruction and visual observations. This paper proposes CONSOLE, a new paradigm that treats VLN as an open-world...

➡️

继续阅读