通过大型模型进行视觉语言导航中的可纠正的地标发现
📝
内容提要
Vision-Language Navigation (VLN) requires the agent to align landmarks based on instruction and visual observations. This paper proposes CONSOLE, a new paradigm that treats VLN as an open-world...
➡️