Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
📝
内容提要
Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with...
➡️