community[patch]: Update model client to support vision model in Tong… (#21474)

- **Description:** Tongyi uses different client for chat model and vision model. This PR chooses proper client based on model name to support both chat model and vision model. Reference [tongyi document](https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-qianwen-vl-plus-api?spm=a2c4g.11186623.0.0.27404c9a7upm11) for details. ``` from langchain_core.messages import HumanMessage from langchain_community.chat_models import ChatTongyi llm = ChatTongyi(model_name='qwen-vl-max') image_message = { "image": "https://lilianweng.github.io/posts/2023-06-23-agent/agent-overview.png" } text_message = { "text": "summarize this picture", } message = HumanMessage(content=[text_message, image_message]) llm.invoke([message]) ``` - **Issue:** None - **Dependencies:** None - **Twitter handle:** None
2025-08-06 19:48:26 +00:00 · 2024-05-22 02:58:27 +08:00 · 2024-05-22 02:58:27 +08:00 · 4cf523949a
commit 4cf523949a
parent 98b64f3ae3
3 changed files with 67 additions and 1 deletions
--- a/docs/docs/integrations/chat/tongyi.ipynb
+++ b/docs/docs/integrations/chat/tongyi.ipynb
@ -265,6 +265,50 @@
    "ai_message = chatLLM.bind(**llm_kwargs).invoke(messages)\n",
    "ai_message"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Tongyi With Vision\n",
+    "Qwen-VL(qwen-vl-plus/qwen-vl-max) are models that can process images."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "AIMessage(content=[{'text': 'The image presents a flowchart of an artificial intelligence system. The system is divided into two main components: short-term memory and long-term memory, which are connected to the \"Memory\" box.\\n\\nFrom the \"Memory\" box, there are three branches leading to different functionalities:\\n\\n1. \"Tools\" - This branch represents various tools that the AI system can utilize, including \"Calendar()\", \"Calculator()\", \"CodeInterpreter()\", \"Search()\" and others not explicitly listed.\\n\\n2. \"Action\" - This branch represents the action taken by the AI system based on its processing of information. It\\'s connected to both the \"Tools\" and the \"Agent\" boxes.\\n\\n3. \"Planning\" - This branch represents the planning process of the AI system, which involves reflection, self-critics, chain of thoughts, subgoal decomposition, and other processes not shown.\\n\\nThe central component of the system is the \"Agent\" box, which seems to orchestrate the flow of information between the different components. The \"Agent\" interacts with the \"Tools\" and \"Memory\" boxes, suggesting it plays a crucial role in the AI\\'s decision-making process. \\n\\nOverall, the image depicts a complex and interconnected artificial intelligence system, where different components work together to process information, make decisions, and take actions.'}], response_metadata={'model_name': 'qwen-vl-max', 'finish_reason': 'stop', 'request_id': '6a2b9e90-7c3b-960d-8a10-6a0cf9991ae5', 'token_usage': {'input_tokens': 1262, 'output_tokens': 260, 'image_tokens': 1232}}, id='run-fd030661-c734-4580-b977-b77d42680742-0')"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_community.chat_models import ChatTongyi\n",
+    "from langchain_core.messages import HumanMessage\n",
+    "\n",
+    "chatLLM = ChatTongyi(model_name=\"qwen-vl-max\")\n",
+    "image_message = {\n",
+    "    \"image\": \"https://lilianweng.github.io/posts/2023-06-23-agent/agent-overview.png\",\n",
+    "}\n",
+    "text_message = {\n",
+    "    \"text\": \"summarize this picture\",\n",
+    "}\n",
+    "message = HumanMessage(content=[text_message, image_message])\n",
+    "chatLLM.invoke([message])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
  }
 ],
 "metadata": {
--- a/libs/community/langchain_community/chat_models/tongyi.py
+++ b/libs/community/langchain_community/chat_models/tongyi.py
@ -281,6 +281,9 @@ class ChatTongyi(BaseChatModel):
                "Please install it with `pip install dashscope --upgrade`."
            )
        try:
+            if "vl" in values["model_name"]:
+                values["client"] = dashscope.MultiModalConversation
+            else:
                values["client"] = dashscope.Generation
        except AttributeError:
            raise ValueError(
--- a/libs/community/tests/integration_tests/chat_models/test_tongyi.py
+++ b/libs/community/tests/integration_tests/chat_models/test_tongyi.py
@ -77,6 +77,25 @@ def test_model() -> None:
    assert isinstance(response.content, str)


+def test_vision_model() -> None:
+    """Test model kwarg works."""
+    chat = ChatTongyi(model="qwen-vl-max")  # type: ignore[call-arg]
+    response = chat.invoke(
+        [
+            HumanMessage(
+                content=[
+                    {
+                        "image": "https://python.langchain.com/v0.1/assets/images/run_details-806f6581cd382d4887a5bc3e8ac62569.png"
+                    },
+                    {"text": "Summarize the image"},
+                ]
+            )
+        ]
+    )
+    assert isinstance(response, BaseMessage)
+    assert isinstance(response.content, list)
+
+
 def test_functions_call_thoughts() -> None:
    chat = ChatTongyi(model="qwen-plus")  # type: ignore[call-arg]