doc: Implement langchain-xinference (#30296)

- [ ] **PR title**: Implement langchain-xinference - [ ] **PR message**: Implement a standalone package for Xinference chat models and llm models. https://github.com/langchain-ai/langchain/issues/30045#issue-2887214214
2025-07-07 05:30:39 +00:00 · 2025-03-18 23:50:16 +08:00 · 2025-03-18 23:50:16 +08:00 · 251551ccf1
commit 251551ccf1
parent 492b4c1604
3 changed files with 269 additions and 1 deletions
--- a/docs/docs/integrations/chat/xinference.ipynb
+++ b/docs/docs/integrations/chat/xinference.ipynb
@ -0,0 +1,245 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "sidebar_label: Xinference\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01ca90da",
+   "metadata": {},
+   "source": [
+    "# ChatXinference\n",
+    "\n",
+    "[Xinference](https://github.com/xorbitsai/inference) is a powerful and versatile library designed to serve LLMs, \n",
+    "speech recognition models, and multimodal models, even on your laptop. It supports a variety of models compatible with GGML, such as chatglm, baichuan, whisper, vicuna, orca, and many others.\n",
+    "\n",
+    "## Overview\n",
+    "### Integration details\n",
+    "\n",
+    "| Class | Package | Local | Serializable | [JS support] | Package downloads | Package latest |\n",
+    "| :--- | :--- | :---: | :---: |  :---: | :---: | :---: |\n",
+    "| ChatXinference| langchain-xinference | ✅ | ❌ | ✅ | ✅ | ✅ |\n",
+    "\n",
+    "### Model features\n",
+    "| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
+    "| :---: |:----------------------------------------------------:| :---: | :---: |  :---: | :---: | :---: | :---: | :---: | :---: |\n",
+    "| ✅ |                          ✅                           | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ |\n",
+    "\n",
+    "## Setup\n",
+    "\n",
+    "Install `Xinference` through PyPI:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install --upgrade --quiet  \"xinference[all]\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fb27b941602401d91542211134fc71a",
+   "metadata": {},
+   "source": [
+    "### Deploy Xinference Locally or in a Distributed Cluster.\n",
+    "\n",
+    "For local deployment, run `xinference`. \n",
+    "\n",
+    "To deploy Xinference in a cluster, first start an Xinference supervisor using the `xinference-supervisor`. You can also use the option -p to specify the port and -H to specify the host. The default port is 8080 and the default host is 0.0.0.0.\n",
+    "\n",
+    "Then, start the Xinference workers using `xinference-worker` on each server you want to run them on. \n",
+    "\n",
+    "You can consult the README file from [Xinference](https://github.com/xorbitsai/inference) for more information.\n",
+    "### Wrapper\n",
+    "\n",
+    "To use Xinference with LangChain, you need to first launch a model. You can use command line interface (CLI) to do so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "acae54e37e7d407bbb7b55eff062a284",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model uid: 7167b2b0-2a04-11ee-83f0-d29396a3f064\n"
+     ]
+    }
+   ],
+   "source": [
+    "%xinference launch -n vicuna-v1.3 -f ggmlv3 -q q4_0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a63283cbaf04dbcab1f6479b197f3a8",
+   "metadata": {},
+   "source": [
+    "A model UID is returned for you to use. Now you can use Xinference with LangChain:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8dd0d8092fe74a7c96281538738b07e2",
+   "metadata": {},
+   "source": [
+    "## Installation\n",
+    "\n",
+    "The LangChain Xinference integration lives in the `langchain-xinference` package:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "72eea5119410473aa328ad9291626812",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -qU langchain-xinference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8edb47106e1a46a883d545849b8ab81b",
+   "metadata": {},
+   "source": [
+    "Make sure you're using the latest Xinference version for structured outputs."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "55935996",
+   "metadata": {},
+   "source": [
+    "## Instantiation\n",
+    "\n",
+    "Now we can instantiate our model object and generate chat completions:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "10185d26023b46108eb7d9f57d49d2b3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_xinference.chat_models import ChatXinference\n",
+    "\n",
+    "llm = ChatXinference(\n",
+    "    server_url=\"your_server_url\", model_uid=\"7167b2b0-2a04-11ee-83f0-d29396a3f064\"\n",
+    ")\n",
+    "\n",
+    "llm.invoke(\n",
+    "    \"Q: where can we visit in the capital of France?\",\n",
+    "    config={\"max_tokens\": 1024},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8763a12b2bbd4a93a75aff182afb95dc",
+   "metadata": {},
+   "source": [
+    "## Invocation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "50f6c0e4",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain_core.messages import HumanMessage, SystemMessage\n",
+    "from langchain_xinference.chat_models import ChatXinference\n",
+    "\n",
+    "llm = ChatXinference(\n",
+    "    server_url=\"your_server_url\", model_uid=\"7167b2b0-2a04-11ee-83f0-d29396a3f064\"\n",
+    ")\n",
+    "\n",
+    "system_message = \"You are a helpful assistant that translates English to French. Translate the user sentence.\"\n",
+    "human_message = \"I love programming.\"\n",
+    "\n",
+    "llm.invoke([HumanMessage(content=human_message), SystemMessage(content=system_message)])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7623eae2785240b9bd12b16a66d81610",
+   "metadata": {},
+   "source": [
+    "## Chaining\n",
+    "\n",
+    "We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "317d05c6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.prompts import PromptTemplate\n",
+    "from langchain_xinference.chat_models import ChatXinference\n",
+    "\n",
+    "prompt = PromptTemplate(\n",
+    "    input=[\"country\"], template=\"Q: where can we visit in the capital of {country}? A:\"\n",
+    ")\n",
+    "\n",
+    "llm = ChatXinference(\n",
+    "    server_url=\"your_server_url\", model_uid=\"7167b2b0-2a04-11ee-83f0-d29396a3f064\"\n",
+    ")\n",
+    "\n",
+    "chain = prompt | llm\n",
+    "chain.invoke(input={\"country\": \"France\"})\n",
+    "chain.stream(input={\"country\": \"France\"})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66f7fb26",
+   "metadata": {},
+   "source": [
+    "## API reference\n",
+    "\n",
+    "For detailed documentation of all ChatXinference features and configurations head to the API reference: https://github.com/TheSongg/langchain-xinference"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/docs/integrations/providers/xinference.mdx
+++ b/docs/docs/integrations/providers/xinference.mdx
@ -100,3 +100,22 @@ For more information and detailed examples, refer to the
 Xinference also supports embedding queries and documents. See
 [example for xinference embeddings](/docs/integrations/text_embedding/xinference) 
 for a more detailed demo.
+
+
+### Xinference LangChain partner package install
+Install the integration package with:
+```bash
+pip install langchain-xinference
+```
+## Chat Models
+
+```python
+from langchain_xinference.chat_models import ChatXinference
+```
+
+## LLM
+
+```python
+from langchain_xinference.llms import Xinference
+```
+
--- a/libs/packages.yml
+++ b/libs/packages.yml
@ -516,3 +516,7 @@ packages:
 - name: langchain-agentql
  path: langchain
  repo: tinyfish-io/agentql-integrations
+- name: langchain-xinference
+  path: .
+  repo: TheSongg/langchain-xinference
+