doc: Implement langchain-xinference (#30296)

- [ ] **PR title**: Implement langchain-xinference

- [ ] **PR message**: 
Implement a standalone package for Xinference chat models and llm
models.

https://github.com/langchain-ai/langchain/issues/30045#issue-2887214214
This commit is contained in:
TheSongg 2025-03-18 23:50:16 +08:00 committed by GitHub
parent 492b4c1604
commit 251551ccf1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 269 additions and 1 deletions

View File

@ -0,0 +1,245 @@
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Xinference\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "01ca90da",
"metadata": {},
"source": [
"# ChatXinference\n",
"\n",
"[Xinference](https://github.com/xorbitsai/inference) is a powerful and versatile library designed to serve LLMs, \n",
"speech recognition models, and multimodal models, even on your laptop. It supports a variety of models compatible with GGML, such as chatglm, baichuan, whisper, vicuna, orca, and many others.\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | [JS support] | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| ChatXinference| langchain-xinference | ✅ | ❌ | ✅ | ✅ | ✅ |\n",
"\n",
"### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
"| :---: |:----------------------------------------------------:| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ |\n",
"\n",
"## Setup\n",
"\n",
"Install `Xinference` through PyPI:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet \"xinference[all]\""
]
},
{
"cell_type": "markdown",
"id": "7fb27b941602401d91542211134fc71a",
"metadata": {},
"source": [
"### Deploy Xinference Locally or in a Distributed Cluster.\n",
"\n",
"For local deployment, run `xinference`. \n",
"\n",
"To deploy Xinference in a cluster, first start an Xinference supervisor using the `xinference-supervisor`. You can also use the option -p to specify the port and -H to specify the host. The default port is 8080 and the default host is 0.0.0.0.\n",
"\n",
"Then, start the Xinference workers using `xinference-worker` on each server you want to run them on. \n",
"\n",
"You can consult the README file from [Xinference](https://github.com/xorbitsai/inference) for more information.\n",
"### Wrapper\n",
"\n",
"To use Xinference with LangChain, you need to first launch a model. You can use command line interface (CLI) to do so:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "acae54e37e7d407bbb7b55eff062a284",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model uid: 7167b2b0-2a04-11ee-83f0-d29396a3f064\n"
]
}
],
"source": [
"%xinference launch -n vicuna-v1.3 -f ggmlv3 -q q4_0"
]
},
{
"cell_type": "markdown",
"id": "9a63283cbaf04dbcab1f6479b197f3a8",
"metadata": {},
"source": [
"A model UID is returned for you to use. Now you can use Xinference with LangChain:"
]
},
{
"cell_type": "markdown",
"id": "8dd0d8092fe74a7c96281538738b07e2",
"metadata": {},
"source": [
"## Installation\n",
"\n",
"The LangChain Xinference integration lives in the `langchain-xinference` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "72eea5119410473aa328ad9291626812",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-xinference"
]
},
{
"cell_type": "markdown",
"id": "8edb47106e1a46a883d545849b8ab81b",
"metadata": {},
"source": [
"Make sure you're using the latest Xinference version for structured outputs."
]
},
{
"cell_type": "markdown",
"id": "55935996",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our model object and generate chat completions:\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "10185d26023b46108eb7d9f57d49d2b3",
"metadata": {},
"outputs": [],
"source": [
"from langchain_xinference.chat_models import ChatXinference\n",
"\n",
"llm = ChatXinference(\n",
" server_url=\"your_server_url\", model_uid=\"7167b2b0-2a04-11ee-83f0-d29396a3f064\"\n",
")\n",
"\n",
"llm.invoke(\n",
" \"Q: where can we visit in the capital of France?\",\n",
" config={\"max_tokens\": 1024},\n",
")"
]
},
{
"cell_type": "markdown",
"id": "8763a12b2bbd4a93a75aff182afb95dc",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "50f6c0e4",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_core.messages import HumanMessage, SystemMessage\n",
"from langchain_xinference.chat_models import ChatXinference\n",
"\n",
"llm = ChatXinference(\n",
" server_url=\"your_server_url\", model_uid=\"7167b2b0-2a04-11ee-83f0-d29396a3f064\"\n",
")\n",
"\n",
"system_message = \"You are a helpful assistant that translates English to French. Translate the user sentence.\"\n",
"human_message = \"I love programming.\"\n",
"\n",
"llm.invoke([HumanMessage(content=human_message), SystemMessage(content=system_message)])"
]
},
{
"cell_type": "markdown",
"id": "7623eae2785240b9bd12b16a66d81610",
"metadata": {},
"source": [
"## Chaining\n",
"\n",
"We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "317d05c6",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"from langchain_xinference.chat_models import ChatXinference\n",
"\n",
"prompt = PromptTemplate(\n",
" input=[\"country\"], template=\"Q: where can we visit in the capital of {country}? A:\"\n",
")\n",
"\n",
"llm = ChatXinference(\n",
" server_url=\"your_server_url\", model_uid=\"7167b2b0-2a04-11ee-83f0-d29396a3f064\"\n",
")\n",
"\n",
"chain = prompt | llm\n",
"chain.invoke(input={\"country\": \"France\"})\n",
"chain.stream(input={\"country\": \"France\"})"
]
},
{
"cell_type": "markdown",
"id": "66f7fb26",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all ChatXinference features and configurations head to the API reference: https://github.com/TheSongg/langchain-xinference"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -100,3 +100,22 @@ For more information and detailed examples, refer to the
Xinference also supports embedding queries and documents. See
[example for xinference embeddings](/docs/integrations/text_embedding/xinference)
for a more detailed demo.
### Xinference LangChain partner package install
Install the integration package with:
```bash
pip install langchain-xinference
```
## Chat Models
```python
from langchain_xinference.chat_models import ChatXinference
```
## LLM
```python
from langchain_xinference.llms import Xinference
```

View File

@ -516,3 +516,7 @@ packages:
- name: langchain-agentql
path: langchain
repo: tinyfish-io/agentql-integrations
- name: langchain-xinference
path: .
repo: TheSongg/langchain-xinference