diff --git a/docs/docs/integrations/chat/xinference.ipynb b/docs/docs/integrations/chat/xinference.ipynb new file mode 100644 index 00000000000..9992b10d21c --- /dev/null +++ b/docs/docs/integrations/chat/xinference.ipynb @@ -0,0 +1,245 @@ +{ + "cells": [ + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "---\n", + "sidebar_label: Xinference\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "01ca90da", + "metadata": {}, + "source": [ + "# ChatXinference\n", + "\n", + "[Xinference](https://github.com/xorbitsai/inference) is a powerful and versatile library designed to serve LLMs, \n", + "speech recognition models, and multimodal models, even on your laptop. It supports a variety of models compatible with GGML, such as chatglm, baichuan, whisper, vicuna, orca, and many others.\n", + "\n", + "## Overview\n", + "### Integration details\n", + "\n", + "| Class | Package | Local | Serializable | [JS support] | Package downloads | Package latest |\n", + "| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n", + "| ChatXinference| langchain-xinference | ✅ | ❌ | ✅ | ✅ | ✅ |\n", + "\n", + "### Model features\n", + "| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n", + "| :---: |:----------------------------------------------------:| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n", + "| ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ |\n", + "\n", + "## Setup\n", + "\n", + "Install `Xinference` through PyPI:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet \"xinference[all]\"" + ] + }, + { + "cell_type": "markdown", + "id": "7fb27b941602401d91542211134fc71a", + "metadata": {}, + "source": [ + "### Deploy Xinference Locally or in a Distributed Cluster.\n", + "\n", + "For local deployment, run `xinference`. \n", + "\n", + "To deploy Xinference in a cluster, first start an Xinference supervisor using the `xinference-supervisor`. You can also use the option -p to specify the port and -H to specify the host. The default port is 8080 and the default host is 0.0.0.0.\n", + "\n", + "Then, start the Xinference workers using `xinference-worker` on each server you want to run them on. \n", + "\n", + "You can consult the README file from [Xinference](https://github.com/xorbitsai/inference) for more information.\n", + "### Wrapper\n", + "\n", + "To use Xinference with LangChain, you need to first launch a model. You can use command line interface (CLI) to do so:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "acae54e37e7d407bbb7b55eff062a284", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model uid: 7167b2b0-2a04-11ee-83f0-d29396a3f064\n" + ] + } + ], + "source": [ + "%xinference launch -n vicuna-v1.3 -f ggmlv3 -q q4_0" + ] + }, + { + "cell_type": "markdown", + "id": "9a63283cbaf04dbcab1f6479b197f3a8", + "metadata": {}, + "source": [ + "A model UID is returned for you to use. Now you can use Xinference with LangChain:" + ] + }, + { + "cell_type": "markdown", + "id": "8dd0d8092fe74a7c96281538738b07e2", + "metadata": {}, + "source": [ + "## Installation\n", + "\n", + "The LangChain Xinference integration lives in the `langchain-xinference` package:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "72eea5119410473aa328ad9291626812", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -qU langchain-xinference" + ] + }, + { + "cell_type": "markdown", + "id": "8edb47106e1a46a883d545849b8ab81b", + "metadata": {}, + "source": [ + "Make sure you're using the latest Xinference version for structured outputs." + ] + }, + { + "cell_type": "markdown", + "id": "55935996", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "Now we can instantiate our model object and generate chat completions:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "10185d26023b46108eb7d9f57d49d2b3", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_xinference.chat_models import ChatXinference\n", + "\n", + "llm = ChatXinference(\n", + " server_url=\"your_server_url\", model_uid=\"7167b2b0-2a04-11ee-83f0-d29396a3f064\"\n", + ")\n", + "\n", + "llm.invoke(\n", + " \"Q: where can we visit in the capital of France?\",\n", + " config={\"max_tokens\": 1024},\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "8763a12b2bbd4a93a75aff182afb95dc", + "metadata": {}, + "source": [ + "## Invocation" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "50f6c0e4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain_core.messages import HumanMessage, SystemMessage\n", + "from langchain_xinference.chat_models import ChatXinference\n", + "\n", + "llm = ChatXinference(\n", + " server_url=\"your_server_url\", model_uid=\"7167b2b0-2a04-11ee-83f0-d29396a3f064\"\n", + ")\n", + "\n", + "system_message = \"You are a helpful assistant that translates English to French. Translate the user sentence.\"\n", + "human_message = \"I love programming.\"\n", + "\n", + "llm.invoke([HumanMessage(content=human_message), SystemMessage(content=system_message)])" + ] + }, + { + "cell_type": "markdown", + "id": "7623eae2785240b9bd12b16a66d81610", + "metadata": {}, + "source": [ + "## Chaining\n", + "\n", + "We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "317d05c6", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.prompts import PromptTemplate\n", + "from langchain_xinference.chat_models import ChatXinference\n", + "\n", + "prompt = PromptTemplate(\n", + " input=[\"country\"], template=\"Q: where can we visit in the capital of {country}? A:\"\n", + ")\n", + "\n", + "llm = ChatXinference(\n", + " server_url=\"your_server_url\", model_uid=\"7167b2b0-2a04-11ee-83f0-d29396a3f064\"\n", + ")\n", + "\n", + "chain = prompt | llm\n", + "chain.invoke(input={\"country\": \"France\"})\n", + "chain.stream(input={\"country\": \"France\"})" + ] + }, + { + "cell_type": "markdown", + "id": "66f7fb26", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all ChatXinference features and configurations head to the API reference: https://github.com/TheSongg/langchain-xinference" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/docs/integrations/providers/xinference.mdx b/docs/docs/integrations/providers/xinference.mdx index 97f32b6c1a0..88a471f3b44 100644 --- a/docs/docs/integrations/providers/xinference.mdx +++ b/docs/docs/integrations/providers/xinference.mdx @@ -99,4 +99,23 @@ For more information and detailed examples, refer to the Xinference also supports embedding queries and documents. See [example for xinference embeddings](/docs/integrations/text_embedding/xinference) -for a more detailed demo. \ No newline at end of file +for a more detailed demo. + + +### Xinference LangChain partner package install +Install the integration package with: +```bash +pip install langchain-xinference +``` +## Chat Models + +```python +from langchain_xinference.chat_models import ChatXinference +``` + +## LLM + +```python +from langchain_xinference.llms import Xinference +``` + diff --git a/libs/packages.yml b/libs/packages.yml index 8b807f75a6a..32060ce8521 100644 --- a/libs/packages.yml +++ b/libs/packages.yml @@ -516,3 +516,7 @@ packages: - name: langchain-agentql path: langchain repo: tinyfish-io/agentql-integrations +- name: langchain-xinference + path: . + repo: TheSongg/langchain-xinference +