Add OpenLLM wrapper(#6578)

LLM wrapper for models served with OpenLLM --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> Authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Chaoyu <paranoyang@gmail.com>
2025-08-08 12:31:49 +00:00 · 2023-06-22 01:18:14 -07:00 · 2023-06-22 01:18:14 -07:00 · 4fabd02d25
commit 4fabd02d25
parent d718f3b6d0
9 changed files with 1227 additions and 98 deletions
--- a/docs/extras/ecosystem/integrations/openllm.mdx
+++ b/docs/extras/ecosystem/integrations/openllm.mdx
@ -0,0 +1,70 @@
 # OpenLLM
 This page demonstrates how to use [OpenLLM](https://github.com/bentoml/OpenLLM)
 with LangChain.
 `OpenLLM` is an open platform for operating large language models (LLMs) in
 production. It enables developers to easily run inference with any open-source
 LLMs, deploy to the cloud or on-premises, and build powerful AI apps.
 ## Installation and Setup
 Install the OpenLLM package via PyPI:
 ```bash
 pip install openllm
 ```
 ## LLM
 OpenLLM supports a wide range of open-source LLMs as well as serving users' own
 fine-tuned LLMs. Use `openllm model` command to see all available models that
 are pre-optimized for OpenLLM.
 ## Wrappers
 There is a OpenLLM Wrapper which supports loading LLM in-process or accessing a
 remote OpenLLM server:
 ```python
 from langchain.llms import OpenLLM
 ```
 ### Wrapper for OpenLLM server
 This wrapper supports connecting to an OpenLLM server via HTTP or gRPC. The
 OpenLLM server can run either locally or on the cloud.
 To try it out locally, start an OpenLLM server:
 ```bash
 openllm start flan-t5
 ```
 Wrapper usage:
 ```python
 from langchain.llms import OpenLLM
 llm = OpenLLM(server_url='http://localhost:3000')
 llm("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")
 ```
 ### Wrapper for Local Inference
 You can also use the OpenLLM wrapper to load LLM in current Python process for
 running inference.
 ```python
 from langchain.llms import OpenLLM
 llm = OpenLLM(model_name="dolly-v2", model_id='databricks/dolly-v2-7b')
 llm("What is the difference between a duck and a goose? And why there are so many Goose in Canada?")
 ```
 ### Usage
 For a more detailed walkthrough of the OpenLLM Wrapper, see the
 [example notebook](../modules/models/llms/integrations/openllm.ipynb)
--- a/docs/extras/guides/deployments/index.mdx
+++ b/docs/extras/guides/deployments/index.mdx
@ -21,7 +21,8 @@ This guide aims to provide a comprehensive overview of the requirements for depl
 Understanding these components is crucial when assessing serving systems. LangChain integrates with several open-source projects designed to tackle these issues, providing a robust framework for productionizing your LLM applications. Some notable frameworks include:
 - [Ray Serve](/docs/ecosystem/integrations/ray_serve.html)
- [BentoML](https://github.com/ssheng/BentoChain)
+- [BentoML](https://github.com/bentoml/BentoML)
 - [OpenLLM](/docs/ecosystem/integrations/openllm.html)
 - [Modal](/docs/ecosystem/integrations/modal.html)
 These links will provide further information on each ecosystem, assisting you in finding the best fit for your LLM deployment needs.
--- a/docs/extras/guides/deployments/template_repos.mdx
+++ b/docs/extras/guides/deployments/template_repos.mdx
@ -67,6 +67,11 @@ This repository allows users to serve local chains and agents as RESTful, gRPC,
 This repository provides an example of how to deploy a LangChain application with [BentoML](https://github.com/bentoml/BentoML). BentoML is a framework that enables the containerization of machine learning applications as standard OCI images. BentoML also allows for the automatic generation of OpenAPI and gRPC endpoints. With BentoML, you can integrate models from all popular ML frameworks and deploy them as microservices running on the most optimal hardware and scaling independently.
 ## [OpenLLM](https://github.com/bentoml/OpenLLM)
 OpenLLM is a platform for operating large language models (LLMs) in production. With OpenLLM, you can run inference with any open-source LLM, deploy to the cloud or on-premises, and build powerful AI apps. It supports a wide range of open-source LLMs, offers flexible APIs, and first-class support for LangChain and BentoML.
 See OpenLLM's [integration doc](https://github.com/bentoml/OpenLLM#%EF%B8%8F-integrations) for usage with LangChain.
 ## [Databutton](https://databutton.com/home?new-data-app=true)
 These templates serve as examples of how to build, deploy, and share LangChain applications using Databutton. You can create user interfaces with Streamlit, automate tasks by scheduling Python code, and store files and data in the built-in store. Examples include a Chatbot interface with conversational memory, a Personal search engine, and a starter template for LangChain apps. Deploying and sharing is just one click away.
--- a/docs/extras/modules/model_io/models/llms/integrations/openllm.ipynb
+++ b/docs/extras/modules/model_io/models/llms/integrations/openllm.ipynb
@ -0,0 +1,159 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "026cc336",
   "metadata": {},
   "source": [
    "# OpenLLM\n",
    "\n",
    "[🦾 OpenLLM](https://github.com/bentoml/OpenLLM) is an open platform for operating large language models (LLMs) in production. It enables developers to easily run inference with any open-source LLMs, deploy to the cloud or on-premises, and build powerful AI apps."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da0ddca1",
   "metadata": {},
   "source": [
    "## Installation\n",
    "\n",
    "Install `openllm` through [PyPI](https://pypi.org/project/openllm/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6601c03b",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install openllm"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90174fe3",
   "metadata": {},
   "source": [
    "## Launch OpenLLM server locally\n",
    "\n",
    "To start an LLM server, use `openllm start` command. For example, to start a dolly-v2 server, run the following command from a terminal:\n",
    "\n",
    "```bash\n",
    "openllm start dolly-v2\n",
    "```\n",
    "\n",
    "\n",
    "## Wrapper"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35b6bf60",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenLLM\n",
    "\n",
    "server_url = \"http://localhost:3000\" # Replace with remote host if you are running on a remote server \n",
    "llm = OpenLLM(server_url=server_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f830f9d",
   "metadata": {},
   "source": [
    "### Optional: Local LLM Inference\n",
    "\n",
    "You may also choose to initialize an LLM managed by OpenLLM locally from current process. This is useful for development purpose and allows developers to quickly try out different types of LLMs.\n",
    "\n",
    "When moving LLM applications to production, we recommend deploying the OpenLLM server separately and access via the `server_url` option demonstrated above.\n",
    "\n",
    "To load an LLM locally via the LangChain wrapper:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82c392b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms import OpenLLM\n",
    "\n",
    "llm = OpenLLM(\n",
    "    model_name=\"dolly-v2\",\n",
    "    model_id=\"databricks/dolly-v2-3b\",\n",
    "    temperature=0.94,\n",
    "    repetition_penalty=1.2,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f15ebe0d",
   "metadata": {},
   "source": [
    "### Integrate with a LLMChain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "8b02a97a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "iLkb\n"
     ]
    }
   ],
   "source": [
    "from langchain import PromptTemplate, LLMChain\n",
    "\n",
    "template = \"What is a good name for a company that makes {product}?\"\n",
    "\n",
    "prompt = PromptTemplate(template=template, input_variables=[\"product\"])\n",
    "\n",
    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
    "\n",
    "generated = llm_chain.run(product=\"mechanical keyboard\")\n",
    "print(generated)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "56cb4bc0",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/langchain/llms/init.py
+++ b/langchain/llms/init.py
@ -32,6 +32,7 @@ from langchain.llms.modal import Modal
 from langchain.llms.mosaicml import MosaicML
 from langchain.llms.nlpcloud import NLPCloud
 from langchain.llms.openai import AzureOpenAI, OpenAI, OpenAIChat
 from langchain.llms.openllm import OpenLLM
 from langchain.llms.openlm import OpenLM
 from langchain.llms.petals import Petals
 from langchain.llms.pipelineai import PipelineAI
@ -81,6 +82,7 @@ __all__ = [
    "NLPCloud",
    "OpenAI",
    "OpenAIChat",
    "OpenLLM",
    "OpenLM",
    "Petals",
    "PipelineAI",
@ -138,5 +140,7 @@ type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
    "self_hosted_hugging_face": SelfHostedHuggingFaceLLM,
    "stochasticai": StochasticAI,
    "vertexai": VertexAI,
    "openllm": OpenLLM,
    "openllm_client": OpenLLM,
    "writer": Writer,
 }
--- a/langchain/llms/openllm.py
+++ b/langchain/llms/openllm.py
@ -0,0 +1,307 @@
 """Wrapper around OpenLLM APIs."""
 from __future__ import annotations
 import copy
 import json
 import logging
 from typing import (
    TYPE_CHECKING,
    Any,
    Dict,
    List,
    Literal,
    Optional,
    TypedDict,
    Union,
    overload,
 )
 from pydantic import PrivateAttr
 from langchain.callbacks.manager import (
    AsyncCallbackManagerForLLMRun,
    CallbackManagerForLLMRun,
 )
 from langchain.llms.base import LLM
 if TYPE_CHECKING:
    import openllm
 ServerType = Literal["http", "grpc"]
 class IdentifyingParams(TypedDict):
    model_name: str
    model_id: Optional[str]
    server_url: Optional[str]
    server_type: Optional[ServerType]
    embedded: bool
    llm_kwargs: Dict[str, Any]
 logger = logging.getLogger(__name__)
 class OpenLLM(LLM):
    """Wrapper for accessing OpenLLM, supporting both in-process model
    instance and remote OpenLLM servers.
    To use, you should have the openllm library installed:
    .. code-block:: bash
        pip install openllm
    Learn more at: https://github.com/bentoml/openllm
    Example running an LLM model locally managed by OpenLLM:
        .. code-block:: python
            from langchain.llms import OpenLLM
            llm = OpenLLM(
                model_name='flan-t5',
                model_id='google/flan-t5-large',
            )
            llm("What is the difference between a duck and a goose?")
    For all available supported models, you can run 'openllm models'.
    If you have a OpenLLM server running, you can also use it remotely:
        .. code-block:: python
            from langchain.llms import OpenLLM
            llm = OpenLLM(server_url='http://localhost:3000')
            llm("What is the difference between a duck and a goose?")
    """
    model_name: Optional[str] = None
    """Model name to use. See 'openllm models' for all available models."""
    model_id: Optional[str] = None
    """Model Id to use. If not provided, will use the default model for the model name.
    See 'openllm models' for all available model variants."""
    server_url: Optional[str] = None
    """Optional server URL that currently runs a LLMServer with 'openllm start'."""
    server_type: ServerType = "http"
    """Optional server type. Either 'http' or 'grpc'."""
    embedded: bool = True
    """Initialize this LLM instance in current process by default. Should 
    only set to False when using in conjunction with BentoML Service."""
    llm_kwargs: Dict[str, Any]
    """Key word arguments to be passed to openllm.LLM"""
    _runner: Optional[openllm.LLMRunner] = PrivateAttr(default=None)
    _client: Union[
        openllm.client.HTTPClient, openllm.client.GrpcClient, None
    ] = PrivateAttr(default=None)
    class Config:
        extra = "forbid"
    @overload
    def __init__(
        self,
        model_name: Optional[str] = ...,
        *,
        model_id: Optional[str] = ...,
        embedded: Literal[True, False] = ...,
        **llm_kwargs: Any,
    ) -> None:
        ...
    @overload
    def __init__(
        self,
        *,
        server_url: str = ...,
        server_type: Literal["grpc", "http"] = ...,
        **llm_kwargs: Any,
    ) -> None:
        ...
    def __init__(
        self,
        model_name: Optional[str] = None,
        *,
        model_id: Optional[str] = None,
        server_url: Optional[str] = None,
        server_type: Literal["grpc", "http"] = "http",
        embedded: bool = True,
        **llm_kwargs: Any,
    ):
        try:
            import openllm
        except ImportError as e:
            raise ImportError(
                "Could not import openllm. Make sure to install it with "
                "'pip install openllm.'"
            ) from e
        llm_kwargs = llm_kwargs or {}
        if server_url is not None:
            logger.debug("'server_url' is provided, returning a openllm.Client")
            assert (
                model_id is None and model_name is None
            ), "'server_url' and {'model_id', 'model_name'} are mutually exclusive"
            client_cls = (
                openllm.client.HTTPClient
                if server_type == "http"
                else openllm.client.GrpcClient
            )
            client = client_cls(server_url)
            super().__init__(
                **{
                    "server_url": server_url,
                    "server_type": server_type,
                    "llm_kwargs": llm_kwargs,
                }
            )
            self._runner = None  # type: ignore
            self._client = client
        else:
            assert model_name is not None, "Must provide 'model_name' or 'server_url'"
            # since the LLM are relatively huge, we don't actually want to convert the
            # Runner with embedded when running the server. Instead, we will only set
            # the init_local here so that LangChain users can still use the LLM
            # in-process. Wrt to BentoML users, setting embedded=False is the expected
            # behaviour to invoke the runners remotely
            runner = openllm.Runner(
                model_name=model_name,
                model_id=model_id,
                init_local=embedded,
                **llm_kwargs,
            )
            super().__init__(
                **{
                    "model_name": model_name,
                    "model_id": model_id,
                    "embedded": embedded,
                    "llm_kwargs": llm_kwargs,
                }
            )
            self._client = None  # type: ignore
            self._runner = runner
    @property
    def runner(self) -> openllm.LLMRunner:
        """
        Get the underlying openllm.LLMRunner instance for integration with BentoML.
        Example:
        .. code-block:: python
            llm = OpenLLM(
                model_name='flan-t5',
                model_id='google/flan-t5-large',
                embedded=False,
            )
            tools = load_tools(["serpapi", "llm-math"], llm=llm)
            agent = initialize_agent(
                tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION
            )
            svc = bentoml.Service("langchain-openllm", runners=[llm.runner])
            @svc.api(input=Text(), output=Text())
            def chat(input_text: str):
                return agent.run(input_text)
        """
        if self._runner is None:
            raise ValueError("OpenLLM must be initialized locally with 'model_name'")
        return self._runner
    @property
    def _identifying_params(self) -> IdentifyingParams:
        """Get the identifying parameters."""
        if self._client is not None:
            self.llm_kwargs.update(self._client.configuration)
            model_name = self._client.model_name
            model_id = self._client.model_id
        else:
            if self._runner is None:
                raise ValueError("Runner must be initialized.")
            model_name = self.model_name
            model_id = self.model_id
            try:
                self.llm_kwargs.update(
                    json.loads(self._runner.identifying_params["configuration"])
                )
            except (TypeError, json.JSONDecodeError):
                pass
        return IdentifyingParams(
            server_url=self.server_url,
            server_type=self.server_type,
            embedded=self.embedded,
            llm_kwargs=self.llm_kwargs,
            model_name=model_name,
            model_id=model_id,
        )
    @property
    def _llm_type(self) -> str:
        return "openllm_client" if self._client else "openllm"
    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: CallbackManagerForLLMRun | None = None,
        **kwargs: Any,
    ) -> str:
        try:
            import openllm
        except ImportError as e:
            raise ImportError(
                "Could not import openllm. Make sure to install it with "
                "'pip install openllm'."
            ) from e
        copied = copy.deepcopy(self.llm_kwargs)
        copied.update(kwargs)
        config = openllm.AutoConfig.for_model(
            self._identifying_params["model_name"], **copied
        )
        if self._client:
            return self._client.query(prompt, **config.model_dump(flatten=True))
        else:
            assert self._runner is not None
            return self._runner(prompt, **config.model_dump(flatten=True))
    async def _acall(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        try:
            import openllm
        except ImportError as e:
            raise ImportError(
                "Could not import openllm. Make sure to install it with "
                "'pip install openllm'."
            ) from e
        copied = copy.deepcopy(self.llm_kwargs)
        copied.update(kwargs)
        config = openllm.AutoConfig.for_model(
            self._identifying_params["model_name"], **copied
        )
        if self._client:
            return await self._client.acall(
                "generate", prompt, **config.model_dump(flatten=True)
            )
        else:
            assert self._runner is not None
            (
                prompt,
                generate_kwargs,
                postprocess_kwargs,
            ) = self._runner.llm.sanitize_parameters(prompt, **kwargs)
            generated_result = await self._runner.generate.async_run(
                prompt, **generate_kwargs
            )
            return self._runner.llm.postprocess_generate(
                prompt, generated_result, **postprocess_kwargs
            )
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@ -108,10 +108,10 @@ nebula3-python = {version = "^3.4.0", optional = true}
 langchainplus-sdk = ">=0.0.13"
 awadb = {version = "^0.3.3", optional = true}
 azure-search-documents = {version = "11.4.0a20230509004", source = "azure-sdk-dev", optional = true}
 openllm = {version = ">=0.1.6", optional = true}
 # now streamlit requires Python >=3.7, !=3.9.7 So, it is commented out.
 #streamlit = {version = "^1.18.0", optional = true}
 [tool.poetry.group.docs.dependencies]
 autodoc_pydantic = "^1.8.0"
 myst_parser = "^0.18.1"
@ -213,7 +213,7 @@ playwright = "^1.28.0"
 setuptools = "^67.6.1"
 [tool.poetry.extras]
-llms = ["anthropic", "cohere", "openai", "openlm", "nlpcloud", "huggingface_hub", "manifest-ml", "torch", "transformers"]
+llms = ["anthropic", "cohere", "openai", "openllm", "openlm", "nlpcloud", "huggingface_hub", "manifest-ml", "torch", "transformers"]
 qdrant = ["qdrant-client"]
 openai = ["openai", "tiktoken"]
 text_helpers = ["chardet"]
--- a/tests/integration_tests/llms/test_openllm.py
+++ b/tests/integration_tests/llms/test_openllm.py
@ -0,0 +1,16 @@
 """Test OpenLLM wrapper."""
 from langchain.llms.openllm import OpenLLM
 def test_openllm_llm_local() -> None:
    llm = OpenLLM(model_name="flan-t5", model_id="google/flan-t5-small")
    output = llm("Say foo:")
    assert isinstance(output, str)
 def test_openllm_with_kwargs() -> None:
    llm = OpenLLM(
        model_name="flan-t5", model_id="google/flan-t5-small", temperature=0.84
    )
    output = llm("Say bar:")
    assert isinstance(output, str)