feat: adding paygo api support for Azure ML / Azure AI Studio (#14560)

- **Description:** Introducing support for LLMs and Chat models running in Azure AI studio and Azure ML using the new deployment mode pay-as-you-go (model as a service). - **Issue:** NA - **Dependencies:** None. - **Tag maintainer:** @prakharg-msft @gdyre - **Twitter handle:** @santiagofacundo Examples added: * [docs/docs/integrations/llms/azure_ml.ipynb](https://github.com/santiagxf/langchain/blob/santiagxf/azureml-endpoints-paygo-community/docs/docs/integrations/chat/azureml_endpoint.ipynb) * [docs/docs/integrations/chat/azureml_chat_endpoint.ipynb](https://github.com/santiagxf/langchain/blob/santiagxf/azureml-endpoints-paygo-community/docs/docs/integrations/chat/azureml_chat_endpoint.ipynb) --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2025-09-09 15:03:21 +00:00 · 2024-01-23 22:08:51 -03:00
parent 9ce177580a
commit 92e6a641fd
6 changed files with 630 additions and 206 deletions
--- a/docs/docs/integrations/chat/azureml_chat_endpoint.ipynb
+++ b/docs/docs/integrations/chat/azureml_chat_endpoint.ipynb
@@ -15,9 +15,9 @@
   "source": [
    "# AzureMLChatOnlineEndpoint\n",
    "\n",
-    ">[Azure Machine Learning](https://azure.microsoft.com/en-us/products/machine-learning/) is a platform used to build, train, and deploy machine learning models. Users can explore the types of models to deploy in the Model Catalog, which provides Azure Foundation Models and OpenAI Models. `Azure Foundation Models` include various open-source models and popular Hugging Face models. Users can also import models of their liking into AzureML.\n",
+    ">[Azure Machine Learning](https://azure.microsoft.com/en-us/products/machine-learning/) is a platform used to build, train, and deploy machine learning models. Users can explore the types of models to deploy in the Model Catalog, which provides foundational and general purpose models from different providers.\n",
    ">\n",
-    ">[Azure Machine Learning Online Endpoints](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints). After you train machine learning models or pipelines, you need to deploy them to production so that others can use them for inference. Inference is the process of applying new input data to the machine learning model or pipeline to generate outputs. While these outputs are typically referred to as \"predictions,\" inferencing can be used to generate outputs for other machine learning tasks, such as classification and clustering. In `Azure Machine Learning`, you perform inferencing by using endpoints and deployments. `Endpoints` and `Deployments` allow you to decouple the interface of your production workload from the implementation that serves it.\n",
+    ">In general, you need to deploy models in order to consume its predictions (inference). In `Azure Machine Learning`, [Online Endpoints](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints) are used to deploy these models with a real-time serving. They are based on the ideas of `Endpoints` and `Deployments` which allow you to decouple the interface of your production workload from the implementation that serves it.\n",
    "\n",
    "This notebook goes over how to use a chat model hosted on an `Azure Machine Learning Endpoint`."
   ]
@@ -37,10 +37,11 @@
   "source": [
    "## Set up\n",
    "\n",
-    "To use the wrapper, you must [deploy a model on AzureML](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-foundation-models?view=azureml-api-2#deploying-foundation-models-to-endpoints-for-inferencing) and obtain the following parameters:\n",
+    "You must [deploy a model on Azure ML](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-foundation-models?view=azureml-api-2#deploying-foundation-models-to-endpoints-for-inferencing) or [to Azure AI studio](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-open) and obtain the following parameters:\n",
    "\n",
-    "* `endpoint_api_key`: The API key provided by the endpoint\n",
-    "* `endpoint_url`: The REST endpoint url provided by the endpoint"
+    "* `endpoint_url`: The REST endpoint url provided by the endpoint.\n",
+    "* `endpoint_api_type`: Use `endpoint_type='realtime'` when deploying models to **Realtime endpoints** (hosted managed infrastructure). Use `endpoint_type='serverless'` when deploying models using the **Pay-as-you-go** offering (model as a service).\n",
+    "* `endpoint_api_key`: The API key provided by the endpoint"
   ]
  },
  {
@@ -51,7 +52,40 @@
    "\n",
    "The `content_formatter` parameter is a handler class for transforming the request and response of an AzureML endpoint to match with required schema. Since there are a wide range of models in the model catalog, each of which may process data differently from one another, a `ContentFormatterBase` class is provided to allow users to transform data to their liking. The following content formatters are provided:\n",
    "\n",
-    "* `LLamaContentFormatter`: Formats request and response data for LLaMa2-chat"
+    "* `LLamaChatContentFormatter`: Formats request and response data for LLaMa2-chat\n",
+    "\n",
+    "*Note: `langchain.chat_models.azureml_endpoint.LLamaContentFormatter` is being deprecated and replaced with `langchain.chat_models.azureml_endpoint.LLamaChatContentFormatter`.*\n",
+    "\n",
+    "You can implement custom content formatters specific for your model deriving from the class `langchain_community.llms.azureml_endpoint.ContentFormatterBase`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Examples\n",
+    "\n",
+    "The following section cotain examples about how to use this class:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.schema import HumanMessage\n",
+    "from langchain_community.chat_models.azureml_endpoint import (\n",
+    "    AzureMLEndpointApiType,\n",
+    "    LlamaChatContentFormatter,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example: Chat completions with real-time endpoints"
   ]
  },
  {
@@ -76,11 +110,79 @@
    "\n",
    "chat = AzureMLChatOnlineEndpoint(\n",
    "    endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/score\",\n",
+    "    endpoint_api_type=AzureMLEndpointApiType.realtime,\n",
    "    endpoint_api_key=\"my-api-key\",\n",
-    "    content_formatter=LlamaContentFormatter,\n",
+    "    content_formatter=LlamaChatContentFormatter(),\n",
    ")\n",
-    "response = chat(\n",
-    "    messages=[HumanMessage(content=\"Will the Collatz conjecture ever be solved?\")]\n",
+    "response = chat.invoke(\n",
+    "    [HumanMessage(content=\"Will the Collatz conjecture ever be solved?\")]\n",
+    ")\n",
+    "response"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example: Chat completions with pay-as-you-go deployments (model as a service)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chat = AzureMLChatOnlineEndpoint(\n",
+    "    endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/v1/chat/completions\",\n",
+    "    endpoint_api_type=AzureMLEndpointApiType.serverless,\n",
+    "    endpoint_api_key=\"my-api-key\",\n",
+    "    content_formatter=LlamaChatContentFormatter,\n",
+    ")\n",
+    "response = chat.invoke(\n",
+    "    [HumanMessage(content=\"Will the Collatz conjecture ever be solved?\")]\n",
+    ")\n",
+    "response"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you need to pass additional parameters to the model, use `model_kwards` argument:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chat = AzureMLChatOnlineEndpoint(\n",
+    "    endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/v1/chat/completions\",\n",
+    "    endpoint_api_type=AzureMLEndpointApiType.serverless,\n",
+    "    endpoint_api_key=\"my-api-key\",\n",
+    "    content_formatter=LlamaChatContentFormatter,\n",
+    "    model_kwargs={\"temperature\": 0.8},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Parameters can also be passed during invocation:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = chat.invoke(\n",
+    "    [HumanMessage(content=\"Will the Collatz conjecture ever be solved?\")],\n",
+    "    max_tokens=512,\n",
    ")\n",
    "response"
   ]
--- a/docs/docs/integrations/llms/azure_ml.ipynb
+++ b/docs/docs/integrations/llms/azure_ml.ipynb
@@ -6,9 +6,9 @@
   "source": [
    "# Azure ML\n",
    "\n",
-    "[Azure ML](https://azure.microsoft.com/en-us/products/machine-learning/) is a platform used to build, train, and deploy machine learning models. Users can explore the types of models to deploy in the Model Catalog, which provides Azure Foundation Models and OpenAI Models. Azure Foundation Models include various open-source models and popular Hugging Face models. Users can also import models of their liking into AzureML.\n",
+    "[Azure ML](https://azure.microsoft.com/en-us/products/machine-learning/) is a platform used to build, train, and deploy machine learning models. Users can explore the types of models to deploy in the Model Catalog, which provides foundational and general purpose models from different providers.\n",
    "\n",
-    "This notebook goes over how to use an LLM hosted on an `AzureML online endpoint`"
+    "This notebook goes over how to use an LLM hosted on an `Azure ML Online Endpoint`."
   ]
  },
  {
@@ -26,11 +26,12 @@
   "source": [
    "## Set up\n",
    "\n",
-    "To use the wrapper, you must [deploy a model on AzureML](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-foundation-models?view=azureml-api-2#deploying-foundation-models-to-endpoints-for-inferencing) and obtain the following parameters:\n",
+    "You must [deploy a model on Azure ML](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-foundation-models?view=azureml-api-2#deploying-foundation-models-to-endpoints-for-inferencing) or [to Azure AI studio](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-open) and obtain the following parameters:\n",
    "\n",
-    "* `endpoint_api_key`: Required - The API key provided by the endpoint\n",
-    "* `endpoint_url`: Required - The REST endpoint url provided by the endpoint\n",
-    "* `deployment_name`: Not required - The deployment name of the model using the endpoint"
+    "* `endpoint_url`: The REST endpoint url provided by the endpoint.\n",
+    "* `endpoint_api_type`: Use `endpoint_type='realtime'` when deploying models to **Realtime endpoints** (hosted managed infrastructure). Use `endpoint_type='serverless'` when deploying models using the **Pay-as-you-go** offering (model as a service).\n",
+    "* `endpoint_api_key`: The API key provided by the endpoint.\n",
+    "* `deployment_name`: (Optional) The deployment name of the model using the endpoint."
   ]
  },
  {
@@ -46,31 +47,107 @@
    "* `HFContentFormatter`: Formats request and response data for text-generation Hugging Face models\n",
    "* `LLamaContentFormatter`: Formats request and response data for LLaMa2\n",
    "\n",
-    "*Note: `OSSContentFormatter` is being deprecated and replaced with `GPT2ContentFormatter`. The logic is the same but `GPT2ContentFormatter` is a more suitable name. You can still continue to use `OSSContentFormatter` as the changes are backwards compatible.*\n",
-    "\n",
-    "Below is an example using a summarization model from Hugging Face."
+    "*Note: `OSSContentFormatter` is being deprecated and replaced with `GPT2ContentFormatter`. The logic is the same but `GPT2ContentFormatter` is a more suitable name. You can still continue to use `OSSContentFormatter` as the changes are backwards compatible.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Custom Content Formatter"
+    "## Examples"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example: LlaMa 2 completions with real-time endpoints"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "HaSeul won her first music show trophy with \"So What\" on Mnet's M Countdown. Loona released their second EP titled [#] (read as hash] on February 5, 2020. HaSeul did not take part in the promotion of the album because of mental health issues. On October 19, 2020, they released their third EP called [12:00]. It was their first album to enter the Billboard 200, debuting at number 112. On June 2, 2021, the group released their fourth EP called Yummy-Yummy. On August 27, it was announced that they are making their Japanese debut on September 15 under Universal Music Japan sublabel EMI Records.\n"
-     ]
-    }
-   ],
+   "outputs": [],
+   "source": [
+    "from langchain.schema import HumanMessage\n",
+    "from langchain_community.llms.azureml_endpoint import (\n",
+    "    AzureMLEndpointApiType,\n",
+    "    LlamaContentFormatter,\n",
+    ")\n",
+    "\n",
+    "llm = AzureMLOnlineEndpoint(\n",
+    "    endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/score\",\n",
+    "    endpoint_api_type=AzureMLEndpointApiType.realtime,\n",
+    "    endpoint_api_key=\"my-api-key\",\n",
+    "    content_formatter=LlamaContentFormatter(),\n",
+    "    model_kwargs={\"temperature\": 0.8, \"max_new_tokens\": 400},\n",
+    ")\n",
+    "response = llm.invoke(\"Write me a song about sparkling water:\")\n",
+    "response"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Model parameters can also be indicated during invocation:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = llm.invoke(\"Write me a song about sparkling water:\", temperature=0.5)\n",
+    "response"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example: Chat completions with pay-as-you-go deployments (model as a service)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.schema import HumanMessage\n",
+    "from langchain_community.llms.azureml_endpoint import (\n",
+    "    AzureMLEndpointApiType,\n",
+    "    LlamaContentFormatter,\n",
+    ")\n",
+    "\n",
+    "llm = AzureMLOnlineEndpoint(\n",
+    "    endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/v1/completions\",\n",
+    "    endpoint_api_type=AzureMLEndpointApiType.serverless,\n",
+    "    endpoint_api_key=\"my-api-key\",\n",
+    "    content_formatter=LlamaContentFormatter(),\n",
+    "    model_kwargs={\"temperature\": 0.8, \"max_new_tokens\": 400},\n",
+    ")\n",
+    "response = llm.invoke(\"Write me a song about sparkling water:\")\n",
+    "response"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example: Custom content formatter\n",
+    "\n",
+    "Below is an example using a summarization model from Hugging Face."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
   "source": [
    "import json\n",
    "import os\n",
@@ -104,6 +181,7 @@
    "content_formatter = CustomFormatter()\n",
    "\n",
    "llm = AzureMLOnlineEndpoint(\n",
+    "    endpoint_api_type=\"realtime\",\n",
    "    endpoint_api_key=os.getenv(\"BART_ENDPOINT_API_KEY\"),\n",
    "    endpoint_url=os.getenv(\"BART_ENDPOINT_URL\"),\n",
    "    model_kwargs={\"temperature\": 0.8, \"max_new_tokens\": 400},\n",
@@ -132,7 +210,7 @@
    "that Loona will release the double A-side single, \"Hula Hoop / Star Seed\" on September 15, with a physical CD release on October \n",
    "20.[53] In December, Chuu filed an injunction to suspend her exclusive contract with Blockberry Creative.[54][55]\n",
    "\"\"\"\n",
-    "summarized_text = llm(large_text)\n",
+    "summarized_text = llm.invoke(large_text)\n",
    "print(summarized_text)"
   ]
  },
@@ -140,22 +218,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Dolly with LLMChain"
+    "### Example: Dolly with LLMChain"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Many people are willing to talk about themselves; it's others who seem to be stuck up. Try to understand others where they're coming from. Like minded people can build a tribe together.\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
    "from langchain.chains import LLMChain\n",
    "from langchain.prompts import PromptTemplate\n",
@@ -177,31 +247,22 @@
    ")\n",
    "\n",
    "chain = LLMChain(llm=llm, prompt=prompt)\n",
-    "print(chain.run({\"word_count\": 100, \"topic\": \"how to make friends\"}))"
+    "print(chain.invoke({\"word_count\": 100, \"topic\": \"how to make friends\"}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Serializing an LLM\n",
+    "## Serializing an LLM\n",
    "You can also save and load LLM configurations"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\u001b[1mAzureMLOnlineEndpoint\u001b[0m\n",
-      "Params: {'deployment_name': 'databricks-dolly-v2-12b-4', 'model_kwargs': {'temperature': 0.2, 'max_tokens': 150, 'top_p': 0.8, 'frequency_penalty': 0.32, 'presence_penalty': 0.072}}\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
    "from langchain_community.llms.loading import load_llm\n",
    "\n",
@@ -224,9 +285,9 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "langchain",
   "language": "python",
-   "name": "python3"
+   "name": "langchain"
  },
  "language_info": {
   "codemirror_mode": {
@@ -238,7 +299,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.11.5"
  }
 },
 "nbformat": 4,