mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-07 14:03:26 +00:00
community[patch]: Support Streaming in Azure Machine Learning (#18246)
- [x] **PR title**: "community: Support streaming in Azure ML and few naming changes" - [x] **PR message**: - **Description:** Added support for streaming for azureml_endpoint. Also, renamed and AzureMLEndpointApiType.realtime to AzureMLEndpointApiType.dedicated. Also, added new classes CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and updated the classes LlamaChatContentFormatter and LlamaContentFormatter to now show a deprecated warning message when instantiated. --------- Co-authored-by: Sachin Paryani <saparan@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
@@ -40,7 +40,7 @@
|
||||
"You must [deploy a model on Azure ML](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-foundation-models?view=azureml-api-2#deploying-foundation-models-to-endpoints-for-inferencing) or [to Azure AI studio](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-open) and obtain the following parameters:\n",
|
||||
"\n",
|
||||
"* `endpoint_url`: The REST endpoint url provided by the endpoint.\n",
|
||||
"* `endpoint_api_type`: Use `endpoint_type='realtime'` when deploying models to **Realtime endpoints** (hosted managed infrastructure). Use `endpoint_type='serverless'` when deploying models using the **Pay-as-you-go** offering (model as a service).\n",
|
||||
"* `endpoint_api_type`: Use `endpoint_type='dedicated'` when deploying models to **Dedicated endpoints** (hosted managed infrastructure). Use `endpoint_type='serverless'` when deploying models using the **Pay-as-you-go** offering (model as a service).\n",
|
||||
"* `endpoint_api_key`: The API key provided by the endpoint"
|
||||
]
|
||||
},
|
||||
@@ -52,9 +52,9 @@
|
||||
"\n",
|
||||
"The `content_formatter` parameter is a handler class for transforming the request and response of an AzureML endpoint to match with required schema. Since there are a wide range of models in the model catalog, each of which may process data differently from one another, a `ContentFormatterBase` class is provided to allow users to transform data to their liking. The following content formatters are provided:\n",
|
||||
"\n",
|
||||
"* `LLamaChatContentFormatter`: Formats request and response data for LLaMa2-chat\n",
|
||||
"* `CustomOpenAIChatContentFormatter`: Formats request and response data for models like LLaMa2-chat that follow the OpenAI API spec for request and response.\n",
|
||||
"\n",
|
||||
"*Note: `langchain.chat_models.azureml_endpoint.LLamaContentFormatter` is being deprecated and replaced with `langchain.chat_models.azureml_endpoint.LLamaChatContentFormatter`.*\n",
|
||||
"*Note: `langchain.chat_models.azureml_endpoint.LlamaChatContentFormatter` is being deprecated and replaced with `langchain.chat_models.azureml_endpoint.CustomOpenAIChatContentFormatter`.*\n",
|
||||
"\n",
|
||||
"You can implement custom content formatters specific for your model deriving from the class `langchain_community.llms.azureml_endpoint.ContentFormatterBase`."
|
||||
]
|
||||
@@ -65,20 +65,7 @@
|
||||
"source": [
|
||||
"## Examples\n",
|
||||
"\n",
|
||||
"The following section cotain examples about how to use this class:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.chat_models.azureml_endpoint import (\n",
|
||||
" AzureMLEndpointApiType,\n",
|
||||
" LlamaChatContentFormatter,\n",
|
||||
")\n",
|
||||
"from langchain_core.messages import HumanMessage"
|
||||
"The following section contains examples about how to use this class:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -105,14 +92,17 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.chat_models.azureml_endpoint import LlamaContentFormatter\n",
|
||||
"from langchain_community.chat_models.azureml_endpoint import (\n",
|
||||
" AzureMLEndpointApiType,\n",
|
||||
" CustomOpenAIChatContentFormatter,\n",
|
||||
")\n",
|
||||
"from langchain_core.messages import HumanMessage\n",
|
||||
"\n",
|
||||
"chat = AzureMLChatOnlineEndpoint(\n",
|
||||
" endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/score\",\n",
|
||||
" endpoint_api_type=AzureMLEndpointApiType.realtime,\n",
|
||||
" endpoint_api_type=AzureMLEndpointApiType.dedicated,\n",
|
||||
" endpoint_api_key=\"my-api-key\",\n",
|
||||
" content_formatter=LlamaChatContentFormatter(),\n",
|
||||
" content_formatter=CustomOpenAIChatContentFormatter(),\n",
|
||||
")\n",
|
||||
"response = chat.invoke(\n",
|
||||
" [HumanMessage(content=\"Will the Collatz conjecture ever be solved?\")]\n",
|
||||
@@ -137,7 +127,7 @@
|
||||
" endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/v1/chat/completions\",\n",
|
||||
" endpoint_api_type=AzureMLEndpointApiType.serverless,\n",
|
||||
" endpoint_api_key=\"my-api-key\",\n",
|
||||
" content_formatter=LlamaChatContentFormatter,\n",
|
||||
" content_formatter=CustomOpenAIChatContentFormatter,\n",
|
||||
")\n",
|
||||
"response = chat.invoke(\n",
|
||||
" [HumanMessage(content=\"Will the Collatz conjecture ever be solved?\")]\n",
|
||||
@@ -149,7 +139,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you need to pass additional parameters to the model, use `model_kwards` argument:"
|
||||
"If you need to pass additional parameters to the model, use `model_kwargs` argument:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -162,7 +152,7 @@
|
||||
" endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/v1/chat/completions\",\n",
|
||||
" endpoint_api_type=AzureMLEndpointApiType.serverless,\n",
|
||||
" endpoint_api_key=\"my-api-key\",\n",
|
||||
" content_formatter=LlamaChatContentFormatter,\n",
|
||||
" content_formatter=CustomOpenAIChatContentFormatter,\n",
|
||||
" model_kwargs={\"temperature\": 0.8},\n",
|
||||
")"
|
||||
]
|
||||
@@ -204,7 +194,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@@ -29,7 +29,7 @@
|
||||
"You must [deploy a model on Azure ML](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-foundation-models?view=azureml-api-2#deploying-foundation-models-to-endpoints-for-inferencing) or [to Azure AI studio](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-open) and obtain the following parameters:\n",
|
||||
"\n",
|
||||
"* `endpoint_url`: The REST endpoint url provided by the endpoint.\n",
|
||||
"* `endpoint_api_type`: Use `endpoint_type='realtime'` when deploying models to **Realtime endpoints** (hosted managed infrastructure). Use `endpoint_type='serverless'` when deploying models using the **Pay-as-you-go** offering (model as a service).\n",
|
||||
"* `endpoint_api_type`: Use `endpoint_type='dedicated'` when deploying models to **Dedicated endpoints** (hosted managed infrastructure). Use `endpoint_type='serverless'` when deploying models using the **Pay-as-you-go** offering (model as a service).\n",
|
||||
"* `endpoint_api_key`: The API key provided by the endpoint.\n",
|
||||
"* `deployment_name`: (Optional) The deployment name of the model using the endpoint."
|
||||
]
|
||||
@@ -45,7 +45,7 @@
|
||||
"* `GPT2ContentFormatter`: Formats request and response data for GPT2\n",
|
||||
"* `DollyContentFormatter`: Formats request and response data for the Dolly-v2\n",
|
||||
"* `HFContentFormatter`: Formats request and response data for text-generation Hugging Face models\n",
|
||||
"* `LLamaContentFormatter`: Formats request and response data for LLaMa2\n",
|
||||
"* `CustomOpenAIContentFormatter`: Formats request and response data for models like LLaMa2 that follow OpenAI API compatible scheme.\n",
|
||||
"\n",
|
||||
"*Note: `OSSContentFormatter` is being deprecated and replaced with `GPT2ContentFormatter`. The logic is the same but `GPT2ContentFormatter` is a more suitable name. You can still continue to use `OSSContentFormatter` as the changes are backwards compatible.*"
|
||||
]
|
||||
@@ -72,15 +72,15 @@
|
||||
"source": [
|
||||
"from langchain_community.llms.azureml_endpoint import (\n",
|
||||
" AzureMLEndpointApiType,\n",
|
||||
" LlamaContentFormatter,\n",
|
||||
" CustomOpenAIContentFormatter,\n",
|
||||
")\n",
|
||||
"from langchain_core.messages import HumanMessage\n",
|
||||
"\n",
|
||||
"llm = AzureMLOnlineEndpoint(\n",
|
||||
" endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/score\",\n",
|
||||
" endpoint_api_type=AzureMLEndpointApiType.realtime,\n",
|
||||
" endpoint_api_type=AzureMLEndpointApiType.dedicated,\n",
|
||||
" endpoint_api_key=\"my-api-key\",\n",
|
||||
" content_formatter=LlamaContentFormatter(),\n",
|
||||
" content_formatter=CustomOpenAIContentFormatter(),\n",
|
||||
" model_kwargs={\"temperature\": 0.8, \"max_new_tokens\": 400},\n",
|
||||
")\n",
|
||||
"response = llm.invoke(\"Write me a song about sparkling water:\")\n",
|
||||
@@ -119,7 +119,7 @@
|
||||
"source": [
|
||||
"from langchain_community.llms.azureml_endpoint import (\n",
|
||||
" AzureMLEndpointApiType,\n",
|
||||
" LlamaContentFormatter,\n",
|
||||
" CustomOpenAIContentFormatter,\n",
|
||||
")\n",
|
||||
"from langchain_core.messages import HumanMessage\n",
|
||||
"\n",
|
||||
@@ -127,7 +127,7 @@
|
||||
" endpoint_url=\"https://<your-endpoint>.<your_region>.inference.ml.azure.com/v1/completions\",\n",
|
||||
" endpoint_api_type=AzureMLEndpointApiType.serverless,\n",
|
||||
" endpoint_api_key=\"my-api-key\",\n",
|
||||
" content_formatter=LlamaContentFormatter(),\n",
|
||||
" content_formatter=CustomOpenAIContentFormatter(),\n",
|
||||
" model_kwargs={\"temperature\": 0.8, \"max_new_tokens\": 400},\n",
|
||||
")\n",
|
||||
"response = llm.invoke(\"Write me a song about sparkling water:\")\n",
|
||||
@@ -181,7 +181,7 @@
|
||||
"content_formatter = CustomFormatter()\n",
|
||||
"\n",
|
||||
"llm = AzureMLOnlineEndpoint(\n",
|
||||
" endpoint_api_type=\"realtime\",\n",
|
||||
" endpoint_api_type=\"dedicated\",\n",
|
||||
" endpoint_api_key=os.getenv(\"BART_ENDPOINT_API_KEY\"),\n",
|
||||
" endpoint_url=os.getenv(\"BART_ENDPOINT_URL\"),\n",
|
||||
" model_kwargs={\"temperature\": 0.8, \"max_new_tokens\": 400},\n",
|
||||
|
Reference in New Issue
Block a user