docs: update huggingface inference to latest usage (#31906)

This PR updates the doc on Hugging Face's inference offering from
'inference API' to 'inference providers'

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
This commit is contained in:
burtenshaw 2025-07-08 16:35:10 +02:00 committed by GitHub
parent b8e2420865
commit 4e513539f8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 77 additions and 32 deletions

View File

@ -120,7 +120,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -138,11 +138,36 @@
"from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint\n",
"\n",
"llm = HuggingFaceEndpoint(\n",
" repo_id=\"HuggingFaceH4/zephyr-7b-beta\",\n",
" repo_id=\"deepseek-ai/DeepSeek-R1-0528\",\n",
" task=\"text-generation\",\n",
" max_new_tokens=512,\n",
" do_sample=False,\n",
" repetition_penalty=1.03,\n",
" provider=\"auto\", # let Hugging Face choose the best provider for you\n",
")\n",
"\n",
"chat_model = ChatHuggingFace(llm=llm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's take advantage of [Inference Providers](https://huggingface.co/docs/inference-providers) to run the model on specific third-party providers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = HuggingFaceEndpoint(\n",
" repo_id=\"deepseek-ai/DeepSeek-R1-0528\",\n",
" task=\"text-generation\",\n",
" provider=\"hyperbolic\", # set your provider here\n",
" # provider=\"nebius\",\n",
" # provider=\"together\",\n",
")\n",
"\n",
"chat_model = ChatHuggingFace(llm=llm)"

View File

@ -117,7 +117,7 @@
"source": [
"## Examples\n",
"\n",
"Here is an example of how you can access `HuggingFaceEndpoint` integration of the free [Serverless Endpoints](https://huggingface.co/inference-endpoints/serverless) API."
"Here is an example of how you can access `HuggingFaceEndpoint` integration of the serverless [Inference Providers](https://huggingface.co/docs/inference-providers) API.\n"
]
},
{
@ -128,13 +128,17 @@
},
"outputs": [],
"source": [
"repo_id = \"mistralai/Mistral-7B-Instruct-v0.2\"\n",
"repo_id = \"deepseek-ai/DeepSeek-R1-0528\"\n",
"\n",
"llm = HuggingFaceEndpoint(\n",
" repo_id=repo_id,\n",
" max_length=128,\n",
" temperature=0.5,\n",
" huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,\n",
" provider=\"auto\", # set your provider here hf.co/settings/inference-providers\n",
" # provider=\"hyperbolic\",\n",
" # provider=\"nebius\",\n",
" # provider=\"together\",\n",
")\n",
"llm_chain = prompt | llm\n",
"print(llm_chain.invoke({\"question\": question}))"

View File

@ -1,6 +1,11 @@
# Hugging Face
All functionality related to the [Hugging Face Platform](https://huggingface.co/).
All functionality related to [Hugging Face Hub](https://huggingface.co/) and libraries like [transformers](https://huggingface.co/docs/transformers/index), [sentence transformers](https://sbert.net/), and [datasets](https://huggingface.co/docs/datasets/index).
> [Hugging Face](https://huggingface.co/) is an AI platform with all major open source models, datasets, MCPs, and demos.
> It supplies model inference locally and via serverless [Inference Providers](https://huggingface.co/docs/inference-providers).
>
> You can use [Inference Providers](https://huggingface.co/docs/inference-providers) to run open source models like DeepSeek R1 on scalable serverless infrastructure.
## Installation
@ -26,6 +31,7 @@ from langchain_huggingface import ChatHuggingFace
### HuggingFaceEndpoint
We can use the `HuggingFaceEndpoint` class to run open source models via serverless [Inference Providers](https://huggingface.co/docs/inference-providers) or via dedicated [Inference Endpoints](https://huggingface.co/inference-endpoints/dedicated).
See a [usage example](/docs/integrations/llms/huggingface_endpoint).
@ -35,7 +41,7 @@ from langchain_huggingface import HuggingFaceEndpoint
### HuggingFacePipeline
Hugging Face models can be run locally through the `HuggingFacePipeline` class.
We can use the `HuggingFacePipeline` class to run open source models locally.
See a [usage example](/docs/integrations/llms/huggingface_pipelines).
@ -47,6 +53,8 @@ from langchain_huggingface import HuggingFacePipeline
### HuggingFaceEmbeddings
We can use the `HuggingFaceEmbeddings` class to run open source embedding models locally.
See a [usage example](/docs/integrations/text_embedding/huggingfacehub).
```python
@ -55,6 +63,8 @@ from langchain_huggingface import HuggingFaceEmbeddings
### HuggingFaceEndpointEmbeddings
We can use the `HuggingFaceEndpointEmbeddings` class to run open source embedding models via a dedicated [Inference Endpoint](https://huggingface.co/inference-endpoints/dedicated).
See a [usage example](/docs/integrations/text_embedding/huggingfacehub).
```python
@ -63,6 +73,8 @@ from langchain_huggingface import HuggingFaceEndpointEmbeddings
### HuggingFaceInferenceAPIEmbeddings
We can use the `HuggingFaceInferenceAPIEmbeddings` class to run open source embedding models via [Inference Providers](https://huggingface.co/docs/inference-providers).
See a [usage example](/docs/integrations/text_embedding/huggingfacehub).
```python
@ -71,6 +83,8 @@ from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
### HuggingFaceInstructEmbeddings
We can use the `HuggingFaceInstructEmbeddings` class to run open source embedding models locally.
See a [usage example](/docs/integrations/text_embedding/instruct_embeddings).
```python

View File

@ -95,35 +95,36 @@
"id": "92019ef1-5d30-4985-b4e6-c0d98bdfe265",
"metadata": {},
"source": [
"## Hugging Face Inference API\n",
"We can also access embedding models via the Hugging Face Inference API, which does not require us to install ``sentence_transformers`` and download models locally."
"## Hugging Face Inference Providers\n",
"\n",
"We can also access embedding models via the [Inference Providers](https://huggingface.co/docs/inference-providers), which let's us use open source models on scalable serverless infrastructure.\n",
"\n",
"First, we need to get a read-only API key from [Hugging Face](https://huggingface.co/settings/tokens).\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "66f5c6ba-1446-43e1-b012-800d17cef300",
"execution_count": null,
"id": "c5576a6c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Enter your HF Inference API Key:\n",
"\n",
" ········\n"
]
}
],
"outputs": [],
"source": [
"import getpass\n",
"from getpass import getpass\n",
"\n",
"inference_api_key = getpass.getpass(\"Enter your HF Inference API Key:\\n\\n\")"
"huggingfacehub_api_token = getpass()"
]
},
{
"cell_type": "markdown",
"id": "3ad10337",
"metadata": {},
"source": [
"Now we can use the `HuggingFaceInferenceAPIEmbeddings` class to run open source embedding models via [Inference Providers](https://huggingface.co/docs/inference-providers)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"id": "d0623c1f-cd82-4862-9bce-3655cb9b66ac",
"metadata": {},
"outputs": [
@ -139,10 +140,11 @@
}
],
"source": [
"from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings\n",
"from langchain_huggingface import HuggingFaceInferenceAPIEmbeddings\n",
"\n",
"embeddings = HuggingFaceInferenceAPIEmbeddings(\n",
" api_key=inference_api_key, model_name=\"sentence-transformers/all-MiniLM-l6-v2\"\n",
" api_key=huggingfacehub_api_token,\n",
" model_name=\"sentence-transformers/all-MiniLM-l6-v2\",\n",
")\n",
"\n",
"query_result = embeddings.embed_query(text)\n",