docs: update huggingface inference to latest usage (#31906)

This PR updates the doc on Hugging Face's inference offering from 'inference API' to 'inference providers' --------- Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-12 15:59:56 +00:00 · 2025-07-08 16:35:10 +02:00 · 2025-07-08 16:35:10 +02:00 · 4e513539f8
commit 4e513539f8
parent b8e2420865
4 changed files with 77 additions and 32 deletions
--- a/docs/docs/integrations/chat/huggingface.ipynb
+++ b/docs/docs/integrations/chat/huggingface.ipynb
@ -120,7 +120,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -138,11 +138,36 @@
    "from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint\n",
    "\n",
    "llm = HuggingFaceEndpoint(\n",
-    "    repo_id=\"HuggingFaceH4/zephyr-7b-beta\",\n",
+    "    repo_id=\"deepseek-ai/DeepSeek-R1-0528\",\n",
    "    task=\"text-generation\",\n",
    "    max_new_tokens=512,\n",
    "    do_sample=False,\n",
    "    repetition_penalty=1.03,\n",
+    "    provider=\"auto\",  # let Hugging Face choose the best provider for you\n",
+    ")\n",
+    "\n",
+    "chat_model = ChatHuggingFace(llm=llm)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let's take advantage of [Inference Providers](https://huggingface.co/docs/inference-providers) to run the model on specific third-party providers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = HuggingFaceEndpoint(\n",
+    "    repo_id=\"deepseek-ai/DeepSeek-R1-0528\",\n",
+    "    task=\"text-generation\",\n",
+    "    provider=\"hyperbolic\",  # set your provider here\n",
+    "    # provider=\"nebius\",\n",
+    "    # provider=\"together\",\n",
    ")\n",
    "\n",
    "chat_model = ChatHuggingFace(llm=llm)"
--- a/docs/docs/integrations/llms/huggingface_endpoint.ipynb
+++ b/docs/docs/integrations/llms/huggingface_endpoint.ipynb
@ -117,7 +117,7 @@
   "source": [
    "## Examples\n",
    "\n",
-    "Here is an example of how you can access `HuggingFaceEndpoint` integration of the free [Serverless Endpoints](https://huggingface.co/inference-endpoints/serverless) API."
+    "Here is an example of how you can access `HuggingFaceEndpoint` integration of the serverless [Inference Providers](https://huggingface.co/docs/inference-providers) API.\n"
   ]
  },
  {
@ -128,13 +128,17 @@
   },
   "outputs": [],
   "source": [
-    "repo_id = \"mistralai/Mistral-7B-Instruct-v0.2\"\n",
+    "repo_id = \"deepseek-ai/DeepSeek-R1-0528\"\n",
    "\n",
    "llm = HuggingFaceEndpoint(\n",
    "    repo_id=repo_id,\n",
    "    max_length=128,\n",
    "    temperature=0.5,\n",
    "    huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,\n",
+    "    provider=\"auto\",  # set your provider here hf.co/settings/inference-providers\n",
+    "    # provider=\"hyperbolic\",\n",
+    "    # provider=\"nebius\",\n",
+    "    # provider=\"together\",\n",
    ")\n",
    "llm_chain = prompt | llm\n",
    "print(llm_chain.invoke({\"question\": question}))"
--- a/docs/docs/integrations/providers/huggingface.mdx
+++ b/docs/docs/integrations/providers/huggingface.mdx
@ -1,6 +1,11 @@
 # Hugging Face

-All functionality related to the [Hugging Face Platform](https://huggingface.co/).
+All functionality related to [Hugging Face Hub](https://huggingface.co/) and libraries like [transformers](https://huggingface.co/docs/transformers/index), [sentence transformers](https://sbert.net/), and [datasets](https://huggingface.co/docs/datasets/index).
+
+> [Hugging Face](https://huggingface.co/) is an AI platform with all major open source models, datasets, MCPs, and demos. 
+> It supplies model inference locally and via serverless [Inference Providers](https://huggingface.co/docs/inference-providers).
+>
+> You can use [Inference Providers](https://huggingface.co/docs/inference-providers) to run open source models like DeepSeek R1 on scalable serverless infrastructure.

 ## Installation

@ -26,6 +31,7 @@ from langchain_huggingface import ChatHuggingFace

 ### HuggingFaceEndpoint

+We can use the `HuggingFaceEndpoint` class to run open source models via serverless [Inference Providers](https://huggingface.co/docs/inference-providers) or via dedicated [Inference Endpoints](https://huggingface.co/inference-endpoints/dedicated).

 See a [usage example](/docs/integrations/llms/huggingface_endpoint).

@ -35,7 +41,7 @@ from langchain_huggingface import HuggingFaceEndpoint

 ### HuggingFacePipeline

-Hugging Face models can be run locally through the `HuggingFacePipeline` class.
+We can use the `HuggingFacePipeline` class to run open source models locally.

 See a [usage example](/docs/integrations/llms/huggingface_pipelines).

@ -47,6 +53,8 @@ from langchain_huggingface import HuggingFacePipeline

 ### HuggingFaceEmbeddings

+We can use the `HuggingFaceEmbeddings` class to run open source embedding models locally.
+
 See a [usage example](/docs/integrations/text_embedding/huggingfacehub).

 ```python
@ -55,6 +63,8 @@ from langchain_huggingface import HuggingFaceEmbeddings

 ### HuggingFaceEndpointEmbeddings

+We can use the `HuggingFaceEndpointEmbeddings` class to run open source embedding models via a dedicated [Inference Endpoint](https://huggingface.co/inference-endpoints/dedicated).
+
 See a [usage example](/docs/integrations/text_embedding/huggingfacehub).

 ```python
@ -63,6 +73,8 @@ from langchain_huggingface import HuggingFaceEndpointEmbeddings

 ### HuggingFaceInferenceAPIEmbeddings

+We can use the `HuggingFaceInferenceAPIEmbeddings` class to run open source embedding models via [Inference Providers](https://huggingface.co/docs/inference-providers).
+
 See a [usage example](/docs/integrations/text_embedding/huggingfacehub).

 ```python
@ -71,6 +83,8 @@ from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

 ### HuggingFaceInstructEmbeddings

+We can use the `HuggingFaceInstructEmbeddings` class to run open source embedding models locally.
+
 See a [usage example](/docs/integrations/text_embedding/instruct_embeddings).

 ```python
--- a/docs/docs/integrations/text_embedding/huggingfacehub.ipynb
+++ b/docs/docs/integrations/text_embedding/huggingfacehub.ipynb
@ -95,35 +95,36 @@
   "id": "92019ef1-5d30-4985-b4e6-c0d98bdfe265",
   "metadata": {},
   "source": [
-    "## Hugging Face Inference API\n",
-    "We can also access embedding models via the Hugging Face Inference API, which does not require us to install ``sentence_transformers`` and download models locally."
+    "## Hugging Face Inference Providers\n",
+    "\n",
+    "We can also access embedding models via the [Inference Providers](https://huggingface.co/docs/inference-providers), which let's us use open source models on scalable serverless infrastructure.\n",
+    "\n",
+    "First, we need to get a read-only API key from [Hugging Face](https://huggingface.co/settings/tokens).\n"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
-   "id": "66f5c6ba-1446-43e1-b012-800d17cef300",
+   "execution_count": null,
+   "id": "c5576a6c",
   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Enter your HF Inference API Key:\n",
-      "\n",
-      " ········\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
-    "import getpass\n",
+    "from getpass import getpass\n",
    "\n",
-    "inference_api_key = getpass.getpass(\"Enter your HF Inference API Key:\\n\\n\")"
+    "huggingfacehub_api_token = getpass()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ad10337",
+   "metadata": {},
+   "source": [
+    "Now we can use the `HuggingFaceInferenceAPIEmbeddings` class to run open source embedding models via [Inference Providers](https://huggingface.co/docs/inference-providers)."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
   "id": "d0623c1f-cd82-4862-9bce-3655cb9b66ac",
   "metadata": {},
   "outputs": [
@ -139,10 +140,11 @@
    }
   ],
   "source": [
-    "from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings\n",
+    "from langchain_huggingface import HuggingFaceInferenceAPIEmbeddings\n",
    "\n",
    "embeddings = HuggingFaceInferenceAPIEmbeddings(\n",
-    "    api_key=inference_api_key, model_name=\"sentence-transformers/all-MiniLM-l6-v2\"\n",
+    "    api_key=huggingfacehub_api_token,\n",
+    "    model_name=\"sentence-transformers/all-MiniLM-l6-v2\",\n",
    ")\n",
    "\n",
    "query_result = embeddings.embed_query(text)\n",