langchain/libs/partners/huggingface/tests/integration_tests/test_llms.py
célina 868f07f8f4
partners: (langchain-huggingface) Chat Models - Integrate Hugging Face Inference Providers and remove deprecated code (#30733)
Hi there, I'm Célina from 🤗,
This PR introduces support for Hugging Face's serverless Inference
Providers (documentation
[here](https://huggingface.co/docs/inference-providers/index)), allowing
users to specify different providers for chat completion and text
generation tasks.

This PR also removes the usage of `InferenceClient.post()` method in
`HuggingFaceEndpoint`, in favor of the task-specific `text_generation`
method. `InferenceClient.post()` is deprecated and will be removed in
`huggingface_hub v0.31.0`.

---
## Changes made
- bumped the minimum required version of the `huggingface-hub` package
to ensure compatibility with the latest API usage.
- added a `provider` field to `HuggingFaceEndpoint`, enabling users to
select the inference provider (e.g., 'cerebras', 'together',
'fireworks-ai'). Defaults to `hf-inference` (HF Inference API).
- replaced the deprecated `InferenceClient.post()` call in
`HuggingFaceEndpoint` with the task-specific `text_generation` method
for future-proofing, `post()` will be removed in huggingface-hub
v0.31.0.
- updated the `ChatHuggingFace` component:
    - added async and streaming support.
    - added support for tool calling.
- exposed underlying chat completion parameters for more granular
control.
- Added integration tests for `ChatHuggingFace` and updated the
corresponding unit tests.

  All changes are backward compatible.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-04-29 09:53:14 -04:00

21 lines
696 B
Python

from collections.abc import Generator
from langchain_huggingface.llms import HuggingFacePipeline
def test_huggingface_pipeline_streaming() -> None:
"""Test streaming tokens from huggingface_pipeline."""
llm = HuggingFacePipeline.from_model_id(
model_id="openai-community/gpt2",
task="text-generation",
pipeline_kwargs={"max_new_tokens": 10},
)
generator = llm.stream("Q: How do you say 'hello' in German? A:'", stop=["."])
stream_results_string = ""
assert isinstance(generator, Generator)
for chunk in generator:
assert isinstance(chunk, str)
stream_results_string = chunk
assert len(stream_results_string.strip()) > 0