langchain

mirror of https://github.com/hwchase17/langchain.git synced 2026-06-09 10:17:00 +00:00

Author	SHA1	Message	Date
Mason Daugherty	2f64d80cc6	fix(core,model-profiles): add missing `ModelProfile` fields, warn on schema drift (#36129 ) PR #35788 added 7 new fields to the `langchain-profiles` CLI output (`name`, `status`, `release_date`, `last_updated`, `open_weights`, `attachment`, `temperature`) but didn't update `ModelProfile` in `langchain-core`. Partner packages like `langchain-aws` that set `extra="forbid"` on their Pydantic models hit `extra_forbidden` validation errors when Pydantic encountered undeclared TypedDict keys at construction time. This adds the missing fields, makes `ModelProfile` forward-compatible, provides a base-class hook so partners can stop duplicating model-profile validator boilerplate, migrates all in-repo partners to the new hook, and adds runtime + CI-time warnings for schema drift. ## Changes ### `langchain-core` - Add `__pydantic_config__ = ConfigDict(extra="allow")` to `ModelProfile` so unknown profile keys pass Pydantic validation even on models with `extra="forbid"` — forward-compatibility for when the CLI schema evolves ahead of core - Declare the 7 missing fields on `ModelProfile`: `name`, `status`, `release_date`, `last_updated`, `open_weights` (metadata) and `attachment`, `temperature` (capabilities) - Add `_warn_unknown_profile_keys()` in `model_profile.py` — emits a `UserWarning` when a profile dict contains keys not in `ModelProfile`, suggesting a core upgrade. Wrapped in a bare `except` so introspection failures never crash model construction - Add `BaseChatModel._resolve_model_profile()` hook that returns `None` by default. Partners can override this single method instead of redefining the full `_set_model_profile` validator — the base validator calls it automatically - Add `BaseChatModel._check_profile_keys` as a separate `model_validator` that calls `_warn_unknown_profile_keys`. Uses a distinct method name so partner overrides of `_set_model_profile` don't inadvertently suppress the check ### `langchain-profiles` CLI - Add `_warn_undeclared_profile_keys()` to the CLI (`cli.py`), called after merging augmentations in `refresh()` — warns at profile-generation time (not just runtime) when emitted keys aren't declared in `ModelProfile`. Gracefully skips if `langchain-core` isn't installed - Add guard test `test_model_data_to_profile_keys_subset_of_model_profile` in model-profiles — feeds a fully-populated model dict to `_model_data_to_profile()` and asserts every emitted key exists in `ModelProfile.__annotations__`. CI fails before any release if someone adds a CLI field without updating the TypedDict ### Partner packages - Migrate all 10 in-repo partners to the `_resolve_model_profile()` hook, replacing duplicated `@model_validator` / `_set_model_profile` overrides: anthropic, deepseek, fireworks, groq, huggingface, mistralai, openai (base + azure), openrouter, perplexity, xai - Anthropic retains custom logic (context-1m beta → `max_input_tokens` override); all others reduce to a one-liner - Add `pr_lint.yml` scope for the new `model-profiles` package	2026-03-23 00:44:27 -04:00
Mason Daugherty	5d9568b5f5	feat(model-profiles): new fields + `Makefile` target (#35788 ) Extract additional fields from models.dev into `_model_data_to_profile`: `name`, `status`, `release_date`, `last_updated`, `open_weights`, `attachment`, `temperature` Move the model profile refresh logic from an inline bash script in the GitHub Actions workflow into a `make refresh-profiles` target in `libs/model-profiles/Makefile`. This makes it runnable locally with a single command and keeps the provider map in one place instead of duplicated between CI and developer docs.	2026-03-12 13:56:25 +00:00
Mason Daugherty	70192690b1	fix(model-profiles): sort generated profiles by model ID for stable diffs (#35344 ) - Sort model profiles alphabetically by model ID (the top-level `_PROFILES` dictionary keys, e.g. `claude-3-5-haiku-20241022`, `gpt-4o-mini`) before writing `_profiles.py`, so that regenerating profiles only shows actual data changes in diffs — not random reordering from the models.dev API response order - Regenerate all 10 partner profile files with the new sorted ordering	2026-02-19 23:11:22 -05:00
Mason Daugherty	82ae4fb6fa	chore: bump model profiles (#35294 )	2026-02-17 20:22:07 -05:00
Mason Daugherty	4ca586b322	feat(model-profiles): add `text_inputs` and `text_outputs` (#35084 ) - Add `text_inputs` and `text_outputs` fields to `ModelProfile` - Regenerate `_profiles.py` for all providers ## Why models.dev data includes `'text'` as both an input and output modality, but we didn't capture it. models.dev broadly contains models without text input (Whisper/ASR) and without text output (image generators, TTS). Without this, downstream consumers can't filter on model text support (e.g. preventing users from passing text input to an audio-only model). --- We'd need to also run for Google, AWS and cut releases for all to propagate	2026-02-09 14:50:09 -05:00
XXt	689ce96016	docs: add missing module-level docstrings to partner integrations (#34838 ) docs: add missing module-level docstrings to partner integrations Added module-level docstrings to 6 partner integration __init__.py files that were missing documentation:	2026-01-22 12:05:59 -05:00
Georgey	16c984ef0a	fix(langchain-classic): fix init_chat_model for HuggingFace models (#33943 )	2025-12-12 11:05:48 -05:00
Paul	bf6a5eb122	fix(huggingface): Helper logic for init_chat_model with HuggingFace backend (#34259 )	2025-12-12 10:05:16 -05:00
Mason Daugherty	ff6e3558d7	docs(fireworks,groq,huggingface,mistralai,ollama,openai): x-ref `convert_to_openai_tool` (#34276 )	2025-12-09 19:51:04 -05:00
ccurme	33e5d01f7c	feat(model-profiles): distribute data across packages (#34024 )	2025-11-21 15:47:05 -05:00
Azibek	d8b94007c1	fix(huggingface): pass llm params to `ChatHuggingFace` (#32368 ) This PR fixes #32234 and improves HuggingFace chat model integration by: Ensuring ChatHuggingFace inherits key parameters (temperature, max_tokens, top_p, streaming, etc.) from the underlying LLM when not explicitly set. Adding and updating unit tests to verify property inheritance. No breaking changes; these updates enhance reliability and maintainability. --------- Co-authored-by: Mason Daugherty <mason@langchain.dev> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Mason Daugherty <github@mdrxy.com>	2025-11-07 14:29:15 -05:00
Mason Daugherty	e023201d42	style: some cleanup (#33857 )	2025-11-06 23:50:46 -05:00
Hyejeong Jo	0e36185933	fix(huggingface): add `stream_usage` support for `ChatHuggingFace` invoke/stream (#32708 )	2025-11-03 14:44:32 -05:00
Mason Daugherty	123e29dc26	style: more refs fixes (#33730 )	2025-10-29 16:34:46 -04:00
Mason Daugherty	1d2273597a	docs: more fixes for refs (#33554 )	2025-10-16 22:54:16 -04:00
Mason Daugherty	15db024811	chore: more sweeping (#33533 ) more fixes for refs	2025-10-16 15:44:56 -04:00
Mason Daugherty	291a9fcea1	style: `llm` -> `model` (#33423 )	2025-10-10 13:19:13 -04:00
Mason Daugherty	6fc21afbc9	style: `.. code-block::` admonition translations (#33400 ) biiiiiiiiiiiiiiiigggggggg pass	2025-10-09 16:52:58 -04:00
Mason Daugherty	d8a680ee57	style: address Sphinx double-backtick snippet syntax (#33389 )	2025-10-09 13:35:51 -04:00
Mason Daugherty	b6132fc23e	style: remove more `Optional` syntax (#33371 )	2025-10-08 23:28:43 -04:00
Mason Daugherty	31eeb50ce0	chore: drop UP045 (#33362 ) Python 3.9 EOL	2025-10-08 21:17:53 -04:00
Mason Daugherty	d13823043d	style: monorepo pass for refs (#33359 ) * Delete some double backticks previously used by Sphinx (not done everywhere yet) * Fix some code blocks / dropdowns Ignoring CLI CI for now	2025-10-08 18:41:39 -04:00
Mason Daugherty	ae5b105d11	docs: v1 docs updates (#33173 ) Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com> Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Vadym Barda <vadim.barda@gmail.com>	2025-10-02 18:46:26 -04:00
Mason Daugherty	eaa6dcce9e	release: v1.0.0 (#32567 ) Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com> Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com> Co-authored-by: Vadym Barda <vadim.barda@gmail.com>	2025-10-02 10:49:42 -04:00
Mason Daugherty	986302322f	docs: more standardization (#33124 )	2025-09-25 20:46:20 -04:00
Mason Daugherty	b92b394804	style: repo linting pass (#33089 ) enable docstring-code-format	2025-09-24 15:25:55 -04:00
Mason Daugherty	96cbd90cba	fix: formatting issues in docstrings (#32265 ) Ensures proper reStructuredText formatting by adding the required blank line before closing docstring quotes, which resolves the "Block quote ends without a blank line; unexpected unindent" warning.	2025-07-27 23:37:47 -04:00
niceg	0d6f915442	fix: LLM mimicking Unicode responses due to forced Unicode conversion of non-ASCII characters. (#32222 ) fix: Fix LLM mimicking Unicode responses due to forced Unicode conversion of non-ASCII characters. - Description: This PR fixes an issue where the LLM would mimic Unicode responses due to forced Unicode conversion of non-ASCII characters in tool calls. The fix involves disabling the `ensure_ascii` flag in `json.dumps()` when converting tool calls to OpenAI format. - Issue: Fixes ↓↓↓ input： ```json {'role': 'assistant', 'tool_calls': [{'type': 'function', 'id': 'call_nv9trcehdpihr21zj9po19vq', 'function': {'name': 'create_customer', 'arguments': '{"customer_name": "你好啊集团"}'}}]} ``` output: ```json {'role': 'assistant', 'tool_calls': [{'type': 'function', 'id': 'call_nv9trcehdpihr21zj9po19vq', 'function': {'name': 'create_customer', 'arguments': '{"customer_name": "\\u4f60\\u597d\\u554a\\u96c6\\u56e2"}'}}]} ``` then: llm will mimic outputting unicode. Unicode's vast number of symbols can lengthen LLM responses, leading to slower performance. <img width="686" height="277" alt="image" src="https://github.com/user-attachments/assets/28f3b007-3964-4455-bee2-68f86ac1906d" /> --------- Co-authored-by: Mason Daugherty <github@mdrxy.com> Co-authored-by: Mason Daugherty <mason@langchain.dev>	2025-07-24 17:01:31 -04:00
Mason Daugherty	d53ebf367e	fix(docs): capitalization, codeblock formatting, and hyperlinks, note blocks (#32235 ) widespread cleanup attempt	2025-07-24 16:55:04 -04:00
Mason Daugherty	4d9eefecab	fix: bump lockfiles (#31923 ) * bump lockfiles after upgrading ruff * resolve resulting linting fixes	2025-07-08 13:27:55 -04:00
Mason Daugherty	ae210c1590	ruff: add bugbear across packages (#31917 ) WIP, other packages will get in next PRs	2025-07-08 12:22:55 -04:00
Mason Daugherty	750721b4c3	huggingface[patch]: ruff fixes and rules (#31912 ) * bump ruff deps * add more thorough ruff rules * fix said rules	2025-07-08 10:07:57 -04:00
m27315	013ce2c47f	huggingface: fix HuggingFaceEndpoint._astream() got multiple values for argument 'stop' (#31385 )	2025-07-06 15:18:53 +00:00
Peter Schneider	cecfec5efa	huggingface: handle image-text-to-text pipeline task (#31611 ) Description: Allows for HuggingFacePipeline to handle image-text-to-text pipeline	2025-06-14 16:41:11 -04:00
अंkur गोswami	729526ff7c	huggingface: Undefined model_id fix (#31358 ) Description: This change fixes the undefined model_id issue when instantiating [ChatHuggingFace](https://github.com/langchain-ai/langchain/blob/master/libs/partners/huggingface/langchain_huggingface/chat_models/huggingface.py#L306) Issue: Fixes https://github.com/langchain-ai/langchain/issues/31357 @baskaryan @hwchase17	2025-05-29 15:59:35 -04:00
ccurme	bdb7c4a8b3	huggingface: fix embeddings return type (#31072 ) Integration tests failing cc @hanouticelina	2025-04-29 18:45:04 +00:00
célina	868f07f8f4	partners: (langchain-huggingface) Chat Models - Integrate Hugging Face Inference Providers and remove deprecated code (#30733 ) Hi there, I'm Célina from 🤗, This PR introduces support for Hugging Face's serverless Inference Providers (documentation [here](https://huggingface.co/docs/inference-providers/index)), allowing users to specify different providers for chat completion and text generation tasks. This PR also removes the usage of `InferenceClient.post()` method in `HuggingFaceEndpoint`, in favor of the task-specific `text_generation` method. `InferenceClient.post()` is deprecated and will be removed in `huggingface_hub v0.31.0`. --- ## Changes made - bumped the minimum required version of the `huggingface-hub` package to ensure compatibility with the latest API usage. - added a `provider` field to `HuggingFaceEndpoint`, enabling users to select the inference provider (e.g., 'cerebras', 'together', 'fireworks-ai'). Defaults to `hf-inference` (HF Inference API). - replaced the deprecated `InferenceClient.post()` call in `HuggingFaceEndpoint` with the task-specific `text_generation` method for future-proofing, `post()` will be removed in huggingface-hub v0.31.0. - updated the `ChatHuggingFace` component: - added async and streaming support. - added support for tool calling. - exposed underlying chat completion parameters for more granular control. - Added integration tests for `ChatHuggingFace` and updated the corresponding unit tests. ✅ All changes are backward compatible. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2025-04-29 09:53:14 -04:00
Sydney Runkle	7e926520d5	packaging: remove Python upper bound for langchain and co libs (#31025 ) Follow up to https://github.com/langchain-ai/langsmith-sdk/pull/1696, I've bumped the `langsmith` version where applicable in `uv.lock`. Type checking problems here because deps have been updated in `pyproject.toml` and `uv lock` hasn't been run - we should enforce that in the future - goes with the other dependabot todos :).	2025-04-28 14:44:28 -04:00
Sydney Runkle	8c6734325b	partners[lint]: run `pyupgrade` to get code in line with 3.9 standards (#30781 ) Using `pyupgrade` to get all `partners` code up to 3.9 standards (mostly, fixing old `typing` imports).	2025-04-11 07:18:44 -04:00
célina	68361f9c2d	partners: (langchain-huggingface) Embeddings - Integrate Inference Providers and remove deprecated code (#30735 ) Hi there, This is a complementary PR to #30733. This PR introduces support for Hugging Face's serverless Inference Providers (documentation [here](https://huggingface.co/docs/inference-providers/index)), allowing users to specify different providers This PR also removes the usage of `InferenceClient.post()` method in `HuggingFaceEndpointEmbeddings`, in favor of the task-specific `feature_extraction` method. `InferenceClient.post()` is deprecated and will be removed in `huggingface_hub` v0.31.0. ## Changes made - bumped the minimum required version of the `huggingface_hub` package to ensure compatibility with the latest API usage. - added a provider field to `HuggingFaceEndpointEmbeddings`, enabling users to select the inference provider. - replaced the deprecated `InferenceClient.post()` call in `HuggingFaceEndpointEmbeddings` with the task-specific `feature_extraction` method for future-proofing, `post()` will be removed in `huggingface-hub` v0.31.0. ✅ All changes are backward compatible. --------- Co-authored-by: Lucain <lucainp@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2025-04-09 19:05:43 +00:00
Ella Charlaix	c401254770	huggingface: Add ipex support to HuggingFaceEmbeddings (#29386 ) ONNX and OpenVINO models are available by specifying the `backend` argument (the model is loaded using `optimum` https://github.com/huggingface/optimum) ```python from langchain_huggingface import HuggingFaceEmbeddings embedding = HuggingFaceEmbeddings( model_name=model_id, model_kwargs={"backend": "onnx"}, ) ``` With this PR we also enable the IPEX backend ```python from langchain_huggingface import HuggingFaceEmbeddings embedding = HuggingFaceEmbeddings( model_name=model_id, model_kwargs={"backend": "ipex"}, ) ```	2025-02-07 15:21:09 -08:00
Teruaki Ishizaki	aeb42dc900	partners: Fixed the procedure of initializing pad_token_id (#29500 ) - Description: Add to check pad_token_id and eos_token_id of model config. It seems that this is the same bug as the HuggingFace TGI bug. It's same bug as #29434 - Issue: #29431 - Dependencies: none - Twitter handle: tell14 Example code is followings: ```python from langchain_huggingface.llms import HuggingFacePipeline hf = HuggingFacePipeline.from_model_id( model_id="meta-llama/Llama-3.2-3B-Instruct", task="text-generation", pipeline_kwargs={"max_new_tokens": 10}, ) from langchain_core.prompts import PromptTemplate template = """Question: {question} Answer: Let's think step by step.""" prompt = PromptTemplate.from_template(template) chain = prompt \| hf question = "What is electroencephalography?" print(chain.invoke({"question": question})) ```	2025-02-03 21:40:33 -05:00
Ella Charlaix	6f95db81b7	huggingface: Add IPEX models support (#29179 ) Co-authored-by: Erick Friis <erick@langchain.dev>	2025-01-22 00:16:44 +00:00
Mohammad Mohtashim	8cf5f20bb5	`required` tool_choice added for ChatHuggingFace (#28851 ) - Description: HuggingFace Inference Client V3 now supports `required` as tool_choice which has been added. - Issue: #28842	2024-12-20 12:06:04 -05:00
Manuel	af2e0a7ede	partners: add 'model' alias for consistency in embedding classes (#28374 ) Description: This PR introduces a `model` alias for the embedding classes that contain the attribute `model_name`, to ensure consistency across the codebase, as suggested by a moderator in a previous PR. The change aligns the usage of attribute names across the project (see for example [here](`65deeddd5d/libs/partners/groq/langchain_groq/chat_models.py (L304)`)). Issue: This PR addresses the suggestion from the review of issue #28269. Dependencies: None --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-13 22:30:00 +00:00
Wang, Yi	d834c6b618	huggingface: fix tool argument serialization in _convert_TGI_message_to_LC_message (#26075 ) Currently `_convert_TGI_message_to_LC_message` replaces `'` in the tool arguments, so an argument like "It's" will be converted to `It"s` and could cause a json parser to fail. --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-12-11 18:34:32 -08:00
af su	7c7ee07d30	huggingface[fix]: HuggingFaceEndpointEmbeddings model parameter passing error when async embed (#27953 ) This change refines the handling of _model_kwargs in POST requests. Instead of nesting _model_kwargs as a dictionary under the parameters key, it is now directly unpacked and merged into the request's JSON payload. This ensures that the model parameters are passed correctly and avoids unnecessary nesting.E. g.: ```python import asyncio from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings embedding_input = ["This input will get multiplied" * 10000] embeddings = HuggingFaceEndpointEmbeddings( model="http://127.0.0.1:8081/embed", model_kwargs={"truncate": True}, ) # Truncated parameters in synchronized methods are handled correctly embeddings.embed_documents(texts=embedding_input) # The truncate parameter is not handled correctly in the asynchronous method, # and 413 Request Entity Too Large is returned. asyncio.run(embeddings.aembed_documents(texts=embedding_input)) ``` Co-authored-by: af su <saf@zjuici.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-20 19:08:56 +00:00
Roman Solomatin	0f85dea8c8	langchain-huggingface: use separate kwargs for queries and docs (#27857 ) Now `encode_kwargs` used for both for documents and queries and this leads to wrong embeddings. E. g.: ```python model_kwargs = {"device": "cuda", "trust_remote_code": True} encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"} model = HuggingFaceEmbeddings( model_name="dunzhang/stella_en_400M_v5", model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, ) query_embedding = np.array( model.embed_query("What are some ways to reduce stress?",) ) document_embedding = np.array( model.embed_documents( [ "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.", "Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.", ] ) ) print(model._client.similarity(query_embedding, document_embedding)) # output: tensor([[0.8421, 0.3317]], dtype=torch.float64) ``` But from the [model card](https://huggingface.co/dunzhang/stella_en_400M_v5#sentence-transformers) expexted like this: ```python model_kwargs = {"device": "cuda", "trust_remote_code": True} encode_kwargs = {"normalize_embeddings": False} query_encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"} model = HuggingFaceEmbeddings( model_name="dunzhang/stella_en_400M_v5", model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, query_encode_kwargs=query_encode_kwargs, ) query_embedding = np.array( model.embed_query("What are some ways to reduce stress?", ) ) document_embedding = np.array( model.embed_documents( [ "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.", "Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.", ] ) ) print(model._client.similarity(query_embedding, document_embedding)) # tensor([[0.8398, 0.2990]], dtype=torch.float64) ```	2024-11-06 17:35:39 -05:00
Andrew Effendi	49517cc1e7	partners/huggingface[patch]: fix HuggingFacePipeline model_id parameter (#27514 ) Description: Fixes issue with model parameter not getting initialized correctly when passing transformers pipeline Issue: https://github.com/langchain-ai/langchain/issues/25915	2024-10-29 14:34:46 +00:00
Hyejun An	6227396e20	partners/HuggingFacePipeline[stream]: Change to use `pipeline` instead of `pipeline.model.generate` in stream() (#26531 ) ## Description I encountered an error while using the` gemma-2-2b-it model` with the `HuggingFacePipeline` class and have implemented a fix to resolve this issue. ### What is Problem ```python model_id="google/gemma-2-2b-it" gemma_2_model = AutoModelForCausalLM.from_pretrained(model_id) gemma_2_tokenizer = AutoTokenizer.from_pretrained(model_id) gen = pipeline( task='text-generation', model=gemma_2_model, tokenizer=gemma_2_tokenizer, max_new_tokens=1024, device=0 if torch.cuda.is_available() else -1, temperature=.5, top_p=0.7, repetition_penalty=1.1, do_sample=True, ) llm = HuggingFacePipeline(pipeline=gen) for chunk in llm.stream("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World."): print(chunk, end="", flush=True) ``` This code outputs the following error message: ``` /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation. warnings.warn( Exception in thread Thread-19 (generate): Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(self._args, self._kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1874, in generate self._validate_generated_length(generation_config, input_ids_length, has_default_max_length) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1266, in _validate_generated_length raise ValueError( ValueError: Input length of input_ids is 31, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`. ``` In addition, the following error occurs when the number of tokens is reduced. ```python for chunk in llm.stream("Hello World"): print(chunk, end="", flush=True) ``` ``` /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1885: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`. warnings.warn( Exception in thread Thread-20 (generate): Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(self._args, *self._kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2024, in generate result = self._sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2982, in _sample outputs = self(model_inputs, return_dict=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 994, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 803, in forward inputs_embeds = self.embed_tokens(input_ids) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 164, in forward return F.embedding( File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2267, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select) ``` On the other hand, in the case of invoke, the output is normal: ``` llm.invoke("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.") ``` ``` 'Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.\n\nThis is a simple program that prints the phrase "Hello World" to the console. \n\nHere\'s how it works:*\n\n `print("Hello World")`: This line of code uses the `print()` function, which is a built-in function in most programming languages (like Python). The `print()` function takes whatever you put inside its parentheses and displays it on the screen.\n* `"Hello World"`: The text within the double quotes (`"`) is called a string. It represents the message we want to print.\n\n\nLet me know if you\'d like to explore other programming concepts or see more examples! \n' ``` ### Problem Analysis - Apparently, I put kwargs in while generating pipelines and it applied to `invoke()`, but it's not applied in the `stream()`. - When using the stream, `inputs = self.pipeline.tokenizer (prompt, return_tensors = "pt")` enters cpu. - This can crash when the model is in gpu. ### Solution Just use `self.pipeline` instead of `self.pipeline.model.generate`. - Original Code ```python stopping_criteria = StoppingCriteriaList([StopOnTokens()]) inputs = self.pipeline.tokenizer(prompt, return_tensors="pt") streamer = TextIteratorStreamer( self.pipeline.tokenizer, timeout=60.0, skip_prompt=skip_prompt, skip_special_tokens=True, ) generation_kwargs = dict( inputs, streamer=streamer, stopping_criteria=stopping_criteria, pipeline_kwargs, ) t1 = Thread(target=self.pipeline.model.generate, kwargs=generation_kwargs) t1.start() ``` - Updated Code ```python stopping_criteria = StoppingCriteriaList([StopOnTokens()]) streamer = TextIteratorStreamer( self.pipeline.tokenizer, timeout=60.0, skip_prompt=skip_prompt, skip_special_tokens=True, ) generation_kwargs = dict( text_inputs= prompt, streamer=streamer, stopping_criteria=stopping_criteria, pipeline_kwargs, ) t1 = Thread(target=self.pipeline, kwargs=generation_kwargs) t1.start() ``` By using the `pipeline` directly, the `kwargs` of the pipeline are applied, and there is no need to consider the `device` of the `tensor` made with the `tokenizer`. > According to the change to use `pipeline`, it was modified to put `text_inputs=prompts` directly into `generation_kwargs`. ## Issue None ## Dependencies None ## Twitter handle None --------- Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-10-24 16:49:43 -04:00

1 2

72 Commits