langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-08-11 22:04:37 +00:00

Author	SHA1	Message	Date
af su	7c7ee07d30	huggingface[fix]: HuggingFaceEndpointEmbeddings model parameter passing error when async embed (#27953 ) This change refines the handling of _model_kwargs in POST requests. Instead of nesting _model_kwargs as a dictionary under the parameters key, it is now directly unpacked and merged into the request's JSON payload. This ensures that the model parameters are passed correctly and avoids unnecessary nesting.E. g.: ```python import asyncio from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings embedding_input = ["This input will get multiplied" * 10000] embeddings = HuggingFaceEndpointEmbeddings( model="http://127.0.0.1:8081/embed", model_kwargs={"truncate": True}, ) # Truncated parameters in synchronized methods are handled correctly embeddings.embed_documents(texts=embedding_input) # The truncate parameter is not handled correctly in the asynchronous method, # and 413 Request Entity Too Large is returned. asyncio.run(embeddings.aembed_documents(texts=embedding_input)) ``` Co-authored-by: af su <saf@zjuici.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-20 19:08:56 +00:00
Roman Solomatin	0f85dea8c8	langchain-huggingface: use separate kwargs for queries and docs (#27857 ) Now `encode_kwargs` used for both for documents and queries and this leads to wrong embeddings. E. g.: ```python model_kwargs = {"device": "cuda", "trust_remote_code": True} encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"} model = HuggingFaceEmbeddings( model_name="dunzhang/stella_en_400M_v5", model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, ) query_embedding = np.array( model.embed_query("What are some ways to reduce stress?",) ) document_embedding = np.array( model.embed_documents( [ "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.", "Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.", ] ) ) print(model._client.similarity(query_embedding, document_embedding)) # output: tensor([[0.8421, 0.3317]], dtype=torch.float64) ``` But from the [model card](https://huggingface.co/dunzhang/stella_en_400M_v5#sentence-transformers) expexted like this: ```python model_kwargs = {"device": "cuda", "trust_remote_code": True} encode_kwargs = {"normalize_embeddings": False} query_encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"} model = HuggingFaceEmbeddings( model_name="dunzhang/stella_en_400M_v5", model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, query_encode_kwargs=query_encode_kwargs, ) query_embedding = np.array( model.embed_query("What are some ways to reduce stress?", ) ) document_embedding = np.array( model.embed_documents( [ "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.", "Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.", ] ) ) print(model._client.similarity(query_embedding, document_embedding)) # tensor([[0.8398, 0.2990]], dtype=torch.float64) ```	2024-11-06 17:35:39 -05:00
Andrew Effendi	49517cc1e7	partners/huggingface[patch]: fix HuggingFacePipeline model_id parameter (#27514 ) Description: Fixes issue with model parameter not getting initialized correctly when passing transformers pipeline Issue: https://github.com/langchain-ai/langchain/issues/25915	2024-10-29 14:34:46 +00:00
Hyejun An	6227396e20	partners/HuggingFacePipeline[stream]: Change to use `pipeline` instead of `pipeline.model.generate` in stream() (#26531 ) ## Description I encountered an error while using the` gemma-2-2b-it model` with the `HuggingFacePipeline` class and have implemented a fix to resolve this issue. ### What is Problem ```python model_id="google/gemma-2-2b-it" gemma_2_model = AutoModelForCausalLM.from_pretrained(model_id) gemma_2_tokenizer = AutoTokenizer.from_pretrained(model_id) gen = pipeline( task='text-generation', model=gemma_2_model, tokenizer=gemma_2_tokenizer, max_new_tokens=1024, device=0 if torch.cuda.is_available() else -1, temperature=.5, top_p=0.7, repetition_penalty=1.1, do_sample=True, ) llm = HuggingFacePipeline(pipeline=gen) for chunk in llm.stream("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World."): print(chunk, end="", flush=True) ``` This code outputs the following error message: ``` /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation. warnings.warn( Exception in thread Thread-19 (generate): Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(self._args, self._kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1874, in generate self._validate_generated_length(generation_config, input_ids_length, has_default_max_length) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1266, in _validate_generated_length raise ValueError( ValueError: Input length of input_ids is 31, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`. ``` In addition, the following error occurs when the number of tokens is reduced. ```python for chunk in llm.stream("Hello World"): print(chunk, end="", flush=True) ``` ``` /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1885: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`. warnings.warn( Exception in thread Thread-20 (generate): Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(self._args, *self._kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2024, in generate result = self._sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2982, in _sample outputs = self(model_inputs, return_dict=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 994, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 803, in forward inputs_embeds = self.embed_tokens(input_ids) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 164, in forward return F.embedding( File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2267, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select) ``` On the other hand, in the case of invoke, the output is normal: ``` llm.invoke("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.") ``` ``` 'Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.\n\nThis is a simple program that prints the phrase "Hello World" to the console. \n\nHere\'s how it works:*\n\n `print("Hello World")`: This line of code uses the `print()` function, which is a built-in function in most programming languages (like Python). The `print()` function takes whatever you put inside its parentheses and displays it on the screen.\n* `"Hello World"`: The text within the double quotes (`"`) is called a string. It represents the message we want to print.\n\n\nLet me know if you\'d like to explore other programming concepts or see more examples! \n' ``` ### Problem Analysis - Apparently, I put kwargs in while generating pipelines and it applied to `invoke()`, but it's not applied in the `stream()`. - When using the stream, `inputs = self.pipeline.tokenizer (prompt, return_tensors = "pt")` enters cpu. - This can crash when the model is in gpu. ### Solution Just use `self.pipeline` instead of `self.pipeline.model.generate`. - Original Code ```python stopping_criteria = StoppingCriteriaList([StopOnTokens()]) inputs = self.pipeline.tokenizer(prompt, return_tensors="pt") streamer = TextIteratorStreamer( self.pipeline.tokenizer, timeout=60.0, skip_prompt=skip_prompt, skip_special_tokens=True, ) generation_kwargs = dict( inputs, streamer=streamer, stopping_criteria=stopping_criteria, pipeline_kwargs, ) t1 = Thread(target=self.pipeline.model.generate, kwargs=generation_kwargs) t1.start() ``` - Updated Code ```python stopping_criteria = StoppingCriteriaList([StopOnTokens()]) streamer = TextIteratorStreamer( self.pipeline.tokenizer, timeout=60.0, skip_prompt=skip_prompt, skip_special_tokens=True, ) generation_kwargs = dict( text_inputs= prompt, streamer=streamer, stopping_criteria=stopping_criteria, pipeline_kwargs, ) t1 = Thread(target=self.pipeline, kwargs=generation_kwargs) t1.start() ``` By using the `pipeline` directly, the `kwargs` of the pipeline are applied, and there is no need to consider the `device` of the `tensor` made with the `tokenizer`. > According to the change to use `pipeline`, it was modified to put `text_inputs=prompts` directly into `generation_kwargs`. ## Issue None ## Dependencies None ## Twitter handle None --------- Co-authored-by: Vadym Barda <vadym@langchain.dev>	2024-10-24 16:49:43 -04:00
Kwan Kin Chan	6d2a76ac05	langchain_huggingface: Fix multiple GPU usage bug in from_model_id function (#23628 ) - [ ] Description: - pass the device_map into model_kwargs - removing the unused device_map variable in the hf_pipeline function call - [ ] Issue: issue #13128 When using the from_model_id function to load a Hugging Face model for text generation across multiple GPUs, the model defaults to loading on the CPU despite multiple GPUs being available using the expected format ``` python llm = HuggingFacePipeline.from_model_id( model_id="model-id", task="text-generation", device_map="auto", ) ``` Currently, to enable multiple GPU , we have to pass in variable in this format instead ``` python llm = HuggingFacePipeline.from_model_id( model_id="model-id", task="text-generation", device=None, model_kwargs={ "device_map": "auto", } ) ``` This issue arises due to improper handling of the device and device_map parameters. - [ ] Explanation: 1. In from_model_id, the model is created using model_kwargs and passed as the model variable of the pipeline function. So at this moment, to load the model with multiple GPUs, "device_map" needs to be set to "auto" within model_kwargs. Otherwise, the model defaults to loading on the CPU. 2. The device_map variable in from_model_id is not utilized correctly. In the pipeline function's source code of tnansformer: - The device_map variable is stored in the model_kwargs dictionary (lines 867-878 of transformers/src/transformers/pipelines/\__init__.py). ```python if device_map is not None: ...... model_kwargs["device_map"] = device_map ``` - The model is constructed with model_kwargs containing the device_map value ONLY IF it is a string (lines 893-903 of transformers/src/transformers/pipelines/\__init__.py). ```python if isinstance(model, str) or framework is None: model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]} framework, model = infer_framework_load_model( ... , model_kwargs, ) ``` - Consequently, since a model object is already passed to the pipeline function, the device_map variable from from_model_id is never used. 3. The device_map variable in from_model_id not only appears unused but also causes errors. Without explicitly setting device=None, attempting to load the model on multiple GPUs may result in the following error: ``` Device has 2 GPUs available. Provide device={deviceId} to `from_model_id` to use available GPUs for execution. deviceId is -1 (default) for CPU and can be a positive integer associated with CUDA device id. Traceback (most recent call last): File "foo.py", line 15, in <module> llm = HuggingFacePipeline.from_model_id( File "foo\site-packages\langchain_huggingface\llms\huggingface_pipeline.py", line 217, in from_model_id pipeline = hf_pipeline( File "foo\lib\site-packages\transformers\pipelines\__init__.py", line 1108, in pipeline return pipeline_class(model=model, framework=framework, task=task, kwargs) File "foo\lib\site-packages\transformers\pipelines\text_generation.py", line 96, in __init__ super().__init__(args, *kwargs) File "foo\lib\site-packages\transformers\pipelines\base.py", line 835, in __init__ raise ValueError( ValueError: The model has been loaded with `accelerate` and therefore cannot be moved to a specific device. Please discard the `device` argument when creating your pipeline object. ``` This error occurs because, in from_model_id, the default values in from_model_id for device and device_map are -1 and None, respectively. It would passes the statement (`device_map is not None and device < 0`) and keep the device as -1 so the pipeline function later raises an error when trying to move a GPU-loaded model back to the CPU. `19eb82e68b/libs/community/langchain_community/llms/huggingface_pipeline.py (L204-L213)` If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: vbarda <vadym@langchain.dev>	2024-10-22 21:41:47 -04:00
Vadym Barda	0640cbf2f1	huggingface[patch]: hide client field in HuggingFaceEmbeddings (#27522 )	2024-10-21 17:37:07 -04:00
Bagatur	099235da01	Revert "huggingface[patch]: make HuggingFaceEndpoint serializable (#2… (#27032 ) …7027)" This reverts commit `b5e28d3a6d`.	2024-10-01 21:26:38 +00:00
Bagatur	b5e28d3a6d	huggingface[patch]: make HuggingFaceEndpoint serializable (#27027 )	2024-10-01 13:16:10 -07:00
Lucain	a2023a1e96	huggingface; fix huggingface_endpoint.py (initialize clients only with supported kwargs) (#26378 ) ## Description By default, `HuggingFaceEndpoint` instantiates both the `InferenceClient` and the `AsyncInferenceClient` with the `"server_kwargs"` passed as input. This is an issue as both clients might not support exactly the same kwargs. This has been highlighted in https://github.com/huggingface/huggingface_hub/issues/2522 by @morgandiverrez with the `trust_env` parameter. In order to make `langchain` integration future-proof, I do think it's wiser to forward only the supported parameters to each client. Parameters that are not supported are simply ignored with a warning to the user. From a `huggingface_hub` maintenance perspective, this allows us much more flexibility as we are not constrained to support the exact same kwargs in both clients. ## Issue https://github.com/huggingface/huggingface_hub/issues/2522 ## Dependencies None ## Twitter https://x.com/Wauplin --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-09-20 16:05:24 -07:00
Erick Friis	c2a3021bb0	multiple: pydantic 2 compatibility, v0.3 (#26443 ) Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com>	2024-09-13 14:38:45 -07:00
Mohammad Mohtashim	9a29398fe6	huggingface: fix `model` param population (#24743 ) - Description: Fix the validation error for `endpoint_url` for HuggingFaceEndpoint. I have given a descriptive detail of the isse in the issue that I have created. - Issue: #24742 --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-24 00:45:28 +00:00
Eugene Yurtsev	5f5e8c9a60	huggingface[patch], pinecone[patch], fireworks[patch], mistralai[patch], voyageai[patch], togetherai[path]: convert Pydantic extras to literals (#25384 ) Backwards compatible change that converts pydantic extras to literals which is consistent with pydantic 2 usage. - fireworks - voyage ai - mistralai - mistral ai - together ai - huggigng face - pinecone	2024-08-14 09:55:30 -04:00
Bagatur	8461934c2b	core[patch], integrations[patch]: convert TypedDict to tool schema support (#24641 ) supports following UX ```python class SubTool(TypedDict): """Subtool docstring""" args: Annotated[Dict[str, Any], {}, "this does bar"] class Tool(TypedDict): """Docstring Args: arg1: foo """ arg1: str arg2: Union[int, str] arg3: Optional[List[SubTool]] arg4: Annotated[Literal["bar", "baz"], ..., "this does foo"] arg5: Annotated[Optional[float], None] ``` - can parse google style docstring - can use Annotated to specify default value (second arg) - can use Annotated to specify arg description (third arg) - can have nested complex types	2024-07-31 18:27:24 +00:00
Ethan Yang	13855ef0c3	[HuggingFace Pipeline] add streaming support (#23852 )	2024-07-09 17:02:00 -04:00
Jiejun Tan	2be66a38d8	huggingface: Fix huggingface tei support (#22653 ) Update former pull request: https://github.com/langchain-ai/langchain/pull/22595. Modified `libs/partners/huggingface/langchain_huggingface/embeddings/huggingface_endpoint.py`, where the API call function does not match current [Text Embeddings Inference API](https://huggingface.github.io/text-embeddings-inference/#/Text%20Embeddings%20Inference/embed). One example is: ```json { "inputs": "string", "normalize": true, "truncate": false } ``` Parameters in `_model_kwargs` are not passed properly in the latest version. By the way, the issue [why cause 413? #50](https://github.com/huggingface/text-embeddings-inference/issues/50) might be solved.	2024-07-03 13:30:29 -07:00
Bagatur	a0c2281540	infra: update mypy 1.10, ruff 0.5 (#23721 ) ```python """python scripts/update_mypy_ruff.py""" import glob import tomllib from pathlib import Path import toml import subprocess import re ROOT_DIR = Path(__file__).parents[1] def main(): for path in glob.glob(str(ROOT_DIR / "libs/*/pyproject.toml"), recursive=True): print(path) with open(path, "rb") as f: pyproject = tomllib.load(f) try: pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = ( "^1.10" ) pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = ( "^0.5" ) except KeyError: continue with open(path, "w") as f: toml.dump(pyproject, f) cwd = "/".join(path.split("/")[:-1]) completed = subprocess.run( "poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color", cwd=cwd, shell=True, capture_output=True, text=True, ) logs = completed.stdout.split("\n") to_ignore = {} for l in logs: if re.match("^(.)\:(\d+)\: error:.\[(.)\]", l): path, line_no, error_type = re.match( "^(.)\:(\d+)\: error:.\[(.*)\]", l ).groups() if (path, line_no) in to_ignore: to_ignore[(path, line_no)].append(error_type) else: to_ignore[(path, line_no)] = [error_type] print(len(to_ignore)) for (error_path, line_no), error_types in to_ignore.items(): all_errors = ", ".join(error_types) full_path = f"{cwd}/{error_path}" try: with open(full_path, "r") as f: file_lines = f.readlines() except FileNotFoundError: continue file_lines[int(line_no) - 1] = ( file_lines[int(line_no) - 1][:-1] + f" # type: ignore[{all_errors}]\n" ) with open(full_path, "w") as f: f.write("".join(file_lines)) subprocess.run( "poetry run ruff format .; poetry run ruff --select I --fix .", cwd=cwd, shell=True, capture_output=True, text=True, ) if __name__ == "__main__": main() ```	2024-07-03 10:33:27 -07:00
wenngong	ee5eedfa04	partners: support reading HuggingFace params from env (#23309 ) Description: 1. partners/HuggingFace module support reading params from env. Not adjust langchain_community/.../huggingfaceXX modules since they are deprecated. 2. pydantic 2 @root_validator migration. Issue: #22448 #22819 --------- Co-authored-by: gongwn1 <gongwn1@lenovo.com>	2024-07-02 10:12:45 -04:00
Mohammad Mohtashim	4796b7eb15	[Community [HuggingFace]]: Small Fix for ChatHuggingFace. (#22925 ) - Description: A small fix where I moved the `available_endpoints` in order to avoid the token error in the below issue. Also I have added conftest file and updated the `scripy`,`numpy` versions to support newer python versions in poetry files. - Issue: #22804 --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-06-27 19:37:20 +00:00
Leonid Ganeline	41f7620989	huggingface: docstrings (#23148 ) Added missed docstrings. Format docstrings to the consistent format (used in the API Reference) Co-authored-by: ccurme <chester.curme@gmail.com>	2024-06-20 13:22:40 +00:00
Lucas Tucker	cb79e80b0b	docs: standardize ChatHuggingFace (#22693 ) Updated ChatHuggingFace doc string as per issue #22296: "langchain_huggingface: updated docstring for ChatHuggingFace in langchain_huggingface to match that of the description (in the appendix) provided in issue #22296. " Issue: This PR is in response to issue #22296, and more specifically ChatHuggingFace model. In particular, this PR updates the docstring for langchain/libs/partners/hugging_face/langchain_huggingface/chat_models/huggingface.py by adding the following sections: Instantiate, Invoke, Stream, Async, Tool calling, and Response metadata. I used the template from the Anthropic implementation and referenced the Appendix of the original issue post. I also noted that: langchain_community hugging face llms do not work with langchain_huggingface's ChatHuggingFace model (at least for me); the .stream(messages) functionality of ChatHuggingFace only returned a block of response. --------- Co-authored-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-06-10 20:54:36 +00:00
Ethan Yang	29064848f9	[Community]add option to delete the prompt from HF output (#22225 ) This will help to solve pattern mismatching issue when parsing the output in Agent. https://github.com/langchain-ai/langchain/issues/21912	2024-06-05 18:38:54 -04:00
Michal Gregor	98b2e7b195	huggingface[patch]: Support for HuggingFacePipeline in ChatHuggingFace. (#22194 ) - Description: Added support for using HuggingFacePipeline in ChatHuggingFace (previously it was only usable with API endpoints, probably by oversight). - Issue: #19997 - Dependencies: none - Twitter handle: none --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-06-04 00:47:35 +00:00
Miroslav	cbd5720011	huggingface[patch]: Skip Login to HuggingFaceHub when token is not set (#22365 )	2024-06-03 15:20:32 -07:00
Erick Friis	2a984e8e3f	docs: huggingface package (#21645 )	2024-05-14 03:17:40 +00:00
Erick Friis	9b51ca08bc	huggingface: fix community dep checking (#21628 )	2024-05-13 21:52:18 +00:00
Jofthomas	afd85b60fc	huggingface: init package (#21097 ) First Pr for the langchain_huggingface partner Package - Moved some of the hugging face related class from `community` to the new `partner package` Still needed : - Documentation - Tests - Support for the new apply_chat_template in `ChatHuggingFace` - Confirm choice of class to support for embeddings witht he sentence-transformer team. cc : @efriis --------- Co-authored-by: Cyril Kondratenko <kkn1993@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-05-13 20:53:15 +00:00

26 Commits