mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-24 08:27:50 +00:00
The huggingface pipeline in langchain (used for locally hosted models) does not support batching. If you send in a batch of prompts, it just processes them serially using the base implementation of _generate: https://github.com/docugami/langchain/blob/master/libs/langchain/langchain/llms/base.py#L1004C2-L1004C29 This PR adds support for batching in this pipeline, so that GPUs can be fully saturated. I updated the accompanying notebook to show GPU batch inference. --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com> |
||
---|---|---|
.. | ||
callbacks | ||
chat | ||
chat_loaders | ||
document_loaders | ||
document_transformers | ||
llms | ||
memory | ||
platforms | ||
providers | ||
retrievers | ||
text_embedding | ||
toolkits | ||
tools | ||
vectorstores |