Commit Graph

4873 Commits

Author SHA1 Message Date
Martin Triska
7a9149f5dd community: ZeroxPDFLoader (#27800)
# OCR-based PDF loader

This implements [Zerox](https://github.com/getomni-ai/zerox) PDF
document loader.
Zerox utilizes simple but very powerful (even though slower and more
costly) approach to parsing PDF documents: it converts PDF to series of
images and passes it to a vision model requesting the contents in
markdown.

It is especially suitable for complex PDFs that are not parsed well by
other alternatives.

## Example use:
```python
from langchain_community.document_loaders.pdf import ZeroxPDFLoader

os.environ["OPENAI_API_KEY"] = "" ## your-api-key

model = "gpt-4o-mini" ## openai model
pdf_url = "https://assets.ctfassets.net/f1df9zr7wr1a/soP1fjvG1Wu66HJhu3FBS/034d6ca48edb119ae77dec5ce01a8612/OpenAI_Sacra_Teardown.pdf"

loader = ZeroxPDFLoader(file_path=pdf_url, model=model)
docs = loader.load()
```

The Zerox library supports wide range of provides/models. See Zerox
documentation for details.

- **Dependencies:** `zerox`
- **Twitter handle:** @martintriska1

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-11-07 03:14:57 +00:00
Dmitriy Prokopchuk
53b0a99f37 community: Memcached LLM Cache Integration (#27323)
## Description
This PR adds support for Memcached as a usable LLM model cache by adding
the ```MemcachedCache``` implementation relying on the
[pymemcache](https://github.com/pinterest/pymemcache) client.

Unit test-wise, the new integration is generally covered under existing
import testing. All new functionality depends on pymemcache if
instantiated and used, so to comply with the other cache implementations
the PR also adds optional integration tests for ```MemcachedCache```.

Since this is a new integration, documentation is added for Memcached as
an integration and as an LLM Cache.

## Issue
This PR closes #27275 which was originally raised as a discussion in
#27035

## Dependencies
There are no new required dependencies for langchain, but
[pymemcache](https://github.com/pinterest/pymemcache) is required to
instantiate the new ```MemcachedCache```.

## Example Usage
```python3
from langchain.globals import set_llm_cache
from langchain_openai import OpenAI

from langchain_community.cache import MemcachedCache
from pymemcache.client.base import Client

llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2)
set_llm_cache(MemcachedCache(Client('localhost')))

# The first time, it is not yet in cache, so it should take longer
llm.invoke("Which city is the most crowded city in the USA?")

# The second time it is, so it goes faster
llm.invoke("Which city is the most crowded city in the USA?")
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-07 03:07:59 +00:00
Siddharth Murching
cfff2a057e community: Update UC toolkit documentation to use LangGraph APIs (#26778)
- **Description:** Update UC toolkit documentation to show an example of
using recommended LangGraph agent APIs before the existing LangChain
AgentExecutor example. Tested by manually running the updated example
notebook
- **Dependencies:** No new dependencies

---------

Signed-off-by: Sid Murching <sid.murching@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-07 02:47:41 +00:00
Baptiste Pasquier
81f7daa458 community: add InfinityRerank (#27043)
**Description:** 

- Add a Reranker for Infinity server.

**Dependencies:** 

This wrapper uses
[infinity_client](https://github.com/michaelfeil/infinity/tree/main/libs/client_infinity/infinity_client)
to connect to an Infinity server.

**Tests and docs**

- integration test: test_infinity_rerank.py
- example notebook: infinity_rerank.ipynb
[here](https://github.com/baptiste-pasquier/langchain/blob/feat/infinity-rerank/docs/docs/integrations/document_transformers/infinity_rerank.ipynb)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-06 17:26:30 -08:00
Martin Triska
90189f5639 community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716)
## What this PR does?

### Currently `O365BaseLoader` (and consequently both derived loaders)
are limited to `pdf`, `doc`, `docx` files.
- **Solution: here we introduce _handlers_ attribute that allows for
custom handlers to be passed in. This is done in _dict_ form:**

**Example:**
```python
from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser
# PR for DocumentLoaderAsParser here: https://github.com/langchain-ai/langchain/pull/27749
from langchain_community.document_loaders.excel import UnstructuredExcelLoader

xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged")

# create dictionary mapping file types to handlers (parsers)
handlers = {
    "doc": MsWordParser()
    "pdf": PDFMinerParser()
    "txt": TextParser()
    "xlsx": xlsx_parser
}
loader = SharePointLoader(document_library_id="...",
                            handlers=handlers # pass handlers to SharePointLoader
                            )
documents = loader.load()

# works the same in OneDriveLoader
loader = OneDriveLoader(document_library_id="...",
                            handlers=handlers
                            )
```
This dictionary is then passed to `MimeTypeBasedParser` same as in the
[current
implementation](5a2cfb49e0/libs/community/langchain_community/document_loaders/parsers/registry.py (L13)).


### Currently `SharePointLoader` and `OneDriveLoader` are separate
loaders that both inherit from `O365BaseLoader`
However both of these implement the same functionality. The only
differences are:
- `SharePointLoader` requires argument `document_library_id` whereas
`OneDriveLoader` requires `drive_id`. These are just different names for
the same thing.
  - `SharePointLoader` implements significantly more features.
- **Solution: `OneDriveLoader` is replaced with an empty shell just
renaming `drive_id` to `document_library_id` and inheriting from
`SharePointLoader`**

**Dependencies:** None
**Twitter handle:** @martintriska1

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-11-06 17:44:34 -05:00
murrlincoln
14f1827953 docs: Adding notebook for cdp agentkit toolkit (#27910)
- **Description:** Adding in the first pass of documentation for the CDP
Agentkit Toolkit
    - **Issue:** N/a
    - **Dependencies:** cdp-langchain
    - **Twitter handle:** @CoinbaseDev

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: John Peterson <john.peterson@coinbase.com>
2024-11-06 13:28:27 -08:00
Hammad Randhawa
75aa82fedc docs: Completed sentence under the heading "Instantiating a Browser … (#27944)
…Toolkit" in "playwright.ipynb" integration.

- Completed the incomplete sentence in the Langchain Playwright
documentation.

- Enhanced documentation clarity to guide users on best practices for
instantiating browser instances with Langchain Playwright.

Example before:
> "It's always recommended to instantiate using the from_browser method
so that the

Example after:
> "It's always recommended to instantiate using the `from_browser`
method so that the browser context is properly initialized and managed,
ensuring seamless interaction and resource optimization."

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-06 19:55:00 +00:00
ccurme
66966a6e72 openai[patch]: release 0.2.6 (#27924)
Some additions in support of [predicted
outputs](https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs)
feature:
- Bump openai sdk version
- Add integration test
- Add example to integration docs

The `prediction` kwarg is already plumbed through model invocation.
2024-11-05 23:02:24 +00:00
Tomaz Bratanic
a3bbbe6a86 update llm graph transformer documentation (#27905) 2024-11-05 11:54:26 -05:00
Bagatur
6973f7214f docs: sidebar capitalization (#27894) 2024-11-04 22:09:32 +00:00
Ofer Mendelevitch
d7c39e6dbb community: update Vectara integration (#27869)
Thank you for contributing to LangChain!

- **Description:** Updated Vectara integration
- **Issue:** refresh on descriptions across all demos and added UDF
reranker
- **Dependencies:** None
- **Twitter handle:** @ofermend

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-04 20:40:39 +00:00
Daniel Vu Dao
5745f3bf78 docs: Update messages.mdx (#27856)
### Description
Updates phrasing for the header of the `Messages` section.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-04 20:36:31 +00:00
sifatj
e02a5ee03e docs: Update VectorStore as_retriever method url in qa_chat_history_how_to.ipynb (#27844)
**Description**: Update VectorStore `as_retriever` method api reference
url in `qa_chat_history_how_to.ipynb`

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-04 20:34:50 +00:00
sifatj
dd1711f3c2 docs: Update max_marginal_relevance_search api reference url in multi_vector.ipynb (#27843)
**Description**: Update VectorStore `max_marginal_relevance_search` api
reference url in `multi_vector.ipynb`

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-04 20:31:36 +00:00
sifatj
aa1f46a03a docs: Update VectorStore .as_retriever method url in vectorstore_retriever.ipynb (#27842)
**Description**: Update VectorStore `.as_retriever` method url in
`vectorstore_retriever.ipynb`

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-04 20:28:11 +00:00
sifatj
eecf95df9b docs: Update VectorStore api reference url in rag.ipynb (#27841)
**Description**: Update VectorStore api reference url in `rag.ipynb`

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-04 20:27:03 +00:00
sifatj
50563400fb docs: Update broken vectorstore urls in retrievers.ipynb (#27838)
**Description**: Update outdated `VectorStore` api reference urls in
`retrievers.ipynb`

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-04 20:26:03 +00:00
Rashmi Pawar
f86a09f82c Add nvidia as provider for embedding, llm (#27810)
Documentation: Add NVIDIA as integration provider

cc: @mattf @dglogo

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-11-04 19:45:51 +00:00
ono-hiroki
b7d549ae88 docs: fix undefined 'data' variable in document_loader_csv.ipynb (#27872)
**Description:** 
This PR addresses an issue in the CSVLoader example where data is not
defined, causing a NameError. The line `data = loader.load()` is added
to correctly assign the output of loader.load() to the data variable.
2024-11-04 14:10:56 +00:00
Erick Friis
9fedb04dd3 docs: INVALID_CHAT_HISTORY redirect (#27845) 2024-11-01 21:35:11 +00:00
Erick Friis
03a3670a5e infra: remove some special cases (#27839) 2024-11-01 21:13:43 +00:00
Prithvi Kannan
c3c638cd7b docs: Reference new databricks-langchain package (#27828)
Thank you for contributing to LangChain!

Update references in Databricks integration page to reference our new
partner package databricks-langchain
https://github.com/databricks/databricks-ai-bridge/tree/main/integrations/langchain

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
2024-11-01 10:21:19 -07:00
sifatj
33d445550e docs: update VectorStore api reference url in retrievers.ipynb (#27814)
**Description:** Update outdated `VectorStore` api reference url in
Vector store subsection of `retrievers.ipynb`
2024-11-01 15:44:26 +00:00
sifatj
9a4a630e40 docs: Update Retrievers and Runnable links in Retrievers subsection of retrievers.ipynb (#27815)
**Description:** Update outdated links for `Retrievers` and `Runnable`
in Retrievers subsection of `retrievers.ipynb`
2024-11-01 15:42:30 +00:00
Zapiron
b0dfff4cd5 Fixed broken link for TokenTextSplitter (#27824)
Fixed the broken redirect link for `TokenTextSplitter` section
2024-11-01 11:32:07 -04:00
Eugene Yurtsev
2f6254605d docs: fix more links (#27809)
Fix more broken links
2024-10-31 17:15:46 -04:00
Eugene Yurtsev
71f590de50 docs: fix more broken links (#27806)
Fix some broken links
2024-10-31 19:46:39 +00:00
Neli Hateva
c572d663f9 docs: Ontotext GraphDB QA Chain Update Documentation (Fix versions of libraries) (#27783)
- **Description:** Update versions of libraries in the Ontotext GraphDB
QA Chain Documentation
 - **Issue:** N/A
 - **Dependencies:** N/A
 - **Twitter handle:** @OntotextGraphDB
2024-10-31 15:23:16 -04:00
Erick Friis
54cb80c778 docs: experimental case, use yq action (#27798) 2024-10-31 11:21:48 -07:00
Changyong Um
d9163e7afa community[docs]: Add content for the Lora adapter in the VLLM page. (#27788)
**Description:**
I added code for lora_request in the community package, but I forgot to
add content to the VLLM page. So, I will do that now. #27731

---------

Co-authored-by: Um Changyong <changyong.um@sfa.co.kr>
2024-10-31 12:44:35 -04:00
ccurme
0172d938b4 community: add AzureOpenAIWhisperParser (#27796)
Commandeered from https://github.com/langchain-ai/langchain/pull/26757.

---------

Co-authored-by: Sheepsta300 <128811766+Sheepsta300@users.noreply.github.com>
2024-10-31 12:37:41 -04:00
Sam Julien
0a472e2a2d community: Add Writer integration (#27646)
**Description:** Add support for Writer chat models   
**Issue:** N/A
**Dependencies:** Add `writer-sdk` to optional dependencies.
**Twitter handle:** Please tag `@samjulien` and `@Get_Writer`

**Tests and docs**
- [x] Unit test
- [x] Example notebook in `docs/docs/integrations` directory.

**Lint and test**
- [x] Run `make format` 
- [x] Run `make lint`
- [x] Run `make test`

---------

Co-authored-by: Johannes <tolstoy.work@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-30 18:06:05 +00:00
ccurme
595dc592c9 docs: run how-to guides in CI (#27615)
Add how-to guides to [Run notebooks
job](https://github.com/langchain-ai/langchain/actions/workflows/run_notebooks.yml)
and fix existing notebooks.

- As with tutorials, cassettes must be updated when HTTP calls in guides
change (by running existing
[script](https://github.com/langchain-ai/langchain/blob/master/docs/scripts/update_cassettes.sh)).
- Cassettes now total ~62mb over 474 files.
- `docs/scripts/prepare_notebooks_for_ci.py` lists a number of notebooks
that do not run (e.g., due to requiring additional infra, slowness,
requiring `input()`, etc.).
2024-10-30 12:35:38 -04:00
Yuki Watanabe
e593e017d2 Update compatibility table for ChatDatabricks (#27676)
`ChatDatabricks` added support for structured output and JSON mode in
the last release. This PR updates the feature table accordingly.

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
2024-10-30 11:56:55 -04:00
Ankur Singh
0b97135da1 fix the grammar and markdown component (#27657)
## Before

![Screenshot from 2024-10-26
08-47-29](https://github.com/user-attachments/assets/d8ccead1-3ba3-4f67-a29f-ef8b352341cf)

## After


![image](https://github.com/user-attachments/assets/78f36d54-b2d7-4164-b334-8ac41000711e)

## Typo
`(either in PR summary of in a linked issue)` => `either in PR summary
or in a linked issue`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-10-30 14:47:26 +00:00
Abdesselam Benameur
8fb6708ac4 Fix typo (missing letter) in elasticsearch_retriever.ipynb (#27639)
Fixed a small typo (added a missing "t" in ElasticsearchRetriever docs
page)


https://python.langchain.com/docs/integrations/retrievers/elasticsearch_retriever/#:~:text=It%20is%20possible%20to%20cusomize%20the%20function%20tha%20maps%20an%20Elasticsearch%20result%20(hit)%20to%20a%20LangChain%20document.
2024-10-30 14:38:39 +00:00
Martin Gullbrandson
8a5807a6b4 docs: Update Milvus documentation to correctly show how to filter in similarity_search (#27723)
### Description/Issue:

I had problems filtering when setting up a local Milvus db and noticed
that the `filter` option in the `similarity_search` and
`similarity_search_with_score` appeared to do nothing. Instead, the
`expr` option should be used.

The `expr` option is correctly used in the retriever example further
down in the documentation.

The `expr` option seems to be correctly passed on, for example
[here](447c0dd2f0/libs/community/langchain_community/vectorstores/milvus.py (L701))

### Solution:

Update the documentation for the functions mentioned to show intended
behavior.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-10-30 14:15:11 +00:00
Nawaz Haider
9d2f6701e1 DOCS: Fixed import of langchain instead of langchain_nvidia_ai_endpoints for ChatNVIDIA (#27734)
* **PR title**: "docs: Replaced langchain import with
langchain-nvidia-ai-endpoints in NVIDIA Endpoints Tab"

* **PR message**:
+ **Description:** Replaced the import of `langchain` with
`langchain-nvidia-ai-endpoints` in the NVIDIA Endpoints Tab to resolve
an error caused by the documentation attempting to import the generic
`langchain` module despite the targeted import.
	+ **Issue:** 
+ **Dependencies:** No additional dependencies introduced; simply
updated the existing import to a more specific module.
	+ **Twitter handle:** https://x.com/nawaz0x1

* **Add tests and docs**:
+ **Applicability:** Not applicable in this case, as the change is a fix
to an existing integration rather than the addition of a new one.
+ **Rationale:** No new functionality or integrations are introduced,
only a corrective import change.

* **Lint and test**:
	+ **Status:** Completed
	+ **Outcome:** 
		- `make format`: **Passed**
		- `make lint`: **Passed**
		- `make test`: **Passed**


![image](https://github.com/user-attachments/assets/fbc1b597-5083-4461-875a-d32ab8ed933c)
2024-10-30 13:57:37 +00:00
Prithvi Kannan
0433b114bb docs: Add databricks-langchain package consolidation notice (#27703)
Thank you for contributing to LangChain!

Add notice of upcoming package consolidation of `langchain-databricks`
into `databricks-langchain`.

<img width="1047" alt="image"
src="https://github.com/user-attachments/assets/18eaa394-4e82-444b-85d5-7812be322674">


Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-29 22:00:27 +00:00
Zapiron
447c0dd2f0 docs: Fixed Grammar & Improve reading (#27672)
Updated the documentation to fix some grammar errors

    - **Description:** Some language errors exist in the documentation
    - **Issue:** the issue # Changed the structure of some sentences
2024-10-29 20:19:00 +00:00
Soham Das
913ff1b152 docs: fix typo in query analysis documentation (#27721)
**PR Title**: `docs: fix typo in query analysis documentation`

**Description**: This PR corrects a typo on line 68 in the query
analysis documentation, changing **"pharsings"** to **"phrasings"** for
clarity and accuracy. Only one instance of the typo was fixed in the
last merge, and this PR fixes the second instance.

**Issue**: N/A

**Dependencies**: None

**Additional Notes**: No functional changes were made; this is a
documentation fix only.
2024-10-29 16:15:37 -04:00
Mateusz Szewczyk
0606aabfa3 docs: Added WatsonxRerank documentation (#27424)
Thank you for contributing to LangChain!

Changes:
- docs: Added `WatsonxRerank` documentation 
- docs Updated `WatsonxEmbeddings` with docs template
- docs: Updated `ChatWatsonx` with docs template
- docs: Updated `WatsonxLLM` with docs template
- docs: Added `ChatWatsonx` to list with Chat models providers. Added
[test_chat_models_standard](https://github.com/langchain-ai/langchain-ibm/blob/main/libs/ibm/tests/integration_tests/test_chat_models_standard.py)
to `langchain_ibm` tests suite.
- docs: Added `IBM` to list with Embedding models providers. Added
[test_embeddings_standard](https://github.com/langchain-ai/langchain-ibm/blob/main/libs/ibm/tests/integration_tests/test_embeddings_standard.py)
to `langchain_ibm` tests suite.
- docs: Updated `langcahin_ibm` recommended versions compatible with
`LangChain v0.3`

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-29 16:57:47 +00:00
Zapiron
9ccd4a6ffb DOC: Tutorial Section Updates (#27675)
Edited various notebooks in the tutorial section to fix:
* Grammatical Errors
* Improve Readability by changing the sentence structure or reducing
repeated words which bears the same meaning
* Edited a code block to follow the PEP 8 Standard
* Added more information in some sentences to make the concept more
clear and reduce ambiguity

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-10-29 14:51:34 +00:00
Soham Das
c3021e9322 docs: fix typo in query analysis documentation (#27697)
**PR Title**: `docs: fix typo in query analysis documentation`

**Description**: This PR corrects a typo on line 68 in the query
analysis documentation, changing **"pharsings"** to **"phrasings"** for
clarity and accuracy.

**Issue**: N/A

**Dependencies**: None

**Additional Notes**: No functional changes were made; this is a
documentation fix only.
2024-10-29 14:07:22 +00:00
Erick Friis
94e5765416 docs: packages in homepage (#27693) 2024-10-28 20:44:30 +00:00
Jorge Piedrahita Ortiz
8895d468cb community: sambastudio llm refactor (#27215)
**Description:** 
    - Sambastudio LLM refactor 
    - Sambastudio openai compatible API support added
    - docs updated
2024-10-27 11:08:15 -04:00
Erick Friis
cdb4b1980a docs: reorganize contributing docs (#27649) 2024-10-25 22:41:54 +00:00
Gabriel Faundez
ef27ce7a45 docs: add missing import for tools docs (#27650)
## Description

Added missing import from `pydantic` in the tools docs
2024-10-25 21:14:40 +00:00
Erick Friis
2683f814f4 docs: contributing index page (#27647) 2024-10-25 17:06:55 +00:00
Rashmi Pawar
83eebf549f docs: Add NVIDIA as provider in v3 integrations (#27254)
### Add NVIDIA as provider in langchain v3 integrations

cc: @sumitkbh @mattf @dglogo

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-10-25 16:21:22 +00:00