Commit Graph

1964 Commits

Author SHA1 Message Date
Erick Friis
86b3c6e81c community: make old stub for QuerySQLDataBaseTool private to skip api ref (#28711) 2024-12-13 10:43:23 -08:00
Martin Triska
05ebe1e66b Community: add modified_since argument to O365BaseLoader (#28708)
## What are we doing in this PR
We're adding `modified_since` optional argument to `O365BaseLoader`.
When set, O365 loader will only load documents newer than
`modified_since` datetime.

## Why?
OneDrives / Sharepoints can contain large number of documents. Current
approach is to download and parse all files and let indexer to deal with
duplicates. This can be prohibitively time-consuming. Especially when
using OCR-based parser like
[zerox](fa06188834/libs/community/langchain_community/document_loaders/pdf.py (L948)).
This argument allows to skip documents that are older than known time of
indexing.

_Q: What if a file was modfied during last indexing process?
A: Users can set the `modified_since` conservatively and indexer will
still take care of duplicates._




If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 17:30:17 +00:00
Bagatur
fa06188834 community[patch]: fix QuerySQLDatabaseTool name (#28659)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-12 19:16:03 -08:00
Michael Chin
28cb2cefc6 docs: Fix stack diagram in community README (#28685)
- **Description:** The stack diagram illustration in the community
README fails to render due to an invalid branch reference. This PR
replaces the broken image link with a valid one referencing master
branch.
2024-12-12 13:33:50 -08:00
Botong Zhu
13c3c4a210 community: fixes json loader not getting texts with json standard (#27327)
This PR fixes JSONLoader._get_text not converting objects to json string
correctly.
If an object is serializable and is not a dict, JSONLoader will use
python built-in str() method to convert it to string. This may cause
object converted to strings not following json standard. For example, a
list will be converted to string with single quotes, and if json.loads
try to load this string, it will cause error.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 19:33:45 +00:00
Lorenzo
4149c0dd8d community: add method to create branch and list files for gitlab tool (#27883)
### About

- **Description:** In the Gitlab utilities used for the Gitlab tool
there are no methods to create branches, list branches and files, as
this is already done for Github
- **Issue:** None
- **Dependencies:** None

This Pull request add the methods:
- create_branch
- list_branches_in_repo
- set_active_branch
- list_files_in_main_branch
- list_files_in_bot_branch
- list_files_from_directory

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 19:11:35 +00:00
Prathamesh Nimkar
ca054ed1b1 community: ChatSnowflakeCortex - Add streaming functionality (#27753)
Description:
snowflake.py
Add _stream and _stream_content methods to enable streaming
functionality
fix pydantic issues and added functionality with the overall langchain
version upgrade
added bind_tools method for agentic workflows support through langgraph
updated the _generate method to account for agentic workflows support
through langgraph
cosmetic changes to comments and if conditions

snowflake.ipynb
Added _stream example
cosmetic changes to comments
fixed lint errors

check_pydantic.sh
Decreased counter from 126 to 125 as suggested when formatting

---------

Co-authored-by: Prathamesh Nimkar <prathamesh.nimkar@snowflake.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 18:35:40 -08:00
Lakindu Boteju
5a31792bf1 community: Add support for cross-region inference profile IDs in Bedrock Anthropic Claude token cost calculation (#28167)
This change modifies the token cost calculation logic to support
cross-region inference profile IDs for Anthropic Claude models. Instead
of explicitly listing all regional variants of new inference profile IDs
in the cost dictionaries, the code now extracts a base model ID from the
input model ID (or inference profile ID), making it more maintainable
and automatically supporting new regional variants.

These inference profile IDs follow the format:
`<region>.<vendor>.<model-name>` (e.g.,
`us.anthropic.claude-3-haiku-xxx`, `eu.anthropic.claude-3-sonnet-xxx`).

Cross-region inference profiles are system-defined identifiers that
enable distributing model inference requests across multiple AWS
regions. They help manage unplanned traffic bursts and enhance
resilience during peak demands without additional routing costs.

References for Amazon Bedrock's cross-region inference profiles:-
-
https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html
-
https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 02:33:50 +00:00
fatmelon
d1e0ec7b55 community: VectorStores: Azure Cosmos DB Mongo vCore with DiskANN (#27329)
# Description
Add a new vector index type `diskann` to Azure Cosmos DB Mongo vCore
vector store. Paper of DiskANN can be found here [DiskANN: Fast Accurate
Billion-point Nearest Neighbor Search on a Single
Node](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf).

## Sample Usage
```python
from pymongo import MongoClient

# INDEX_NAME = "izzy-test-index-2"
# NAMESPACE = "izzy_test_db.izzy_test_collection"
# DB_NAME, COLLECTION_NAME = NAMESPACE.split(".")

client: MongoClient = MongoClient(CONNECTION_STRING)
collection = client[DB_NAME][COLLECTION_NAME]

model_deployment = os.getenv(
    "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada"
)
model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002")

vectorstore = AzureCosmosDBVectorSearch.from_documents(
    docs,
    openai_embeddings,
    collection=collection,
    index_name=INDEX_NAME,
)

# Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search
maxDegree = 40
dimensions = 1536
similarity_algorithm = CosmosDBSimilarityType.COS
kind = CosmosDBVectorSearchType.VECTOR_DISKANN
lBuild = 20

vectorstore.create_index(
            dimensions=dimensions,
            similarity=similarity_algorithm,
            kind=kind ,
            max_degree=maxDegree,
            l_build=lBuild,
        )
```

## Dependencies
No additional dependencies were added

---------

Co-authored-by: Yang Qiao (from Dev Box) <yangqiao@microsoft.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 01:54:04 +00:00
manukychen
ba9b95cd23 Community: Adding bulk_size as a setable param for OpenSearchVectorSearch (#28325)
Description:
When using langchain.retrievers.parent_document_retriever.py with
vectorstore is OpenSearchVectorSearch, I found that the bulk_size param
I passed into OpenSearchVectorSearch class did not work on my
ParentDocumentRetriever.add_documents() function correctly, it will be
overwrite with int 500 the function which OpenSearchVectorSearch class
had (e.g., add_texts(), add_embeddings()...).

So I made this PR requset to fix this, thanks!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 01:45:22 +00:00
Thomas van Dongen
ee640d6bd3 community: fixed bug in model2vec embedding code (#28670)
This PR fixes a bug with the current implementation for Model2Vec
embeddings where `embed_documents` does not work as expected.

- **Description**: the current implementation uses `encode_as_sequence`
for encoding documents. This is incorrect, as `encode_as_sequence`
creates token embeddings and not mean embeddings. The normal `encode`
function handles both single and batched inputs and should be used
instead. The return type was also incorrect, as encode returns a NumPy
array. This PR converts the embedding to a list so that the output is
consistent with the Embeddings ABC.
2024-12-11 15:50:56 -08:00
Brian Sharon
b20230c800 community: use correct id_key when deleting by id in LanceDB wrapper (#28655)
- **Description:** The current version of the `delete` method assumes
that the id field will always be called `id`.
- **Issue:** n/a
- **Dependencies:** n/a
- **Twitter handle:** ugh, Twitter :D 

---

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 23:49:35 +00:00
Mohammad Mohtashim
fa155a422f [Community]: requests_kwargs not being used in _fetch (#28646)
- **Description:** `requests_kwargs` is not being passed to `_fetch`
which is fetching pages asynchronously. In this PR, making sure that we
are passing `requests_kwargs` to `_fetch` just like `_scrape`.
- **Issue:** #28634

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 23:46:54 +00:00
Vincent Zhang
df5008fe55 community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators (#28207)
## Description
We are submitting as a team of four for a project. Other team members
are @RuofanChen03, @LikeWang10067, @TANYAL77.

This pull requests expands the filtering capabilities of the FAISS
vectorstore by adding MongoDB-style query operators indicated as
follows, while including comprehensive testing for the added
functionality.
- $eq (equals)
- $neq (not equals)
- $gt (greater than)
- $lt (less than)
- $gte (greater than or equal)
- $lte (less than or equal)
- $in (membership in list)
- $nin (not in list)
- $and (all conditions must match)
- $or (any condition must match)
- $not (negation of condition)


## Issue
This closes https://github.com/langchain-ai/langchain/issues/26379.


## Sample Usage
```python
import faiss
import asyncio
from langchain_community.vectorstores import FAISS
from langchain.schema import Document
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
documents = [
    Document(page_content="Process customer refund request", metadata={"schema_type": "financial", "handler_type": "refund",}),
    Document(page_content="Update customer shipping address", metadata={"schema_type": "customer", "handler_type": "update",}),
    Document(page_content="Process payment transaction", metadata={"schema_type": "financial", "handler_type": "payment",}),
    Document(page_content="Handle customer complaint", metadata={"schema_type": "customer","handler_type": "complaint",}),
    Document(page_content="Process invoice payment", metadata={"schema_type": "financial","handler_type": "payment",})
]

async def search(vectorstore, query, schema_type, handler_type, k=2):
    schema_filter = {"schema_type": {"$eq": schema_type}}
    handler_filter = {"handler_type": {"$eq": handler_type}}
    combined_filter = {
        "$and": [
            schema_filter,
            handler_filter,
        ]
    }
    base_retriever = vectorstore.as_retriever(
        search_kwargs={"k":k, "filter":combined_filter}
    )
    return await base_retriever.ainvoke(query)

async def main():
    vectorstore = FAISS.from_texts(
        texts=[doc.page_content for doc in documents],
        embedding=embeddings,
        metadatas=[doc.metadata for doc in documents]
    )
    
    def printt(title, documents):
        print(title)
        if not documents:
            print("\tNo documents found.")
            return
        for doc in documents:
            print(f"\t{doc.page_content}. {doc.metadata}")

    printt("Documents:", documents)
    printt('\nquery="process payment", schema_type="financial", handler_type="payment":', await search(vectorstore, query="process payment", schema_type="financial", handler_type="payment", k=2))
    printt('\nquery="customer update", schema_type="customer", handler_type="update":', await search(vectorstore, query="customer update", schema_type="customer", handler_type="update", k=2))
    printt('\nquery="refund process", schema_type="financial", handler_type="refund":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="refund", k=2))
    printt('\nquery="refund process", schema_type="financial", handler_type="foobar":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="foobar", k=2))
    print()

if __name__ == "__main__":asyncio.run(main())
```

## Output
```
Documents:
	Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'}
	Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'}
	Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'}
	Handle customer complaint. {'schema_type': 'customer', 'handler_type': 'complaint'}
	Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'}

query="process payment", schema_type="financial", handler_type="payment":
	Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'}
	Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'}

query="customer update", schema_type="customer", handler_type="update":
	Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'}

query="refund process", schema_type="financial", handler_type="refund":
	Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'}

query="refund process", schema_type="financial", handler_type="foobar":
	No documents found.

```

---------

Co-authored-by: ruofan chen <ruofan.is.awesome@gmail.com>
Co-authored-by: RickyCowboy <like.wang@mail.utoronto.ca>
Co-authored-by: Shanni Li <tanya.li@mail.utoronto.ca>
Co-authored-by: RuofanChen03 <114096642+ruofanchen03@users.noreply.github.com>
Co-authored-by: Like Wang <102838708+likewang10067@users.noreply.github.com>
2024-12-11 17:52:22 -05:00
like
3048a9a26d community: tongyi multimodal response format fix to support langchain (#28645)
Description: The multimodal(tongyi) response format "message": {"role":
"assistant", "content": [{"text": "图像"}]}}]} is not compatible with
LangChain.
Dependencies: No

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 21:13:26 +00:00
Bagatur
d0e662e43b community[patch]: Release 0.3.11 (#28658) 2024-12-10 20:51:13 +00:00
Bagatur
e6a62d8422 core,langchain,community[patch]: allow langsmith 0.2 (#28598) 2024-12-10 18:50:58 +00:00
Johannes Mohren
c1d348e95d doc-loader: retain Azure Doc Intelligence API metadata in Document parser (#28382)
**Description**:
This PR modifies the doc_intelligence.py parser in the community package
to include all metadata returned by the Azure Doc Intelligence API in
the Document object. Previously, only the parsed content (markdown) was
retained, while other important metadata such as bounding boxes (bboxes)
for images and tables was discarded. These image bboxes are crucial for
supporting use cases like multi-modal RAG workflows when using Azure Doc
Intelligence.

The change ensures that all information returned by the Azure Doc
Intelligence API is preserved by setting the metadata attribute of the
Document object to the entire result returned by the API, rather than an
empty dictionary. This extends the parser's utility for complex use
cases without breaking existing functionality.

**Issue**:
This change does not address a specific issue number, but it resolves a
critical limitation in supporting multimodal workflows when using the
LangChain wrapper for the Azure API.

**Dependencies**:
No additional dependencies are required for this change.

---------

Co-authored-by: jmohren <johannes.mohren@aol.de>
2024-12-10 11:22:58 -05:00
Alex Tonkonozhenko
0d20c314dd Confluence Loader: Fix CQL loading (#27620)
fix #12082

<!---
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
-->
2024-12-10 11:05:23 -05:00
Katarina Supe
aba2711e7f community: update Memgraph integration (#27017)
**Description:**
- **Memgraph** no longer relies on `Neo4jGraphStore` but **implements
`GraphStore`**, just like other graph databases.
- **Memgraph** no longer relies on `GraphQAChain`, but implements
`MemgraphQAChain`, just like other graph databases.
- The refresh schema procedure has been updated to try using `SHOW
SCHEMA INFO`. The fallback uses Cypher queries (a combination of schema
and Cypher) → **LangChain integration no longer relies on MAGE
library**.
- The **schema structure** has been reformatted. Regardless of the
procedures used to get schema, schema structure is the same.
- The `add_graph_documents()` method has been implemented. It transforms
`GraphDocument` into Cypher queries and creates a graph in Memgraph. It
implements the ability to use `baseEntityLabel` to improve speed
(`baseEntityLabel` has an index on the `id` property). It also
implements the ability to include sources by creating a `MENTIONS`
relationship to the source document.
- Jupyter Notebook for Memgraph has been updated.
- **Issue:** /
- **Dependencies:** /
- **Twitter handle:** supe_katarina (DX Engineer @ Memgraph)

Closes #25606
2024-12-10 10:57:21 -05:00
TamagoTorisugi
0f0df2df60 fix: Set default search_type to 'similarity' in as_retriever method of AzureSearch (#28376)
**Description**
This PR updates the `as_retriever` method in the `AzureSearch` to ensure
that the `search_type` parameter defaults to 'similarity' when not
explicitly provided.

Previously, if the `search_type` was omitted, it did not default to any
specific value. So it was inherited from
`AzureSearchVectorStoreRetriever`, which defaults to 'hybrid'.

This change ensures that the intended default behavior aligns with the
expected usage.

**Issue**
No specific issue was found related to this change.

**Dependencies**
No new dependencies are introduced with this change.

---------

Co-authored-by: prrao87 <prrao87@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 03:40:04 +00:00
Prashanth Rao
8c6eec5f25 community: KuzuGraph needs allow_dangerous_requests, add graph documents via LLMGraphTransformer (#27949)
- [x] **PR title**: "community: Kuzu - Add graph documents via
LLMGraphTransformer"
- This PR adds a new method `add_graph_documents` to use the
`GraphDocument`s extracted by `LLMGraphTransformer` and store in a Kùzu
graph backend.
- This allows users to transform unstructured text into a graph that
uses Kùzu as the graph store.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: pookam90 <pookam@microsoft.com>
Co-authored-by: Pooja Kamath <60406274+Pookam90@users.noreply.github.com>
Co-authored-by: hsm207 <hsm207@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 03:15:28 +00:00
Arnav Priyadarshi
b78b2f7a28 community[fix]: Update Perplexity to pass parameters into API calls (#28421)
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- **Description:** I realized the invocation parameters were not being
passed into `_generate` so I added those in but then realized that the
parameters contained some old fields designed for an older openai client
which I removed. Parameters work fine now.
- **Issue:** Fixes #28229 
- **Dependencies:** No new dependencies.  
- **Twitter handle:** @arch_plane

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 00:23:31 +00:00
Amir Sadeghi
2c49f587aa community[fix]: could not locate runnable browser (#28289)
set open_browser to false to resolve "could not locate runnable browser"
error while default browser is None

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 21:05:52 +00:00
Martin Triska
75bc6bb191 community: [bugfix] fix source path for office files in O365 (#28260)
# What problem are we fixing?

Currently documents loaded using `O365BaseLoader` fetch source from
`file.web_url` (where `file` is `<class 'O365.drive.File'>`). This works
well for `.pdf` documents. Unfortunately office documents (`.xlsx`,
`.docx` ...) pass their `web_url` in following format:

`https://sharepoint_address/sites/path/to/library/root/Doc.aspx?sourcedoc=%XXXXXXXX-1111-1111-XXXX-XXXXXXXXXX%7D&file=filename.xlsx&action=default&mobileredirect=true`

This obfuscates the path to the file. This PR utilizes the parrent
folder's path and file name to reconstruct the actual location of the
file. Knowing the file's location can be crucial for some RAG
applications (path to the file can carry information we don't want to
loose).

@vbarda Could you please look at this one? I'm @-mentioning you since
we've already closed some PRs together :-)

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 12:34:59 -08:00
Naka Masato
ce3b69aa05 community: add include_labels option to ConfluenceLoader (#28259)
## **Description:**

Enable `ConfluenceLoader` to include labels with `include_labels` option
(`false` by default for backward compatibility). and the labels are set
to `metadata` in the `Document`. e.g. `{"labels": ["l1", "l2"]}`

## Notes

Confluence API supports to get labels by providing `metadata.labels` to
`expand` query parameter

All of the following functions support `expand` in the same way:
- confluence.get_page_by_id
- confluence.get_all_pages_by_label
- confluence.get_all_pages_from_space
- cql (internally using
[/api/content/search](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content/#api-wiki-rest-api-content-search-get))

## **Issue:**

No issue related to this PR.

## **Dependencies:** 

No changes.

## **Twitter handle:** 

[@gymnstcs](https://x.com/gymnstcs)


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 19:35:01 +00:00
Rajendra Kadam
242fee11be community[minor] Pebblo: Support for new Pinecone class PineconeVectorStore (#28253)
- **Description:** Support for new Pinecone class PineconeVectorStore in
PebbloRetrievalQA.
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** -

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 19:33:54 +00:00
maang-h
b64d846347 docs: Standardize MoonshotChat docstring (#28159)
- **Description:** Add docstring

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 18:46:25 +00:00
WGNW_MG
eabe587787 community[patch]:Fix for get_openai_callback() return token_cost=0.0 when model is gpt-4o-11-20 (#28408)
- **Description:** update MODEL_COST_PER_1K_TOKENS for new gpt-4o-11-20.
- **Issue:** with latest gpt-4o-11-20, openai callback return
token_cost=0.0
- **Dependencies:** None (just simple dict fix.)
- **Twitter handle:** I Don't Use Twitter. 
- (However..., I have a YouTube channel. Could you upload this there, by
any chance?
https://www.youtube.com/@%EA%B2%9C%EC%B0%BD%EB%B6%80%EA%B3%A0%EB%AC%B8AI%EC%9E%90%EB%AC%B8%EC%84%BC%EC%84%B8)
2024-12-08 20:46:50 -08:00
Abhinav
317a38b83e community[minor]: Add support for modle2vec embeddings (#28507)
This PR add an embeddings integration for model2vec, the
`Model2vecEmbeddings` class.

- **Description**: [Model2Vec](https://github.com/MinishLab/model2vec)
lets you turn any sentence transformer into a really small static model
and makes running the model faster.
- **Issue**:
- **Dependencies**: model2vec
([pypi](https://pypi.org/project/model2vec/))
- **Twitter handle:**:

- [x] **Add tests and docs**: 
-
[Test](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/libs/community/langchain_community/embeddings/model2vec.py),
[docs](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/docs/docs/integrations/text_embedding/model2vec.ipynb)

- [x] **Lint and test**:

---------

Co-authored-by: Abhinav KM <abhinav.m@zerone-consulting.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-12-09 02:17:22 +00:00
Mohammad Mohtashim
524ee6d9ac Invalid tool_choice being passed to ChatLiteLLM (#28198)
- **Description:** Invalid `tool_choice` is given to `ChatLiteLLM` to
`bind_tools` due to it's parent's class default value being pass through
`with_structured_output`.
- **Issue:** #28176
2024-12-07 14:33:40 -05:00
Erick Friis
07c2ac765a community: release 0.3.10 (#28600) 2024-12-07 00:07:13 +00:00
ccurme
2c6bc74cb1 multiple: combine sync/async vector store standard test suites (#28580)
Breaking change in `langchain-tests`.
2024-12-06 14:55:06 -05:00
cinqisap
482e8a7855 community: Add support for SAP HANA Vector hnsw index creation (#27884)
**Issue:** Added support for creating indexes in the SAP HANA Vector
engine.
 
**Changes**: 
1. Introduced a new function `create_hnsw_index` in `hanavector.py` that
enables the creation of indexes for SAP HANA Vector.
2. Added integration tests for the index creation function to ensure
functionality.
3. Updated the documentation to reflect the new index creation feature,
including examples and output from the notebook.
4. Fix the operator issue in ` _process_filter_object` function and
change the array argument to a placeholder in the similarity search SQL
statement.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-05 23:29:08 +00:00
Erick Friis
e6a08355a3 docs: more api ref links, add linting step to prevent more (#28495) 2024-12-04 04:19:42 +00:00
wlleiiwang
6151ea78d5 community: implement _select_relevance_score_fn for tencent vectordb (#28036)
implement _select_relevance_score_fn for tencent vectordb
fix use external embedding for tencent vectordb

Co-authored-by: wlleiiwang <wlleiiwang@tencent.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-04 03:03:00 +00:00
Asi Greenholts
d34bf78f3b community: BM25Retriever preservation of document id (#27019)
Currently this retriever discards document ids

---------

Co-authored-by: asi-cider <88270351+asi-cider@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-04 00:36:00 +00:00
peterdhp
bc5ec63d67 community : allow using apikey for PubMedAPIWrapper (#27246)
**Description**: 

> Without an API key, any site (IP address) posting more than 3 requests
per second to the E-utilities will receive an error message. By
including an API key, a site can post up to 10 requests per second by
default.

quoted from A General Introduction to the E-utilities,NCBI :
https://www.ncbi.nlm.nih.gov/books/NBK25497/

I have simply added a api_key parameter to the PubMedAPIWrapper that can
be used to increase the number of requests per second from 3 to 10.

**Twitter handle** : @KORmaori

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-03 16:21:22 -08:00
William Smith
15e7353168 langchain_community: updated query constructor for Databricks Vector Search due to LangChainDeprecationWarning: filters was deprecated since langchain-community 0.2.11 and will be removed in 0.3. Please use filter instead. (#27974)
- **Description:** Updated the kwargs for the structured query from
filters to filter due to deprecation of 'filters' for Databricks Vector
Search. Also changed the error messages as the allowed operators and
comparators are different which can cause issues with functions such as
get_query_constructor_prompt()

- **Issue:** Fixes the Key Error for filters due to deprecation in favor
for 'filter':

LangChainDeprecationWarning: DatabricksVectorSearch received a key
`filters` in search_kwargs. `filters` was deprecated since
langchain-community 0.2.11 and will be removed in 0.3. Please use
`filter` instead.

- **Dependencies:** N/A
- **Twitter handle:** N/A

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-03 16:03:53 -08:00
Jan Heimes
ef365543cb community: add Needle retriever and document loader integration (#28157)
- [x] **PR title**: "community: add Needle retriever and document loader
integration"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"

- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** This PR adds a new integration for Needle, which
includes:
- **NeedleRetriever**: A retriever for fetching documents from Needle
collections.
- **NeedleLoader**: A document loader for managing and loading documents
into Needle collections.
      - Example notebooks demonstrating usage have been added in:
        - `docs/docs/integrations/retrievers/needle.ipynb`
        - `docs/docs/integrations/document_loaders/needle.ipynb`.
- **Dependencies:** The `needle-python` package is required as an
external dependency for accessing Needle's API. It has been added to the
extended testing dependencies list.
- **Twitter handle:** Feel free to mention me if this PR gets announced:
[needlexai](https://x.com/NeedlexAI).

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. Unit tests have been added for both `NeedleRetriever` and
`NeedleLoader` in `libs/community/tests/unit_tests`. These tests mock
API calls to avoid relying on network access.
2. Example notebooks have been added to `docs/docs/integrations/`,
showcasing both retriever and loader functionality.

- [x] **Lint and test**: Run `make format`, `make lint`, and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
  - `make format`: Passed
  - `make lint`: Passed
- `make test`: Passed (requires `needle-python` to be installed locally;
this package is not added to LangChain dependencies).

Additional guidelines:
- [x] Optional dependencies are imported only within functions.
- [x] No dependencies have been added to pyproject.toml files except for
those required for unit tests.
- [x] The PR does not touch more than one package.
- [x] Changes are fully backwards compatible.
- [x] Community additions are not re-imported into LangChain core.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-03 22:06:25 +00:00
Audrey Sage Lorberfeld
6b7e93d4c7 pinecone: update pinecone client (#28320)
This PR updates the Pinecone client to `5.4.0`, as well as its
dependencies (`pinecone-plugin-inference` and
`pinecone-plugin-interface`).

Note: `pinecone-client` is now simply called `pinecone`.

**Question for reviewer(s):** should this PR also update the `pinecone`
dep in [the root dir's `poetry.lock`
file](https://github.com/langchain-ai/langchain/blob/master/poetry.lock#L6729)?
Was unsure. (I don't believe so b/c it seems pinned to a lower version
likely based on 3rd-party deps (e.g. Unstructured).)

--
TW: @audrey_sage_


---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1208693659122374
2024-12-02 22:47:09 -08:00
lucasiscovici
60021e54b5 community: Add the additonnal kward 'context' for openai (#28351)
- **Description:** 
Add the additonnal kward 'context' for openai into
`convert_dict_to_message` and `convert_message_to_dict` functions.
2024-12-02 16:43:30 -05:00
Bagatur
49914e959a community[patch]: Release 0.3.9 (#28451) 2024-12-02 16:23:37 +00:00
Greg Hinch
5141f25a20 community[patch]: support numpy2 (#28184)
Follows on from #27991, updates the langchain-community package to
support numpy 2 versions

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-11-27 11:10:58 -05:00
LuisMSotamba
0901f11b0f community: add truncation params when an openai assistant's run is created (#28158)
**Description:** When an OpenAI assistant is invoked, it creates a run
by default, allowing users to set only a few request fields. The
truncation strategy is set to auto, which includes previous messages in
the thread along with the current question until the context length is
reached. This causes token usage to grow incrementally:
consumed_tokens = previous_consumed_tokens + current_consumed_tokens.

This PR adds support for user-defined truncation strategies, giving
better control over token consumption.

**Issue:** High token consumption.
2024-11-27 10:53:53 -05:00
ccurme
bb83abd037 community[patch]: remove sqlalchemy cap (#28389) 2024-11-27 10:20:36 -05:00
Mohammad Mohtashim
06fafc6651 Community: Marqo Index Setting GET Request Updated according to 2.x API version while keep backward compatability for 1.5.x (#28342)
- **Description:** `add_texts` was using `get_setting` for marqo client
which was being used according to 1.5.x API version. However, this PR
updates the `add_text` accounting for updated response payload for 2.x
and later while maintaining backward compatibility. Plus I have verified
this was the only place where marqo client was not accounting for
updated API version.
  - **Issue:** #28323

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-11-26 18:26:56 +00:00
Richard Hao
c161f7d46f docs(create_sql_agent): fix reStructured Text Markup (#28356)
- **Description:** Lines of code must be indented beneath `..
code-block::` for proper formatting.
https://devguide.python.org/documentation/markup/#showing-code-examples
- **Issue:** The example code block on the `create_sql_agent` document
page is not properly rendered.

https://python.langchain.com/api_reference/community/agent_toolkits/langchain_community.agent_toolkits.sql.base.create_sql_agent.html#langchain_community.agent_toolkits.sql.base.create_sql_agent

<img width="933" alt="image"
src="https://github.com/user-attachments/assets/d764bcad-e412-408b-ab0b-9a78a11188ee">
2024-11-26 09:52:16 -05:00
Mohammad Mohtashim
195ae7baa3 Community: Adding citations in AIMessage for ChatPerplexity (#28321)
**Description**: Adding Citation in response payload of ChatPerplexity
**Issue**: #28108
2024-11-26 09:45:47 -05:00
ccurme
a5374952f8 community[patch]: fix import in test (#28339)
Library name was updated after
https://github.com/langchain-ai/langchain/pull/27879 branched off
master.
2024-11-25 19:28:01 +00:00