Commit Graph

581 Commits

Author SHA1 Message Date
maang-h
721f709dec
community: Improve QianfanChatEndpoint tool result to model (#24466)
- **Description:** `QianfanChatEndpoint` When using tool result to
answer questions, the content of the tool is required to be in Dict
format. Of course, this can require users to return Dict format when
calling the tool, but in order to be consistent with other Chat Models,
I think such modifications are necessary.
2024-07-22 11:29:00 -04:00
ccurme
dcba7df2fe
community[patch]: deprecate langchain_community Chroma in favor of langchain_chroma (#24474) 2024-07-22 11:00:13 -04:00
ZhangShenao
0f6737cbfe
[Vector Store] Fix function add_texts in TencentVectorDB (#24469)
Regardless of whether `embedding_func` is set or not, the 'text'
attribute of document should be assigned, otherwise the `page_content`
in the document of the final search result will be lost
2024-07-22 09:50:22 -04:00
maang-h
7b28359719
docs: Add ChatSparkLLM docstrings (#24449)
- **Description:** 
  - Add `ChatSparkLLM` docstrings, the issue #22296 
  - To support `stream` method
2024-07-19 20:19:14 -07:00
Erick Friis
f4ee3c8a22
infra: add min version testing to pr test flow (#24358)
xfailing some sql tests that do not currently work on sqlalchemy v1

#22207 was very much not sqlalchemy v1 compatible. 

Moving forward, implementations should be compatible with both to pass
CI
2024-07-19 22:03:19 +00:00
Philippe PRADOS
f5856680fe
community[minor]: add mongodb byte store (#23876)
The `MongoDBStore` can manage only documents.
It's not possible to use MongoDB for an `CacheBackedEmbeddings`.

With this new implementation, it's possible to use:
```python
CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=embeddings,
    document_embedding_cache=MongoDBByteStore(
      connection_string=db_uri,
      db_name=db_name,
      collection_name=collection_name,
  ),
)
```
and use MongoDB to cache the embeddings !
2024-07-19 13:54:12 -04:00
Dristy Srivastava
020cc1cf3e
Community[minor]: Added checksum in while send data to pebblo-cloud (#23968)
- **Description:** 
            - Updated checksum in doc metadata
- Sending checksum and removing actual content, while sending data to
`pebblo-cloud` if `classifier-location `is `pebblo-cloud` in
`/loader/doc` API
            - Adding `pb_id` i.e. pebblo id to doc metadata
            - Refactoring as needed.
- Sending `content-checksum` and removing actual content, while sending
data to `pebblo-cloud` if `classifier-location `is `pebblo-cloud` in
`prmopt` API
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** Updated
- **Docs** NA

---------

Co-authored-by: dristy.cd <dristy@clouddefense.io>
2024-07-19 13:52:54 -04:00
keval dekivadiya
06f47678ae
community[minor]: Add TextEmbed Embedding Integration (#22946)
**Description:**

**TextEmbed** is a high-performance embedding inference server designed
to provide a high-throughput, low-latency solution for serving
embeddings. It supports various sentence-transformer models and includes
the ability to deploy image and text embedding models. TextEmbed offers
flexibility and scalability for diverse applications.

- **PyPI Package:** [TextEmbed on
PyPI](https://pypi.org/project/textembed/)
- **Docker Image:** [TextEmbed on Docker
Hub](https://hub.docker.com/r/kevaldekivadiya/textembed)
- **GitHub Repository:** [TextEmbed on
GitHub](https://github.com/kevaldekivadiya2415/textembed)

**PR Description**
This PR adds functionality for embedding documents and queries using the
`TextEmbedEmbeddings` class. The implementation allows for both
synchronous and asynchronous embedding requests to a TextEmbed API
endpoint. The class handles batching and permuting of input texts to
optimize the embedding process.

**Example Usage:**

```python
from langchain_community.embeddings import TextEmbedEmbeddings

# Initialise the embeddings class
embeddings = TextEmbedEmbeddings(model="your-model-id", api_key="your-api-key", api_url="your_api_url")

# Define a list of documents
documents = [
    "Data science involves extracting insights from data.",
    "Artificial intelligence is transforming various industries.",
    "Cloud computing provides scalable computing resources over the internet.",
    "Big data analytics helps in understanding large datasets.",
    "India has a diverse cultural heritage."
]

# Define a query
query = "What is the cultural heritage of India?"

# Embed all documents
document_embeddings = embeddings.embed_documents(documents)

# Embed the query
query_embedding = embeddings.embed_query(query)

# Print embeddings for each document
for i, embedding in enumerate(document_embeddings):
    print(f"Document {i+1} Embedding:", embedding)

# Print the query embedding
print("Query Embedding:", query_embedding)

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-07-19 17:30:25 +00:00
Ben Chambers
3691701d58
community[minor]: Add keybert-based link extractor (#24311)
- **Description:** Add a `KeybertLinkExtractor` for graph vectorstores.
This allows extracting links from keywords in a Document and linking
nodes that have common keywords.
- **Issue:** None
- **Dependencies:** None.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-19 12:25:07 -04:00
Ben Chambers
83f3d95ffa
community[minor]: GLiNER link extraction (#24314)
- **Description:** This allows extracting links between documents with
common named entities using [GLiNER](https://github.com/urchade/GLiNER).
- **Issue:** None
- **Dependencies:** None

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-19 15:34:54 +00:00
Anas Khan
b5acb91080
Mask API keys for various LLM/ChatModel Modules (#13885)
**Description:** 
- Added masking of the API Keys for the modules:
  - `langchain/chat_models/openai.py`
  - `langchain/llms/openai.py`
  - `langchain/llms/google_palm.py`
  - `langchain/chat_models/google_palm.py`
  - `langchain/llms/edenai.py`

- Updated the modules to utilize `SecretStr` from pydantic to securely
manage API key.
- Added unit/integration tests
- `langchain/chat_models/asure_openai.py` used the `open_api_key` that
is derived from the `ChatOpenAI` Class and it was assuming
`openai_api_key` is a str so we changed it to expect `SecretStr`
instead.

**Issue:** https://github.com/langchain-ai/langchain/issues/12165 ,
**Dependencies:** none,
**Tag maintainer:** @eyurtsev

---------

Co-authored-by: HassanA01 <anikeboss@gmail.com>
Co-authored-by: Aneeq Hassan <aneeq.hassan@utoronto.ca>
Co-authored-by: kristinspenc <kristinspenc2003@gmail.com>
Co-authored-by: faisalt14 <faisalt14@gmail.com>
Co-authored-by: Harshil-Patel28 <76663814+Harshil-Patel28@users.noreply.github.com>
Co-authored-by: kristinspenc <146893228+kristinspenc@users.noreply.github.com>
Co-authored-by: faisalt14 <90787271+faisalt14@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-19 15:23:34 +00:00
ccurme
f99369a54c
community[patch]: fix formatting (#24443)
Somehow this got through CI:
https://github.com/langchain-ai/langchain/pull/24363
2024-07-19 14:38:53 +00:00
Ben Chambers
242b085be7
Merge pull request #24315
* community: Add Hierarchy link extractor

* add example

* lint
2024-07-19 09:42:26 -04:00
Brice Fotzo
034a8c7c1b
community: support advanced text extraction options for pdf documents (#20265)
**Description:** 
- Updated constructors in PyPDFParser and PyPDFLoader to handle
`extraction_mode` and additional kwargs, aligning with the capabilities
of `PageObject.extract_text()` from pypdf.

- Added `test_pypdf_loader_with_layout` along with a corresponding
example text file to validate layout extraction from PDFs.

**Issue:** fixes #19735 

**Dependencies:** This change requires updating the pypdf dependency
from version 3.4.0 to at least 4.0.0.

Additional changes include the addition of a new test
test_pypdf_loader_with_layout and an example text file to ensure the
functionality of layout extraction from PDFs aligns with the new
capabilities.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-17 20:47:09 +00:00
Rafael Pereira
cf28708e7b
Neo4j: Update with non-deprecated cypher methods, and new method to associate relationship embeddings (#23725)
**Description:** At the moment neo4j wrapper is using setVectorProperty,
which is deprecated
([link](https://neo4j.com/docs/operations-manual/5/reference/procedures/#procedure_db_create_setVectorProperty)).
I replaced with the non-deprecated version.

Neo4j recently introduced a new cypher method to associate embeddings
into relations using "setRelationshipVectorProperty" method. In this PR
I also implemented a new method to perform this association maintaining
the same format used in the "add_embeddings" method which is used to
associate embeddings into Nodes.
I also included a test case for this new method.
2024-07-17 12:37:47 -04:00
Rafael Pereira
fc41730e28
neo4j: Fix test for order-insensitive comparison and floating-point precision issues (#24338)
**Description:** 
This PR addresses two main issues in the `test_neo4jvector.py`:
1. **Order-insensitive Comparison:** Modified the
`test_retrieval_dictionary` to ensure that it passes regardless of the
order of returned values by parsing `page_content` into a structured
format (dictionary) before comparison.
2. **Floating-point Precision:** Updated
`test_neo4jvector_relevance_score` to handle minor floating-point
precision differences by using the `isclose` function for comparing
relevance scores with a relative tolerance.

Errors addressed:

- **test_neo4jvector_relevance_score:**
  ```
AssertionError: assert [(Document(page_content='foo', metadata={'page':
'0'}), 1.0000014305114746), (Document(page_content='bar',
metadata={'page': '1'}), 0.9998371005058289),
(Document(page_content='baz', metadata={'page': '2'}),
0.9993508458137512)] == [(Document(page_content='foo', metadata={'page':
'0'}), 1.0), (Document(page_content='bar', metadata={'page': '1'}),
0.9998376369476318), (Document(page_content='baz', metadata={'page':
'2'}), 0.9993523359298706)]
At index 0 diff: (Document(page_content='foo', metadata={'page': '0'}),
1.0000014305114746) != (Document(page_content='foo', metadata={'page':
'0'}), 1.0)
  Full diff:
  - [(Document(page_content='foo', metadata={'page': '0'}), 1.0),
+ [(Document(page_content='foo', metadata={'page': '0'}),
1.0000014305114746),
? +++++++++++++++
- (Document(page_content='bar', metadata={'page': '1'}),
0.9998376369476318),
? ^^^ ------
+ (Document(page_content='bar', metadata={'page': '1'}),
0.9998371005058289),
? ^^^^^^^^^
- (Document(page_content='baz', metadata={'page': '2'}),
0.9993523359298706),
? ----------
+ (Document(page_content='baz', metadata={'page': '2'}),
0.9993508458137512),
? ++++++++++
  ]
  ```

- **test_retrieval_dictionary:**
  ```
AssertionError: assert [Document(page_content='skills:\n- Python\n- Data
Analysis\n- Machine Learning\nname: John\nage: 30\n')] ==
[Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine
Learning\nage: 30\nname: John\n')]
At index 0 diff: Document(page_content='skills:\n- Python\n- Data
Analysis\n- Machine Learning\nname: John\nage: 30\n') !=
Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine
Learning\nage: 30\nname: John\n')
  Full diff:
- [Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine
Learning\nage: 30\nname: John\n')]
? ---------
+ [Document(page_content='skills:\n- Python\n- Data Analysis\n- Machine
Learning\nage: John\nage: 30\n')]
? +++++++++
  ```
2024-07-17 09:28:25 -04:00
bovlb
5caa381177
community[minor]: Add ApertureDB as a vectorstore (#24088)
Thank you for contributing to LangChain!

- [X] *ApertureDB as vectorstore**: "community: Add ApertureDB as a
vectorestore"

- **Description:** this change provides a new community integration that
uses ApertureData's ApertureDB as a vector store.
    - **Issue:** none
    - **Dependencies:** depends on ApertureDB Python SDK
    - **Twitter handle:** ApertureData

- [X] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

Integration tests rely on a local run of a public docker image.
Example notebook additionally relies on a local Ollama server.

- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

All lint tests pass.

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Gautam <gautam@aperturedata.io>
2024-07-16 09:32:59 -07:00
Lage Ragnarsson
a3c10fc6ce
community: Add support for specifying hybrid search for Databricks vector search (#23528)
**Description:**

Databricks Vector Search recently added support for hybrid
keyword-similarity search.
See [usage
examples](https://docs.databricks.com/en/generative-ai/create-query-vector-search.html#query-a-vector-search-endpoint)
from their documentation.

This PR updates the Langchain vectorstore interface for Databricks to
enable the user to pass the *query_type* parameter to
*similarity_search* to make use of this functionality.
By default, there will not be any changes for existing users of this
interface. To use the new hybrid search feature, it is now possible to
do

```python
# ...
dvs = DatabricksVectorSearch(index)
dvs.similarity_search("my search query", query_type="HYBRID")
```

Or using the retriever:

```python
retriever = dvs.as_retriever(
    search_kwargs={
        "query_type": "HYBRID",
    }
)
retriever.invoke("my search query")
```

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-15 22:14:08 +00:00
Christopher Tee
5171ffc026
community(you): Integrate You.com conversational APIs (#23046)
You.com is releasing two new conversational APIs — Smart and Research. 

This PR:
- integrates those APIs with Langchain, as an LLM
- streaming is supported

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-07-15 17:46:58 -04:00
maang-h
6c7d9f93b9
feat: Add ChatTongyi structured output (#24187)
- **Description:** Add `with_structured_output` method to ChatTongyi to
support structured output.
2024-07-15 15:57:21 -04:00
maang-h
9d97de34ae
community[patch]: Improve ChatBaichuan init args and role (#23878)
- **Description:** Improve ChatBaichuan init args and role
   -  ChatBaichuan adds `system` role
   - alias: `baichuan_api_base` -> `base_url`
   - `with_search_enhance` is deprecated
   - Add `max_tokens` argument
2024-07-15 15:17:00 -04:00
Harold Martin
ccdaf14eff
docs: Spell check fixes (#24217)
**Description:** Spell check fixes for docs, comments, and a couple of
strings. No code change e.g. variable names.
**Issue:** none
**Dependencies:** none
**Twitter handle:** hmartin
2024-07-15 15:51:43 +00:00
Bagatur
5fd1e67808
core[minor], integrations...[patch]: Support ToolCall as Tool input and ToolMessage as Tool output (#24038)
Changes:
- ToolCall, InvalidToolCall and ToolCallChunk can all accept a "type"
parameter now
- LLM integration packages add "type" to all the above
- Tool supports ToolCall inputs that have "type" specified
- Tool outputs ToolMessage when a ToolCall is passed as input
- Tools can separately specify ToolMessage.content and
ToolMessage.raw_output
- Tools emit events for validation errors (using on_tool_error and
on_tool_end)

Example:
```python
@tool("structured_api", response_format="content_and_raw_output")
def _mock_structured_tool_with_raw_output(
    arg1: int, arg2: bool, arg3: Optional[dict] = None
) -> Tuple[str, dict]:
    """A Structured Tool"""
    return f"{arg1} {arg2}", {"arg1": arg1, "arg2": arg2, "arg3": arg3}


def test_tool_call_input_tool_message_with_raw_output() -> None:
    tool_call: Dict = {
        "name": "structured_api",
        "args": {"arg1": 1, "arg2": True, "arg3": {"img": "base64string..."}},
        "id": "123",
        "type": "tool_call",
    }
    expected = ToolMessage("1 True", raw_output=tool_call["args"], tool_call_id="123")
    tool = _mock_structured_tool_with_raw_output
    actual = tool.invoke(tool_call)
    assert actual == expected

    tool_call.pop("type")
    with pytest.raises(ValidationError):
        tool.invoke(tool_call)

    actual_content = tool.invoke(tool_call["args"])
    assert actual_content == expected.content
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-11 14:54:02 -07:00
Matt
8327925ab7
community:support additional Azure Search Options (#24134)
- **Description:** Support additional kwargs options for the Azure
Search client (Described here
https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/core/azure-core/README.md#configurations)
    - **Issue:** N/A
    - **Dependencies:** No additional Dependencies

---------
2024-07-11 18:22:36 +00:00
Christophe Bornet
5fc5ef2b52
community[minor]: Add graph store extractors (#24065)
This adds an extractor interface and an implementation for HTML pages.
Extractors are used to create GraphVectorStore Links on loaded content.

**Twitter handle:** cbornet_
2024-07-11 10:35:31 -04:00
Eugene Yurtsev
2c180d645e
core[minor],community[minor]: Upgrade all @root_validator() to @pre_init (#23841)
This PR introduces a @pre_init decorator that's a @root_validator(pre=True) but with all the defaults populated!
2024-07-08 16:09:29 -04:00
Rajendra Kadam
ee8aa54f53
community[patch]: Fix source path mismatch in PebbloSafeLoader (#23857)
**Description:** Fix for source path mismatch in PebbloSafeLoader. The
fix involves storing the full path in the doc metadata in VectorDB
**Issue:** NA, caught in internal testing
**Dependencies:** NA
**Add tests**:  Updated tests
2024-07-05 15:24:17 -04:00
Christophe Bornet
42d049f618
core[minor]: Add Graph Store component (#23092)
This PR introduces a GraphStore component. GraphStore extends
VectorStore with the concept of links between documents based on
document metadata. This allows linking documents based on a variety of
techniques, including common keywords, explicit links in the content,
and other patterns.

This works with existing Documents, so it’s easy to extend existing
VectorStores to be used as GraphStores. The interface can be implemented
for any Vector Store technology that supports metadata, not only graph
DBs.

When retrieving documents for a given query, the first level of search
is done using classical similarity search. Next, links may be followed
using various traversal strategies to get additional documents. This
allows documents to be retrieved that aren’t directly similar to the
query but contain relevant information.

2 retrieving methods are added to the VectorStore ones : 
* traversal_search which gets all linked documents up to a certain depth
* mmr_traversal_search which selects linked documents using an MMR
algorithm to have more diverse results.

If a depth of retrieval of 0 is used, GraphStore is effectively a
VectorStore. It enables an easy transition from a simple VectorStore to
GraphStore by adding links between documents as a second step.

An implementation for Apache Cassandra is also proposed.

See
https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-store/notebooks/astra_support.ipynb
for a notebook explaining how to use GraphStore and that shows that it
can answer correctly to questions that a simple VectorStore cannot.

**Twitter handle:** _cbornet
2024-07-05 12:24:10 -04:00
Eugene Yurtsev
6f08e11d7c
core[minor]: add upsert, streaming_upsert, aupsert, astreaming_upsert methods to the VectorStore abstraction (#23774)
This PR rolls out part of the new proposed interface for vectorstores
(https://github.com/langchain-ai/langchain/pull/23544) to existing store
implementations.

The PR makes the following changes:

1. Adds standard upsert, streaming_upsert, aupsert, astreaming_upsert
methods to the vectorstore.
2. Updates `add_texts` and `aadd_texts` to be non required with a
default implementation that delegates to `upsert` and `aupsert` if those
have been implemented. The original `add_texts` and `aadd_texts` methods
are problematic as they spread object specific information across
document and **kwargs. (e.g., ids are not a part of the document)
3. Adds a default implementation to `add_documents` and `aadd_documents`
that delegates to `upsert` and `aupsert` respectively.
4. Adds standard unit tests to verify that a given vectorstore
implements a correct read/write API.

A downside of this implementation is that it creates `upsert` with a
very similar signature to `add_documents`.
The reason for introducing `upsert` is to:
* Remove any ambiguities about what information is allowed in `kwargs`.
Specifically kwargs should only be used for information common to all
indexed data. (e.g., indexing timeout).
*Allow inheriting from an anticipated generalized interface for indexing
that will allow indexing `BaseMedia` (i.e., allow making a vectorstore
for images/audio etc.)
 
`add_documents` can be deprecated in the future in favor of `upsert` to
make sure that users have a single correct way of indexing content.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-05 12:21:40 -04:00
André Quintino
99b1467b63
community: add support for 'cloud' parameter in JiraAPIWrapper (#23057)
- **Description:** Enhance JiraAPIWrapper to accept the 'cloud'
parameter through an environment variable. This update allows more
flexibility in configuring the environment for the Jira API.
 - **Twitter handle:** Andre_Q_Pereira

---------

Co-authored-by: André Quintino <andre.quintino@tui.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-05 15:11:10 +00:00
volodymyr-memsql
a4eb6d0fb1
community: add SingleStoreDB semantic cache (#23218)
This PR adds a `SingleStoreDBSemanticCache` class that implements a
cache based on SingleStoreDB vector store, integration tests, and a
notebook example.

Additionally, this PR contains minor changes to SingleStoreDB vector
store:
 - change add texts/documents methods to return a list of inserted ids
 - implement delete(ids) method to delete documents by list of ids
 - added drop() method to drop a correspondent database table
- updated integration tests to use and check functionality implemented
above


CC: @baskaryan, @hwchase17

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
2024-07-05 09:26:06 -04:00
Ikko Eltociear Ashimine
75734fbcf1
community: fix typo in unit tests for test_zenguard.py (#23819)
enviroment -> environment


- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"
2024-07-03 14:05:42 -04:00
Bagatur
a0c2281540
infra: update mypy 1.10, ruff 0.5 (#23721)
```python
"""python scripts/update_mypy_ruff.py"""
import glob
import tomllib
from pathlib import Path

import toml
import subprocess
import re

ROOT_DIR = Path(__file__).parents[1]


def main():
    for path in glob.glob(str(ROOT_DIR / "libs/**/pyproject.toml"), recursive=True):
        print(path)
        with open(path, "rb") as f:
            pyproject = tomllib.load(f)
        try:
            pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = (
                "^1.10"
            )
            pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = (
                "^0.5"
            )
        except KeyError:
            continue
        with open(path, "w") as f:
            toml.dump(pyproject, f)
        cwd = "/".join(path.split("/")[:-1])
        completed = subprocess.run(
            "poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color",
            cwd=cwd,
            shell=True,
            capture_output=True,
            text=True,
        )
        logs = completed.stdout.split("\n")

        to_ignore = {}
        for l in logs:
            if re.match("^(.*)\:(\d+)\: error:.*\[(.*)\]", l):
                path, line_no, error_type = re.match(
                    "^(.*)\:(\d+)\: error:.*\[(.*)\]", l
                ).groups()
                if (path, line_no) in to_ignore:
                    to_ignore[(path, line_no)].append(error_type)
                else:
                    to_ignore[(path, line_no)] = [error_type]
        print(len(to_ignore))
        for (error_path, line_no), error_types in to_ignore.items():
            all_errors = ", ".join(error_types)
            full_path = f"{cwd}/{error_path}"
            try:
                with open(full_path, "r") as f:
                    file_lines = f.readlines()
            except FileNotFoundError:
                continue
            file_lines[int(line_no) - 1] = (
                file_lines[int(line_no) - 1][:-1] + f"  # type: ignore[{all_errors}]\n"
            )
            with open(full_path, "w") as f:
                f.write("".join(file_lines))

        subprocess.run(
            "poetry run ruff format .; poetry run ruff --select I --fix .",
            cwd=cwd,
            shell=True,
            capture_output=True,
            text=True,
        )


if __name__ == "__main__":
    main()

```
2024-07-03 10:33:27 -07:00
maang-h
525109e506
feat: Implement ChatBaichuan asynchronous interface (#23589)
- **Description:** Add interface to `ChatBaichuan` to support
asynchronous requests
    - `_agenerate` method
    - `_astream` method

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-03 12:10:04 -04:00
maang-h
e4e28a6ff5
community[patch]: Fix MiniMaxChat validate_environment error (#23770)
- **Description:** Fix some issues in MiniMaxChat 
  - Fix `minimax_api_host` not in `values` error
- Remove `minimax_group_id` from reading environment variables, the
`minimax_group_id` no longer use in MiniMaxChat
  - Invoke callback prior to yielding token, the issus #16913
2024-07-02 13:23:32 -04:00
Jacob Lee
7791d92711
community[patch]: Fix requests alias for load_tools (#23734)
CC @baskaryan
2024-07-01 15:02:14 -07:00
Bagatur
381aedcc61
docs: standardize azure openai page (#23642)
part of #22296
2024-06-28 15:15:41 -07:00
Vadym Barda
e8d77002ea
core: add RemoveMessage (#23636)
This change adds a new message type `RemoveMessage`. This will enable
`langgraph` users to manually modify graph state (or have the graph
nodes modify the state) to remove messages by `id`

Examples:

* allow users to delete messages from state by calling

```python
graph.update_state(config, values=[RemoveMessage(id=state.values[-1].id)])
```

* allow nodes to delete messages

```python
graph.add_node("delete_messages", lambda state: [RemoveMessage(id=state[-1].id)])
```
2024-06-28 14:40:02 -07:00
Eugene Yurtsev
68f348357e
community[patch]: Test InMemoryVectorStore with RWAPI test suite (#23603)
Add standard test suite to InMemoryVectorStore implementation.
2024-06-27 16:43:43 -04:00
mackong
70834cd741
community[patch]: support convert FunctionMessage for Tongyi (#23569)
**Description:** For function call agent with Tongyi, cause the
AgentAction will be converted to FunctionMessage by

47f69fe0d8/libs/core/langchain_core/agents.py (L188)
But now Tongyi's *convert_message_to_dict* doesn't support
FunctionMessage

47f69fe0d8/libs/community/langchain_community/chat_models/tongyi.py (L184-L207)
Then next round conversation will be failed by the *TypeError*
exception.

This patch adds the support to convert FunctionMessage for Tongyi.

**Issue:** N/A
**Dependencies:** N/A
2024-06-27 15:49:26 -04:00
Nuradil
c93d9e66e4
Community: Update and fix ZenGuardTool docs and add ZenguardTool to init files (#23415)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: update docs and add tool to init.py"

- [x] **PR message**: 
- **Description:** Fixed some errors and comments in the docs and added
our ZenGuardTool and additional classes to init.py for easy access when
importing
- **Question:** when will you update the langchain-community package in
pypi to make our tool available?


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Thank you for review!

---------

Co-authored-by: Baur <baur.krykpayev@gmail.com>
2024-06-25 19:26:32 +00:00
yuncliu
398b2b9c51
community[minor]: Add Ascend NPU optimized Embeddings (#20260)
- **Description:** Add NPU support for embeddings

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-06-24 20:15:11 +00:00
Tomaz Bratanic
aeeda370aa
Sanitize backticks from neo4j labels and types for import (#23367) 2024-06-24 19:05:31 +00:00
Rave Harpaz
f5ff7f178b
Add OCI Generative AI new model support (#22880)
- [x] PR title: 
community: Add OCI Generative AI new model support
 
- [x] PR message:
- Description: adding support for new models offered by OCI Generative
AI services. This is a moderate update of our initial integration PR
16548 and includes a new integration for our chat models under
/langchain_community/chat_models/oci_generative_ai.py
    - Issue: NA
- Dependencies: No new Dependencies, just latest version of our OCI sdk
    - Twitter handle: NA


- [x] Add tests and docs: 
  1. we have updated our unit tests
2. we have updated our documentation including a new ipynb for our new
chat integration


- [x] Lint and test: 
 `make format`, `make lint`, and `make test` run successfully

---------

Co-authored-by: RHARPAZ <RHARPAZ@RHARPAZ-5750.us.oracle.com>
Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
2024-06-24 14:48:23 -04:00
Baur
aa358f2be4
community: Add ZenGuard tool (#22959)
** Description**
This is the community integration of ZenGuard AI - the fastest
guardrails for GenAI applications. ZenGuard AI protects against:

- Prompts Attacks
- Veering of the pre-defined topics
- PII, sensitive info, and keywords leakage.
- Toxicity
- Etc.

**Twitter Handle** : @zenguardai

- [x] **Add tests and docs**: If you're adding a new integration, please
include
  1. Added an integration test
  2. Added colab


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

---------

Co-authored-by: Nuradil <nuradil.maksut@icloud.com>
Co-authored-by: Nuradil <133880216+yaksh0nti@users.noreply.github.com>
2024-06-24 17:40:56 +00:00
Mathis Joffre
60103fc4a5
community: Fix OVHcloud 401 Unauthorized on embedding. (#23260)
They are now rejecting with code 401 calls from users with expired or
invalid tokens (while before they were being considered anonymous).
Thus, the authorization header has to be removed when there is no token.

Related to: #23178

---------

Signed-off-by: Joffref <mariusjoffre@gmail.com>
2024-06-24 12:58:32 -04:00
maang-h
bc4cd9c5cc
community[patch]: Update root_validators ChatModels: ChatBaichuan, QianfanChatEndpoint, MiniMaxChat, ChatSparkLLM, ChatZhipuAI (#22853)
This PR updates root validators for:

- ChatModels: ChatBaichuan, QianfanChatEndpoint, MiniMaxChat,
ChatSparkLLM, ChatZhipuAI

Issues #22819

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-06-20 16:36:41 +00:00
Michał Krassowski
710197e18c
community[patch]: restore compatibility with SQLAlchemy 1.x (#22546)
- **Description:** Restores compatibility with SQLAlchemy 1.4.x that was
broken since #18992 and adds a test run for this version on CI (only for
Python 3.11)
- **Issue:** fixes #19681
- **Dependencies:** None
- **Twitter handle:** `@krassowski_m`

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-06-19 17:58:57 +00:00
ccurme
ca798bc6ea
community: move test to integration tests (#23178)
Tests failing on master with

> FAILED
tests/unit_tests/embeddings/test_ovhcloud.py::test_ovhcloud_embed_documents
- ValueError: Request failed with status code: 401, {"message":"Bad
token; invalid JSON"}
2024-06-19 14:39:48 +00:00
Finlay Macklon
616d06d7fe
community: glob multiple patterns when using DirectoryLoader (#22852)
- **Description:** Updated
*community.langchain_community.document_loaders.directory.py* to enable
the use of multiple glob patterns in the `DirectoryLoader` class. Now,
the glob parameter is of type `list[str] | str` and still defaults to
the same value as before. I updated the docstring of the class to
reflect this, and added a unit test to
*community.tests.unit_tests.document_loaders.test_directory.py* named
`test_directory_loader_glob_multiple`. This test also shows an example
of how to use the new functionality.
- ~~Issue:~~**Discussion Thread:**
https://github.com/langchain-ai/langchain/discussions/18559
- **Dependencies:** None
- **Twitter handle:** N/a

- [x] **Add tests and docs**
    - Added test (described above)
    - Updated class docstring

- [x] **Lint and test**

---------

Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
2024-06-18 09:24:50 -07:00
Raghav Dixit
55705c0f5e
LanceDB integration update (#22869)
Added : 

- [x] relevance search (w/wo scores)
- [x] maximal marginal search
- [x] image ingestion
- [x] filtering support
- [x] hybrid search w reranking 

make test, lint_diff and format checked.
2024-06-17 20:54:26 -07:00
Chang Liu
62c8a67f56
community: add KafkaChatMessageHistory (#22216)
Add chat history store based on Kafka.

Files added: 
`libs/community/langchain_community/chat_message_histories/kafka.py`
`docs/docs/integrations/memory/kafka_chat_message_history.ipynb`

New issue to be created for future improvement:
1. Async method implementation.
2. Message retrieval based on timestamp.
3. Support for other configs when connecting to cloud hosted Kafka (e.g.
add `api_key` field)
4. Improve unit testing & integration testing.
2024-06-17 20:34:01 -07:00
Oguz Vuruskaner
dd25d08c06
community[minor]: add tool calling for DeepInfraChat (#22745)
DeepInfra now supports tool calling for supported models.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-06-17 15:21:49 -04:00
maang-h
c6b7db6587
community: Add Baichuan Embeddings batch size (#22942)
- **Support batch size** 
Baichuan updates the document, indicating that up to 16 documents can be
imported at a time

- **Standardized model init arg names**
    - baichuan_api_key -> api_key
    - model_name  -> model
2024-06-17 14:11:04 -04:00
Shubham Pandey
56ac94e014
community[minor]: add ChatSnowflakeCortex chat model (#21490)
**Description:** This PR adds a chat model integration for [Snowflake
Cortex](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions),
which gives an instant access to industry-leading large language models
(LLMs) trained by researchers at companies like Mistral, Reka, Meta, and
Google, including [Snowflake
Arctic](https://www.snowflake.com/en/data-cloud/arctic/), an open
enterprise-grade model developed by Snowflake.

**Dependencies:** Snowflake's
[snowpark](https://pypi.org/project/snowflake-snowpark-python/) library
is required for using this integration.

**Twitter handle:** [@gethouseware](https://twitter.com/gethouseware)

- [x] **Add tests and docs**:
1. integration tests:
`libs/community/tests/integration_tests/chat_models/test_snowflake.py`
2. unit tests:
`libs/community/tests/unit_tests/chat_models/test_snowflake.py`
  3. example notebook: `docs/docs/integrations/chat/snowflake.ipynb`


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2024-06-17 09:47:05 -07:00
Bitmonkey
570d45b2a1
Update ollama.py with optional raw setting. (#21486)
Ollama has a raw option now. 

https://github.com/ollama/ollama/blob/main/docs/api.md

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-06-14 17:19:26 -07:00
Eugene Yurtsev
c72bcda4f2
community[major], experimental[patch]: Remove Python REPL from community (#22904)
Remove the REPL from community, and suggest an alternative import from
langchain_experimental.

Fix for this issue:
https://github.com/langchain-ai/langchain/issues/14345

This is not a bug in the code or an actual security risk. The python
REPL itself is behaving as expected.

The PR is done to appease blanket security policies that are just
looking for the presence of exec in the code.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-06-14 17:53:29 +00:00
Thanh Nguyen
b5e2ba3a47
community[minor]: add chat model llamacpp (#22589)
- **PR title**: [community] add chat model llamacpp


- **PR message**:
- **Description:** This PR introduces a new chat model integration with
llamacpp_python, designed to work similarly to the existing ChatOpenAI
model.
      + Work well with instructed chat, chain and function/tool calling.
+ Work with LangGraph (persistent memory, tool calling), will update
soon

- **Dependencies:** This change requires the llamacpp_python library to
be installed.
    
@baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-06-14 14:51:43 +00:00
maang-h
1055b9a309
community[minor]: Implement ZhipuAIEmbeddings interface (#22821)
- **Description:** Implement ZhipuAIEmbeddings interface, include:
     - The `embed_query` method
     - The `embed_documents` method

refer to [ZhipuAI
Embedding-2](https://open.bigmodel.cn/dev/api#text_embedding)

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-06-13 19:45:11 -07:00
Christophe Bornet
d04e899b56
ci: add testing with Python 3.12 (#22813)
We need to use a different version of numpy for py3.8 and py3.12 in
pyproject.
And so do projects that use that Python version range and import
langchain.

    - **Twitter handle:** _cbornet
2024-06-12 16:31:36 -04:00
Philippe PRADOS
23c22fcbc9
langchain[minor]: Make EmbeddingsFilters async (#22737)
Add native async implementation for EmbeddingsFilter
2024-06-12 12:27:26 -04:00
Mr. Lance E Sloan «UMich»
84dc2dd059
community[patch]: Load YouTube transcripts (captions) as fixed-duration chunks with start times (#21710)
- **Description:** Add a new format, `CHUNKS`, to
`langchain_community.document_loaders.youtube.YoutubeLoader` which
creates multiple `Document` objects from YouTube video transcripts
(captions), each of a fixed duration. The metadata of each chunk
`Document` includes the start time of each one and a URL to that time in
the video on the YouTube website.
  
I had implemented this for UMich (@umich-its-ai) in a local module, but
it makes sense to contribute this to LangChain community for all to
benefit and to simplify maintenance.

- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter:** lsloan_umich
- **Mastodon:**
[lsloan@mastodon.social](https://mastodon.social/@lsloan)

With regards to **tests and documentation**, most existing features of
the `YoutubeLoader` class are not tested. Only the
`YoutubeLoader.extract_video_id()` static method had a test. However,
while I was waiting for this PR to be reviewed and merged, I had time to
add a test for the chunking feature I've proposed in this PR.

I have added an example of using chunking to the
`docs/docs/integrations/document_loaders/youtube_transcript.ipynb`
notebook.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-06-11 17:44:36 +00:00
Aayush Kataria
71811e0547
community[minor]: Adds a vector store for Azure Cosmos DB for NoSQL (#21676)
This PR add supports for Azure Cosmos DB for NoSQL vector store.

Summary:

Description: added vector store integration for Azure Cosmos DB for
NoSQL Vector Store,
Dependencies: azure-cosmos dependency,
Tag maintainer: @hwchase17, @baskaryan @efriis @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-06-11 10:34:01 -07:00
am-kinetica
ad101adec8
community[patch]: Kinetica Integrations handled error in querying; quotes in table names; updated gpudb API (#22724)
- [ ] **Miscellaneous updates and fixes**: 
- **Description:** Handled error in querying; quotes in table names;
updated gpudb API
- **Issue:** Threw an error with an error message difficult to
understand if a query failed or returned no records
    - **Dependencies:** Updated GPUDB API version to `7.2.0.9`


@baskaryan @hwchase17
2024-06-11 10:01:26 -04:00
Mathis Joffre
ea43f40daf
community[minor]: Add support for OVHcloud AI Endpoints Embedding (#22667)
**Description:** Add support for [OVHcloud AI
Endpoints](https://endpoints.ai.cloud.ovh.net/) Embedding models.

Inspired by:
https://gist.github.com/gmasse/e1f99339e161f4830df6be5d0095349a

Signed-off-by: Joffref <mariusjoffre@gmail.com>
2024-06-10 21:07:25 +00:00
Eugene Yurtsev
05d31a2f00
community[patch]: Add missing type annotations (#22758)
Add missing type annotations to objects in community.
These missing type annotations will raise type errors in pydantic 2.
2024-06-10 16:59:28 -04:00
Tomaz Bratanic
76a193decc
community[patch]: Add function response to graph cypher qa chain (#22690)
LLMs struggle with Graph RAG, because it's different from vector RAG in
a way that you don't provide the whole context, only the answer and the
LLM has to believe. However, that doesn't really work a lot of the time.
However, if you wrap the context as function response the accuracy is
much better.

btw... `union[LLMChain, Runnable]` is linting fun, that's why so many
ignores
2024-06-10 13:52:17 -07:00
X-HAN
34edfe4a16
community[minor]: add Volcengine Rerank (#22700)
**Description:** this PR adds Volcengine Rerank capability to Langchain,
you can find Volcengine Rerank API from
[here](https://www.volcengine.com/docs/84313/1254474) &
[here](https://www.volcengine.com/docs/84313/1254605).
[Volcengine](https://www.volcengine.com/) is a cloud service platform
developed by ByteDance, the parent company of TikTok. You can obtain
Volcengine API AK/SK from
[here](https://www.volcengine.com/docs/84313/1254553).

**Dependencies:** VolcengineRerank depends on `volcengine` python
package.

**Twitter handle:** my twitter/x account is https://x.com/LastMonopoly
and I'd like a mention, thank you!


**Tests and docs**
  1. integration test: `test_volcengine_rerank.py`
  2. example notebook: `volcengine_rerank.ipynb`

**Lint and test**: I have run `make format`, `make lint` and `make test`
from the root of the package I've modified.
2024-06-10 13:41:05 -07:00
Max Mulatz
058a64c563
Community[minor]: Add language parser for Elixir (#22742)
Hi 👋 

First off, thanks a ton for your work on this 💚 Really appreciate what
you're providing here for the community.

## Description

This PR adds a basic language parser for the
[Elixir](https://elixir-lang.org/) programming language. The parser code
is based upon the approach outlined in
https://github.com/langchain-ai/langchain/pull/13318: it's using
`tree-sitter` under the hood and aligns with all the other `tree-sitter`
based parses added that PR.

The `CHUNK_QUERY` I'm using here is probably not the most sophisticated
one, but it worked for my application. It's a starting point to provide
"core" parsing support for Elixir in LangChain. It enables people to use
the language parser out in real world applications which may then lead
to further tweaking of the queries. I consider this PR just the ground
work.

- **Dependencies:** requires `tree-sitter` and `tree-sitter-languages`
from the extended dependencies
- **Twitter handle:**`@bitcrowd`

## Checklist

- [x] **PR title**: "package: description"
- [x] **Add tests and docs**
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

<!-- If no one reviews your PR within a few days, please @-mention one
of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->
2024-06-10 15:56:57 +00:00
Philippe PRADOS
9aabb446c5
community[minor]: Add SQL storage implementation (#22207)
Hello @eyurtsev

- package: langchain-comminity
- **Description**: Add SQL implementation for docstore. A new
implementation, in line with my other PR ([async
PGVector](https://github.com/langchain-ai/langchain-postgres/pull/32),
[SQLChatMessageMemory](https://github.com/langchain-ai/langchain/pull/22065))
- Twitter handler: pprados

---------

Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Piotr Mardziel <piotrm@gmail.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-06-07 21:17:02 +00:00
Cahid Arda Öz
6c07eb0c12
community[minor]: Add UpstashRatelimitHandler (#21885)
Adding `UpstashRatelimitHandler` callback for rate limiting based on
number of chain invocations or LLM token usage.

For more details, see [upstash/ratelimit-py
repository](https://github.com/upstash/ratelimit-py) or the notebook
guide included in this PR.

Twitter handle: @cahidarda

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-06-07 21:02:06 +00:00
Erick Friis
a24a9c6427
multiple: get rid of pyproject extras (#22581)
They cause `poetry lock` to take a ton of time, and `uv pip install` can
resolve the constraints from these toml files in trivial time
(addressing problem with #19153)

This allows us to properly upgrade lockfile dependencies moving forward,
which revealed some issues that were either fixed or type-ignored (see
file comments)
2024-06-06 15:45:22 -07:00
Isaac Francisco
ba3e219d83
community[patch]: recursive url loader fix and unit tests (#22521)
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-06-05 17:56:20 -07:00
X-HAN
62f13f95e4
community[minor]: add DashScope Rerank (#22403)
**Description:** this PR adds DashScope Rerank capability to Langchain,
you can find DashScope Rerank API from
[here](https://help.aliyun.com/document_detail/2780058.html?spm=a2c4g.2780059.0.0.6d995024FlrJ12)
&
[here](https://help.aliyun.com/document_detail/2780059.html?spm=a2c4g.2780058.0.0.63f75024cr11N9).
[DashScope](https://dashscope.aliyun.com/) is the generative AI service
from Alibaba Cloud (Aliyun). You can create DashScope API key from
[here](https://bailian.console.aliyun.com/?apiKey=1#/api-key).

**Dependencies:** DashScopeRerank depends on `dashscope` python package.

**Twitter handle:** my twitter/x account is https://x.com/LastMonopoly
and I'd like a mention, thanks you!


**Tests and docs**
  1. integration test: `test_dashscope_rerank.py`
  2. example notebook: `dashscope_rerank.ipynb`

**Lint and test**: I have run `make format`, `make lint` and `make test`
from the root of the package I've modified.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-06-05 15:40:21 -07:00
leila-messallem
3280a5b49b
community[patch]: improve test setup to accurately test filtering of labels in neo4j (#22531)
**Description:** This PR addresses an issue with an existing test that
was not effectively testing the intended functionality. The previous
test setup did not adequately validate the filtering of the labels in
neo4j, because the nodes and relationship in the test data did not have
any properties set. Without properties these labels would not have been
returned, regardless of the filtering.

---------

Co-authored-by: Oskar Hane <oh@oskarhane.com>
2024-06-05 15:56:53 +00:00
Stefano Lottini
328d0c99f2
community[minor]: Add support for metadata indexing policy in Cassandra vector store (#22548)
This PR adds a constructor `metadata_indexing` parameter to the
Cassandra vector store to allow optional fine-tuning of which fields of
the metadata are to be indexed.

This is a feature supported by the underlying CassIO library. Indexing
mode of "all", "none" or deny- and allow-list based choices are
available.

The rationale is, in some cases it's advisable to programmatically
exclude some portions of the metadata from the index if one knows in
advance they won't ever be used at search-time. this keeps the index
more lightweight and performant and avoids limitations on the length of
_indexed_ strings.

I added a integration test of the feature. I also added the possibility
of running the integration test with Cassandra on an arbitrary IP
address (e.g. Dockerized), via
`CASSANDRA_CONTACT_POINTS=10.1.1.5,10.1.1.6 poetry run pytest [...]` or
similar.

While I was at it, I added a line to the `.gitignore` since the mypy
_test_ cache was not ignored yet.

My X (Twitter) handle: @rsprrs.
2024-06-05 11:23:26 -04:00
Philippe PRADOS
8250c177de
community[minor]: Add native async support to SQLChatMessageHistory (#22065)
# package community: Fix SQLChatMessageHistory

## Description
Here is a rewrite of `SQLChatMessageHistory` to properly implement the
asynchronous approach. The code circumvents [issue
22021](https://github.com/langchain-ai/langchain/issues/22021) by
accepting a synchronous call to `def add_messages()` in an asynchronous
scenario. This bypasses the bug.

For the same reasons as in [PR
22](https://github.com/langchain-ai/langchain-postgres/pull/32) of
`langchain-postgres`, we use a lazy strategy for table creation. Indeed,
the promise of the constructor cannot be fulfilled without this. It is
not possible to invoke a synchronous call in a constructor. We
compensate for this by waiting for the next asynchronous method call to
create the table.

The goal of the `PostgresChatMessageHistory` class (in
`langchain-postgres`) is, among other things, to be able to recycle
database connections. The implementation of the class is problematic, as
we have demonstrated in [issue
22021](https://github.com/langchain-ai/langchain/issues/22021).

Our new implementation of `SQLChatMessageHistory` achieves this by using
a singleton of type (`Async`)`Engine` for the database connection. The
connection pool is managed by this singleton, and the code is then
reentrant.

We also accept the type `str` (optionally complemented by `async_mode`.
I know you don't like this much, but it's the only way to allow an
asynchronous connection string).

In order to unify the different classes handling database connections,
we have renamed `connection_string` to `connection`, and `Session` to
`session_maker`.

Now, a single transaction is used to add a list of messages. Thus, a
crash during this write operation will not leave the database in an
unstable state with a partially added message list. This makes the code
resilient.

We believe that the `PostgresChatMessageHistory` class is no longer
necessary and can be replaced by:
```
PostgresChatMessageHistory = SQLChatMessageHistory
```
This also fixes the bug.


## Issue
- [issue 22021](https://github.com/langchain-ai/langchain/issues/22021)
  - Bug in _exit_history()
  - Bugs in PostgresChatMessageHistory and sync usage
  - Bugs in PostgresChatMessageHistory and async usage
- [issue
36](https://github.com/langchain-ai/langchain-postgres/issues/36)
 ## Twitter handle:
pprados

## Tests
- libs/community/tests/unit_tests/chat_message_histories/test_sql.py
(add async test)

@baskaryan, @eyurtsev or @hwchase17 can you check this PR ?
And, I've been waiting a long time for validation from other PRs. Can
you take a look?
- [PR 32](https://github.com/langchain-ai/langchain-postgres/pull/32)
- [PR 15575](https://github.com/langchain-ai/langchain/pull/15575)
- [PR 13200](https://github.com/langchain-ai/langchain/pull/13200)

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-06-05 15:10:38 +00:00
Vincent Min
59bef31997
community[minor]: Improve InMemoryVectorStore with ability to persist to disk and filter on metadata. (#22186)
- **Description:** The InMemoryVectorStore is a nice and simple vector
store implementation for quick development and debugging. The current
implementation is quite limited in its functionalities. This PR extends
the functionalities by adding utility function to persist the vector
store to a json file and to load it from a json file. We choose the json
file format because it allows inspection of the database contents in a
text editor, which is great for debugging. Furthermore, it adds a
`filter` keyword that can be used to filter out documents on their
`page_content` or `metadata`.
- **Issue:** -
- **Dependencies:** -
- **Twitter handle:** @Vincent_Min
2024-06-05 10:40:34 -04:00
Ofer Mendelevitch
ad502e8d50
community[minor]: Vectara Integration Update - Streaming, FCS, Chat, updates to documentation and example notebooks (#21334)
Thank you for contributing to LangChain!

**Description:** update to the Vectara / Langchain integration to
integrate new Vectara capabilities:
- Full RAG implemented as a Runnable with as_rag()
- Vectara chat supported with as_chat()
- Both support streaming response
- Updated documentation and example notebook to reflect all the changes
- Updated Vectara templates

**Twitter handle:** ofermend

**Add tests and docs**: no new tests or docs, but updated both existing
tests and existing docs
2024-06-04 12:57:28 -07:00
Joydeep Banik Roy
3796672c67
community, milvus, pinecone, qdrant, mongo: Broadcast operation failure while using simsimd beyond v3.7.7 (#22271)
- [ ] **Packages affected**: 
  - community: fix `cosine_similarity` to support simsimd beyond 3.7.7
- partners/milvus: fix `cosine_similarity` to support simsimd beyond
3.7.7
- partners/mongodb: fix `cosine_similarity` to support simsimd beyond
3.7.7
- partners/pinecone: fix `cosine_similarity` to support simsimd beyond
3.7.7
- partners/qdrant: fix `cosine_similarity` to support simsimd beyond
3.7.7


- [ ] **Broadcast operation failure while using simsimd beyond v3.7.7**:
- **Description:** I was using simsimd 4.3.1 and the unsupported operand
type issue popped up. When I checked out the repo and ran the tests,
they failed as well (have attached a screenshot for that). Looks like it
is a variant of https://github.com/langchain-ai/langchain/issues/18022 .
Prior to 3.7.7, simd.cdist returned an ndarray but now it returns
simsimd.DistancesTensor which is ineligible for a broadcast operation
with numpy. With this change, it also remove the need to explicitly cast
`Z` to numpy array
    - **Issue:** #19905
    - **Dependencies:** No
    - **Twitter handle:** https://x.com/GetzJoydeep

<img width="1622" alt="Screenshot 2024-05-29 at 2 50 00 PM"
src="https://github.com/langchain-ai/langchain/assets/31132555/fb27b383-a9ae-4a6f-b355-6d503b72db56">

- [ ] **Considerations**: 
1. I started with community but since similar changes were there in
Milvus, MongoDB, Pinecone, and QDrant so I modified their files as well.
If touching multiple packages in one PR is not the norm, then I can
remove them from this PR and raise separate ones
2. I have run and verified that the tests work. Since, only MongoDB had
tests, I ran theirs and verified it works as well. Screenshots attached
:
<img width="1573" alt="Screenshot 2024-05-29 at 2 52 13 PM"
src="https://github.com/langchain-ai/langchain/assets/31132555/ce87d1ea-19b6-4900-9384-61fbc1a30de9">
<img width="1614" alt="Screenshot 2024-05-29 at 3 33 51 PM"
src="https://github.com/langchain-ai/langchain/assets/31132555/6ce1d679-db4c-4291-8453-01028ab2dca5">
  

I have added a test for simsimd. I feel it may not go well with the
CI/CD setup as installing simsimd is not a dependency requirement. I
have just imported simsimd to ensure simsimd cosine similarity is
invoked. However, its not a good approach. Suggestions are welcome and I
can make the required changes on the PR. Please provide guidance on the
same as I am new to the community.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-06-04 17:36:31 +00:00
KyrianC
03178ee74f
community[minor]: Add tools calls to ChatEdenAI (#22320)
### Description  
Add tools implementation to `ChatEdenAI`:
- `bind_tools()`
- `with_structured_output()`

### Documentation 
Updated `docs/docs/integrations/chat/edenai.ipynb`

### Notes
We don´t support stream with tools as of yet. If stream is called with
tools we directly yield the whole message from `generate` (implemented
the same way as Anthropic did).
2024-06-04 10:29:28 -07:00
Rahul Triptahi
77ad857934
community[minor]: Enable retrieval api calls in PebbloRetrievalQA (#21958)
Description: Enable app discovery and Prompt/Response apis in
PebbloSafeRetrieval
Documentation: NA
Unit test: N/A

---------

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-06-04 10:18:50 -07:00
ccurme
afe89a1411
community: add standard chat model params to Ollama (#22446) 2024-06-03 17:45:03 -04:00
maang-h
13140dc4ff
community[patch]: Update the default api_url and reqeust_body of sparkllm embedding (#22136)
- **Description:** When I was running the SparkLLMTextEmbeddings,
app_id, api_key and api_secret are all correct, but it cannot run
normally using the current URL.

    ```python
    # example
    from langchain_community.embeddings import SparkLLMTextEmbeddings

    embedding= SparkLLMTextEmbeddings(
        spark_app_id="my-app-id",
        spark_api_key="my-api-key",
        spark_api_secret="my-api-secret"
    )
    embedding= "hello"
    print(spark.embed_query(text1))
    ```

![sparkembedding](https://github.com/langchain-ai/langchain/assets/55082429/11daa853-4f67-45b2-aae2-c95caa14e38c)
   
So I updated the url and request body parameters according to
[Embedding_api](https://www.xfyun.cn/doc/spark/Embedding_api.html), now
it is runnable.
2024-06-03 12:38:11 -07:00
Yuwen Hu
ba0dca46d7
community[minor]: Add IPEX-LLM BGE embedding support on both Intel CPU and GPU (#22226)
**Description:** [IPEX-LLM](https://github.com/intel-analytics/ipex-llm)
is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local
PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low
latency. This PR adds ipex-llm integrations to langchain for BGE
embedding support on both Intel CPU and GPU.
**Dependencies:** `ipex-llm`, `sentence-transformers`
**Contribution maintainer**: @Oscilloscope98 
**tests and docs**: 
- langchain/docs/docs/integrations/text_embedding/ipex_llm.ipynb
- langchain/docs/docs/integrations/text_embedding/ipex_llm_gpu.ipynb
-
langchain/libs/community/tests/integration_tests/embeddings/test_ipex_llm.py

---------

Co-authored-by: Shengsheng Huang <shannie.huang@gmail.com>
2024-06-03 12:37:10 -07:00
maang-h
596c062cba
community[patch]: Standardize qianfan model init args name (#22322)
- **Description:**  
    - Standardize qianfan chat model intialization arguments name
        - qianfan_ak (qianfan api key)  -> api_key
        - qianfan_sk (qianfan secret key)  ->  secret_key
       
    - Delete unuse variable
- **Issue:** #20085
2024-05-30 11:08:32 -04:00
Pavlo Paliychuk
342df7cf83
community[minor]: Add Zep Cloud components + docs + examples (#21671)
Thank you for contributing to LangChain!

- [x] **PR title**: community: Add Zep Cloud components + docs +
examples

- [x] **PR message**: 
We have recently released our new zep-cloud sdks that are compatible
with Zep Cloud (not Zep Open Source). We have also maintained our Cloud
version of langchain components (ChatMessageHistory, VectorStore) as
part of our sdks. This PRs goal is to port these components to langchain
community repo, and close the gap with the existing Zep Open Source
components already present in community repo (added
ZepCloudMemory,ZepCloudVectorStore,ZepCloudRetriever).
Also added a ZepCloudChatMessageHistory components together with an
expression language example ported from our repo. We have left the
original open source components intact on purpose as to not introduce
any breaking changes.
    - **Issue:** -
- **Dependencies:** Added optional dependency of our new cloud sdk
`zep-cloud`
    - **Twitter handle:** @paulpaliychuk51


- [x] **Add tests and docs**


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-05-27 12:50:13 -07:00
Jirka Lhotka
7c0459faf2
community: Update costs of openai finetuned models (#22124)
- **Description:** Update costs of finetuned models and add
gpt-3-turbo-0125. Source: https://openai.com/api/pricing/
  - **Issue:** N/A
  - **Dependencies:** None
2024-05-24 15:25:17 +00:00
Christophe Bornet
c838de5027
doc: Add doc for CassandraByteStore (#22126)
Preview:
https://langchain-git-fork-cbornet-doc-cassandrabytestore-langchain.vercel.app/v0.2/docs/integrations/stores/cassandra/
2024-05-24 10:57:55 -04:00
Eugene Yurtsev
2d693c484e
docs: fix some spelling mistakes caught by newest version of code spell (#22090)
Going to merge this even though it doesn't pass all tests, and open a
separate PR for the remaining spelling mistakes.
2024-05-23 16:59:11 -04:00
Pavel Zloi
fe26f937e4
community[minor]: ManticoreSearch engine added to vectorstore (#19117)
**Description:** ManticoreSearch engine added to vectorstores
**Issue:** no issue, just a new feature
**Dependencies:** https://pypi.org/project/manticoresearch-dev/
**Twitter handle:** @EvilFreelancer

- Example notebook with test integration:

https://github.com/EvilFreelancer/langchain/blob/manticore-search-vectorstore/docs/docs/integrations/vectorstores/manticore_search.ipynb

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-23 13:56:18 -07:00
Philippe PRADOS
6dd621d636
community[minor]: Add CloudBlobLoader that supports loading data from cloud buckets (#21957)
Thank you for contributing to LangChain!

- [ ] **PR title**: "Add CloudBlobLoader"
  - community: Add CloudBlobLoader

- [ ] **PR message**: Add cloud blob loader
    - **Description:** 
 Langchain provides several approaches to read different file formats:

Specific loaders (`CVSLoader`) or blob-compatible loaders
(`FileSystemBlobLoader`). The only implementation proposed for
BlobLoader is `FileSystemBlobLoader`.
      
Many projects retrieve files from cloud storage. We propose a new
implementation of `BlobLoader` to read files from the three cloud
storage systems. The interface is strictly identical to
`FileSystemBlobLoader`. The only difference is the constructor, which
takes a cloud "url" object such as `s3://my-bucket`, `az://my-bucket`,
or `gs://my-bucket`.
      
By streamlining the process, this novel implementation eliminates the
requirement to pre-download files from cloud storage to local temporary
files (which are seldom removed).
      
The code relies on the
[CloudPathLib](https://cloudpathlib.drivendata.org/stable/) library to
interpret cloud URLs. This has been added as an optional dependency.

```Python
loader = CloudBlobLoader("s3://mybucket/id")
for blob in loader.yield_blobs():
    print(blob)
```

- [X] **Dependencies:** CloudPathLib
- [X] **Twitter handle:** pprados


- [X] **Add tests and docs**: Add unit test, but it's easy to convert to
integration test, with some files in a cloud storage (see
`test_cloud_blob_loader.py`)

- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

Hello from Paris @hwchase17. Can you review this PR?

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-05-23 10:59:55 -04:00
Christophe Bornet
74947ec894
community[minor]: Add Cassandra ByteStore (#22064) 2024-05-23 10:46:23 -04:00
Christophe Bornet
fea6b99b16
community[minor]: Add async methods to CassandraChatMessageHistory (#21975) 2024-05-23 10:13:05 -04:00
Bruno Alvisio
5eabe90494
community[patch]: Adding HEADER to the list of supported locations (#21946)
**Description:** adds headers to the list of supported locations when
generating the openai function schema
2024-05-22 22:47:56 +00:00
Bagatur
50186da0a1
infra: rm unused # noqa violations (#22049)
Updating #21137
2024-05-22 15:21:08 -07:00
acho98
45ed5f3f51
community[minor]: Add Clova Embeddings for LangChain Community (#21890)
- [ ] **PR title**: "Add Naver ClovaX embedding to LangChain community"
- HyperClovaX is a large language model developed by
[Naver](https://clova-x.naver.com/welcome).
It's a powerful and purpose-trained LLM.

- You can visit the embedding service provided by
[ClovaX](https://www.ncloud.com/product/aiService/clovaStudio)

- You may get CLOVA_EMB_API_KEY, CLOVA_EMB_APIGW_API_KEY,
CLOVA_EMB_APP_ID From
https://www.ncloud.com/product/aiService/clovaStudio

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-22 22:08:47 +00:00
MSubik
d948783a4c
community[patch]: standardize init args, update for javelin sdk release. (#21980)
Related to
[20085](https://github.com/langchain-ai/langchain/issues/20085) Updated
the Javelin chat model to standardize the initialization argument. Also
fixed an existing bug, where code was initialized with incorrect call to
the JavelinClient defined in the javelin_sdk, resulting in an
initialization error. See related [Javelin
Documentation](https://docs.getjavelin.io/docs/javelin-python/quickstart).
2024-05-22 21:47:28 +00:00
Mazen Ramadan
3c1d77dd64
community[minor]: Add Scrapfly Loader community integration (#22036)
Added [Scrapfly](https://scrapfly.io/) Web Loader integration. Scrapfly
is a web scraping API that allows extracting web page data into
accessible markdown or text datasets.

- __Description__: Added Scrapfly web loader for retrieving web page
data as markdown or text.
- Dependencies: scrapfly-sdk
- Twitter: @thealchemi1st

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-22 21:29:13 +00:00
SaschaStoll
709664a079
community[patch]: Performant filter columns option for Hanavector (#21971)
**Description:** Backwards compatible extension of the initialisation
interface of HanaDB to allow the user to specify
specific_metadata_columns that are used for metadata storage of selected
keys which yields increased filter performance. Any not-mentioned
metadata remains in the general metadata column as part of a JSON
string. Furthermore switched to executemany for batch inserts into
HanaDB.

**Issue:** N/A

**Dependencies:** no new dependencies added

**Twitter handle:** @sapopensource

---------

Co-authored-by: Martin Kolb <martin.kolb@sap.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-22 13:21:21 -07:00
Eric Zhang
e7e41eaabe
langchain: add RankLLM Reranker (#21171)
Integrate RankLLM reranker (https://github.com/castorini/rank_llm) into
LangChain

An example notebook is given in
`docs/docs/integrations/retrievers/rankllm-reranker.ipynb`

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-05-22 20:12:55 +00:00
maang-h
fc93bed8c4
community: Fix CSVLoader columns is None (#20701)
- **Bug code**: In
langchain_community/document_loaders/csv_loader.py:100

- **Description**: currently, when 'CSVLoader' reads the column as None
in the 'csv' file, it will report an error because the 'CSVLoader' does
not verify whether the column is of str type and does not consider how
to handle the corresponding 'row_data' when the column is' None 'in the
csv. This pr provides a solution.

- **Issue:**  Fix #20699 

- **thinking:**

1. Refer to the processing method for
'langchain_community/document_loaders/csv_loader.py:100' when **'v'**
equals'None', and apply the same method to '**k**'.
(Reference`csv.DictReader` ,**'k'** will only be None when `
len(columns) < len(number_row_data)` is established)
2. **‘k’** equals None only holds when it is the last column, and its
corresponding **'v'** type is a list. Therefore, I referred to the data
format in 'Document' and used ',' to concatenated the elements in the
list.(But I'm not sure if you accept this form, if you have any other
ideas, communicate)

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-05-22 12:57:46 -07:00
Eugene Yurtsev
36813d2f00
community[patch]: Fix remaining __inits__ in community (#22037)
Fixes the __init__ files in community to use __all__ which is statically
defined.
2024-05-22 17:42:17 +00:00
Eugene Yurtsev
58360a1e53
community[patch]: Add unit test to verify that init is correctly defined (#22030)
Fix some __init__ files and add a unit test
2024-05-22 17:19:00 +00:00
Matthew Hoffman
4f2e3bd7fd
community[patch]: fix public interface for embeddings module (#21650)
## Description

The existing public interface for `langchain_community.emeddings` is
broken. In this file, `__all__` is statically defined, but is
subsequently overwritten with a dynamic expression, which type checkers
like pyright do not support. pyright actually gives the following
diagnostic on the line I am requesting we remove:


[reportUnsupportedDunderAll](https://github.com/microsoft/pyright/blob/main/docs/configuration.md#reportUnsupportedDunderAll):

```
Operation on "__all__" is not supported, so exported symbol list may be incorrect
```

Currently, I get the following errors when attempting to use publicablly
exported classes in `langchain_community.emeddings`:

```python
import langchain_community.embeddings

langchain_community.embeddings.HuggingFaceEmbeddings(...)  #  error: "HuggingFaceEmbeddings" is not exported from module "langchain_community.embeddings" (reportPrivateImportUsage)
```

This is solved easily by removing the dynamic expression.
2024-05-22 11:42:15 -04:00
Eugene Yurtsev
8d82160a8a
community[patch]: Clean up logic in import checking unit test (#22026)
Clean up unit test
2024-05-22 15:30:10 +00:00
Tomaz Bratanic
d8a1f1114d
community[patch]: Handle exceptions where node props aren't consistent in neo4j schema (#22027) 2024-05-22 11:21:56 -04:00
Eugene Yurtsev
aed64daabb
community[patch]: Add unit test to catch bad __all__ definitions (#21996)
This will catch all dynamic __all__ definitions.
2024-05-22 09:32:13 -04:00
Pengcheng Liu
4cf523949a
community[patch]: Update model client to support vision model in Tong… (#21474)
- **Description:** Tongyi uses different client for chat model and
vision model. This PR chooses proper client based on model name to
support both chat model and vision model. Reference [tongyi
document](https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-qianwen-vl-plus-api?spm=a2c4g.11186623.0.0.27404c9a7upm11)
for details.

```
from langchain_core.messages import HumanMessage
from langchain_community.chat_models import ChatTongyi

llm = ChatTongyi(model_name='qwen-vl-max')
image_message = {
    "image": "https://lilianweng.github.io/posts/2023-06-23-agent/agent-overview.png"
}
text_message = {
    "text": "summarize this picture",
}
message = HumanMessage(content=[text_message, image_message])
llm.invoke([message])
```

- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None
2024-05-21 11:58:27 -07:00
Sevin F. Varoglu
1bc0ea5496
community[patch]: update OctoAIEmbeddings to subclass OpenAIEmbeddings (#21805) 2024-05-21 11:29:41 -07:00
Robert Caulk
54adcd9e82
community[minor]: add AskNews retriever and AskNews tool (#21581)
We add a tool and retriever for the [AskNews](https://asknews.app)
platform with example notebooks.

The retriever can be invoked with:

```py
from langchain_community.retrievers import AskNewsRetriever

retriever = AskNewsRetriever(k=3)

retriever.invoke("impact of fed policy on the tech sector")
```

To retrieve 3 documents in then news related to fed policy impacts on
the tech sector. The included notebook also includes deeper details
about controlling filters such as category and time, as well as
including the retriever in a chain.

The tool is quite interesting, as it allows the agent to decide how to
obtain the news by forming a query and deciding how far back in time to
look for the news:

```py
from langchain_community.tools.asknews import AskNewsSearch
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI

tool = AskNewsSearch()

instructions = """You are an assistant."""
base_prompt = hub.pull("langchain-ai/openai-functions-template")
prompt = base_prompt.partial(instructions=instructions)
llm = ChatOpenAI(temperature=0)
asknews_tool = AskNewsSearch()
tools = [asknews_tool]
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
)

agent_executor.invoke({"input": "How is the tech sector being affected by fed policy?"})
```

---------

Co-authored-by: Emre <e@emre.pm>
2024-05-20 18:23:06 -07:00
Jesse S
fc79b372cb
community[minor]: add aerospike vectorstore integration (#21735)
Please let me know if you see any possible areas of improvement. I would
very much appreciate your constructive criticism if time allows.

**Description:**
- Added a aerospike vector store integration that utilizes
[Aerospike-Vector-Search](https://aerospike.com/products/vector-database-search-llm/)
add-on.
- Added both unit tests and integration tests
- Added a docker compose file for spinning up a test environment
- Added a notebook

 **Dependencies:** any dependencies required for this change
- aerospike-vector-search

 **Twitter handle:** 
- No twitter, you can use my GitHub handle or LinkedIn if you'd like

Thanks!

---------

Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-21 01:01:47 +00:00
Param Singh
d07885f8b7
community[patch]: standardized sparkllm init args (#21633)
Related to #20085 
@baskaryan 

Thank you for contributing to LangChain!

community:sparkllm[patch]: standardized init args

updated `spark_api_key` so that aliased to `api_key`. Added integration
test for `sparkllm` to test that it continues to set the same underlying
attribute.

updated temperature with Pydantic Field, added to the integration test.

Ran `make format`,`make test`, `make lint`, `make spell_check`
2024-05-20 17:11:36 -07:00
Liuww
332ffed393
community[patch]: Adopting the lighter-weight xinference_client (#21900)
While integrating the xinference_embedding, we observed that the
downloaded dependency package is quite substantial in size. With a focus
on resource optimization and efficiency, if the project requirements are
limited to its vector processing capabilities, we recommend migrating to
the xinference_client package. This package is more streamlined,
significantly reducing the storage space requirements of the project and
maintaining a feature focus, making it particularly suitable for
scenarios that demand lightweight integration. Such an approach not only
boosts deployment efficiency but also enhances the application's
maintainability, rendering it an optimal choice for our current context.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-20 22:05:09 +00:00
Eugene Yurtsev
8607735b80
langchain[patch],community[patch]: Move unit tests that depend on community to community (#21685) 2024-05-16 17:24:27 -04:00
Stefano Lottini
040597e832
community: init signature revision for Cassandra LLM cache classes + small maintenance (#17765)
This PR improves on the `CassandraCache` and `CassandraSemanticCache`
classes, mainly in the constructor signature, and also introduces
several minor improvements around these classes.

### Init signature

A (sigh) breaking change is tentatively introduced to the constructor.
To me, the advantages outweigh the possible discomfort: the new syntax
places the DB-connection objects `session` and `keyspace` later in the
param list, so that they can be given a default value. This is what
enables the pattern of _not_ specifying them, provided one has
previously initialized the Cassandra connection through the versatile
utility method `cassio.init(...)`.

In this way, a much less unwieldy instantiation can be done, such as
`CassandraCache()` and `CassandraSemanticCache(embedding=xyz)`,
everything else falling back to defaults.

A downside is that, compared to the earlier signature, this might turn
out to be breaking for those doing positional instantiation. As a way to
mitigate this problem, this PR typechecks its first argument trying to
detect the legacy usage.
(And to make this point less tricky in the future, most arguments are
left to be keyword-only).

If this is considered too harsh, I'd like guidance on how to further
smoothen this transition. **Our plan is to make the pattern of optional
session/keyspace a standard across all Cassandra classes**, so that a
repeatable strategy would be ideal. A possibility would be to keep
positional arguments for legacy reasons but issue a deprecation warning
if any of them is actually used, to later remove them with 0.2 - please
advise on this point.

### Other changes

- class docstrings: enriched, completely moved to class level, added
note on `cassio.init(...)` pattern, added tiny sample usage code.
- semantic cache: revised terminology to never mention "distance" (it is
in fact a similarity!). Kept the legacy constructor param with a
deprecation warning if used.
- `llm_caching` notebook: uniform flow with the Cassandra and Astra DB
separate cases; better and Cassandra-first description; all imports made
explicit and from community where appropriate.
- cache integration tests moved to community (incl. the imported tools),
env var bugfix for `CASSANDRA_CONTACT_POINTS`.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-05-16 17:22:24 +00:00
Kyle Cassidy
eca8c4bcc6
Standardized openai init params (#21739)
## Patch Summary
community:openai[patch]: standardize init args

## Details
I made changes to the OpenAI Chat API wrapper test in the Langchain
open-source repository

- **File**: `libs/community/tests/unit_tests/chat_models/test_openai.py`
- **Changes**:
  - Updated `max_retries` with Pydantic Field
  - Updated the corresponding unit test
- **Related Issues**: #20085
  - Updated max_retries with Pydantic Field, updated the unit test.

---------

Co-authored-by: JuHyung Son <sonju0427@gmail.com>
2024-05-16 16:30:52 +00:00
ccurme
19e6bf814b
community: fix CI (#21766) 2024-05-16 15:41:03 +00:00
Cheese
0ead09f84d
community: Implement bind_tools for ChatTongyi (#20725)
## Description

Implement `bind_tools` in ChatTongyi. Usage example:

```py
from langchain_core.tools import tool
from langchain_community.chat_models.tongyi import ChatTongyi

@tool
def multiply(first_int: int, second_int: int) -> int:
    """Multiply two integers together."""
    return first_int * second_int

llm = ChatTongyi(model="qwen-turbo")

llm_with_tools = llm.bind_tools([multiply])

msg = llm_with_tools.invoke("What's 5 times forty two")

print(msg)
```

Streaming is also supported.

## Dependencies

No Dependency is required for this change.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-05-16 10:39:35 -04:00
Harrison Chase
15be439719
Harrison/move flashrank rerank (#21448)
third party integration, should be in community
2024-05-15 13:08:52 -07:00
Rajendra Kadam
54e003268e
langchain[minor]: Add PebbloRetrievalQA chain with Identity & Semantic Enforcement support (#20641)
- **Description:** PebbloRetrievalQA chain introduces identity
enforcement using vector-db metadata filtering
- **Dependencies:** None
- **Issue:** None
- **Documentation:** Adding documentation for PebbloRetrievalQA chain in
a separate PR(https://github.com/langchain-ai/langchain/pull/20746)
- **Unit tests:** New unit-tests added

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-05-15 13:14:52 +00:00
Eugene Yurtsev
25fbe356b4
community[patch]: upgrade to recent version of mypy (#21616)
This PR upgrades community to a recent version of mypy. It inserts type:
ignore on all existing failures.
2024-05-13 14:55:07 -04:00
ccurme
3bb9bec314
bedrock: add unit test for retriever (#21485)
This was implemented in
https://github.com/langchain-ai/langchain/pull/21349 but dropped before
merge.
2024-05-09 11:37:03 -04:00
Yash
cb31c3611f
Ndb enterprise (#21233)
Description: Adds NeuralDBClientVectorStore to the langchain, which is
our enterprise client.

---------

Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
2024-05-08 16:30:58 -07:00
Oguz Vuruskaner
5b35f077f9
[community][fix](DeepInfraEmbeddings): Implement chunking for large batches (#21189)
**Description:**
This PR introduces chunking logic to the `DeepInfraEmbeddings` class to
handle large batch sizes without exceeding maximum batch size of the
backend. This enhancement ensures that embedding generation processes
large batches by breaking them down into smaller, manageable chunks,
each conforming to the maximum batch size limit.

**Issue:**
Fixes #21189

**Dependencies:**
No new dependencies introduced.
2024-05-08 14:45:42 -07:00
Sokolov Fedor
f4ddf64faa
community: Add MarkdownifyTransformer to langchain_community.document_transformers (#21247)
- Added new document_transformer: MarkdonifyTransformer, that uses
`markdonify` package with customizable options to convert HTML to
Markdown. It's similar to Html2TextTransformer, but has more flexible
options and also I've noticed that sometimes MarkdownifyTransformer
performs better than html2text one, so that's why I use markdownify on
my project.
- Added docs and tests

- Usage:
```python
from langchain_community.document_transformers import MarkdownifyTransformer

markdownify = MarkdownifyTransformer()
docs_transform = markdownify.transform_documents(docs)
```

- Example of better performance on simple task, that I've noticed:
```
<html>
<head><title>Reports on product movement</title></head>
<body>
<p data-block-key="2wst7">The reports on product movement will be useful for forming supplier orders and controlling outcomes.</p>
</body>
```
**Html2TextTransformer**: 
```python
[Document(page_content='The reports on product movement will be useful for forming supplier orders and\ncontrolling outcomes.\n\n')]
# Here we can see 'and\ncontrolling', which has extra '\n' in it
```
**MarkdownifyTranformer**:
```python
[Document(page_content='Reports on product movement\n\nThe reports on product movement will be useful for forming supplier orders and controlling outcomes.')]
```

---------

Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.bbrouter>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.local>
Co-authored-by: Sokolov Fedor <f.sokolov@192.168.1.6>
2024-05-08 14:45:13 -07:00
Eugene Yurtsev
f92006de3c
multiple: langchain 0.2 in master (#21191)
0.2rc 

migrations

- [x] Move memory
- [x] Move remaining retrievers
- [x] graph_qa chains
- [x] some dependency from evaluation code potentially on math utils
- [x] Move openapi chain from `langchain.chains.api.openapi` to
`langchain_community.chains.openapi`
- [x] Migrate `langchain.chains.ernie_functions` to
`langchain_community.chains.ernie_functions`
- [x] migrate `langchain/chains/llm_requests.py` to
`langchain_community.chains.llm_requests`
- [x] Moving `langchain_community.cross_enoders.base:BaseCrossEncoder`
->
`langchain_community.retrievers.document_compressors.cross_encoder:BaseCrossEncoder`
(namespace not ideal, but it needs to be moved to `langchain` to avoid
circular deps)
- [x] unit tests langchain -- add pytest.mark.community to some unit
tests that will stay in langchain
- [x] unit tests community -- move unit tests that depend on community
to community
- [x] mv integration tests that depend on community to community
- [x] mypy checks

Other todo

- [x] Make deprecation warnings not noisy (need to use warn deprecated
and check that things are implemented properly)
- [x] Update deprecation messages with timeline for code removal (likely
we actually won't be removing things until 0.4 release) -- will give
people more time to transition their code.
- [ ] Add information to deprecation warning to show users how to
migrate their code base using langchain-cli
- [ ] Remove any unnecessary requirements in langchain (e.g., is
SQLALchemy required?)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-05-08 16:46:52 -04:00
Eugene Yurtsev
6a1d61dbf1
community[patch]: Fix in memory vectorstore to take into account ids when adding docs (#21384)
Should respect `ids` if passed
2024-05-07 15:05:16 -04:00
nrpd25
95cc8e3fc3
premai[patch]:Standardized model init args (#21308)
[Standardized model init args
#20085](https://github.com/langchain-ai/langchain/issues/20085)
- Enable premai chat model to be initialized with `model_name` as an
alias for `model`, `api_key` as an alias for `premai_api_key`.
- Add initialization test `test_premai_initialization`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-05-06 18:12:29 -04:00
Jorge Piedrahita Ortiz
e65652c3e8
community: add SambaNova embeddings integration (#21227)
- **Description:**  SambaNova hosted embeddings integration
2024-05-06 13:29:59 -07:00
Jorge Piedrahita Ortiz
df1c10260c
community: minor changes sambanova integration (#21231)
- **Description:** fix: variable names in root validator not allowing
pass credentials as named parameters in llm instancing, also added
sambanova's sambaverse and sambastudio llms to __init__.py for module
import
2024-05-06 13:28:35 -07:00
Mark Cusack
060987d755
community[minor]: Add indexing via locality sensitive hashing to the Yellowbrick vector store (#20856)
- **Description:** Add LSH-based indexing to the Yellowbrick vector
store module
- **Twitter handle:** @markcusack

---------

Co-authored-by: markcusack <markcusack@markcusacksmac.lan>
Co-authored-by: markcusack <markcusack@Mark-Cusack-sMac.local>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-05-06 20:18:02 +00:00
Param Singh
fee91d43b7
baichuan[patch]:standardize chat init args (#21298)
Thank you for contributing to LangChain!

community:baichuan[patch]: standardize init args

updated `baichuan_api_key` so that aliased to `api_key`. Added test that
it continues to set the same underlying attribute. Test checks for
`SecretStr`

updated `temperature` with Pydantic Field, added unit test. 

Related to https://github.com/langchain-ai/langchain/issues/20085
2024-05-06 18:33:57 +00:00
Rohan Aggarwal
8021d2a2ab
community[minor]: Oraclevs integration (#21123)
Thank you for contributing to LangChain!

- Oracle AI Vector Search 
Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.


- Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.
This Pull Requests Adds the following functionalities
Oracle AI Vector Search : Vector Store
Oracle AI Vector Search : Document Loader
Oracle AI Vector Search : Document Splitter
Oracle AI Vector Search : Summary
Oracle AI Vector Search : Oracle Embeddings


- We have added unit tests and have our own local unit test suite which
verifies all the code is correct. We have made sure to add guides for
each of the components and one end to end guide that shows how the
entire thing runs.


- We have made sure that make format and make lint run clean.

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: skmishraoracle <shailendra.mishra@oracle.com>
Co-authored-by: hroyofc <harichandan.roy@oracle.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-04 03:15:35 +00:00
Eugene Yurtsev
c9119b0e75
langchain[patch],community[minor]: Move some unit tests from langchain to community, use core for fake models (#21190) 2024-05-02 09:57:52 -04:00
Eugene Yurtsev
bec3eee3fa
langchain[patch]: Migrate retrievers to use optional langchain community imports (#21155) 2024-05-01 14:44:44 -04:00
East Agile
2a6f78a53f
community[minor]: Rememberizer retriever (#20052)
**Description:**
This pull request introduces a new feature for LangChain: the
integration with the Rememberizer API through a custom retriever.
This enables LangChain applications to allow users to load and sync
their data from Dropbox, Google Drive, Slack, their hard drive into a
vector database that LangChain can query. Queries involve sending text
chunks generated within LangChain and retrieving a collection of
semantically relevant user data for inclusion in LLM prompts.
User knowledge dramatically improved AI applications.
The Rememberizer integration will also allow users to access general
purpose vectorized data such as Reddit channel discussions and US
patents.

**Issue:**
N/A

**Dependencies:**
N/A

**Twitter handle:**
https://twitter.com/Rememberizer
2024-05-01 10:41:44 -04:00
MacanPN
0f7f448603
community[patch]: add delete() method to AzureSearch vector store (#21127)
**Issue:**
Currently `AzureSearch` vector store does not implement `delete` method.
This PR implements it. This also makes it compatible with LangChain
indexer.

**Dependencies:**
None

**Twitter handle:**
@martintriska1

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-30 23:46:18 +00:00
Jakub Pawłowski
b0b1a67771
community[patch]: Skip unexpected 404 HTTP Error in Arxiv download (#21042)
### Description:
When attempting to download PDF files from arXiv, an unexpected 404
error frequently occurs. This error halts the operation, regardless of
whether there are additional documents to process. As a solution, I
suggest implementing a mechanism to ignore and communicate this error
and continue processing the next document from the list.

Proposed Solution: To address the issue of unexpected 404 errors during
PDF downloads from arXiv, I propose implementing the following solution:

- Error Handling: Implement error handling mechanisms to catch and
handle 404 errors gracefully.
- Communication: Inform the user or logging system about the occurrence
of the 404 error.
- Continued Processing: After encountering a 404 error, continue
processing the remaining documents from the list without interruption.

This solution ensures that the application can handle unexpected errors
without terminating the entire operation. It promotes resilience and
robustness in the face of intermittent issues encountered during PDF
downloads from arXiv.

### Issue:
#20909 
### Dependencies:
none

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-04-30 18:29:22 +00:00
Cahid Arda Öz
cc6191cb90
community[minor]: Add support for Upstash Vector (#20824)
## Description

Adding `UpstashVectorStore` to utilize [Upstash
Vector](https://upstash.com/docs/vector/overall/getstarted)!

#17012 was opened to add Upstash Vector to langchain but was closed to
wait for filtering. Now filtering is added to Upstash vector and we open
a new PR. Additionally, [embedding
feature](https://upstash.com/docs/vector/features/embeddingmodels) was
added and we add this to our vectorstore aswell.

## Dependencies

[upstash-vector](https://pypi.org/project/upstash-vector/) should be
installed to use `UpstashVectorStore`. Didn't update dependencies
because of [this comment in the previous
PR](https://github.com/langchain-ai/langchain/pull/17012#pullrequestreview-1876522450).

## Tests

Tests are added and they pass. Tests are naturally network bound since
Upstash Vector is offered through an API.

There was [a discussion in the previous PR about mocking the
unittests](https://github.com/langchain-ai/langchain/pull/17012#pullrequestreview-1891820567).
We didn't make changes to this end yet. We can update the tests if you
can explain how the tests should be mocked.

---------

Co-authored-by: ytkimirti <yusuftaha9@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-29 17:25:01 -04:00
chyroc
3e241956d3
community[minor]: add coze chat model (#20770)
add coze chat model, to call coze.com apis
2024-04-29 12:26:16 -04:00
Patrick McFadin
3331865f6b
community[minor]: add Cassandra Database Toolkit (#20246)
**Description**: ToolKit and Tools for accessing data in a Cassandra
Database primarily for Agent integration. Initially, this includes the
following tools:
- `cassandra_db_schema` Gathers all schema information for the connected
database or a specific schema. Critical for the agent when determining
actions.
- `cassandra_db_select_table_data` Selects data from a specific keyspace
and table. The agent can pass paramaters for a predicate and limits on
the number of returned records.
- `cassandra_db_query` Expiriemental alternative to
`cassandra_db_select_table_data` which takes a query string completely
formed by the agent instead of parameters. May be removed in future
versions.

Includes unit test and two notebooks to demonstrate usage. 

**Dependencies**: cassio
**Twitter handle**: @PatrickMcFadin

---------

Co-authored-by: Phil Miesle <phil.miesle@datastax.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-29 15:51:43 +00:00
Igor Brai
b3e74f2b98
community[minor]: add mojeek search util (#20922)
**Description:** This pull request introduces a new feature to community
tools, enhancing its search capabilities by integrating the Mojeek
search engine
**Dependencies:** None

---------

Co-authored-by: Igor Brai <igor@mojeek.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-04-29 15:49:53 +00:00
Tomaz Bratanic
67428c4052
community[patch]: Neo4j enhanced schema (#20983)
Scan the database for example values and provide them to an LLM for
better inference of Text2cypher
2024-04-29 10:45:55 -04:00
Pengcheng Liu
1fad39be1c
community[minor]: Add LarkSuite wiki document loader. (#21016)
**Description:** Add LarkSuite wiki document loader. Refer to [LarkSuite
api document
](https://open.feishu.cn/document/server-docs/docs/wiki-v2/space-node/list)for
details.
**Issue:** None
**Dependencies:** None
**Twitter handle:** None
2024-04-29 10:37:50 -04:00
Leonid Ganeline
dc7c06bc07
community[minor]: import fix (#20995)
Issue: When the third-party package is not installed, whenever we need
to `pip install <package>` the ImportError is raised.
But sometimes, the `ValueError` or `ModuleNotFoundError` is raised. It
is bad for consistency.
Change: replaced the `ValueError` or `ModuleNotFoundError` with
`ImportError` when we raise an error with the `pip install <package>`
message.
Note: Ideally, we replace all `try: import... except... raise ... `with
helper functions like `import_aim` or just use the existing
[langchain_core.utils.utils.guard_import](https://api.python.langchain.com/en/latest/utils/langchain_core.utils.utils.guard_import.html#langchain_core.utils.utils.guard_import)
But it would be much bigger refactoring. @baskaryan Please, advice on
this.
2024-04-29 10:32:50 -04:00
WilliamEspegren
804390ba4b
community: Spider integration (#20937)
Added the [Spider.cloud](https://spider.cloud) document loader.
[Spider](https://github.com/spider-rs/spider) is the
[fastest](https://github.com/spider-rs/spider/blob/main/benches/BENCHMARKS.md)
and cheapest crawler that returns LLM-ready data.

```
- **Description:** Adds Spider data loader
- **Dependencies:** spider-client
- **Twitter handle:** @WilliamEspegren 
```

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: = <=>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-04-27 21:45:03 +00:00
Chip Davis
e818c75f8a
infra: test directory loader multithreaded (#20281)
This is a unit test for #20230 which was a fix for using multithreaded
mode with directory loader @eyurtsev
2024-04-26 19:16:47 -07:00
Jorge Piedrahita Ortiz
40b2e2916b
community[minor]: Sambanova llm integration (#20955)
- **Description:** Added [Sambanova systems](https://sambanova.ai/)
integration, including sambaverse and sambastudio LLMs
- **Dependencies:**   sseclient-py  (optional)

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-27 01:05:13 +00:00
Mayank Solanki
8c085fc697
community[patch]: Added a function from_existing_collection in Qdrant vector database. (#20779)
Issue: #20514 
The current implementation of `construct_instance` expects a `texts:
List[str]` that will call the embedding function. This might not be
needed when we already have a client with collection and `path, you
don't want to add any text.

This PR adds a class method that returns a qdrant instance with an
existing client.

Here everytime
cb6e5e56c2/libs/community/langchain_community/vectorstores/qdrant.py (L1592)
`construct_instance` is called, this line sends some text for embedding
generation.

---------

Co-authored-by: Anush <anushshetty90@gmail.com>
2024-04-26 15:34:09 -07:00