Commit Graph

1463 Commits

Author SHA1 Message Date
Eugene Yurtsev
780e84ae79 community[minor]: SQLDatabase Add fetch mode cursor, query parameters, query by selectable, expose execution options, and documentation (#17191)
- **Description:** Improve `SQLDatabase` adapter component to promote
code re-use, see
[suggestion](https://github.com/langchain-ai/langchain/pull/16246#pullrequestreview-1846590962).
  - **Needed by:** GH-16246
  - **Addressed to:** @baskaryan, @cbornet 

## Details
- Add `cursor` fetch mode
- Accept SQL query parameters
- Accept both `str` and SQLAlchemy selectables as query expression
- Expose `execution_options`
- Documentation page (notebook) about `SQLDatabase` [^1]
See [About
SQLDatabase](https://github.com/langchain-ai/langchain/blob/c1c7b763/docs/docs/integrations/tools/sql_database.ipynb).

[^1]: Apparently there hasn't been any yet?

---------

Co-authored-by: Andreas Motl <andreas.motl@crate.io>
2024-02-07 22:23:43 -05:00
Tomaz Bratanic
7e4b676d53 community[patch]: Better error propagation for neo4jgraph (#17190)
There are other errors that could happen when refreshing the schema, so
we want to propagate specific errors for more clarity
2024-02-07 22:16:14 -05:00
Dmitry Kankalovich
f92738a6f6 langchain[minor], community[minor], core[minor]: Async Cache support and AsyncRedisCache (#15817)
* This PR adds async methods to the LLM cache. 
* Adds an implementation using Redis called AsyncRedisCache.
* Adds a docker compose file at the /docker to help spin up docker
* Updates redis tests to use a context manager so flushing always happens by default
2024-02-07 22:06:09 -05:00
Bagatur
af74301ab9 core[patch], community[patch]: link extraction continue on failure (#17200) 2024-02-07 14:15:30 -08:00
Frank
ef082c77b1 community[minor]: add github file loader to load any github file content b… (#15305)
### Description
support load any github file content based on file extension.  

Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.

### Twitter handle
my twitter: @shufanhaotop

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-06 09:42:33 -08:00
Ryan Kraus
f027696b5f community: Added new Utility runnables for NVIDIA Riva. (#15966)
**Please tag this issue with `nvidia_genai`**

- **Description:** Added new Runnables for integration NVIDIA Riva into
LCEL chains for Automatic Speech Recognition (ASR) and Text To Speech
(TTS).
- **Issue:** N/A
- **Dependencies:** To use these runnables, the NVIDIA Riva client
libraries are required. It they are not installed, an error will be
raised instructing how to install them. The Runnables can be safely
imported without the riva client libraries.
- **Twitter handle:** N/A

All of the Riva Runnables are inside a single folder in the Utilities
module. In this folder are four files:
- common.py - Contains all code that is common to both TTS and ASR
- stream.py - Contains a class representing an audio stream that allows
the end user to put data into the stream like a queue.
- asr.py - Contains the RivaASR runnable
- tts.py - Contains the RivaTTS runnable

The following Python function is an example of creating a chain that
makes use of both of these Runnables:

```python
def create(
    config: Configuration,
    audio_encoding: RivaAudioEncoding,
    sample_rate: int,
    audio_channels: int = 1,
) -> Runnable[ASRInputType, TTSOutputType]:
    """Create a new instance of the chain."""
    _LOGGER.info("Instantiating the chain.")

    # create the riva asr client
    riva_asr = RivaASR(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        encoding=audio_encoding,
        audio_channel_count=audio_channels,
        sample_rate_hertz=sample_rate,
        profanity_filter=config.riva_asr.profanity_filter,
        enable_automatic_punctuation=config.riva_asr.enable_automatic_punctuation,
        language_code=config.riva_asr.language_code,
    )

    # create the prompt template
    prompt = PromptTemplate.from_template("{user_input}")

    # model = ChatOpenAI()
    model = ChatNVIDIA(model="mixtral_8x7b")  # type: ignore

    # create the riva tts client
    riva_tts = RivaTTS(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        output_directory=config.riva_tts.output_directory,
        language_code=config.riva_tts.language_code,
        voice_name=config.riva_tts.voice_name,
    )

    # construct and return the chain
    return {"user_input": riva_asr} | prompt | model | riva_tts  # type: ignore
```

The following code is an example of creating a new audio stream for
Riva:

```python
input_stream = AudioStream(maxsize=1000)
# Send bytes into the stream
for chunk in audio_chunks:
    await input_stream.aput(chunk)
input_stream.close()
```

The following code is an example of how to execute the chain with
RivaASR and RivaTTS

```python
output_stream = asyncio.Queue()
while not input_stream.complete:
    async for chunk in chain.astream(input_stream):
        output_stream.put(chunk)    
```

Everything should be async safe and thread safe. Audio data can be put
into the input stream while the chain is running without interruptions.

---------

Co-authored-by: Hayden Wolff <hwolff@nvidia.com>
Co-authored-by: Hayden Wolff <hwolff@Haydens-Laptop.local>
Co-authored-by: Hayden Wolff <haydenwolff99@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-05 19:50:50 -08:00
François Paupier
929f071513 community[patch]: Fix error in LlamaCpp community LLM with Configurable Fields, 'grammar' custom type not available (#16995)
- **Description:** Ensure the `LlamaGrammar` custom type is always
available when instantiating a `LlamaCpp` LLM
  - **Issue:** #16994 
  - **Dependencies:** None
  - **Twitter handle:** @fpaupier

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-05 17:56:58 -08:00
Alex Boury
334b6ebdf3 community[minor]: Breebs docs retriever (#16578)
- **Description:** Implementation of breeb retriever with integration
tests ->
libs/community/tests/integration_tests/retrievers/test_breebs.py and
documentation (notebook) ->
docs/docs/integrations/retrievers/breebs.ipynb.
  - **Dependencies:** None
2024-02-05 15:51:08 -08:00
Serena Ruan
9b279ac127 community[patch]: MLflow callback update (#16687)
Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-05 15:46:46 -08:00
Mohammad Mohtashim
3c4b24b69a community[patch]: Fix the _call of HuggingFaceHub (#16891)
Fixed the following identified issue: #16849

@baskaryan
2024-02-05 15:34:42 -08:00
Tyler Titsworth
304f3f5fc1 community[patch]: Add Progress bar to HuggingFaceEmbeddings (#16758)
- **Description:** Adds a function parameter to HuggingFaceEmbeddings
called `show_progress` that enables a `tqdm` progress bar if enabled.
Does not function if `multi_process = True`.
  - **Issue:** n/a
  - **Dependencies:** n/a
2024-02-05 14:33:34 -08:00
Supreet Takkar
ae33979813 community[patch]: Allow adding ARNs as model_id to support Amazon Bedrock custom models (#16800)
- **Description:** Adds an additional class variable to `BedrockBase`
called `provider` that allows sending a model provider such as amazon,
cohere, ai21, etc.
Up until now, the model provider is extracted from the `model_id` using
the first part before the `.`, such as `amazon` for
`amazon.titan-text-express-v1` (see [supported list of Bedrock model IDs
here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html)).
But for custom Bedrock models where the ARN of the provisioned
throughput must be supplied, the `model_id` is like
`arn:aws:bedrock:...` so the `model_id` cannot be extracted from this. A
model `provider` is required by the LangChain Bedrock class to perform
model-based processing. To allow the same processing to be performed for
custom-models of a specific base model type, passing this `provider`
argument can help solve the issues.
The alternative considered here was the use of
`provider.arn:aws:bedrock:...` which then requires ARN to be extracted
and passed separately when invoking the model. The proposed solution
here is simpler and also does not cause issues for current models
already using the Bedrock class.
  - **Issue:** N/A
  - **Dependencies:** N/A

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
2024-02-05 14:28:03 -08:00
Bagatur
66e45e8ab7 community[patch]: chat model mypy fixes (#17061)
Related to #17048
2024-02-05 13:42:59 -08:00
Bagatur
d93de71d08 community[patch]: chat message history mypy fixes (#17059)
Related to #17048
2024-02-05 13:13:25 -08:00
Bagatur
af5ae24af2 community[patch]: callbacks mypy fixes (#17058)
Related to #17048
2024-02-05 12:37:27 -08:00
Bagatur
e7b3290d30 community[patch]: fix agent_toolkits mypy (#17050)
Related to #17048
2024-02-05 11:56:24 -08:00
Erick Friis
6ffd5b15bc pinecone: init pkg (#16556)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-02-05 11:55:01 -08:00
Harrison Chase
4eda647fdd infra: add -p to mkdir in lint steps (#17013)
Previously, if this did not find a mypy cache then it wouldnt run

this makes it always run

adding mypy ignore comments with existing uncaught issues to unblock other prs

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-02-05 11:22:06 -08:00
Christophe Bornet
2ef69fe11b Add async methods to BaseChatMessageHistory and BaseMemory (#16728)
Adds:
   * async methods to BaseChatMessageHistory
   * async methods to ChatMessageHistory
   * async methods to BaseMemory
   * async methods to BaseChatMemory
   * async methods to ConversationBufferMemory
   * tests of ConversationBufferMemory's async methods

  **Twitter handle:** cbornet_
2024-02-05 13:20:28 -05:00
Killinsun - Ryota Takeuchi
bcfce146d8 community[patch]: Correct the calling to collection_name in qdrant (#16920)
## Description

In #16608, the calling `collection_name` was wrong.
I made a fix for it. 
Sorry for the inconvenience!

## Issue

https://github.com/langchain-ai/langchain/issues/16962

## Dependencies

N/A



<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Kumar Shivendu <kshivendu1@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-04 10:45:35 -08:00
Erick Friis
b1a847366c community: revert SQL Stores (#16912)
This reverts commit cfc225ecb3.


https://github.com/langchain-ai/langchain/pull/15909#issuecomment-1922418097

These will have existed in langchain-community 0.0.16 and 0.0.17.
2024-02-01 16:37:40 -08:00
Leonid Ganeline
c2ca6612fe refactor langchain.prompts.example_selector (#15369)
The `langchain.prompts.example_selector` [still holds several
artifacts](https://api.python.langchain.com/en/latest/langchain_api_reference.html#module-langchain.prompts)
that belongs to `community`. If they moved to
`langchain_community.example_selectors`, the `langchain.prompts`
namespace would be effectively removed which is great.
- moved a class and afunction to `langchain_community`

Note:
- Previously, the `langchain.prompts.example_selector` artifacts were
moved into the `langchain_core.exampe_selectors`. See the flattened
namespace (`.prompts` was removed)!
Similar flattening was implemented for the `langchain_core` as the
`langchain_core.exampe_selectors`.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-01 12:05:57 -08:00
Christophe Bornet
9d458d089a community: Factorize AstraDB components constructors (#16779)
* Adds `AstraDBEnvironment` class and use it in `AstraDBLoader`,
`AstraDBCache`, `AstraDBSemanticCache`, `AstraDBBaseStore` and
`AstraDBChatMessageHistory`
* Create an `AsyncAstraDB` if we only have an `AstraDB` and vice-versa
so:
  * we always have an instance of `AstraDB`
* we always have an instance of `AsyncAstraDB` for recent versions of
astrapy
* Create collection if not exists in `AstraDBBaseStore`
* Some typing improvements

Note: `AstraDB` `VectorStore` not using `AstraDBEnvironment` at the
moment. This will be done after the `langchain-astradb` package is out.
2024-02-01 10:51:07 -08:00
Christophe Bornet
af8c5c185b langchain[minor],community[minor]: Add async methods in BaseLoader (#16634)
Adds:
* methods `aload()` and `alazy_load()` to interface `BaseLoader`
* implementation for class `MergedDataLoader `
* support for class `BaseLoader` in async function `aindex()` with unit
tests

Note: this is compatible with existing `aload()` methods that some
loaders already had.

**Twitter handle:** @cbornet_

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-01-31 11:08:11 -08:00
Raphael
bf9068516e community[minor]: add the ability to load existing transcripts from AssemblyAI by their id. (#16051)
- **Description:** the existing AssemblyAI API allows to pass a path or
an url to transcribe an audio file and turn in into Langchain Documents,
this PR allows to get existing transcript by their transcript id and
turn them into Documents.
  - **Issue:** not related to an existing issue
  - **Dependencies:** requests

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-30 13:47:45 -08:00
Bagatur
daf820c77b community[patch]: undo create_sql_agent breaking (#16797) 2024-01-30 10:00:52 -08:00
Bagatur
b0347f3e2b docs: add csv use case (#16756) 2024-01-30 09:39:46 -08:00
Alexander Conway
4acd2654a3 Report which file was errored on in DirectoryLoader (#16790)
The current implementation leaves it up to the particular file loader
implementation to report the file on which an error was encountered - in
my case pdfminer was simply saying it could not parse a file as a PDF,
but I didn't know which of my hundreds of files it was failing on.

No reason not to log the particular item on which an error was
encountered, and it should be an immense debugging assistant.

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-30 09:14:58 -08:00
Bob Lin
546b757303 community: Add ChatGLM3 (#15265)
Add [ChatGLM3](https://github.com/THUDM/ChatGLM3) and updated
[chatglm.ipynb](https://python.langchain.com/docs/integrations/llms/chatglm)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-29 20:30:52 -08:00
Marina Pliusnina
a1ce7ab672 adding parameter for changing the language in SpacyEmbeddings (#15743)
Description: Added the parameter for a possibility to change a language
model in SpacyEmbeddings. The default value is still the same:
"en_core_web_sm", so it shouldn't affect a code which previously did not
specify this parameter, but it is not hard-coded anymore and easy to
change in case you want to use it with other languages or models.

Issue: At Barcelona Supercomputing Center in Aina project
(https://github.com/projecte-aina), a project for Catalan Language
Models and Resources, we would like to use Langchain for one of our
current projects and we would like to comment that Langchain, while
being a very powerful and useful open-source tool, is pretty much
focused on English language. We would like to contribute to make it a
bit more adaptable for using with other languages.

Dependencies: This change requires the Spacy library and a language
model, specified in the model parameter.

Tag maintainer: @dev2049

Twitter handle: @projecte_aina

---------

Co-authored-by: Marina Pliusnina <marina.pliusnina@bsc.es>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-29 20:30:34 -08:00
Christophe Bornet
744070ee85 Add async methods for the AstraDB VectorStore (#16391)
- **Description**: fully async versions are available for astrapy 0.7+.
For older astrapy versions or if the user provides a sync client without
an async one, the async methods will call the sync ones wrapped in
`run_in_executor`
  - **Twitter handle:** cbornet_
2024-01-29 20:22:25 -08:00
baichuan-assistant
f8f2649f12 community: Add Baichuan LLM to community (#16724)
Replace this entire comment with:
- **Description:** Add Baichuan LLM to integration/llm, also updated
related docs.

Co-authored-by: BaiChuanHelper <wintergyc@WinterGYCs-MacBook-Pro.local>
2024-01-29 20:08:24 -08:00
thiswillbeyourgithub
1d082359ee community: add support for callable filters in FAISS (#16190)
- **Description:**
Filtering in a FAISS vectorstores is very inflexible and doesn't allow
that many use case. I think supporting callable like this enables a lot:
regular expressions, condition on multiple keys etc. **Note** I had to
manually alter a test. I don't understand if it was falty to begin with
or if there is something funky going on.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None

Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply.github.com>
2024-01-29 20:05:56 -08:00
Killinsun - Ryota Takeuchi
52f4ad8216 community: Add new fields in metadata for qdrant vector store (#16608)
## Description

The PR is to return the ID and collection name from qdrant client to
metadata field in `Document` class.

## Issue

The motivation is almost same to
[11592](https://github.com/langchain-ai/langchain/issues/11592)

Returning ID is useful to update existing records in a vector store, but
we cannot know them if we use some retrievers.

In order to avoid any conflicts, breaking changes, the new fields in
metadata have a prefix `_`

## Dependencies

N/A

## Twitter handle

@kill_in_sun

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-29 19:59:54 -08:00
hulitaitai
32cad38ec6 <langchain_community\llms\chatglm.py>: <Correcting "history"> (#16729)
Use the real "history" provided by the original program instead of
putting "None" in the history.

- **Description:** I change one line in the code to make it return the
"history" of the chat model.
- **Issue:** At the moment it returns only the answers of the chat
model. However the chat model himself provides a history more complet
with the questions of the user.
  - **Dependencies:** no dependencies required for this change,
2024-01-29 19:50:31 -08:00
Bassem Yacoube
85e93e05ed community[minor]: Update OctoAI LLM, Embedding and documentation (#16710)
This PR includes updates for OctoAI integrations:
- The LLM class was updated to fix a bug that occurs with multiple
sequential calls
- The Embedding class was updated to support the new GTE-Large endpoint
released on OctoAI lately
- The documentation jupyter notebook was updated to reflect using the
new LLM sdk
Thank you!
2024-01-29 13:57:17 -08:00
Volodymyr Machula
32c5be8b73 community[minor]: Connery Tool and Toolkit (#14506)
## Summary

This PR implements the "Connery Action Tool" and "Connery Toolkit".
Using them, you can integrate Connery actions into your LangChain agents
and chains.

Connery is an open-source plugin infrastructure for AI.

With Connery, you can easily create a custom plugin with a set of
actions and seamlessly integrate them into your LangChain agents and
chains. Connery will handle the rest: runtime, authorization, secret
management, access management, audit logs, and other vital features.
Additionally, Connery and our community offer a wide range of
ready-to-use open-source plugins for your convenience.

Learn more about Connery:

- GitHub: https://github.com/connery-io/connery-platform
- Documentation: https://docs.connery.io
- Twitter: https://twitter.com/connery_io

## TODOs

- [x] API wrapper
   - [x] Integration tests
- [x] Connery Action Tool
   - [x] Docs
   - [x] Example
   - [x] Integration tests
- [x] Connery Toolkit
  - [x] Docs
  - [x] Example
- [x] Formatting (`make format`)
- [x] Linting (`make lint`)
- [x] Testing (`make test`)
2024-01-29 12:45:03 -08:00
Harrison Chase
8457c31c04 community[patch]: activeloop ai tql deprecation (#14634)
Co-authored-by: AdkSarsen <adilkhan@activeloop.ai>
2024-01-29 12:43:54 -08:00
Neli Hateva
c95facc293 langchain[minor], community[minor]: Implement Ontotext GraphDB QA Chain (#16019)
- **Description:** Implement Ontotext GraphDB QA Chain
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** @OntotextGraphDB
2024-01-29 12:25:53 -08:00
Elliot
39eb00d304 community[patch]: Adapt more parameters related to MemorySearchPayload for the search method of ZepChatMessageHistory (#15441)
- **Description:** To adapt more parameters related to
MemorySearchPayload for the search method of ZepChatMessageHistory,
  - **Issue:** None,
  - **Dependencies:** None,
  - **Twitter handle:** None
2024-01-29 11:45:55 -08:00
Jael Gu
a1aa3a657c community[patch]: Milvus supports add & delete texts by ids (#16256)
# Description

To support [langchain
indexing](https://python.langchain.com/docs/modules/data_connection/indexing)
as requested by users, vectorstore Milvus needs to support:
- document addition by id (`add_documents` method with `ids` argument)
- delete by id (`delete` method with `ids` argument)

Example usage:

```python
from langchain.indexes import SQLRecordManager, index
from langchain.schema import Document
from langchain_community.vectorstores import Milvus
from langchain_openai import OpenAIEmbeddings

collection_name = "test_index"
embedding = OpenAIEmbeddings()
vectorstore = Milvus(embedding_function=embedding, collection_name=collection_name)

namespace = f"milvus/{collection_name}"
record_manager = SQLRecordManager(
    namespace, db_url="sqlite:///record_manager_cache.sql"
)
record_manager.create_schema()

doc1 = Document(page_content="kitty", metadata={"source": "kitty.txt"})
doc2 = Document(page_content="doggy", metadata={"source": "doggy.txt"})

index(
    [doc1, doc1, doc2],
    record_manager,
    vectorstore,
    cleanup="incremental",  # None, "incremental", or "full"
    source_id_key="source",
)
```

# Fix issues

Fix https://github.com/milvus-io/milvus/issues/30112

---------

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-29 11:19:50 -08:00
Michard Hugo
e9d3527b79 community[patch]: Add missing async similarity_distance_threshold handling in RedisVectorStoreRetriever (#16359)
Add missing async similarity_distance_threshold handling in
RedisVectorStoreRetriever

- **Description:** added method `_aget_relevant_documents` to
`RedisVectorStoreRetriever` that overrides parent method to add support
of `similarity_distance_threshold` in async mode (as for sync mode)
  - **Issue:** #16099
  - **Dependencies:** N/A
  - **Twitter handle:** N/A
2024-01-29 11:19:30 -08:00
Anthony Bernabeu
2db79ab111 community[patch]: Implement TTL for DynamoDBChatMessageHistory (#15478)
- **Description:** Implement TTL for DynamoDBChatMessageHistory, 
  - **Issue:** see #15477,
  - **Dependencies:** N/A,

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
2024-01-29 10:22:46 -08:00
taimo
d3d9244fee langchain-community: fix unicode escaping issue with SlackToolkit (#16616)
- **Description:** fix unicode escaping issue with SlackToolkit
  - **Issue:**  #16610
2024-01-29 08:38:12 -08:00
Benito Geordie
f3fdc5c5da community: Added integrations for ThirdAI's NeuralDB with Retriever and VectorStore frameworks (#15280)
**Description:** Adds ThirdAI NeuralDB retriever and vectorstore
integration. NeuralDB is a CPU-friendly and fine-tunable text retrieval
engine.
2024-01-29 08:35:42 -08:00
Pashva Mehta
22d90800c8 community: Fixed schema discrepancy in from_texts function for weaviate vectorstore (#16693)
* Description: Fixed schema discrepancy in **from_texts** function for
weaviate vectorstore which created a redundant property "key" inside a
class.
* Issue: Fixed: https://github.com/langchain-ai/langchain/issues/16692
* Twitter handle: @pashvamehta1
2024-01-28 16:53:31 -08:00
Daniel Erenrich
0600998f38 community: Wikidata tool support (#16691)
- **Description:** Adds Wikidata support to langchain. Can read out
documents from Wikidata.
  - **Issue:** N/A
- **Dependencies:** Adds implicit dependencies for
`wikibase-rest-api-client` (for turning items into docs) and
`mediawikiapi` (for hitting the search endpoint)
  - **Twitter handle:** @derenrich

You can see an example of this tool used in a chain
[here](https://nbviewer.org/urls/d.erenrich.net/upload/Wikidata_Langchain.ipynb)
or
[here](https://nbviewer.org/urls/d.erenrich.net/upload/Wikidata_Lars_Kai_Hansen.ipynb)

<!-- Thank you for contributing to LangChain!


Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-28 16:45:21 -08:00
Christophe Bornet
2e3af04080 Use Postponed Evaluation of Annotations in Astra and Cassandra doc loaders (#16694)
Minor/cosmetic change
2024-01-28 16:39:27 -08:00
Christophe Bornet
36e432672a community[minor]: Add async methods to AstraDBLoader (#16652) 2024-01-27 17:05:41 -08:00
Rashedul Hasan Rijul
481493dbce community[patch]: apply embedding functions during query if defined (#16646)
**Description:** This update ensures that the user-defined embedding
function specified during vector store creation is applied during
queries. Previously, even if a custom embedding function was defined at
the time of store creation, Bagel DB would default to using the standard
embedding function during query execution. This pull request addresses
this issue by consistently using the user-defined embedding function for
queries if one has been specified earlier.
2024-01-27 16:46:33 -08:00