Commit Graph

2272 Commits

Author SHA1 Message Date
Davis Chase
a077fddda0 bump 187 (#5504) 2023-05-31 10:13:53 -07:00
Harrison Chase
17df1cbf70 add more vars to text splitter (#5503) 2023-05-31 10:13:21 -07:00
Piyush Jain
ece2a04ced Bedrock llm and embeddings (#5464)
# Bedrock LLM and Embeddings
This PR adds a new LLM and an Embeddings class for the
[Bedrock](https://aws.amazon.com/bedrock) service. The PR also includes
example notebooks for using the LLM class in a conversation chain and
embeddings usage in creating an embedding for a query and document.

**Note**: AWS is doing a private release of the Bedrock service on
05/31/2023; users need to request access and added to an allowlist in
order to start using the Bedrock models and embeddings. Please use the
[Bedrock Home Page](https://aws.amazon.com/bedrock) to request access
and to learn more about the models available in Bedrock.

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-05-31 10:13:21 -07:00
Harrison Chase
a047400ca1 code splitter docs (#5480)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:13:21 -07:00
Harrison Chase
77de60366b Add matching engine vectorstore (#3350)
Co-authored-by: Tom Piaggio <tomaspiaggio@google.com>
Co-authored-by: scafati98 <jupyter@matchingengine.us-central1-a.c.scafati-joonix.internal>
Co-authored-by: scafati98 <scafatieugenio@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:13:21 -07:00
Kacper Łukawski
36457c7d5b Feature: Qdrant filters supports (#5446)
# Support Qdrant filters

Qdrant has an [extensive filtering
system](https://qdrant.tech/documentation/concepts/filtering/) with rich
type support. This PR makes it possible to use the filters in Langchain
by passing an additional param to both the
`similarity_search_with_score` and `similarity_search` methods.

## Who can review?

@dev2049 @hwchase17

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:13:21 -07:00
Harrison Chase
65619ce90c Harrison/html splitter (#5468)
Co-authored-by: David Revillas <26328973+r3v1@users.noreply.github.com>
2023-05-31 10:13:21 -07:00
Ankush Gola
ec4c508d50 py tracer fixes (#5377) 2023-05-31 10:13:20 -07:00
Jose Ignacio Hervás Díaz
c9dc343062 SQLite-backed Entity Memory (#5129)
# SQLite-backed Entity Memory

Following the initiative of
https://github.com/hwchase17/langchain/pull/2397 I think it would be
helpful to be able to persist Entity Memory on disk by default

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
Jeff Vestal
153df9dc36 Allow ElasticsearchEmbeddings to create a connection with ES Client object (#5321)
This PR adds a new method `from_es_connection` to the
`ElasticsearchEmbeddings` class allowing users to use Elasticsearch
clusters outside of Elastic Cloud.

Users can create an Elasticsearch Client object and pass that to the new
function.
The returned object is identical to the one returned by calling
`from_credentials`

```
# Create Elasticsearch connection
es_connection = Elasticsearch(
    hosts=['https://es_cluster_url:port'], 
    basic_auth=('user', 'password')
)

# Instantiate ElasticsearchEmbeddings using es_connection
embeddings = ElasticsearchEmbeddings.from_es_connection(
  model_id,
  es_connection,
)
```

I also added examples to the elasticsearch jupyter notebook

Fixes # https://github.com/hwchase17/langchain/issues/5239

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
Mark Pors
4b57854de2 Allow for async use of SelfAskWithSearchChain (#5394)
# Allow for async use of SelfAskWithSearchChain


Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
Víctor Navarro Aránguiz
a0f0602aec added n_threads functionality for gpt4all (#5427)
# Added support for modifying the number of threads in the GPT4All model

I have added the capability to modify the number of threads used by the
GPT4All model. This allows users to adjust the model's parallel
processing capabilities based on their specific requirements.

## Changes Made
- Updated the `validate_environment` method to set the number of threads
for the GPT4All model using the `values["n_threads"]` parameter from the
`GPT4All` class constructor.

## Context
Useful in scenarios where users want to optimize the model's performance
by leveraging multi-threading capabilities.
Please note that the `n_threads` parameter was included in the `GPT4All`
class constructor but was previously unused. This change ensures that
the specified number of threads is utilized by the model .

## Dependencies
There are no new dependencies introduced by this change. It only
utilizes existing functionality provided by the GPT4All package.

## Testing
Since this is a minor change testing is not required.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
Blithe
05e137e39e convert the parameter 'text' to uppercase in the function 'parse' of the class BooleanOutputParser (#5397)
when the LLMs output 'yes|no',BooleanOutputParser can parse it to
'True|False', fix the ValueError in parse().
<!--
when use the BooleanOutputParser in the chain_filter.py, the LLMs output
'yes|no',the function 'parse' will throw ValueError。
-->

Fixes # (issue)
  #5396
  https://github.com/hwchase17/langchain/issues/5396

---------

Co-authored-by: gaofeng27692 <gaofeng27692@hundsun.com>
2023-05-31 10:12:10 -07:00
Natalie
75d09059e5 Ability to specify credentials wihen using Google BigQuery as a data loader (#5466)
# Adds ability to specify credentials when using Google BigQuery as a
data loader

Fixes #5465 . Adds ability to set credentials which must be of the
`google.auth.credentials.Credentials` type. This argument is optional
and will default to `None.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
Harrison Chase
354176e835 add simple test for imports (#5461)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
Janos Tolgyesi
f1519460f6 Add maximal relevance search to SKLearnVectorStore (#5430)
# Add maximal relevance search to SKLearnVectorStore

This PR implements the maximum relevance search in SKLearnVectorStore. 

Twitter handle: jtolgyesi (I submitted also the original implementation
of SKLearnVectorStore)

## Before submitting

Unit tests are included.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
Ayan Bandyopadhyay
3065fab122 Update psychicapi version (#5471)
Update [psychicapi](https://pypi.org/project/psychicapi/) python package
dependency to the latest version 0.5. The newest python package version
addresses breaking changes in the Psychic http api.
2023-05-31 10:12:10 -07:00
Kacper Łukawski
5a179a6dda Feat: Add batching to Qdrant (#5443)
# Add batching to Qdrant

Several people requested a batching mechanism while uploading data to
Qdrant. It is important, as there are some limits for the maximum size
of the request payload, and without batching implemented in Langchain,
users need to implement it on their own. This PR exposes a new optional
`batch_size` parameter, so all the documents/texts are loaded in batches
of the expected size (64, by default).

The integration tests of Qdrant are extended to cover two cases:
1. Documents are sent in separate batches.
2. All the documents are sent in a single request.
2023-05-31 10:12:10 -07:00
Camille Van Hoffelen
cc30791a2a Added async _acall to FakeListLLM (#5439)
# Added Async _acall to FakeListLLM

FakeListLLM is handy when unit testing apps built with langchain. This
allows the use of FakeListLLM inside concurrent code with
[asyncio](https://docs.python.org/3/library/asyncio.html).

I also changed the pydocstring which was out of date.

## Who can review?

@hwchase17 - project lead
@agola11 - async
2023-05-31 10:12:10 -07:00
Leonid Ganeline
62446aacb5 docs: cleaning (#5413)
# docs cleaning

Changed docs to consistent format (probably, we need an official doc
integration template):
- ClearML - added product descriptions; changed title/headers
- Rebuff  - added product descriptions; changed title/headers
- WhyLabs  - added product descriptions; changed title/headers
- Docugami - changed title/headers/structure
- Airbyte - fixed title
- Wolfram Alpha - added descriptions, fixed title
- OpenWeatherMap -  - added product descriptions; changed title/headers
- Unstructured - changed description

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@hwchase17
@dev2049
2023-05-31 10:12:10 -07:00
Matt Wells
7054826246 MRKL output parser no longer breaks well formed queries (#5432)
# Handles the edge scenario in which the action input is a well formed
SQL query which ends with a quoted column

There may be a cleaner option here (or indeed other edge scenarios) but
this seems to robustly determine if the action input is likely to be a
well formed SQL query in which we don't want to arbitrarily trim off `"`
characters

Fixes #5423

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Agents / Tools / Toolkits
  - @vowelparrot
2023-05-31 10:12:10 -07:00
Yoann Poupart
cade6f0fde encoding_kwargs for InstructEmbeddings (#5450)
# What does this PR do?

Bring support of `encode_kwargs` for ` HuggingFaceInstructEmbeddings`,
change the docstring example and add a test to illustrate with
`normalize_embeddings`.

Fixes #3605
(Similar to #3914)

Use case:
```python
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = "hkunlp/instructor-large"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
hf = HuggingFaceInstructEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)
```
2023-05-31 10:12:10 -07:00
Patrick Keane
395e359ecf Removes duplicated call from langchain/client/langchain.py (#5449)
This removes duplicate code presumably introduced by a cut-and-paste
error, spotted while reviewing the code in
```langchain/client/langchain.py```. The original code had back to back
occurrences of the following code block:

```
        response = self._get(
            path,
            params=params,
        )
        raise_for_status_with_text(response)
```
2023-05-31 10:12:10 -07:00
Jan Brinkmann
add07a3db5 Fixed docstring in faiss.py for load_local (#5440)
# Fix for docstring in faiss.py vectorstore (load_local)

The doctring should reflect that load_local loads something FROM the
disk.
2023-05-31 10:12:10 -07:00
Davis Chase
6ba8fcc5d6 bump 186 (#5459) 2023-05-31 10:12:10 -07:00
Davis Chase
f0ac0b83b1 fix (#5457) 2023-05-31 10:12:10 -07:00
Davis Chase
8805b090b0 bump 185 (#5442) 2023-05-31 10:12:10 -07:00
ByronHsu
de93fc91ab Add more code splitters (go, rst, js, java, cpp, scala, ruby, php, swift, rust) (#5171)
As the title says, I added more code splitters.
The implementation is trivial, so i don't add separate tests for each
splitter.
Let me know if any concerns.

Fixes # (issue)
https://github.com/hwchase17/langchain/issues/5170

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@eyurtsev @hwchase17

---------

Signed-off-by: byhsu <byhsu@linkedin.com>
Co-authored-by: byhsu <byhsu@linkedin.com>
2023-05-31 10:12:10 -07:00
Paul-Emile Brotons
5cd2f833c3 adding MongoDBAtlasVectorSearch (#5338)
# Add MongoDBAtlasVectorSearch for the python library

Fixes #5337
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
Harrison Chase
fb379025d2 Harrison/condense q llm (#5438) 2023-05-31 10:12:10 -07:00
Lei Xu
e84f9ea812 Rename and fix typo in lancedb (#5425)
# Fix typo in LanceDB notebook filename
2023-05-31 10:12:10 -07:00
Zander Chase
f8b6bcb4a3 Set old LCTracer to default to port 8000 (#5381)
Issue from:
https://discord.com/channels/1038097195422978059/1069478035918688346/1112445980466483222
2023-05-31 10:12:10 -07:00
Harrison Chase
7c4a0aa97a Harrison/spark reader (#5405)
Co-authored-by: Rithwik Ediga Lakhamsani <rithwik.ediga@databricks.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
UmerHA
0cc2bd49d9 DocumentLoader for GitHub (#5408)
# Creates GitHubLoader (#5257)

GitHubLoader is a DocumentLoader that loads issues and PRs from GitHub.

Fixes #5257

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
German Martin
da04f08882 New Trello document loader (#4767)
# Added New Trello loader class and documentation

Simple Loader on top of py-trello wrapper. 
With a board name you can pull cards and to do some field parameter
tweaks on load operation.
I included documentation and examples.
Included unit test cases using patch and a fixture for py-trello client
class.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
Harrison Chase
cb0a5cd98c Harrison/text splitter (#5417)
adds support for keeping separators around when using recursive text
splitter
2023-05-31 10:12:10 -07:00
小铭
368ffb3334 Add ToolException that a tool can throw. (#5050)
# Add ToolException that a tool can throw
This is an optional exception that tool throws when execution error
occurs.
When this exception is thrown, the agent will not stop working,but will
handle the exception according to the handle_tool_error variable of the
tool,and the processing result will be returned to the agent as
observation,and printed in pink on the console.It can be used like this:
```python 
from langchain.schema import ToolException
from langchain import LLMMathChain, SerpAPIWrapper, OpenAI
from langchain.agents import AgentType, initialize_agent
from langchain.chat_models import ChatOpenAI
from langchain.tools import BaseTool, StructuredTool, Tool, tool
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0)
llm_math_chain = LLMMathChain(llm=llm, verbose=True)

class Error_tool:
    def run(self, s: str):
        raise ToolException('The current search tool is not available.')
    
def handle_tool_error(error) -> str:
    return "The following errors occurred during tool execution:"+str(error)

search_tool1 = Error_tool()
search_tool2 = SerpAPIWrapper()
tools = [
    Tool.from_function(
        func=search_tool1.run,
        name="Search_tool1",
        description="useful for when you need to answer questions about current events.You should give priority to using it.",
        handle_tool_error=handle_tool_error,
    ),
    Tool.from_function(
        func=search_tool2.run,
        name="Search_tool2",
        description="useful for when you need to answer questions about current events",
        return_direct=True,
    )
]
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True,
                         handle_tool_errors=handle_tool_error)
agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?")
```

![image](https://github.com/hwchase17/langchain/assets/32786500/51930410-b26e-4f85-a1e1-e6a6fb450ada)

## Who can review?
- @vowelparrot

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 10:12:10 -07:00
Harrison Chase
75efcc7e6d bump version 184 (#5407) 2023-05-31 10:12:10 -07:00
Harrison Chase
8ccce24bf4 Harrison/datetime parser (#4693)
Co-authored-by: Jacob Valdez <jacobfv@msn.com>
Co-authored-by: Jacob Valdez <jacob.valdez@limboid.ai>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-05-31 10:12:10 -07:00
Leonid Ganeline
118194b6f9 docs: ecosystem/integrations update 1 (#5219)
# docs: ecosystem/integrations update

It is the first in a series of `ecosystem/integrations` updates.

The ecosystem/integrations list is missing many integrations.
I'm adding the missing integrations in a consistent format: 
1. description of the integrated system
2. `Installation and Setup` section with 'pip install ...`, Key setup,
and other necessary settings
3. Sections like `LLM`, `Text Embedding Models`, `Chat Models`... with
links to correspondent examples and imports of the used classes.

This PR keeps new docs, that are presented in the
`docs/modules/models/text_embedding/examples` but missed in the
`ecosystem/integrations`. The next PRs will cover the next example
sections.

Also updated `integrations.rst`: added the `Dependencies` section with a
link to the packages used in LangChain.

## Who can review?

@hwchase17
@eyurtsev
@dev2049
2023-05-31 10:12:10 -07:00
Leonid Ganeline
e80fb4085b docs: ecosystem/integrations update 2 (#5282)
# docs: ecosystem/integrations update 2

#5219 - part 1 
The second part of this update (parts are independent of each other! no
overlap):

- added diffbot.md
- updated confluence.ipynb; added confluence.md
- updated college_confidential.md
- updated openai.md
- added blackboard.md
- added bilibili.md
- added azure_blob_storage.md
- added azlyrics.md
- added aws_s3.md

## Who can review?

@hwchase17@agola11
@agola11
 @vowelparrot
 @dev2049
2023-05-31 10:12:10 -07:00
Eduard van Valkenburg
ede5f1a78b Implemented appending arbitrary messages (#5293)
# Implemented appending arbitrary messages to the base chat message
history, the in-memory and cosmos ones.

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

As discussed this is the alternative way instead of #4480, with a
add_message method added that takes a BaseMessage as input, so that the
user can control what is in the base message like kwargs.

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@hwchase17

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-31 10:12:10 -07:00
Harrison Chase
e8f75af025 Harrison/prediction guard update (#5404)
Co-authored-by: Daniel Whitenack <whitenack.daniel@gmail.com>
2023-05-31 10:12:10 -07:00
Harrison Chase
9fd0b5f699 Harrison/deep infra (#5403)
Co-authored-by: Yessen Kanapin <yessenzhar@gmail.com>
Co-authored-by: Yessen Kanapin <yessen@deepinfra.com>
2023-05-31 10:12:10 -07:00
Timothy Ji
d1fd6db81a Reformat openai proxy setting as code (#5330)
# Reformat the openai proxy setting as code


  Only affect the doc for openai Model
  - @hwchase17
  - @agola11
2023-05-31 10:12:10 -07:00
Justin Flick
9526699c62 Add pagination for Vertex AI embeddings (#5325)
Fixes #5316

---------

Co-authored-by: Justin Flick <jflick@homesite.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-31 10:12:10 -07:00
Harrison Chase
31070269fd Harrison/llamacpp (#5402)
Co-authored-by: Gavin S <gavinswanson@gmail.com>
2023-05-31 10:12:10 -07:00
Chandan Routray
35f4bbf432 Removed deprecated llm attribute for load_chain (#5343)
# Removed deprecated llm attribute for load_chain

Currently `load_chain` for some chain types expect `llm` attribute to be
present but `llm` is deprecated attribute for those chains and might not
be persisted during their `chain.save`.

Fixes #5224
[(issue)](https://github.com/hwchase17/langchain/issues/5224)

## Who can review?
@hwchase17
@dev2049

---------

Co-authored-by: imeckr <chandanroutray2012@gmail.com>
2023-05-31 10:12:10 -07:00
Oleh Kuznetsov
d0e4def7bf Update llamacpp demonstration notebook (#5344)
# Update llamacpp demonstration notebook

Add instructions to install with BLAS backend, and update the example of
model usage.

Fixes #5071. However, it is more like a prevention of similar issues in
the future, not a fix, since there was no problem in the framework
functionality

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

- @hwchase17 
- @agola11
2023-05-31 10:12:10 -07:00
Martin Holecek
93df74f10b Fix update_document function, add test and documentation. (#5359)
# Fix for `update_document` Function in Chroma

## Summary
This pull request addresses an issue with the `update_document` function
in the Chroma class, as described in
[#5031](https://github.com/hwchase17/langchain/issues/5031#issuecomment-1562577947).
The issue was identified as an `AttributeError` raised when calling
`update_document` due to a missing corresponding method in the
`Collection` object. This fix refactors the `update_document` method in
`Chroma` to correctly interact with the `Collection` object.

## Changes
1. Fixed the `update_document` method in the `Chroma` class to correctly
call methods on the `Collection` object.
2. Added the corresponding test `test_chroma_update_document` in
`tests/integration_tests/vectorstores/test_chroma.py` to reflect the
updated method call.
3. Added an example and explanation of how to use the `update_document`
function in the Jupyter notebook tutorial for Chroma.

## Test Plan
All existing tests pass after this change. In addition, the
`test_chroma_update_document` test case now correctly checks the
functionality of `update_document`, ensuring that the function works as
expected and updates the content of documents correctly.

## Reviewers
@dev2049

This fix will ensure that users are able to use the `update_document`
function as expected, without encountering the previous
`AttributeError`. This will enhance the usability and reliability of the
Chroma class for all users.

Thank you for considering this pull request. I look forward to your
feedback and suggestions.
2023-05-31 10:12:10 -07:00