Compare commits

...

912 Commits

Author SHA1 Message Date
Harrison Chase
08eaf914d6 add markdown header metadata to docs 2023-06-16 06:54:51 -07:00
hp0404
b01cf0dd54 ArxivAPIWrapper - doc_content_chars_max (#6063)
This PR refactors the ArxivAPIWrapper class making
`doc_content_chars_max` parameter optional. Additionally, tests have
been added to ensure the functionality of the doc_content_chars_max
parameter.

Fixes #6027 (issue)
2023-06-15 22:16:42 -07:00
Daniel King
a9b97aa6f4 Update output format of MosaicML endpoint to be more flexible (#6060)
There will likely be another change or two coming over the next couple
weeks as we stabilize the API, but putting this one in now which just
makes the integration a bit more flexible with the response output
format.

```
(langchain) danielking@MML-1B940F4333E2 langchain % pytest tests/integration_tests/llms/test_mosaicml.py tests/integration_tests/embeddings/test_mosaicml.py 
=================================================================================== test session starts ===================================================================================
platform darwin -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0
rootdir: /Users/danielking/github/langchain
configfile: pyproject.toml
plugins: asyncio-0.20.3, mock-3.10.0, dotenv-0.5.2, cov-4.0.0, anyio-3.6.2
asyncio: mode=strict
collected 12 items                                                                                                                                                                        

tests/integration_tests/llms/test_mosaicml.py ......                                                                                                                                [ 50%]
tests/integration_tests/embeddings/test_mosaicml.py ......                                                                                                                          [100%]

=================================================================================== slowest 5 durations ===================================================================================
4.76s call     tests/integration_tests/llms/test_mosaicml.py::test_retry_logic
4.74s call     tests/integration_tests/llms/test_mosaicml.py::test_mosaicml_llm_call
4.13s call     tests/integration_tests/llms/test_mosaicml.py::test_instruct_prompt
0.91s call     tests/integration_tests/llms/test_mosaicml.py::test_short_retry_does_not_loop
0.66s call     tests/integration_tests/llms/test_mosaicml.py::test_mosaicml_extra_kwargs
=================================================================================== 12 passed in 19.70s ===================================================================================
```

#### Who can review?

  @hwchase17
  @dev2049
2023-06-15 22:15:39 -07:00
JaysonAlbert
50d9c7d5a4 Fix: change the chatgpt plugin retriever metadata format (#5920)
the current implement put the doc itself as the metadata, but the
document chatgpt plugin retriever returned already has a `metadata`
field, it's better to use that instead.

the original code will throw the following exception when using
`RetrievalQAWithSourcesChain`, becuse it can not find the field
`metadata`:

```python
Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Document prompt requires documents to have metadata variables: ['source']. Received document with missing metadata: ['source'].
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 27, in format_document
    raise ValueError(
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in <listcomp>
    doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in _get_inputs
    doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 85, in combine_docs
    inputs = self._get_inputs(docs, **kwargs)
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 84, in _call
    output, extra_return_dict = self.combine_docs(
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/base.py", line 140, in __call__
    raise e
```

Additionally, the `metadata` filed in the `chatgpt plugin retriever`
have these fileds by default:
```json
{
    "source":  "file",   //email, file or chat
    "source_id": "filename.docx", // the filename
    "url": "", 
    ...
}
```
so, we should set `source_id` to `source` in the langchain metadata.

```python
metadata = d.pop("metadata", d)
if(metadata.get("source_id")):
    metadata["source"] = metadata.pop("source_id")
```

#### Who can review?
@dev2049

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: wangjie <wangjie@htffund.com>
2023-06-15 22:04:45 -07:00
Harrison Chase
e67b26eee9 Harrison/openai functions (#6261)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
2023-06-15 21:54:39 -07:00
Harrison Chase
6aafb46807 Harrison/openai functions (#6223)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
2023-06-15 21:43:33 -07:00
Zander Chase
bc9b8c8239 Improve Error Message for failed callback (#6247)
Include the handler class name in the warning
2023-06-15 19:18:37 -07:00
Alon Roth
0013256e81 Support chat history persistence in AutoGPT (#5716)
**Short Description**
Added a new argument to AutoGPT class which allows to persist the chat
history to a file.

**Changes**
1. Removed the `self.full_message_history: List[BaseMessage] = []`
2. Replaced it with `chat_history_memory` which can take any subclasses
of `BaseChatMessageHistory`

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-15 17:49:03 -07:00
Martin Antos
1913320cbe Feature/add acreom loader (#5780)
adding new loader for [acreom](https://acreom.com) vaults. It's based on
the Obsidian loader with some additional text processing for acreom
specific markdown elements.

 @eyurtsev please take a look!

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-15 11:53:00 -07:00
Zander Chase
ae76e473e1 Add Tags for LLMs (#6229)
- [x] Add tracing tags to LLMs + Chat Models (both inheritable and
local)
- [x] Add tags for the run_on_dataset helper function(s)
2023-06-15 11:24:11 -07:00
Harrison Chase
8e1a7a8646 bump version to 201 (#6233) 2023-06-15 08:28:47 -07:00
Harrison Chase
e82687ddf4 Harrison/use functions agent (#6185)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
2023-06-15 08:18:50 -07:00
Ryo Kanazawa
7d2b946d0b Fix typo pandocs to pandoc (#6203)
Fixes https://github.com/hwchase17/langchain/issues/6204

### Context

An typo issue with `pandoc`.

#### Who can review?
@hwchase17
2023-06-15 08:18:27 -07:00
Kyle Roth
c7db9febb0 count tokens for new OpenAI model versions (#6195)
Trying to call `ChatOpenAI.get_num_tokens_from_messages` returns the
following error for the newly announced models `gpt-3.5-turbo-0613` and
`gpt-4-0613`:

```
NotImplementedError: get_num_tokens_from_messages() is not presently implemented for model gpt-3.5-turbo-0613.See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.
```

This adds support for counting tokens for those models, by counting
tokens the same way they're counted for the previous versions of
`gpt-3.5-turbo` and `gpt-4`.

#### reviewers

  - @hwchase17
  - @agola11
2023-06-15 06:16:03 -07:00
xu0o0
7ad13cdbdb feat: add content_format param to ConfluenceLoader.load() (#5922)
Confluence API supports difference format of page content. The storage
format is the raw XML representation for storage. The view format is the
HTML representation for viewing with macros rendered as though it is
viewed by users.

Add the `content_format` parameter to `ConfluenceLoader.load()` to
specify the content format, this is
set to `ContentFormat.STORAGE` by default.

#### Who can review?

Tag maintainers/contributors who might be interested: @eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-14 16:56:28 -07:00
0xJordan
c5a46e7435 feat: Add support for the Solidity language (#6054)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

## Add Solidity programming language support for code splitter.

Twitter: @0xjord4n_

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-14 14:25:02 -07:00
Nuno Campos
17c4ec4812 Add docs for tags (#6155)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-14 14:01:58 -07:00
thiswillbeyourgithub
4a649e3b14 typo: 'following following' to 'following' (#6163)
Co-authored-by: thiswillbeyourgithub <github@32mail.33mail.com>
2023-06-14 10:58:47 -07:00
Maciej Bryński
8a44c879c6 Update readthedocs_documentation.ipynb (#6148)
Minor fix in documentation. 
Change URL in wget call to proper one.
2023-06-14 07:21:48 -07:00
Zander Chase
e0e3ef1c57 Update Name (#6136) 2023-06-13 22:25:36 -07:00
Zander Chase
4555ad5d1f Add Run Collector Callback (#6133)
Add a callback handler that can collect nested run objects. Useful for
evaluation.
2023-06-13 22:17:37 -07:00
Harrison Chase
6ac120f299 bump ver to 200 (#6130) 2023-06-13 19:33:51 -07:00
Harrison Chase
e41f0b341c add functions agent (#6113) 2023-06-13 18:51:01 -07:00
Zander Chase
b3b155d488 Return session name in runner response (#6112)
Makes it easier to then run evals w/o thinking about specifying a
session
2023-06-13 16:59:43 -07:00
Harrison Chase
e74733ab9e support streaming for functions (#6115) 2023-06-13 15:26:26 -07:00
Nuno Campos
11ab0be11a Add support for tags (#5898)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-13 12:30:59 -07:00
Harrison Chase
1281fdf0f2 Harrison/notebook functions (#6103) 2023-06-13 10:52:54 -07:00
Harrison Chase
34ebb29726 bump version to 199 (#6102) 2023-06-13 10:50:33 -07:00
Wenchen Li
f9edf76e7c Implement max_marginal_relevance_search in VectorStore of Pinecone (#6056)
This adds implementation of MMR search in pinecone; and I have two
semi-related observations about this vector store class:
- Maybe we should also have a
`similarity_search_by_vector_returning_embeddings` like in supabase, but
it's not in the base `VectorStore` class so I didn't implement
- Talking about the base class, there's
`similarity_search_with_relevance_scores`, but in pinecone it is called
`similarity_search_with_score`; maybe we should consider renaming it to
align with other `VectorStore` base and sub classes (or add that as an
alias for backward compatibility)

#### Who can review?

Tag maintainers/contributors who might be interested:
 - VectorStores / Retrievers / Memory - @dev2049
2023-06-13 10:46:45 -07:00
Harrison Chase
970b2f9d38 convert tools to openai (#6100) 2023-06-13 10:40:49 -07:00
Harrison Chase
292accde2b support functions (#6099) 2023-06-13 10:32:58 -07:00
Lance Martin
ee3d0513ad Add tests and update notebook for MarkdownHeaderTextSplitter (#6069)
Add test and update notebook for `MarkdownHeaderTextSplitter`.
2023-06-13 09:07:52 -07:00
Keshav Kumar
8fdf88b8e3 Fix for ModuleNotFoundError while running langchain-server. Issue #5833 (#6077)
This PR fixes the error
`ModuleNotFoundError: No module named 'langchain.cli'`
Fixes https://github.com/hwchase17/langchain/issues/5833 (issue)
2023-06-13 08:37:07 -07:00
Zander Chase
0c52275bdb Use Run object from SDK (#6067)
Update the Run object in the tracer to extend that in the SDK to include
the parameters necessary for tracking/tracing
2023-06-13 07:14:11 -07:00
Harrison Chase
cde1e8739a turn off repr (#6078) 2023-06-12 22:45:24 -07:00
Nuno Campos
a9b3b2e327 Enable serialization for anthropic (#6049) 2023-06-12 22:39:10 -07:00
Harrison Chase
6ac5d80286 propogate kwargs fully (#6076) 2023-06-12 22:37:55 -07:00
Harrison Chase
ec1a2adf9c improve tools (#6062) 2023-06-12 22:19:03 -07:00
Julius Lipp
5b6bbf4ab2 Add embaas document extraction api endpoints (#6048)
# Introduces embaas document extraction api endpoints

In this PR, we add support for embaas document extraction endpoints to
Text Embedding Models (with LLMs, in different PRs coming). We currently
offer the MTEB leaderboard top performers, will continue to add top
embedding models and soon add support for customers to deploy thier own
models. Additional Documentation + Infomation can be found
[here](https://embaas.io).

While developing this integration, I closely followed the patterns
established by other langchain integrations. Nonetheless, if there are
any aspects that require adjustments or if there's a better way to
present a new integration, let me know! :)

Additionally, I fixed some docs in the embeddings integration.

Related PR: #5976 

#### Who can review?
  DataLoaders
  - @eyurtsev
2023-06-12 19:13:52 -07:00
Zander Chase
2f0088039d Log tracer errors (#6066)
Example (would log several times if not for the helper fn. Would emit no
logs due to mulithreading previously)

![image](https://github.com/hwchase17/langchain/assets/130414180/070d25ae-1f06-4487-9617-0a6f66f3f01e)
2023-06-12 17:13:49 -07:00
Lance Martin
b023f0c0f2 Text splitter for Markdown files by header (#5860)
This creates a new kind of text splitter for markdown files.

The user can supply a set of headers that they want to split the file
on.

We define a new text splitter class, `MarkdownHeaderTextSplitter`, that
does a few things:

(1) For each line, it determines the associated set of user-specified
headers
(2) It groups lines with common headers into splits

See notebook for example usage and test cases.
2023-06-12 15:46:42 -07:00
Jens Madsen
2c91f0d750 chore: spedd up integration test by using smaller model (#6044)
Adds a new parameter `relative_chunk_overlap` for the
`SentenceTransformersTokenTextSplitter` constructor. The parameter sets
the chunk overlap using a relative factor, e.g. for a model where the
token limit is 100, a `relative_chunk_overlap=0.5` implies that
`chunk_overlap=50`

Tag maintainers/contributors who might be interested:

 @hwchase17, @dev2049
2023-06-12 13:27:10 -07:00
Harrison Chase
5922742d56 comment out 2023-06-12 10:57:31 -07:00
Harrison Chase
681ba6d520 embaas title 2023-06-12 08:00:14 -07:00
Ben Flast
7a5e36f3f5 Mongo db doc fix (#6042)
I missed a few errors in my initial fix @hwchase1.  Thanks!
2023-06-12 07:29:27 -07:00
Harrison Chase
289e9aeb9d bump ver to 198 (#6026) 2023-06-11 21:32:45 -07:00
Harrison Chase
d1561b74eb Harrison/cognitive search (#6011)
Co-authored-by: Fabrizio Ruocco <ruoccofabrizio@gmail.com>
2023-06-11 21:15:42 -07:00
wenmeng zhou
bb7ac9edb5 add dashscope text embedding (#5929)
#### What I do
Adding embedding api for
[DashScope](https://help.aliyun.com/product/610100.html), which is the
DAMO Academy's multilingual text unified vector model based on the LLM
base. It caters to multiple mainstream languages worldwide and offers
high-quality vector services, helping developers quickly transform text
data into high-quality vector data. Currently supported languages
include Chinese, English, Spanish, French, Portuguese, Indonesian, and
more.

#### Who can review?

  Models
  - @hwchase17
  - @agola11

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-11 21:14:20 -07:00
Ben Flast
010d0bfeea Update MongoDB Atlas support docs (#6022)
Updating MongoDB Atlas support docs @hwchase17 let me know if you have
any questions
2023-06-11 20:57:15 -07:00
Harrison Chase
e05997c25e Harrison/hologres (#6012)
Co-authored-by: Changgeng Zhao <changgeng@nyu.edu>
Co-authored-by: Changgeng Zhao <zhaochanggeng.zcg@alibaba-inc.com>
2023-06-11 20:56:51 -07:00
ljeagle
c5bce4a465 add from_documents interface in awadb vector store (#6023)
added new interface from_documents in awadb vector store
  @dev2049

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-06-11 19:35:03 -07:00
Zander Chase
2c9619bc1d Remove from PR template (#6018) 2023-06-11 19:34:26 -07:00
ju-bezdek
18f5c985d9 Langchain decorators (#6017)
Added description of LangChain Decorators  into the integration section

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->


#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

@hwchase17 

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-11 19:32:24 -07:00
Zander Chase
a197acfcd3 Update check (#6020)
We were assigning the name as None in on_chat_model_start then not
updating, resulting in a validation error.
2023-06-11 17:59:09 -07:00
Nuno Campos
18af149e91 nc/load (#5733)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-11 15:51:28 -07:00
Zander Chase
614cff89bc I before E (#6015) 2023-06-11 15:45:12 -07:00
Harrison Chase
a7227ee01b Harrison/embaas (#6010)
Co-authored-by: Julius Lipp <43986145+juliuslipp@users.noreply.github.com>
2023-06-11 13:35:14 -07:00
xu0o0
232faba796 fix: TypeError when loading confluence pages by cql (#5878)
The Confluence loader uses the wrong API (`Confluence.cql()` provided by
`atlassian-python-api`) to load pages by CQL.
`Confluence.cql()` is a wrapper of the `/rest/api/search` API which
searches for entities in Confluence.

To search for pages in Confluence, the loader can use the
`/rest/api/content/search` API.

#### Who can review?

Tag maintainers/contributors who might be interested: @eyurtsev
<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
#### References
##### Cloud API

https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content/#api-wiki-rest-api-content-search-get

https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-search/#api-wiki-rest-api-search-get

##### Server API

https://docs.atlassian.com/ConfluenceServer/rest/8.3.1/#api/content-search
https://docs.atlassian.com/ConfluenceServer/rest/8.3.1/#api/search
2023-06-11 13:23:22 -07:00
Akhil Vempali
d7d629911b feat: Added filtering option to FAISS vectorstore (#5966)
Inspired by the filtering capability available in ChromaDB, added the
same functionality to the FAISS vectorestore as well. Since FAISS does
not have an inbuilt method of filtering used the approach suggested in
this [thread](https://github.com/facebookresearch/faiss/issues/1079)
Langchain Issue inspiration:
https://github.com/hwchase17/langchain/issues/4572

- [x] Added filtering capability to semantic similarly and MMR
- [x] Added test cases for filtering in
`tests/integration_tests/vectorstores/test_faiss.py`

#### Who can review?

Tag maintainers/contributors who might be interested:

  VectorStores / Retrievers / Memory
  - @dev2049
  - @hwchase17
2023-06-11 13:20:03 -07:00
Jiaping(JP) Zhang
6e90406e0f [APIChain] enhance the robustness or url (#6008)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

I used the APIChain sometimes it failed during the intermediate step
when generating the api url and calling the `request` function. After
some digging, I found the url sometimes includes the space at the
beginning, like `%20https://...api.com` which causes the `
self.requests_wrapper.get` internal function to fail.

Including a little string preprocessing `.strip` to remove the space
seems to improve the robustness of the APIchain to make sure it can send
the request and retrieve the API result more reliably.

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?
@vowelparrot
Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-11 13:13:57 -07:00
Ikko Eltociear Ashimine
c868a3eef3 Update databricks.md (#6006)
HuggingFace -> Hugging Face


#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?
2023-06-11 13:13:33 -07:00
Harrison Chase
20e9ce8a62 bump version to 197 (#6007) 2023-06-11 10:14:57 -07:00
Harrison Chase
704d56e241 support kwargs (#5990) 2023-06-11 10:09:22 -07:00
Mark Pors
b934677a81 Obey handler.raise_error in _ahandle_event_for_handler (#6001)
Obey `handler.raise_error` in `_ahandle_event_for_handler`

Exceptions for async callbacks were only logged as warnings, also when
`raise_error = True`

#### Who can review?

  @hwchase17

   @agola11
2023-06-11 09:49:26 -07:00
Harrison Chase
2d038b57b2 Harrison/arxiv fix (#5993)
Co-authored-by: Juanjo do Olmo <87780148+SimplyJuanjo@users.noreply.github.com>
2023-06-11 09:48:09 -07:00
Vincent
0b740c9baa add ocr_languages param for ConfluenceLoader.load() (#5823)
@eyurtsev

当Confluence文档内容中包含附件,且附件内容为非英文时,提取出来的文本是乱码的。
When the content of the document contains attachments, and the content
of the attachments is not in English, the extracted text is garbled.

这主要是因为没有为pytesseract传递lang参数,默认情况下只支持英文。
This is mainly because lang parameter is not passed to pytesseract, and
only English is supported by default.

所以我给ConfluenceLoader.load()添加了ocr_languages参数,以便支持多种语言。
So I added the ocr_languages parameter to ConfluenceLoader.load () to
support multiple languages.
2023-06-10 16:51:04 -07:00
Thomas B
ac3e6e3944 Fix IndexError in RecursiveCharacterTextSplitter (#5902)
Fixes (not reported) an error that may occur in some cases in the
RecursiveCharacterTextSplitter.

An empty `new_separators` array ([]) would end up in the else path of
the condition below and used in a function where it is expected to be
non empty.

```python
if new_separators is None:
    ...
else:
   # _split_text() expects this array to be non-empty!
   other_info = self._split_text(s, new_separators)

```
resulting in an `IndexError`

```python
def _split_text(self, text: str, separators: List[str]) -> List[str]:
        """Split incoming text and return chunks."""
        final_chunks = []
        # Get appropriate separator to use
>       separator = separators[-1]
E       IndexError: list index out of range

langchain/text_splitter.py:425: IndexError
```

#### Who can review?
@hwchase17 @eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-10 16:48:53 -07:00
Satheesh Valluru
d2270a2261 Fix: Grammer fix in documentation (#5925)
Fix for grammatical errors in the documentation of `vectorstore`.  
@vowelparrot
2023-06-10 16:43:36 -07:00
Jens Madsen
1250cd4630 fix: use model token limit not tokenizer ditto (#5939)
This fixes a token limit bug in the
SentenceTransformersTokenTextSplitter. Before the token limit was taken
from tokenizer used by the model. However, for some models the token
limit of the tokenizer (from `AutoTokenizer.from_pretrained`) does not
equal the token limit of the model. This was a false assumption.
Therefore, the token limit of the text splitter is now taken from the
sentence transformers model token limit.

Twitter: @plasmajens

#### Before submitting

#### Who can review?

@hwchase17 and/or @dev2049

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-10 16:36:03 -07:00
Ofer Mendelevitch
f8cf09a230 Update to Vectara integration (#5950)
This PR updates the Vectara integration (@hwchase17 ):
* Adds reuse of requests.session to imrpove efficiency and speed.
* Utilizes Vectara's low-level API (instead of standard API) to better
match user's specific chunking with LangChain
* Now add_texts puts all the texts into a single Vectara document so
indexing is much faster.
* updated variables names from alpha to lambda_val (to be consistent
with Vectara docs) and added n_context_sentence so it's available to use
if needed.
* Updates to documentation and tests

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-10 16:27:01 -07:00
qued
e4224a396b feat: Add UnstructuredXMLLoader for .xml files (#5955)
# Unstructured XML Loader
Adds an `UnstructuredXMLLoader` class for .xml files. Works with
unstructured>=0.6.7. A plain text representation of the text with the
XML tags will be available under the `page_content` attribute in the
doc.

### Testing
```python
from langchain.document_loaders import UnstructuredXMLLoader

loader = UnstructuredXMLLoader(
    "example_data/factbook.xml",
)
docs = loader.load()
```


## Who can review?

@hwchase17 
@eyurtsev
2023-06-10 16:24:42 -07:00
Lance Martin
21bd16bb59 Create Airtable loader (#5958)
Create document loader for Airtable
2023-06-10 15:43:18 -07:00
Harrison Chase
9218684759 Add a new vector store - AwaDB (#5971) (#5992)
Added AwaDB vector store, which is a wrapper over the AwaDB, that can be
used as a vector storage and has an efficient similarity search. Added
integration tests for the vector store
Added jupyter notebook with the example

Delete a unneeded empty file and resolve the
conflict(https://github.com/hwchase17/langchain/pull/5886)

Please check, Thanks!

@dev2049
@hwchase17

---------

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: ljeagle <vincent_jieli@yeah.net>
Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-06-10 15:42:32 -07:00
Tomaz Bratanic
d5819a7ca7 Add additional parameters to Graph Cypher Chain (#5979)
Based on the inspiration from the SQL chain, the following three
parameters are added to Graph Cypher Chain.

- top_k: Limited the number of results from the database to be used as
context
- return_direct: Return database results without transforming them to
natural language
- return_intermediate_steps: Return intermediate steps
2023-06-10 14:39:55 -07:00
Daniel Grittner
0ca37e613c Fix handling of missing action & input for async MRKL agent (#5985)
Hi,

This is a fix for https://github.com/hwchase17/langchain/pull/5014. This
PR forgot to add the ability to self solve the ValueError(f"Could not
parse LLM output: {llm_output}") error for `_atake_next_step`.
2023-06-10 14:38:20 -07:00
Harrison Chase
ca1afa7213 add test for structured tools (#5989) 2023-06-10 14:37:26 -07:00
constDave
5f356b9993 Fixed typo missing "use" (#5991)
<!--
Fixed a simple typo on
https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/vectorstore.html
where the word "use" was missing.

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-10 14:31:58 -07:00
Kaarthik Andavar
d6f5d0c6b1 Fix: SnowflakeLoader returning empty documents (#5967)
**Fix SnowflakeLoader's Behavior of Returning Empty Documents**

**Description:**

This PR addresses the issue where the SnowflakeLoader was consistently
returning empty documents. After investigation, it was found that the
query method within the SnowflakeLoader was not properly fetching and
processing the data.

**Changes:**

1. Modified the query method in SnowflakeLoader to handle data fetch and
processing more accurately.
2. Enhanced error handling within the SnowflakeLoader to catch and log
potential issues that may arise during data loading.

**Impact:**

This fix will ensure the SnowflakeLoader reliably returns the expected
documents instead of empty ones, improving the efficiency and
reliability of data processing tasks in the LangChain project.

Before Fix:

`[
    Document(page_content='', metadata={}),
    Document(page_content='', metadata={}),
    Document(page_content='', metadata={}),
    Document(page_content='', metadata={}),
    Document(page_content='', metadata={}),
    Document(page_content='', metadata={}),
    Document(page_content='', metadata={}),
    Document(page_content='', metadata={}),
    Document(page_content='', metadata={}),
    Document(page_content='', metadata={})
]`

After Fix:

`[Document(page_content='CUSTOMER_ID: 1\nFIRST_NAME: John\nLAST_NAME:
Doe\nEMAIL: john.doe@example.com\nPHONE: 555-123-4567\nADDRESS: 123 Elm
St, San Francisco, CA 94102', metadata={}),
Document(page_content='CUSTOMER_ID: 2\nFIRST_NAME: Jane\nLAST_NAME:
Doe\nEMAIL: jane.doe@example.com\nPHONE: 555-987-6543\nADDRESS: 456 Oak
St, San Francisco, CA 94103', metadata={}),
Document(page_content='CUSTOMER_ID: 3\nFIRST_NAME: Michael\nLAST_NAME:
Smith\nEMAIL: michael.smith@example.com\nPHONE: 555-234-5678\nADDRESS:
789 Pine St, San Francisco, CA 94104', metadata={}),
Document(page_content='CUSTOMER_ID: 4\nFIRST_NAME: Emily\nLAST_NAME:
Johnson\nEMAIL: emily.johnson@example.com\nPHONE: 555-345-6789\nADDRESS:
321 Maple St, San Francisco, CA 94105', metadata={}),
Document(page_content='CUSTOMER_ID: 5\nFIRST_NAME: David\nLAST_NAME:
Williams\nEMAIL: david.williams@example.com\nPHONE:
555-456-7890\nADDRESS: 654 Birch St, San Francisco, CA 94106',
metadata={}), Document(page_content='CUSTOMER_ID: 6\nFIRST_NAME:
Emma\nLAST_NAME: Jones\nEMAIL: emma.jones@example.com\nPHONE:
555-567-8901\nADDRESS: 987 Cedar St, San Francisco, CA 94107',
metadata={}), Document(page_content='CUSTOMER_ID: 7\nFIRST_NAME:
Oliver\nLAST_NAME: Brown\nEMAIL: oliver.brown@example.com\nPHONE:
555-678-9012\nADDRESS: 147 Cherry St, San Francisco, CA 94108',
metadata={}), Document(page_content='CUSTOMER_ID: 8\nFIRST_NAME:
Sophia\nLAST_NAME: Davis\nEMAIL: sophia.davis@example.com\nPHONE:
555-789-0123\nADDRESS: 369 Walnut St, San Francisco, CA 94109',
metadata={}), Document(page_content='CUSTOMER_ID: 9\nFIRST_NAME:
James\nLAST_NAME: Taylor\nEMAIL: james.taylor@example.com\nPHONE:
555-890-1234\nADDRESS: 258 Hawthorn St, San Francisco, CA 94110',
metadata={}), Document(page_content='CUSTOMER_ID: 10\nFIRST_NAME:
Isabella\nLAST_NAME: Wilson\nEMAIL: isabella.wilson@example.com\nPHONE:
555-901-2345\nADDRESS: 963 Aspen St, San Francisco, CA 94111',
metadata={})]
`

**Tests:**

All unit and integration tests have been run and passed successfully.
Additional tests were added to validate the new behavior of the
SnowflakeLoader.

**Checklist:**

- [x] Code changes are covered by tests
- [x] Code passes `make format` and `make lint`
- [x] This PR does not introduce any breaking changes

Please review and let me know if any changes are required.
2023-06-10 13:03:50 -07:00
Harrison Chase
62ec10a7f5 bump version to 196 (#5988) 2023-06-10 09:06:35 -07:00
German Martin
736a1819aa LOTR: Lord of the Retrievers. A retriever that merge several retrievers together applying document_formatters to them. (#5798)
"One Retriever to merge them all, One Retriever to expose them, One
Retriever to bring them all and in and process them with Document
formatters."

Hi @dev2049! Here bothering people again!

I'm using this simple idea to deal with merging the output of several
retrievers into one.
I'm aware of DocumentCompressorPipeline and
ContextualCompressionRetriever but I don't think they allow us to do
something like this. Also I was getting in trouble to get the pipeline
working too. Please correct me if i'm wrong.

This allow to do some sort of "retrieval" preprocessing and then using
the retrieval with the curated results anywhere you could use a
retriever.
My use case is to generate diff indexes with diff embeddings and sources
for a more colorful results then filtering them with one or many
document formatters.

I saw some people looking for something like this, here:
https://github.com/hwchase17/langchain/issues/3991
and something similar here:
https://github.com/hwchase17/langchain/issues/5555

This is just a proposal I know I'm missing tests , etc. If you think
this is a worth it idea I can work on tests and anything you want to
change.
Let me know!

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-10 08:41:02 -07:00
Lance Martin
f3e7ac0a2c Add load() to snowflake loader (#5956)
Quick fix for recently added [snowflake data
loader](https://github.com/hwchase17/langchain/pull/5825/files).
2023-06-09 11:27:29 -07:00
Harrison Chase
3678cba0be bump ver to 195 (#5949) 2023-06-09 09:17:08 -07:00
Harrison Chase
7af186fddf fixes to docs (#5919) 2023-06-09 09:15:53 -07:00
Kacper Łukawski
7cc200766e Expose full params in Qdrant (#5947)
# Expose full params in Qdrant

There were many questions regarding supporting some additional
parameters in Qdrant integration. Qdrant supports many vector search
optimizations that were impossible to use directly in Qdrant before.
That includes:

1. Possibility to manipulate collection params while using
`Qdrant.from_texts`. The PR allows setting things such as quantization,
HNWS config, optimizers config, etc. That makes it consistent with raw
`QdrantClient`.
2. Extended options while searching. It includes HNSW options, exact
search, score threshold filtering, and read consistency in distributed
mode.

After merging that PR, #4858 might also be closed.

## Who can review?

VectorStores / Retrievers / Memory

@dev2049 @hwchase17
2023-06-09 08:56:32 -07:00
Rubén Martínez
db7ef635c0 Add support for the endpoint URL in DynamoDBChatMesasgeHistory (#5836)
This PR adds the possibility of specifying the endpoint URL to AWS in
the DynamoDBChatMessageHistory, so that it is possible to target not
only the AWS cloud services, but also a local installation.

Specifying the endpoint URL, which is normally not done when addressing
the cloud services, is very helpful when targeting a local instance
(like [Localstack](https://localstack.cloud/)) when running local tests.

Fixes #5835

#### Who can review?

Tag maintainers/contributors who might be interested: @dev2049

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-08 23:21:11 -07:00
Lior
0eb1bc1a02 Fix the issue where the parameters passed to VertexAI ignored #5889 (#5891)
Fixes #5889 and fixes the name of the argument in init_vertexai
@hwchase17
@agola11

Co-authored-by: Lior Durahly <lior.durahly@superwise.ai>
2023-06-08 23:15:22 -07:00
Fei Wang
63fcf41bea Fix openai proxy error (#5914)
Fixes proxy error.
Since openai does not parse proxy parameters and uses openai.proxy
directly, the proxy method needs to be modified.


7610c5adfa/openai/api_requestor.py (LL90)

#### Who can review?
  @hwchase17 - project lead

  Models
  - @hwchase17
  - @agola11

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-08 23:15:06 -07:00
felpigeon
2791a753bf Add start index to metadata in TextSplitter (#5912)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

#### Add start index to metadata in TextSplitter

- Modified method `create_documents` to track start position of each
chunk
- The `start_index` is included in the metadata if the `add_start_index`
parameter in the class constructor is set to `True`

This enables referencing back to the original document, particularly
useful when a specific chunk is retrieved.

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:
@eyurtsev @agola11
<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-08 23:09:32 -07:00
Philip Kiely - Baseten
a09a0e3511 Baseten integration (#5862)
This PR adds a Baseten integration. I've done my best to follow the
contributor's guidelines and add docs, an example notebook, and an
integration test modeled after similar integrations' test.

Please let me know if there is anything I can do to improve the PR. When
it is merged, please tag https://twitter.com/basetenco and
https://twitter.com/philip_kiely as contributors (the note on the PR
template said to include Twitter accounts)
2023-06-08 23:05:57 -07:00
Tamara Lazarevic
0ce8745928 Fix typo (#5894) 2023-06-08 23:05:22 -07:00
Andrew Grangaard
d8ae925425 arxiv: Correct name of search client attribute to 'arxiv_search' from incorrect 'arxiv_client' (#5917)
+ this private attribute is referenced as `arxiv_search` in internal
usage and is set when verifying the environment

twitter: @spazm 


#### Who can review?

Any of @hwchase17, @leo-gan, or @bongsang might be interested in
reviewing.

+ Mismatch between `arxiv_client` attribute vs `arxiv_search` in
validation and usage is present in the initial commit by @hwchase17.
+ @leo-gan has made most of the edits.
+ @bongsang implemented pdf download.
2023-06-08 22:49:11 -07:00
sergiolrinditex
fe8bbc2da7 Create snowflake Loader (#5825)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-08 22:03:00 -07:00
Zander Chase
77c286cf02 Use LCP Client in Tracer (#5908)
Move the LCP calls to the client.
2023-06-08 21:15:14 -07:00
Frank Hübner
3ec6400d70 Feature/add AWS Kendra Index Retriever (#5856)
adding a new retriever for AWS Kendra

@dev2049 please take a look!
2023-06-08 15:44:09 -07:00
Piyush Jain
a6ebffb695 Fixes model arguments for amazon models (#5896)
Fixes #5713 
#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17
@agola11
@aarora79
@rsgrewal-aws
2023-06-08 14:16:01 -07:00
小铭
767fa91eae Fix the shortcut conflict for document page search (#5874)
Fix the document page to open both search and Mendable when pressing
Ctrl+K.
I have changed the shortcut for Mendable to Ctrl+J.



<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?
  @hwchase17
Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-08 14:15:19 -07:00
Zander Chase
5f74db4500 Update run eval imports in init (#5858) 2023-06-08 10:44:36 -07:00
warjiang
511c12dd39 fix: update qa_chain doc for "chai_type" (#5877)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->
`load_qa_with_sources_chain` method already support four type of chain,
including `map_rerank`. update document to prevent any misunderstandings
😀.

![image](https://github.com/hwchase17/langchain/assets/6478745/325260b2-6121-4900-aef9-001febff811a)

<!-- Remove if not applicable -->

Fixes # (issue)
No, just update document.

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?
@hwchase17 
Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-08 07:32:51 -07:00
Harrison Chase
893d20f735 bump version to 194 (#5866) 2023-06-07 22:47:48 -07:00
Harrison Chase
35cfd25db3 Harrison/nebula graph (#5865)
Co-authored-by: Wey Gu <weyl.gu@gmail.com>
Co-authored-by: chenweisomebody <chenweisomebody@gmail.com>
2023-06-07 21:56:43 -07:00
Harrison Chase
658f8bdee7 Harrison/fauna loader (#5864)
Co-authored-by: Shadid12 <Shadid12@users.noreply.github.com>
2023-06-07 21:32:23 -07:00
Liang Zhang
5518f24ec3 Implement saving and loading of RetrievalQA chain (#5818)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes #3983
Mimicing what we do for saving and loading VectorDBQA chain, I added the
logic for RetrievalQA chain.
Also added a unit test. I did not find how we test other chains for
their saving and loading functionality, so I just added a file with one
test case. Let me know if there are recommended ways to test it.

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:
@dev2049
<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-07 21:07:13 -07:00
Liang Zhang
b93638ef1e Refactor and update databricks integration page (#5575)
# Your PR Title (What it does)

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-07 20:45:47 -07:00
volodymyr-memsql
a1549901ce Added SingleStoreDB Vector Store (#5619)
- Added `SingleStoreDB` vector store, which is a wrapper over the
SingleStore DB database, that can be used as a vector storage and has an
efficient similarity search.
- Added integration tests for the vector store
- Added jupyter notebook with the example

@dev2049

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-07 20:45:33 -07:00
jjzhuo
78aa59c68b Fix serialization issue with W&B (#5693)
The chain input_documents are not displaying properly in W&B, due to
serialization issue:

<img width="1164" alt="Screenshot 2023-06-04 at 11 58 26 AM"
src="https://github.com/hwchase17/langchain/assets/134809928/f31f14f6-0935-4cca-9913-6760cd40eadf">

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-07 20:44:59 -07:00
Alec Flett
ec0dd6e34a propagate callbacks to ConversationalRetrievalChain (#5572)
# Allow callbacks to monitor ConversationalRetrievalChain

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

I ran into an issue where load_qa_chain was not passing the callbacks
down to the child LLM chains, and so made sure that callbacks are
propagated. There are probably more improvements to do here but this
seemed like a good place to stop.

Note that I saw a lot of references to callbacks_manager, which seems to
be deprecated. I left that code alone for now.



## Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@agola11
<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-07 20:25:21 -07:00
Jeff Vestal
3294774148 Add knn and query search field options to ElasticKnnSearch (#5641)
in the `ElasticKnnSearch` class added 2 arguments that were not exposed
properly

`knn_search` added:
- `vector_query_field: Optional[str] = 'vector'`
-- vector_query_field: Field name to use in knn search if not default
'vector'

`knn_hybrid_search` added:
- `vector_query_field: Optional[str] = 'vector'`
-- vector_query_field: Field name to use in knn search if not default
'vector'
- `query_field: Optional[str] = 'text'`
-- query_field: Field name to use in search if not default 'text'



Fixes # https://github.com/hwchase17/langchain/issues/5633


cc: @dev2049 @hwchase17

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-07 20:19:14 -07:00
Mark Marryatt
cef79ca579 Fix exporting GCP Vertex Matching Engine from vectorstores (#5793)
The Vertex Matching Engine docs include [the
line](b177a29d3f/docs/modules/indexes/vectorstores/examples/matchingengine.ipynb (L32))
`from langchain.vectorstores import MatchingEngine` which doesn't work
as it wasn't added to the vectorestores module exports.



  - @dev2049
2023-06-07 19:45:33 -07:00
Dave Ingram
106364a45c Update to Getting Started docs page for Memory (#5855)
Simply fixing a small typo in the memory page. 

Also removed an extra code block at the end of the file.

Along the way, the current outputs seem to have changed in a few places
so left that for posterity, and updated the number of runs which seems
harmless, though I can clean that up if preferred.
2023-06-07 19:45:21 -07:00
bnassivet
9355e3f5f5 qdrant vector store - search with relevancy scores (#5781)
Implementation of similarity_search_with_relevance_scores for quadrant
vector store.
As implemented the method is also compatible with other capacities such
as filtering.

Integration tests updated.


#### Who can review?

Tag maintainers/contributors who might be interested:

  VectorStores / Retrievers / Memory
  - @dev2049
2023-06-07 19:26:40 -07:00
Ning Ren
f15763518a docs: add Shale Protocol integration guide (#5814)
This PR adds documentation for Shale Protocol's integration with
LangChain.

[Shale Protocol](https://shaleprotocol.com) provides forever-free
production-ready inference APIs to the open-source community. We have
global data centers and plan to support all major open LLMs (estimated
~1,000 by 2025).

The team consists of software and ML engineers, AI researchers,
designers, and operators across North America and Asia. Combined
together, the team has 50+ years experience in machine learning, cloud
infrastructure, software engineering and product development. Team
members have worked at places like Google and Microsoft.

#### Who can review?

Tag maintainers/contributors who might be interested:

  - @hwchase17
  - @agola11

---------

Co-authored-by: Karen Sheng <46656667+karensheng@users.noreply.github.com>
2023-06-07 19:25:59 -07:00
Duarte OC
137da7e4b6 Update microsoft loader example with docx2txt dependency (#5832)
@eyurtsev
2023-06-07 19:21:48 -07:00
Aidan Holland
9f4b720a63 Add additional VertexAI Params (#5837)
## Changes

- Added the `stop` param to the `_VertexAICommon` class so it can be set
at llm initialization

## Example Usage

```python
VertexAI(
    # ...
    temperature=0.15,
    max_output_tokens=128,
    top_p=1,
    top_k=40,
    stop=["\n```"],
)
```

## Possible Reviewers

- @hwchase17 
- @agola11
2023-06-07 19:20:37 -07:00
Eduard van Valkenburg
76fcd96dae Add logging in PBI tool (#5841)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

Add some logging into the powerbi tool so that you can see the queries
being sent to PBI and attempts to correct them.

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested: @vowelparrot 

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-07 19:19:21 -07:00
Matt Robinson
11fec7d4d1 feat: Add UnstructuredCSVLoader for CSV files (#5844)
### Summary

Adds an `UnstructuredCSVLoader` for loading CSVs. One advantage of using
`UnstructuredCSVLoader` relative to the standard `CSVLoader` is that if
you use `UnstructuredCSVLoader` in `"elements"` mode, an HTML
representation of the table will be available in the metadata.

#### Who can review?

@hwchase17
 @eyurtsev
2023-06-07 19:18:01 -07:00
Soos3D
0b4a51930c Add how to use a custom scraping function with the sitemap loader. (#5847)
Hi! I just added an example of how to use a custom scraping function
with the sitemap loader. I recently used this feature and had to dig in
the source code to find it. I thought it might be useful to other devs
to have an example in the Jupyter Notebook directly.

I only added the example to the documentation page. 

@eyurtsev I was not able to run the lint. Please let me know if I have
to do anything else.

I know this is a very small contribution, but I hope it will be
valuable. My Twitter handle is @web3Dav3.

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-07 19:16:51 -07:00
Yessen Kanapin
c66755b661 Add DeepInfra embeddings integration with tests and examples, better exception handling for Deep Infra LLM (#5854)
#### Who can review?

Tag maintainers/contributors who might be interested:
  @hwchase17 - project lead
  - @agola11

---------

Co-authored-by: Yessen Kanapin <yessen@deepinfra.com>
2023-06-07 19:14:30 -07:00
ugfly1210
4d8cda1c3b FIX: backslash escaped (#5815)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

LatexTextSplitter needs to use "\n\\\chapter" when separators are
escaped, such as "\n\\\chapter", otherwise it will report an error:
(re.error: bad escape \c at position 1 (line 2, column 1))


Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use

re.error: bad escape \c at position 1 (line 2, column 1)

See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

@hwchase17  @dev2049 

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

Co-authored-by: Pang <ugfly@qq.com>
2023-06-07 16:01:07 -07:00
Zander Chase
3af36943e8 Rm extraneous args to the trace group helper (#5801)
These are being ignored
2023-06-07 13:09:29 -07:00
whysage
8ef7274ee6 feat: issue-5712 add sleep tool (#5715)
Fixes # 5712 added sleep tool
2023-06-07 09:39:02 -07:00
Zander Chase
d9fcc45d05 Add in the async methods and link the run id (#5810) 2023-06-07 08:27:44 -07:00
Harrison Chase
ce7c11625f bump version to 193 (#5838) 2023-06-07 07:38:57 -07:00
warjiang
5a207cce8f fix: fullfill openai params when embedding (#5821)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes #5822 
I upgrade my langchain lib by execute `pip install -U langchain`, and
the verion is 0.0.192。But i found that openai.api_base not working. I
use azure openai service as openai backend, the openai.api_base is very
import for me. I hava compared tag/0.0.192 and tag/0.0.191, and figure
out that:

![image](https://github.com/hwchase17/langchain/assets/6478745/e183fdb2-8224-45c9-b3b4-26d62823999a)
openai params is moved inside `_invocation_params` function,and used in
some openai invoke:

![image](https://github.com/hwchase17/langchain/assets/6478745/5a55a048-5fa9-4bf4-aaef-3902226bec5e)

![image](https://github.com/hwchase17/langchain/assets/6478745/85b8cebc-eeb8-4538-a525-814719c8f8df)
but still some case not covered like:

![image](https://github.com/hwchase17/langchain/assets/6478745/e0297620-f2b2-4f4f-98bd-d0ed19022dac)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17 

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-07 07:32:57 -07:00
Harrison Chase
b3ae6bcd3f bump ver to 192 (#5812) 2023-06-06 22:23:11 -07:00
Harrison Chase
5468528748 rm docs mongo (#5811) 2023-06-06 22:22:44 -07:00
Andrew Switlyk
69f4ffb851 Update adding_memory.ipynb (#5806)
just change "to" to "too" so it matches the above prompt

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-06 22:10:53 -07:00
Sun bin
2be4fbb835 add doc about reusing MongoDBAtlasVectorSearch (#5805)
DOC: add doc about reusing MongoDBAtlasVectorSearch

#### Who can review?

Anyone authorized.
2023-06-06 22:10:36 -07:00
bnassivet
062c3c00a2 fixed faiss integ tests (#5808)
Fixes # 5807

Realigned tests with implementation.
Also reinforced folder unicity for the test_faiss_local_save_load test
using date-time suffix

#### Before submitting

- Integration test updated
- formatting and linting ok (locally) 

#### Who can review?

Tag maintainers/contributors who might be interested:

  @hwchase17 - project lead
  VectorStores / Retrievers / Memory
  -@dev2049
2023-06-06 22:07:27 -07:00
SvMax
92b87c2fec added support for different types in ResponseSchema class (#5789)
I added support for specifing different types with ResponseSchema
objects:

## before
`
extracted_info = ResponseSchema(name="extracted_info", description="List
of extracted information")
`
generate the following doc: ```json\n{\n\t\"extracted_info\": string //
List of extracted information}```
This brings GPT to create a JSON with only one string in the specified
field even if you requested a List in the description.

## now
`extracted_info = ResponseSchema(name="extracted_info",
type="List[string]", description="List of extracted information")
`
generate the following doc: ```json\n{\n\t\"extracted_info\":
List[string] // List of extracted information}```
This way the model responds better to the prompt generating an array of
strings.

Tag maintainers/contributors who might be interested:
  Agents / Tools / Toolkits
  @vowelparrot

Don't know who can be interested, I suppose this is a tool, so I tagged
you vowelparrot,
anyway, it's a minor change, and shouldn't impact any other part of the
framework.
2023-06-06 22:00:48 -07:00
Harrison Chase
3954bcf396 WIP: openai settings (#5792)
[] need to test more
[] make sure they arent saved when serializing
[] do for embeddings
2023-06-06 21:57:58 -07:00
Alex Lee
b7999a9bc1 Add UTF-8 json ouput support while langchain.debug is set to True. (#5802)
Before:
<img width="984" alt="image"
src="https://github.com/hwchase17/langchain/assets/4317474/2b0807b4-a1d6-4df2-87cc-92b1c8e10534">

After:
<img width="992" alt="image"
src="https://github.com/hwchase17/langchain/assets/4317474/128c2c7d-2ed5-4c95-954d-b0964c83526a">


Thanks in advance.

 @agola11
2023-06-06 21:56:33 -07:00
kourosh hakhamaneshi
a0d847f636 [Docs][Hotfix] Fix broken links (#5800)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Some links were broken from the previous merge. This PR fixes them.
Tested locally.

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2023-06-06 17:17:16 -07:00
Zander Chase
217b5cc72d Base RunEvaluator Chain (#5750)
Clean up a bit and only implement the QA and reference free
implementations from https://github.com/hwchase17/langchain/pull/5618
2023-06-06 16:42:15 -07:00
Lance Martin
4092fd21dc YoutubeAudioLoader and updates to OpenAIWhisperParser (#5772)
This introduces the `YoutubeAudioLoader`, which will load blobs from a
YouTube url and write them. Blobs are then parsed by
`OpenAIWhisperParser()`, as show in this
[PR](https://github.com/hwchase17/langchain/pull/5580), but we extend
the parser to split audio such that each chuck meets the 25MB OpenAI
size limit. As shown in the notebook, this enables a very simple UX:

```
# Transcribe the video to text
loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser())
docs = loader.load()
``` 

Tested on full set of Karpathy lecture videos:

```
# Karpathy lecture videos
urls = ["https://youtu.be/VMj-3S1tku0"
        "https://youtu.be/PaCmpygFfXo",
        "https://youtu.be/TCH_1BHY58I",
        "https://youtu.be/P6sfmUTpUmc",
        "https://youtu.be/q8SA3rM6ckI",
        "https://youtu.be/t3YJ5hKiMQ0",
        "https://youtu.be/kCc8FmEb1nY"]

# Directory to save audio files 
save_dir = "~/Downloads/YouTube"
 
# Transcribe the videos to text
loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser())
docs = loader.load()
```
2023-06-06 15:15:08 -07:00
Gengliang Wang
2a4b32dee2 Revise DATABRICKS_API_TOKEN as DATABRICKS_TOKEN (#5796)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

In the [Databricks
integration](https://python.langchain.com/en/latest/integrations/databricks.html)
and [Databricks
LLM](https://python.langchain.com/en/latest/modules/models/llms/integrations/databricks.html),
we suggestted users to set the ENV variable `DATABRICKS_API_TOKEN`.
However, this is inconsistent with the other Databricks library. To make
it consistent, this PR changes the variable from `DATABRICKS_API_TOKEN`
to `DATABRICKS_TOKEN`

After changes, there is no more `DATABRICKS_API_TOKEN` in the doc
```
$ git grep DATABRICKS_API_TOKEN|wc -l
0

$ git grep DATABRICKS_TOKEN|wc -l
8
```
cc @hwchase17 @dev2049 @mengxr since you have reviewed the previous PRs.
2023-06-06 14:22:49 -07:00
Paul-Emile Brotons
daf3e99b96 fixing from_documents method of the MongoDB Atlas vector store (#5794)
FIxed a bug in from_documents method --> Collection objects do not
implement truth value testing or bool().
@dev2049
2023-06-06 14:22:23 -07:00
Ankush Gola
b177a29d3f support returning run info for llms, chat models and chains (#5666)
returning the run id is important for accessing the run later on
2023-06-06 10:07:46 -07:00
Yoann Poupart
65111eb2b3 Attribute support for html tags (#5782)
# What does this PR do?

Change the HTML tags so that a tag with attributes can be found.

## Before submitting

- [x] Tests added
- [x] CI/CD validated

### Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.
2023-06-06 09:27:37 -07:00
Zander Chase
0cfaa76e45 Set Falsey (#5783)
Seems natural to try to disable logging by setting `MY_VAR=false` rather
than unsetting (especially once you've already set it in the background)
2023-06-06 09:26:38 -07:00
Harrison Chase
2ae2d6cd1d fix ver 191 (#5784) 2023-06-06 09:17:23 -07:00
Zander Chase
204a73c1d9 Use client from LCP-SDK (#5695)
- Remove the client implementation (this breaks backwards compatibility
for existing testers. I could keep the stub in that file if we want, but
not many people are using it yet
- Add SDK as dependency
- Update the 'run_on_dataset' method to be a function that optionally
accepts a client as an argument
- Remove the langchain plus server implementation (you get it for free
with the SDK now)

We could make the SDK optional for now, but the plan is to use w/in the
tracer so it would likely become a hard dependency at some point.
2023-06-06 06:51:05 -07:00
Harrison Chase
08e2352f7b bump ver 191 (#5766) 2023-06-05 20:54:08 -07:00
berkedilekoglu
f907b62526 Scores are explained in vectorestore docs (#5613)
# Scores in Vectorestores' Docs Are Explained

Following vectorestores can return scores with similar documents by
using `similarity_search_with_score`:
- chroma
- docarray_hnsw
- docarray_in_memory
- faiss
- myscale
- qdrant
- supabase
- vectara
- weaviate

However, in documents, these scores were either not explained at all or
explained in a way that could lead to misunderstandings (e.g., FAISS).
For instance in FAISS document: if we consider the score returned by the
function as a similarity score, we understand that a document returning
a higher score is more similar to the source document. However, since
the scores returned by the function are distance scores, we should
understand that smaller scores correspond to more similar documents.

For the libraries other than Vectara, I wrote the scores they use by
investigating from the source libraries. Since I couldn't be certain
about the score metric used by Vectara, I didn't make any changes in its
documentation. The links mentioned in Vectara's documentation became
broken due to updates, so I replaced them with working ones.

VectorStores / Retrievers / Memory
  - @dev2049

my twitter: [berkedilekoglu](https://twitter.com/berkedilekoglu)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-05 20:39:49 -07:00
Adil Ansari
233b52735e feat: Support for Tigris Vector Database for vector search (#5703)
### Changes
- New vector store integration - [Tigris](https://tigrisdata.com)
- Adds [tigrisdb](https://pypi.org/project/tigrisdb/) optional
dependency
- Example notebook demonstrating usage

Fixes #5535 
Closes tigrisdata/tigris-client-python#40

#### Twitter handles
We'd love a shoutout on our
[@TigrisData](https://twitter.com/TigrisData) and
[@adilansari](https://twitter.com/adilansari) twitter handles

#### Who can review?
@dev2049

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-05 20:39:16 -07:00
Edrick Da Corte Henriquez
38dabdbb3a Update tutorials.md (#5761)
# Added an overview of LangChain modules

Aimed at introducing newcomers to LangChain's main modules :)

Twitter handle is @edrick_dch 

## Who can review?

@eyurtsev
2023-06-05 20:37:11 -07:00
Ankush Gola
84a46753ab Tracing Group (#5326)
Add context manager to group all runs under a virtual parent

---------

Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>
2023-06-05 19:18:43 -07:00
Ilya
d5b1608216 fix markdown text splitter horizontal lines (#5625)
Fixes #5614 

#### Issue

The `***` combination produces an exception when used as a seperator in
`re.split`. Instead `\*\*\*` should be used for regex exprations.

#### Who can review?

@eyurtsev
2023-06-05 16:40:26 -07:00
Harrison Chase
25487fa5ee Harrison/youtube multi language (#5758)
Co-authored-by: rafly lesmana <raflylesmana111@gmail.com>
2023-06-05 16:38:07 -07:00
Shelby Jenkins
2dcda8a8ac Strips whitespace and \n from loc before filtering urls from sitemap (#5728)
Fixes #5699 



#### Who can review?

Tag maintainers/contributors who might be interested:

@woodworker @LeSphax @johannhartmann

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-05 16:33:55 -07:00
Harrison Chase
98dd6d068a cohere retries (#5757)
…719)

A minor update to retry Cohore API call in case of errors using tenacity
as it is done for OpenAI LLMs.

#### Who can review?

@hwchase17, @agola11 

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Sagar Sapkota <22609549+sagar-spkt@users.noreply.github.com>
2023-06-05 16:28:58 -07:00
M Waleed Kadous
5124c1e0d9 Add aviary support (#5661)
Aviary is an open source toolkit for evaluating and deploying open
source LLMs. You can find out more about it on
[http://github.com/ray-project/aviary). You can try it out at
[http://aviary.anyscale.com](aviary.anyscale.com).

This code adds support for Aviary in LangChain. To minimize
dependencies, it connects directly to the HTTP endpoint.

The current implementation is not accelerated and uses the default
implementation of `predict` and `generate`.

It includes a test and a simple example. 

@hwchase17 and @agola11 could you have a look at this?

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-05 16:28:42 -07:00
felpigeon
a47c8618ec Add class attribute "return_generated_question" to class "BaseConversationalRetrievalChain" (#5749)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

Adding a class attribute "return_generated_question" to class
"BaseConversationalRetrievalChain". If set to `True`, the chain's output
has a key "generated_question" with the question generated by the
sub-chain `question_generator` as the value. This way the generated
question can be logged.

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
@dev2049 @vowelparrot
2023-06-05 16:10:12 -07:00
Leonid Ganeline
87ad4fc4b2 docs: updated ecosystem/dependents (#5753)
updated `ecosystem/dependents` data (it was updated 2+ weeks ago)

#### Who can review?

@hwchase17 
@eyurtsev
@dev2049
2023-06-05 16:09:55 -07:00
Leonid Ganeline
92a5f00ffb docs: ecosystem/integrations update 5 (#5752)
- added missed integration to `docs/ecosystem/integrations/`
- updated notebooks to consistent format: changed titles, file names;
added descriptions

#### Who can review?
 @hwchase17 
 @dev2049
2023-06-05 16:08:55 -07:00
Lance Martin
aea090045b Create OpenAIWhisperParser for generating Documents from audio files (#5580)
# OpenAIWhisperParser

This PR creates a new parser, `OpenAIWhisperParser`, that uses the
[OpenAI Whisper
model](https://platform.openai.com/docs/guides/speech-to-text/quickstart)
to perform transcription of audio files to text (`Documents`). Please
see the notebook for usage.
2023-06-05 15:51:13 -07:00
Hao Chen
a4c9053d40 Integrate Clickhouse as Vector Store (#5650)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

#### Description

This PR is mainly to integrate open source version of ClickHouse as
Vector Store as it is easy for both local development and adoption of
LangChain for enterprises who already have large scale clickhouse
deployment.

ClickHouse is a open source real-time OLAP database with full SQL
support and a wide range of functions to assist users in writing
analytical queries. Some of these functions and data structures perform
distance operations between vectors, [enabling ClickHouse to be used as
a vector
database](https://clickhouse.com/blog/vector-search-clickhouse-p1).
Recently added ClickHouse capabilities like [Approximate Nearest
Neighbour (ANN)
indices](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes)
support faster approximate matching of vectors and provide a promising
development aimed to further enhance the vector matching capabilities of
ClickHouse.

In LangChain, some ClickHouse based commercial variant vector stores
like
[Chroma](https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/chroma.py)
and
[MyScale](https://github.com/hwchase17/langchain/blob/master/langchain/vectorstores/myscale.py),
etc are already integrated, but for some enterprises with large scale
Clickhouse clusters deployment, it will be more straightforward to
upgrade existing clickhouse infra instead of moving to another similar
vector store solution, so we believe it's a valid requirement to
integrate open source version of ClickHouse as vector store.

As `clickhouse-connect` is already included by other integrations, this
PR won't include any new dependencies.

#### Before submitting

<!-- If you're adding a new integration, please include:

1. Added a test for the integration:
https://github.com/haoch/langchain/blob/clickhouse/tests/integration_tests/vectorstores/test_clickhouse.py
2. Added an example notebook and document showing its use: 
* Notebook:
https://github.com/haoch/langchain/blob/clickhouse/docs/modules/indexes/vectorstores/examples/clickhouse.ipynb
* Doc:
https://github.com/haoch/langchain/blob/clickhouse/docs/integrations/clickhouse.md

See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

1. Added a test for the integration:
https://github.com/haoch/langchain/blob/clickhouse/tests/integration_tests/vectorstores/test_clickhouse.py
2. Added an example notebook and document showing its use: 
* Notebook:
https://github.com/haoch/langchain/blob/clickhouse/docs/modules/indexes/vectorstores/examples/clickhouse.ipynb
* Doc:
https://github.com/haoch/langchain/blob/clickhouse/docs/integrations/clickhouse.md


#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
 
@hwchase17 @dev2049 Could you please help review?

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-05 13:32:04 -07:00
Gustavo Brian
2f2d27fd82 Error in documentation: Chroma constructor (#5731)
Chroma("langchain_store", embeddings.embed_query) must be
Chroma("langchain_store", embeddings)
2023-06-05 13:30:58 -07:00
George Geddes
019eb13681 Fix a typo in the documentation for the Slack document loader (#5745)
Fixes a typo I noticed while reading the docs.
2023-06-05 13:30:24 -07:00
Andrew Grangaard
450eb91fe2 Removes unnecessary backslash escaping for backticks in python (#5751)
Fixed python deprecation warning:
    DeprecationWarning: invalid escape sequence '`'
    
backticks (`) do not have special meaning in python strings and should
not be escaped.

-- @spazm on twitter

### Who can review:

@nfcampos ported this change from javascript, @hwchase17 wrote the
original STRUCTURED_FORMAT_INSTRUCTIONS,
2023-06-05 13:30:11 -07:00
Daniel Chalef
0551bc90a5 Zep Hybrid Search (#5742)
Zep now supports persisting custom metadata with messages and hybrid
search across both message embeddings and structured metadata. This PR
implements custom metadata and enhancements to the
`ZepChatMessageHistory` and `ZepRetriever` classes to implement this
support.

Tag maintainers/contributors who might be interested:

  VectorStores / Retrievers / Memory
  - @dev2049

---------

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-06-05 12:59:28 -07:00
Tomaz Bratanic
a0ea6f6b6b Cypher search: Check if generated Cypher is provided in backticks (#5541)
# Check if generated Cypher code is wrapped in backticks

Some LLMs like the VertexAI like to explain how they generated the
Cypher statement and wrap the actual code in three backticks:

![Screenshot from 2023-06-01
08-08-23](https://github.com/hwchase17/langchain/assets/19948365/1d8eecb3-d26c-4882-8f5b-6a9bc7e93690)


I have observed a similar pattern with OpenAI chat models in a
conversational settings, where multiple user and assistant message are
provided to the LLM to generate Cypher statements, where then the LLM
wants to maybe apologize for previous steps or explain its thoughts.
Interestingly, both OpenAI and VertexAI wrap the code in three backticks
if they are doing any explaining or apologizing. Checking if the
generated cypher is wrapped in backticks seems like a low-hanging fruit
to expand the cypher search to other LLMs and conversational settings.
2023-06-05 12:48:13 -07:00
Abhijeet Malamkar
1a9ac3b1f9 Adding support to save multiple memories at a time. Cuts save time by … (#5172)
# Adding support to save multiple memories at a time. Cuts save time by
more then half

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

  
        -
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
@dev2049
 @vowelparrot

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-05 12:47:48 -07:00
kourosh hakhamaneshi
625717daa8 docs: Added Deploying LLMs into production + a new ecosystem (#4047)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-05 12:47:27 -07:00
Ralph Schlosser
74f8e603d9 Addresses GPT4All wrapper model_type attribute issues #5720. (#5743)
Fixes #5720.

A more in-depth discussion is in my comment here:
https://github.com/hwchase17/langchain/issues/5720#issuecomment-1577047018

In a nutshell, there has been a subtle change in the latest version of
GPT4Alls Python bindings. The change I submitted yesterday is compatible
with this version, however, this version is as of yet unreleased and
thus the code change breaks Langchain's wrapper under the currently
released version of GPT4All.

This pull request proposes a backwards-compatible solution.
2023-06-05 12:45:29 -07:00
Harrison Chase
d0d89d39ef bump version to 190 (#5704) 2023-06-04 20:04:50 -07:00
mheguy-stingray
b64c39dfe7 top_k and top_p transposed in vertexai (#5673)
Fix transposed properties in vertexai model


Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-04 16:59:53 -07:00
Tobias Herbold
3fb0e4872a sqlalchemy MovedIn20Warning declarative_base DEPRICATION fix (#5676)
fix for the sqlalchemy deprecated declarative_base import :

```
MovedIn20Warning: The ``declarative_base()`` function is now available as sqlalchemy.orm.declarative_base(). (deprecated since: 2.0) (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
  Base = declarative_base()  # type: Any
```

Import is wrapped in an try catch Block to fallback to the old import if
needed.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-04 16:52:52 -07:00
Jens Madsen
8d9e9e013c refactor: extract token text splitter function (#5179)
# Token text splitter for sentence transformers

The current TokenTextSplitter only works with OpenAi models via the
`tiktoken` package. This is not clear from the name `TokenTextSplitter`.
In this (first PR) a token based text splitter for sentence transformer
models is added. In the future I think we should work towards injecting
a tokenizer into the TokenTextSplitter to make ti more flexible.
Could perhaps be reviewed by @dev2049

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-04 14:41:44 -07:00
Nathan Azrak
26ec845921 Raise an exception in MKRL and Chat Output Parsers if parsing text which contains both an action and a final answer (#5609)
Raises exception if OutputParsers receive a response with both a valid
action and a final answer

Currently, if an OutputParser receives a response which includes both an
action and a final answer, they return a FinalAnswer object. This allows
the parser to accept responses which propose an action and hallucinate
an answer without the action being parsed or taken by the agent.

This PR changes the logic to:
1. store a variable checking whether a response contains the
`FINAL_ANSWER_ACTION` (this is the easier condition to check).
2. store a variable checking whether the response contains a valid
action
3. if both are present, raise a new exception stating that both are
present
4. if an action is present, return an AgentAction
5. if an answer is present, return an AgentAnswer
6. if neither is present, raise the relevant exception based around the
action format (these have been kept consistent with the prior exception
messages)

Disclaimer:
* Existing mock data included strings which did include an action and an
answer. This might indicate that prioritising returning AgentAnswer was
always correct, and I am patching out desired behaviour? @hwchase17 to
advice. Curious if there are allowed cases where this is not
hallucinating, and we do want the LLM to output an action which isn't
taken.
* I have not passed `send_to_llm` through this new exception

Fixes #5601 

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@hwchase17 - project lead
@vowelparrot
2023-06-04 14:40:49 -07:00
Lucas Rodrigues
c112d7334d Update MongoDBChatMessageHistory to create an index on SessionId (#5632)
All the queries to the database are done based on the SessionId
property, this will optimize how Mongo retrieves all messages from a
session

#### Who can review?

Tag maintainers/contributors who might be interested:
@dev2049
2023-06-04 14:39:56 -07:00
Jason Weill
6c11f94013 Retitles Bedrock doc to appear in correct alphabetical order in site nav (#5639)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes #5638. Retitles "Amazon Bedrock" page to "Bedrock" so that the
Integrations section of the left nav is properly sorted in alphabetical
order.

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-04 14:39:25 -07:00
Will Smith
6e25e65085 SQL agent : Improved prompt engineering prevents agent guessing database column names. (#5671)
@vowelparrot:

Minor change to the SQL agent:

Tells agent to introspect the schema of the most relevant tables, I
found this to dramatically decrease the chance that the agent wastes
times guessing column names.
2023-06-04 14:39:00 -07:00
Nuhman Pk
8f98592ac9 Added Dependencies Status, Open issues and releases badges in Readme.md (#5681)
[![Dependency
Status](https://img.shields.io/librariesio/github/hwchase17/langchain)](https://libraries.io/github/hwchase17/langchain)
[![Open
Issues](https://img.shields.io/github/issues-raw/hwchase17/langchain)](https://github.com/hwchase17/langchain/issues)
[![Release
Notes](https://img.shields.io/github/release/hwchase17/langchain)](https://github.com/hwchase17/langchain/releases)
2023-06-04 14:30:52 -07:00
Harrison Chase
b9040669a0 Harrison/pipeline prompt (#5540)
idea is to make prompts more composable
2023-06-04 14:29:37 -07:00
George Roberts
647210a4b9 Add args_schema to google_places tool (#5680)
Tiny change to actually add the args_schema to the tool.

@vowelparrot
2023-06-04 14:28:46 -07:00
Ralph Schlosser
8fea0529c1 This fixes issue #5651 - GPT4All wrapper loading issue (#5657)
Fixes #5651 

Small typo in wrapper code. Note the `model_type` parameter is currently
unused by GPT4All.

https://github.com/hwchase17/langchain/issues/5651

#### Who can review?
2023-06-04 07:21:16 -07:00
Jiayao Yu
6a3ceaa377 Support similarity_score_threshold retrieval with Chroma (#5655)
Fixes https://github.com/hwchase17/langchain/issues/5067

Verified the following code now works correctly:
```
db = Chroma(persist_directory=index_directory(index_name), embedding_function=embeddings)
retriever = db.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.4})
docs = retriever.get_relevant_documents(query)
```
2023-06-03 16:57:00 -07:00
Hao Chen
3e45b83065 Improve Error Messaging for APOC Procedure Failure in Neo4jGraph (#5547)
## Improve Error Messaging for APOC Procedure Failure in Neo4jGraph

This commit revises the error message provided when the
'apoc.meta.data()' procedure fails. Previously, the message simply
instructed the user to install the APOC plugin in Neo4j. The new error
message is more specific.

Also removed an unnecessary newline in the Cypher statement variable:
`node_properties_query`.

Fixes #5545 

## Who can review?
  - @vowelparrot
  - @dev2049
2023-06-03 16:56:39 -07:00
Ricardo Reis
33ea606f45 Update youtube.py - Fix metadata validation error in YoutubeLoader (#5479)
This commit addresses a ValueError occurring when the YoutubeLoader
class tries to add datetime metadata from a YouTube video's publish
date. The error was happening because the ChromaDB metadata validation
only accepts str, int, or float data types.

In the `_get_video_info` method of the `YoutubeLoader` class, the
publish date retrieved from the YouTube video was of datetime type. This
commit fixes the issue by converting the datetime object to a string
before adding it to the metadata dictionary.

Additionally, this commit introduces error handling in the
`_get_video_info` method to ensure that all metadata fields have valid
values. If a metadata field is found to be None, a default value is
assigned. This prevents potential errors during metadata validation when
metadata fields are None.

The file modified in this commit is youtube.py.

# Your PR Title (What it does)

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-03 16:56:17 -07:00
Shuqian
5af2c51e78 refactor: BaseStringMessagePromptTemplate from_template method (#5332)
# refactor BaseStringMessagePromptTemplate from_template method 

Refactor the `from_template` method of the
`BaseStringMessagePromptTemplate` class to allow passing keyword
arguments to the `from_template` method of `PromptTemplate`.
Enable the usage of arguments like `template_format`.
In my scenario, I intend to utilize Jinja2 for formatting the human
message prompt in the chat template.

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Models
  - @hwchase17
  - @agola11
  - @jonasalexander 

 -->

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-03 16:55:58 -07:00
mbchang
d3bdb8ea6d FileCallbackHandler (#5589)
# like
[StdoutCallbackHandler](https://github.com/hwchase17/langchain/blob/master/langchain/callbacks/stdout.py),
but writes to a file

When running experiments I have found myself wanting to log the outputs
of my chains in a more lightweight way than using WandB tracing. This PR
contributes a callback handler that writes to file what
`StdoutCallbackHandler` would print.

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

## Example Notebook

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use



See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

See the included `filecallbackhandler.ipynb` notebook for usage. Would
it be better to include this notebook under `modules/callbacks` or under
`integrations/`?

![image](https://github.com/hwchase17/langchain/assets/6439365/c624de0e-343f-4eab-a55b-8808a887489f)


## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@agola11

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-03 16:48:48 -07:00
rajib
1c51d3db0f Created fix for 5475 (#5659)
Created fix for 5475
Currently in PGvector, we do not have any function that returns the
instance of an existing store. The from_documents always adds embeddings
and then returns the store. This fix is to add a function that will
return the instance of an existing store

Also changed the jupyter example for PGVector to show the example of
using the function

<!-- Remove if not applicable -->

Fixes # 5475

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?
@dev2049
@hwchase17 

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: rajib76 <rajib76@yahoo.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-03 16:47:52 -07:00
Michael Landis
475007d63a fix: correct momento chat history notebook typo and title (#5646)
This PR corrects a minor typo in the Momento chat message history
notebook and also expands the title from "Momento" to "Momento Chat
History", inline with other chat history storage providers.


#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

cc @dev2049 who reviewed the original integration
2023-06-03 16:39:27 -07:00
Paul-Emile Brotons
92f218207b removing client+namespace in favor of collection (#5610)
removing client+namespace in favor of collection for an easier
instantiation and to be similar to the typescript library

@dev2049
2023-06-03 16:27:31 -07:00
Harrison Chase
ad09367a92 Harrison/pubmed integration (#5664)
Co-authored-by: younis basher <71520361+younis-ba@users.noreply.github.com>
Co-authored-by: Younis Bashir <younis@omicmd.com>
2023-06-03 16:25:28 -07:00
Harrison Chase
9921f8cc3a Harrison/update azure nb (#5665)
Co-authored-by: NEWTON MALLICK <38786893+N-E-W-T-O-N@users.noreply.github.com>
2023-06-03 16:25:08 -07:00
C.J. Jameson
4e71a1702b nit: pgvector python example notebook, fix variable reference (#5595)
# Your PR Title (What it does)

Fixes the pgvector python example notebook : one of the variables was
not referencing anything

## Before submitting

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

VectorStores / Retrievers / Memory
  - @dev2049
2023-06-03 15:29:34 -07:00
Leonid Ganeline
b201cfaa0f docs ecosystem/integrations update 4 (#5590)
# docs `ecosystem/integrations` update 4

Added missed integrations. Fixed inconsistencies. 

## Who can review?

@hwchase17 
@dev2049
2023-06-03 15:29:03 -07:00
Davis Chase
ae3611730a handle single arg to and/or (#5637)
@ryderwishart @eyurtsev thoughts on handling this in the parser itself?
related to #5570
2023-06-03 15:18:46 -07:00
khallbobo
934319fc28 Add parameters to send_message() call for vertexai chat models (PaLM2) (#5566)
# Ensure parameters are used by vertexai chat models (PaLM2)

The current version of the google aiplatform contains a bug where
parameters for a chat model are not used as intended.

See https://github.com/googleapis/python-aiplatform/issues/2263

Params can be passed both to start_chat() and send_message(); however,
the parameters passed to start_chat() will not be used if send_message()
is called without the overrides. This is due to the defaults in
send_message() being global values rather than None (there is code in
send_message() which would use the params from start_chat() if the param
passed to send_message() evaluates to False, but that won't happen as
the defaults are global values).

Fixes # 5531

@hwchase17
@agola11
2023-06-03 15:17:38 -07:00
UmerHA
44ad9628c9 QuickFix for FinalStreamingStdOutCallbackHandler: Ignore new lines & white spaces (#5497)
# Make FinalStreamingStdOutCallbackHandler more robust by ignoring new
lines & white spaces

`FinalStreamingStdOutCallbackHandler` doesn't work out of the box with
`ChatOpenAI`, as it tokenized slightly differently than `OpenAI`. The
response of `OpenAI` contains the tokens `["\nFinal", " Answer", ":"]`
while `ChatOpenAI` contains `["Final", " Answer", ":"]`.

This PR make `FinalStreamingStdOutCallbackHandler` more robust by
ignoring new lines & white spaces when determining if the answer prefix
has been reached.

Fixes #5433

## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
Tracing / Callbacks
- @agola11

Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) | Discord:
RicChilligerDude#7589
2023-06-03 15:05:58 -07:00
Nathan Azrak
1f4abb265a Adds the option to pass the original prompt into the AgentExecutor for PlanAndExecute agents (#5401)
# Adds the option to pass the original prompt into the AgentExecutor for
PlanAndExecute agents

This PR allows the user to optionally specify that they wish for the
original prompt/objective to be passed into the Executor agent used by
the PlanAndExecute agent. This solves a potential problem where the plan
is formed referring to some context contained in the original prompt,
but which is not included in the current prompt.

Currently, the prompt format given to the Executor is:
```
System: Respond to the human as helpfully and accurately as possible. You have access to the following tools:

<Tool and Action Description>

<Output Format Description>

Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation:.
Thought:
Human: <Previous steps>

<Current step>
```

This PR changes the final part after `Human:` to optionally insert the
objective:
```
Human: <objective>

<Previous steps>

<Current step>
```

I have given a specific example in #5400 where the context of a database
path is lost, since the plan refers to the "given path".

The PR has been linted and formatted. So that existing behaviour is not
changed, I have defaulted the argument to `False` and added it as the
last argument in the signature, so it does not cause issues for any
users passing args positionally as opposed to using keywords.

Happy to take any feedback or make required changes! 

Fixes #5400

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@vowelparrot

---------

Co-authored-by: Nathan Azrak <nathan.azrak@gmail.com>
2023-06-03 14:59:09 -07:00
Felipe Ferreira
ae2cf1f598 Implements support for Personal Access Token Authentication in the ConfluenceLoader (#5385)
# Implements support for Personal Access Token Authentication in the
ConfluenceLoader

Fixes #5191

Implements a new optional parameter for the ConfluenceLoader: `token`.
This allows the use of personal access authentication when using the
on-prem server version of Confluence.

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@eyurtsev @Jflick58 

Twitter Handle: felipe_yyc

---------

Co-authored-by: Felipe <feferreira@ea.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-03 14:57:49 -07:00
Gardner Bickford
b81f98b8a6 Update confluence.py to return spaces between elements (#5383)
# Update confluence.py to return spaces between elements like headers
and links.

Please see
https://stackoverflow.com/questions/48913975/how-to-return-nicely-formatted-text-in-beautifulsoup4-when-html-text-is-across-m

Given:

```html
<address>
        183 Main St<br>East Copper<br>Massachusetts<br>U S A<br>
        MA 01516-113
    </address>
```

The document loader currently returns:

```
'183 Main StEast CopperMassachusettsU S A        MA 01516-113'
```

After this change, the document loader will return:

```
183 Main St East Copper Massachusetts U S A MA 01516-113
```


@eyurtsev would you prefer this to be an option that can be passed in?
2023-06-03 14:57:25 -07:00
Zeeland
b72401b47b pref: reduce DB query error rate (#5339)
# Reduce DB query error rate

If you use sql agent of `SQLDatabaseToolkit` to query data, it is prone
to errors in query fields and often uses fields that do not exist in
database tables for queries. However, the existing prompt does not
effectively make the agent aware that there are problems with the fields
they query. At this time, we urgently need to improve the prompt so that
the agent realizes that they have queried non-existent fields and allows
them to use the `schema_sql_db`, that is,` ListSQLDatabaseTool` first
queries the corresponding fields in the table in the database, and then
uses `QuerySQLDatabaseTool` for querying.

There is a demo of my project to show this problem.

**Original Agent**

```python
def create_mysql_kit():
    db = SQLDatabase.from_uri("mysql+pymysql://xxxxxxx")
    llm = OpenAI(temperature=0)

    toolkit = SQLDatabaseToolkit(db=db, llm=llm)
    agent_executor = create_sql_agent(
        llm=OpenAI(temperature=0),
        toolkit=toolkit,
        verbose=True
    )
    agent_executor.run("Who are the users of sysuser in this system? Tell me the username of all users")


if __name__ == '__main__':
    create_mysql_kit()

```

**original output**

```
> Entering new AgentExecutor chain...
Action: list_tables_sql_db
Action Input: ""
Observation: app_sysrole_menus, app_bimfacemodel, app_project_users, app_measuringpointdata, auth_user, auth_user_groups, django_apscheduler_djangojobexecution, app_project, app_elementpoint, django_apscheduler_djangojob, django_content_type, app_sysrole, django_admin_log, app_bimfaceaccount, app_measuringpoint_warning_thresholds, app_measuringpoint, app_company, auth_group_permissions, app_sysuser, app_sysuser_companies, app_sysmenu, app_datawarningthreshold, auth_group, auth_permission, app_datawarningrecord, auth_user_user_permissions, app_bimfaceaccount_bimface_models, django_migrations, app_measuringitem, django_session
Thought: I should query the app_sysuser table to get the users in the system.
Action: query_sql_db
Action Input: SELECT username FROM app_sysuser LIMIT 10;
Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'username' in 'field list'")
[SQL: SELECT username FROM app_sysuser LIMIT 10;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Thought: I should double check my query before executing it.
Action: query_checker_sql_db
Action Input: SELECT username FROM app_sysuser LIMIT 10;
Observation: 

SELECT username FROM app_sysuser LIMIT 10;
Thought: The query looks correct, so I should execute it.
Action: query_sql_db
Action Input: SELECT username FROM app_sysuser LIMIT 10;
Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'username' in 'field list'")
[SQL: SELECT username FROM app_sysuser LIMIT 10;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Thought: I should double check my query before executing it.
Action: query_checker_sql_db
Action Input: SELECT name FROM app_sysuser LIMIT 10;
Observation: 

SELECT name FROM app_sysuser LIMIT 10;
Thought: The query looks correct, so I should execute it.
Action: query_sql_db
Action Input: SELECT name FROM app_sysuser LIMIT 10;
Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'name' in 'field list'")
[SQL: SELECT name FROM app_sysuser LIMIT 10;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Thought: I should double check my query before executing it.
Action: query_checker_sql_db
Action Input: SELECT id, username FROM app_sysuser LIMIT 10;
Observation: 

SELECT id, username FROM app_sysuser LIMIT 10;
Thought: The query looks correct, so I should execute it.
Action: query_sql_db
Action Input: SELECT id, username FROM app_sysuser LIMIT 10;
Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'id' in 'field list'")
[SQL: SELECT id, username FROM app_sysuser LIMIT 10;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Thought: I should double check my query before executing it.
Action: query_checker_sql_db
Action Input: SELECT user_id, username FROM app_sysuser LIMIT 10;
Observation: 

SELECT user_id, username FROM app_sysuser WHERE user_id IS NOT NULL AND user_id BETWEEN 0 AND 10 UNION ALL SELECT user_id, username FROM app_sysuser WHERE username IS NOT NULL ORDER BY user_id ASC LIMIT 10;
Thought: The query looks correct, so I should execute it.
Action: query_sql_db
Action Input: SELECT user_id, username FROM app_sysuser WHERE user_id IS NOT NULL AND user_id BETWEEN 0 AND 10 UNION ALL SELECT user_id, username FROM app_sysuser WHERE username IS NOT NULL ORDER BY user_id ASC LIMIT 10;
Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'username' in 'field list'")
[SQL: SELECT user_id, username FROM app_sysuser WHERE user_id IS NOT NULL AND user_id BETWEEN 0 AND 10 UNION ALL SELECT user_id, username FROM app_sysuser WHERE username IS NOT NULL ORDER BY user_id ASC LIMIT 10;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Thought: I should double check my query before executing it.
Action: query_checker_sql_db
Action Input: SELECT user_id, user_name FROM app_sysuser LIMIT 10;
Observation: 

SELECT user_id, user_name FROM app_sysuser WHERE user_id NOT IN (NULL) AND user_name BETWEEN 'A' AND 'Z' UNION ALL SELECT user_id, user_name FROM app_sysuser WHERE user_id IS NOT NULL AND user_name LIKE '%a%' LIMIT 10;
Thought: The query looks correct, so I should execute it.
Action: query_sql_db
Action Input: SELECT user_id, user_name FROM app_sysuser WHERE user_id NOT IN (NULL) AND user_name BETWEEN 'A' AND 'Z' UNION ALL SELECT user_id, user_name FROM app_sysuser WHERE user_id IS NOT NULL AND user_name LIKE '%a%' LIMIT 10;
Observation: []
Thought: I now know the final answer.
Final Answer: There are no users in the sysuser table.

> Finished chain.

Process finished with exit code 0
```

As you can see, agent always use incorrect sql to query the data.
Actually, the username field name of my `app_sysuser` table is
`user_name`, not the `username` believed by the agent.

**Optimized agent output**

```
> Entering new AgentExecutor chain...
Action: list_tables_sql_db
Action Input: ""
Observation: app_project_users, django_content_type, app_measuringpoint_warning_thresholds, app_bimfaceaccount_bimface_models, django_session, app_sysuser, django_migrations, app_bimfacemodel, app_elementpoint, app_measuringpoint, django_apscheduler_djangojobexecution, auth_permission, app_sysuser_companies, app_measuringpointdata, app_measuringitem, app_sysrole_menus, django_admin_log, auth_group, django_apscheduler_djangojob, app_sysmenu, app_project, app_bimfaceaccount, app_datawarningthreshold, app_datawarningrecord, auth_user_groups, auth_user_user_permissions, auth_group_permissions, auth_user, app_company, app_sysrole
Thought: I should query the app_sysuser table to get the usernames of all the users.
Action: query_sql_db
Action Input: SELECT username FROM app_sysuser
Observation: Error: (pymysql.err.OperationalError) (1054, "Unknown column 'username' in 'field list'")
[SQL: SELECT username FROM app_sysuser]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
Thought: I should use the schema_sql_db tool to get the correct table fields.
Action: schema_sql_db
Action Input: app_sysuser
Observation: 
CREATE TABLE app_sysuser (
	user_id INTEGER NOT NULL AUTO_INCREMENT, 
	user_account VARCHAR(50) COLLATE utf8mb4_general_ci NOT NULL, 
	user_password VARCHAR(255) COLLATE utf8mb4_general_ci NOT NULL, 
	user_name VARCHAR(50) COLLATE utf8mb4_general_ci NOT NULL, 
	user_avatar VARCHAR(255) COLLATE utf8mb4_general_ci, 
	user_email VARCHAR(254) COLLATE utf8mb4_general_ci, 
	user_phone VARCHAR(20) COLLATE utf8mb4_general_ci, 
	user_create_time DATETIME(6) NOT NULL, 
	user_update_time DATETIME(6) NOT NULL, 
	user_status SMALLINT, 
	role_id_id INTEGER, 
	PRIMARY KEY (user_id), 
	CONSTRAINT app_sysuser_role_id_id_51b50f39_fk_app_sysrole_role_id FOREIGN KEY(role_id_id) REFERENCES app_sysrole (role_id)
)COLLATE utf8mb4_general_ci DEFAULT CHARSET=utf8mb4 ENGINE=InnoDB

/*
3 rows from app_sysuser table:
user_id	user_account	user_password	user_name	user_avatar	user_email	user_phone	user_create_time	user_update_time	user_status	role_id_id
xxxxxxxxxxxxxx
*/
Thought: I should query the app_sysuser table to get the usernames of all the users.
Action: query_sql_db
Action Input: SELECT user_account FROM app_sysuser LIMIT 10
Observation: [('baiyun',), ('eatrice',), ('lisi',), ('pingxiang',), ('wangwu',), ('zeeland',), ('zsj',), ('zzw',)]
Thought: I now know the final answer
Final Answer: The usernames of the users in the sysuser table are baiyun, eatrice, lisi, pingxiang, wangwu, zeeland, zsj, and zzw.

> Finished chain.

Process finished with exit code 0

```

I have tested about 10 related prompts and they all work properly, with
a much lower error rate compared to before


## Who can review?

@vowelparrot

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-03 14:55:56 -07:00
mbchang
ce6dbe41a9 minor refactor GenerativeAgentMemory (#5315)
# minor refactor of GenerativeAgentMemory

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

- refactor `format_memories_detail` to be more reusable
- modified prompts for getting topics for reflection and for generating
insights
- update `characters.ipynb` to reflect changes

## Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049
        
 -->
@vowelparrot
@hwchase17
@dev2049
2023-06-03 14:53:14 -07:00
Leonid Ganeline
95c6ed0568 docs: modules pages simplified (#5116)
# docs: modules pages simplified

Fixied #5627  issue

Merged several repetitive sections in the `modules` pages. Some texts,
that were hard to understand, were also simplified.


## Who can review?

@hwchase17
@dev2049
2023-06-03 14:44:32 -07:00
Chandan Routray
bc875a9df1 Fixed multi input prompt for MapReduceChain (#4979)
# Fixed multi input prompt for MapReduceChain

Added `kwargs` support for inner chains of `MapReduceChain` via
`from_params` method
Currently the `from_method` method of intialising `MapReduceChain` chain
doesn't work if prompt has multiple inputs. It happens because it uses
`StuffDocumentsChain` and `MapReduceDocumentsChain` underneath, both of
them require specifying `document_variable_name` if `prompt` of their
`llm_chain` has more than one `input`.

With this PR, I have added support for passing their respective `kwargs`
via the `from_params` method.

## Fixes https://github.com/hwchase17/langchain/issues/4752

## Who can review? 
@dev2049 @hwchase17 @agola11

---------

Co-authored-by: imeckr <chandanroutray2012@gmail.com>
2023-06-03 14:41:03 -07:00
Matt Robinson
a97e4252e3 feat: add UnstructuredExcelLoader for .xlsx and .xls files (#5617)
# Unstructured Excel Loader

Adds an `UnstructuredExcelLoader` class for `.xlsx` and `.xls` files.
Works with `unstructured>=0.6.7`. A plain text representation of the
Excel file will be available under the `page_content` attribute in the
doc. If you use the loader in `"elements"` mode, an HTML representation
of the Excel file will be available under the `text_as_html` metadata
key. Each sheet in the Excel document is its own document.

### Testing

```python
from langchain.document_loaders import UnstructuredExcelLoader

loader = UnstructuredExcelLoader(
    "example_data/stanley-cups.xlsx",
    mode="elements"
)
docs = loader.load()
```

## Who can review?

@hwchase17
@eyurtsev
2023-06-03 12:44:12 -07:00
Leonid Ganeline
9a7488a5ce fix import issue (#5636)
# fix for the import issue

Added document loader classes from [`figma`, `iugu`, `onedrive_file`] to
`document_loaders/__inti__.py` imports
Also sorted `__all__`

Fixed #5623 issue
2023-06-02 14:58:41 -07:00
Zander Chase
20ec1173f4 Update Tracer Auth / Reduce Num Calls (#5517)
Update the session creation and calls

---------

Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
2023-06-02 12:13:56 -07:00
Sean Morgan
949729ff5c Fix bedrock llm boto3 client instantiation (#5629)
Same issue as https://github.com/hwchase17/langchain/pull/5574
2023-06-02 12:04:49 -07:00
Caleb Ellington
c5a7a85a4e fix chroma update_document to embed entire documents, fixes a characer-wise embedding bug (#5584)
# Chroma update_document full document embeddings bugfix

Chroma update_document takes a single document, but treats the
page_content sting of that document as a list when getting the new
document embedding.

This is a two-fold problem, where the resulting embedding for the
updated document is incorrect (it's only an embedding of the first
character in the new page_content) and it calls the embedding function
for every character in the new page_content string, using many tokens in
the process.

Fixes #5582


Co-authored-by: Caleb Ellington <calebellington@Calebs-MBP.hsd1.ca.comcast.net>
2023-06-02 11:12:48 -07:00
Davis Chase
3c6fa9126a bump 189 (#5620) 2023-06-02 09:09:22 -07:00
Davis Chase
d784401215 Dev2049/add argilla callback (#5621)
Co-authored-by: Alvaro Bartolome <alvarobartt@gmail.com>
Co-authored-by: Daniel Vila Suero <daniel@argilla.io>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
2023-06-02 09:05:06 -07:00
Kacper Łukawski
71a7c16ee0 Fix: Qdrant ids (#5515)
# Fix Qdrant ids creation

There has been a bug in how the ids were created in the Qdrant vector
store. They were previously calculated based on the texts. However,
there are some scenarios in which two documents may have the same piece
of text but different metadata, and that's a valid case. Deduplication
should be done outside of insertion.

It has been fixed and covered with the integration tests.
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-02 08:57:34 -07:00
Jeff Vestal
d1f65d8dc1 Es knn index search 5346 (#5569)
# Create elastic_vector_search.ElasticKnnSearch class

This extends `langchain/vectorstores/elastic_vector_search.py` by adding
a new class `ElasticKnnSearch`

Features:
- Allow creating an index with the `dense_vector` mapping compataible
with kNN search
- Store embeddings in index for use with kNN search (correct mapping
creates HNSW data structure)
- Perform approximate kNN search
- Perform hybrid BM25 (`query{}`) + kNN (`knn{}`) search
- perform knn search by either providing a `query_vector` or passing a
hosted `model_id` to use query_vector_builder to automatically generate
a query_vector at search time

Connection options
- Using `cloud_id` from Elastic Cloud
- Passing elasticsearch client object

search options
- query
- k
- query_vector
- model_id
- size
- source
- knn_boost (hybrid search)
- query_boost (hybrid search)
- fields


This also adds examples to
`docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb`


Fixes # [5346](https://github.com/hwchase17/langchain/issues/5346)

cc: @dev2049

 -->

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-02 08:40:35 -07:00
Davis Chase
8b3df18bcc human approval callback (#5581)
![Screenshot 2023-06-01 at 2 39 40
PM](https://github.com/hwchase17/langchain/assets/130488702/769f1480-7e51-46d9-bcde-698d0b091803)
2023-06-02 06:59:33 -07:00
Zander Chase
6655f43282 Rm Template Title (#5616)
Remove the redundant title from the PR template

#### Before submitting
2023-06-02 06:54:55 -07:00
Bharat Ramanathan
28d6277396 docs(integration): update colab and external links in WandbTracing docs (#5602)
# Update Wandb Tracking documentation

This PR updates the Wandb Tracking documentation for formatting, updated
broken links and colab notebook links

---------

Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>
2023-06-02 02:58:42 -07:00
Waldecir Santos
db45970a66 Fix SQLAlchemy truncating text when it is too big (#5206)
# Fixes SQLAlchemy truncating the result if you have a big/text column
with many chars.

SQLAlchemy truncates columns if you try to convert a Row or Sequence to
a string directly

For comparison:

- Before:
```[('Harrison', 'That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio ... (2 characters truncated) ... hat is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio That is my Bio ')]```

- After:
```[('Harrison', 'That is my Bio That is my Bio That is my Bio That is
my Bio That is my Bio That is my Bio That is my Bio That is my Bio That
is my Bio That is my Bio That is my Bio That is my Bio That is my Bio
That is my Bio That is my Bio That is my Bio That is my Bio That is my
Bio That is my Bio That is my Bio ')]```



## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

I'm not sure who to tag for chains, maybe @vowelparrot ?
2023-06-01 21:33:31 -04:00
Davis Chase
4c572ffe95 nit (#5578) 2023-06-01 14:21:15 -07:00
sseide
001b147450 Documentation fixes (linting and broken links) (#5563)
# Lint sphinx documentation and fix broken links

This PR lints multiple warnings shown in generation of the project
documentation (using "make docs_linkcheck" and "make docs_build").
Additionally documentation internal links to (now?) non-existent files
are modified to point to existing documents as it seemed the new correct
target.

The documentation is not updated content wise.
There are no source code changes.

Fixes # (issue)

- broken documentation links to other files within the project
- sphinx formatting (linting)

## Before submitting

No source code changes, so no new tests added.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-01 13:06:17 -07:00
Sean Morgan
8441cff1d7 Fix bedrock auth validation (#5574)
https://github.com/hwchase17/langchain/pull/5523 has a small bug if
client was not passed in constructor
2023-06-01 12:35:06 -07:00
Andrew Lei
6258f72a00 Add missing comma in conv chat agent prompt json (#5573)
# Add missing comma in conversational chat agent prompt json

Inspired by: https://github.com/hwchase17/langchainjs/pull/1498
2023-06-01 12:12:44 -07:00
Ikko Eltociear Ashimine
14a611775c Fix typo in docugami.ipynb (#5571)
# Fix typo in docugami.ipynb

Fixed typo.
infromation -> information
2023-06-01 11:45:56 -07:00
Blithe
80b3fdf2f7 make the elasticsearch api support version which below 8.x (#5495)
the api which create index or search in the elasticsearch below 8.x is
different with 8.x. When use the es which below 8.x , it will throw
error. I fix the problem


Co-authored-by: gaofeng27692 <gaofeng27692@hundsun.com>
2023-06-01 10:58:20 -07:00
Davis Chase
6632188606 bump 188 (#5568) 2023-06-01 08:50:54 -07:00
Davis Chase
6afb463e9b Qdrant self query (#5567)
Add self query abilities to qdrant vectorstore
2023-06-01 08:40:31 -07:00
Patrick Keane
47c2ec2d0b Corrects inconsistently misspelled variable name. (#5559)
Corrects a spelling error (of the word separator) in several variable
names. Three cut/paste instances of this were corrected, amidst
instances of it also being named properly, which would likely would lead
to issues for someone in the future.

Here is one such example:

```
        seperators = self.get_separators_for_language(Language.PYTHON)
        super().__init__(separators=seperators, **kwargs)
```
becomes
```
        separators = self.get_separators_for_language(Language.PYTHON)
        super().__init__(separators=separators, **kwargs)
```

Make test results below:

```
============================== 708 passed, 52 skipped, 27 warnings in 11.70s ==============================
```
2023-06-01 10:27:58 -04:00
Harrison Chase
342b671d05 add brave search util (#5538)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-01 01:11:51 -07:00
Davis Chase
983a213bdc add maxcompute (#5533)
cc @pengwork (fresh branch, no creds)
2023-06-01 00:54:42 -07:00
Bharat Ramanathan
22603d19e0 feat(integrations): Add WandbTracer (#4521)
# WandbTracer
This PR adds the `WandbTracer` and deprecates the existing
`WandbCallbackHandler`.

Added an example notebook under the docs section alongside the
`LangchainTracer`
Here's an example
[colab](https://colab.research.google.com/drive/1pY13ym8ENEZ8Fh7nA99ILk2GcdUQu0jR?usp=sharing)
with the same notebook and the
[trace](https://wandb.ai/parambharat/langchain-tracing/runs/8i45cst6)
generated from the colab run


Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-01 00:01:19 -07:00
Leonid Ganeline
373ad49157 docs ecosystem/integrations update 3 (#5470)
# docs: `ecosystem_integrations` update 3

Next cycle of updating the `ecosystem/integrations`
* Added an integration `template` file
* Added missed integration files
* Fixed several document_loaders/notebooks

## Who can review?

Is it possible to assign somebody to review PRs on docs? Thanks.
2023-05-31 17:54:05 -07:00
Aditi Viswanathan
bc66b3fb8d make BaseEntityStore inherit from BaseModel (#5478)
# Make BaseEntityStore inherit from BaseModel

This enables initializing InMemoryEntityStore by optionally passing in a
value for the store field.

## Who can review?

It's a small change so I think any of the reviewers can review, but
tagging @dev2049 who seems most relevant since the change relates to
Memory.
2023-05-31 17:32:19 -07:00
Sheng Han Lim
3bae595182 Add texts with embeddings to PGVector wrapper (#5500)
Similar to #1813 for faiss, this PR is to extend functionality to pass
text and its vector pair to initialize and add embeddings to the
PGVector wrapper.

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
  - @dev2049
2023-05-31 17:31:52 -07:00
Tobias van der Werff
8d07ba0d51 Fix wrong class instantiation in docs MMR example (#5501)
# Fix wrong class instantiation in docs MMR example

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

When looking at the Maximal Marginal Relevance ExampleSelector example
at
https://python.langchain.com/en/latest/modules/prompts/example_selectors/examples/mmr.html,
I noticed that there seems to be an error. Initially, the
`MaxMarginalRelevanceExampleSelector` class is used as an
`example_selector` argument to the `FewShotPromptTemplate` class. Then,
according to the text, a comparison is made to regular similarity
search. However, the `FewShotPromptTemplate` still uses the
`MaxMarginalRelevanceExampleSelector` class, so the output is the same.

To fix it, I added an instantiation of the
`SemanticSimilarityExampleSelector` class, because this seems to be what
is intended.


## Who can review?

@hwchase17
2023-05-31 17:30:59 -07:00
Taras Tsugrii
b61f50665e [retrievers][knn] Replace loop appends with list comprehension. (#5529)
# Replace loop appends with list comprehension.

It's much faster, more idiomatic and slightly more readable.
2023-05-31 16:57:24 -07:00
Taras Tsugrii
0ad76c3380 Replace loop appends with list comprehension. (#5528)
# Replace loop appends with list comprehension.

It's significantly faster because it avoids repeated method lookup. It's
also more idiomatic and readable.
2023-05-31 16:56:13 -07:00
Timothy Ji
bd9e0f3934 Add param requests_kwargs for WebBaseLoader (#5485)
# Add param `requests_kwargs` for WebBaseLoader

Fixes # (issue)

#5483 

## Who can review?

@eyurtsev
2023-05-31 15:27:38 -07:00
Taras Tsugrii
359fb8fa3a Replace list comprehension with generator. (#5526)
# Replace list comprehension with generator.

Since these strings can be fairly long, it's best to not construct
unnecessary temporary list just to pass it to `join`. Generators produce
items one-by-one and even though they are slightly more expensive than
lists in terms of CPU they are much more memory-friendly and slightly
more readable.
2023-05-31 15:10:43 -07:00
Matt Robinson
4c8aad0d1b docs: unstructured no longer requires installing detectron2 from source (#5524)
# Update Unstructured docs to remove the `detectron2` install
instructions

Removes `detectron2` installation instructions from the Unstructured
docs because installing `detectron2` is no longer required for
`unstructured>=0.7.0`. The `detectron2` model now runs using the ONNX
runtime.

## Who can review?

@hwchase17 
@eyurtsev
2023-05-31 15:03:21 -07:00
Rithwik Ediga Lakhamsani
d765d77e9b Add minor fixes for PySpark Document Loader Docs (#5525)
# Add minor fixes for PySpark Document Loader Docs

Renamed "PySpack" to "PySpark" and executed the notebook to show
outputs.
2023-05-31 15:02:57 -07:00
Taras Tsugrii
af41cdfc8b Replace enumerate with zip. (#5527)
# Replace enumerate with zip.

It's more idiomatic and slightly more readable.
2023-05-31 15:02:23 -07:00
James O'Dwyer
226a7521ed Add Managed Motorhead (#5507)
# Add Managed Motorhead
This change enabled MotorheadMemory to utilize Metal's managed version
of Motorhead. We can easily enable this by passing in a `api_key` and
`client_id` in order to hit the managed url and access the memory api on
Metal.

Twitter: [@softboyjimbo](https://twitter.com/softboyjimbo)

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

 @dev2049 @hwchase17

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 14:55:41 -07:00
Piyush Jain
5ffa924488 Skips creating boto client for Bedrock if passed in constructor (#5523)
# Skips creating boto client if passed in constructor
Current LLM and Embeddings class always creates a new boto client, even
if one is passed in a constructor. This blocks certain users from
passing in externally created boto clients, for example in SSO
authentication.

## Who can review?
@hwchase17 
@jasondotparse 
@rsgrewal-aws

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-05-31 14:54:12 -07:00
Leonid Ganeline
6b47aaab82 added DeepLearing.AI course link (#5518)
# added DeepLearing.AI course link


## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:


 not @hwchase17 - hehe
2023-05-31 14:53:14 -07:00
Víctor Navarro Aránguiz
f39340ff6b Add allow_download as attribute for GPT4All (#5512)
# Added support for download GPT4All model if does not exist

I've include the class attribute `allow_download` to the GPT4All class.
By default, `allow_download` is set to False.

## Changes Made
- Added a new attribute `allow_download` to the GPT4All class.
- Updated the `validate_environment` method to pass the `allow_download`
parameter to the GPT4All model constructor.

## Context
This change provides more control over model downloading in the GPT4All
class. Previously, if the model file was not found in the cache
directory `~/.cache/gpt4all/`, the package returned error "Failed to
retrieve model (type=value_error)". Now, if `allow_download` is set as
True then it will use GPT4All package to download it . With the addition
of the `allow_download` attribute, users can now choose whether the
wrapper is allowed to download the model or not.

## Dependencies
There are no new dependencies introduced by this change. It only
utilizes existing functionality provided by the GPT4All package.

## Testing
Since this is a minor change to the existing behavior, the existing test
suite for the GPT4All package should cover this scenario

Co-authored-by: Vokturz <victornavarrrokp47@gmail.com>
2023-05-31 13:32:31 -07:00
Zander Chase
ea09c0846f Add Feedback Methods + Evaluation examples (#5166)
Add CRUD methods to interact with feedback endpoints + added eval
examples to the notebook
2023-05-31 11:14:27 -07:00
Davis Chase
46b7181f13 bump 187 (#5504) 2023-05-31 07:35:09 -07:00
Harrison Chase
f0ea77b230 add more vars to text splitter (#5503) 2023-05-31 07:21:20 -07:00
Piyush Jain
562fdfc8f9 Bedrock llm and embeddings (#5464)
# Bedrock LLM and Embeddings
This PR adds a new LLM and an Embeddings class for the
[Bedrock](https://aws.amazon.com/bedrock) service. The PR also includes
example notebooks for using the LLM class in a conversation chain and
embeddings usage in creating an embedding for a query and document.

**Note**: AWS is doing a private release of the Bedrock service on
05/31/2023; users need to request access and added to an allowlist in
order to start using the Bedrock models and embeddings. Please use the
[Bedrock Home Page](https://aws.amazon.com/bedrock) to request access
and to learn more about the models available in Bedrock.

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-05-31 07:17:01 -07:00
Harrison Chase
5ce74b5958 code splitter docs (#5480)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 07:11:53 -07:00
Harrison Chase
470b2822a3 Add matching engine vectorstore (#3350)
Co-authored-by: Tom Piaggio <tomaspiaggio@google.com>
Co-authored-by: scafati98 <jupyter@matchingengine.us-central1-a.c.scafati-joonix.internal>
Co-authored-by: scafati98 <scafatieugenio@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 02:28:02 -07:00
Kacper Łukawski
8bcaca435a Feature: Qdrant filters supports (#5446)
# Support Qdrant filters

Qdrant has an [extensive filtering
system](https://qdrant.tech/documentation/concepts/filtering/) with rich
type support. This PR makes it possible to use the filters in Langchain
by passing an additional param to both the
`similarity_search_with_score` and `similarity_search` methods.

## Who can review?

@dev2049 @hwchase17

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-31 02:26:16 -07:00
Harrison Chase
f72bb966f8 Harrison/html splitter (#5468)
Co-authored-by: David Revillas <26328973+r3v1@users.noreply.github.com>
2023-05-30 21:06:07 -07:00
Ankush Gola
1671c2afb2 py tracer fixes (#5377) 2023-05-30 18:47:06 -07:00
Jose Ignacio Hervás Díaz
ce8b7a2a69 SQLite-backed Entity Memory (#5129)
# SQLite-backed Entity Memory

Following the initiative of
https://github.com/hwchase17/langchain/pull/2397 I think it would be
helpful to be able to persist Entity Memory on disk by default

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-30 18:39:47 -07:00
Jeff Vestal
46e181aa8b Allow ElasticsearchEmbeddings to create a connection with ES Client object (#5321)
This PR adds a new method `from_es_connection` to the
`ElasticsearchEmbeddings` class allowing users to use Elasticsearch
clusters outside of Elastic Cloud.

Users can create an Elasticsearch Client object and pass that to the new
function.
The returned object is identical to the one returned by calling
`from_credentials`

```
# Create Elasticsearch connection
es_connection = Elasticsearch(
    hosts=['https://es_cluster_url:port'], 
    basic_auth=('user', 'password')
)

# Instantiate ElasticsearchEmbeddings using es_connection
embeddings = ElasticsearchEmbeddings.from_es_connection(
  model_id,
  es_connection,
)
```

I also added examples to the elasticsearch jupyter notebook

Fixes # https://github.com/hwchase17/langchain/issues/5239

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-30 17:26:30 -07:00
Mark Pors
0a44bfdca3 Allow for async use of SelfAskWithSearchChain (#5394)
# Allow for async use of SelfAskWithSearchChain


Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-30 17:02:39 -07:00
Víctor Navarro Aránguiz
8121e04200 added n_threads functionality for gpt4all (#5427)
# Added support for modifying the number of threads in the GPT4All model

I have added the capability to modify the number of threads used by the
GPT4All model. This allows users to adjust the model's parallel
processing capabilities based on their specific requirements.

## Changes Made
- Updated the `validate_environment` method to set the number of threads
for the GPT4All model using the `values["n_threads"]` parameter from the
`GPT4All` class constructor.

## Context
Useful in scenarios where users want to optimize the model's performance
by leveraging multi-threading capabilities.
Please note that the `n_threads` parameter was included in the `GPT4All`
class constructor but was previously unused. This change ensures that
the specified number of threads is utilized by the model .

## Dependencies
There are no new dependencies introduced by this change. It only
utilizes existing functionality provided by the GPT4All package.

## Testing
Since this is a minor change testing is not required.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-30 16:31:30 -07:00
Blithe
e31705b5ab convert the parameter 'text' to uppercase in the function 'parse' of the class BooleanOutputParser (#5397)
when the LLMs output 'yes|no',BooleanOutputParser can parse it to
'True|False', fix the ValueError in parse().
<!--
when use the BooleanOutputParser in the chain_filter.py, the LLMs output
'yes|no',the function 'parse' will throw ValueError。
-->

Fixes # (issue)
  #5396
  https://github.com/hwchase17/langchain/issues/5396

---------

Co-authored-by: gaofeng27692 <gaofeng27692@hundsun.com>
2023-05-30 16:26:17 -07:00
Natalie
199cc700a3 Ability to specify credentials wihen using Google BigQuery as a data loader (#5466)
# Adds ability to specify credentials when using Google BigQuery as a
data loader

Fixes #5465 . Adds ability to set credentials which must be of the
`google.auth.credentials.Credentials` type. This argument is optional
and will default to `None.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-30 16:25:22 -07:00
Harrison Chase
eab4b4ccd7 add simple test for imports (#5461)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-30 16:24:27 -07:00
Janos Tolgyesi
1111f18eb4 Add maximal relevance search to SKLearnVectorStore (#5430)
# Add maximal relevance search to SKLearnVectorStore

This PR implements the maximum relevance search in SKLearnVectorStore. 

Twitter handle: jtolgyesi (I submitted also the original implementation
of SKLearnVectorStore)

## Before submitting

Unit tests are included.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-30 16:13:33 -07:00
Ayan Bandyopadhyay
8181f9e362 Update psychicapi version (#5471)
Update [psychicapi](https://pypi.org/project/psychicapi/) python package
dependency to the latest version 0.5. The newest python package version
addresses breaking changes in the Psychic http api.
2023-05-30 15:55:22 -07:00
Kacper Łukawski
f93d256190 Feat: Add batching to Qdrant (#5443)
# Add batching to Qdrant

Several people requested a batching mechanism while uploading data to
Qdrant. It is important, as there are some limits for the maximum size
of the request payload, and without batching implemented in Langchain,
users need to implement it on their own. This PR exposes a new optional
`batch_size` parameter, so all the documents/texts are loaded in batches
of the expected size (64, by default).

The integration tests of Qdrant are extended to cover two cases:
1. Documents are sent in separate batches.
2. All the documents are sent in a single request.
2023-05-30 15:33:54 -07:00
Camille Van Hoffelen
80e133f16d Added async _acall to FakeListLLM (#5439)
# Added Async _acall to FakeListLLM

FakeListLLM is handy when unit testing apps built with langchain. This
allows the use of FakeListLLM inside concurrent code with
[asyncio](https://docs.python.org/3/library/asyncio.html).

I also changed the pydocstring which was out of date.

## Who can review?

@hwchase17 - project lead
@agola11 - async
2023-05-30 14:34:36 -07:00
Leonid Ganeline
1f11f80641 docs: cleaning (#5413)
# docs cleaning

Changed docs to consistent format (probably, we need an official doc
integration template):
- ClearML - added product descriptions; changed title/headers
- Rebuff  - added product descriptions; changed title/headers
- WhyLabs  - added product descriptions; changed title/headers
- Docugami - changed title/headers/structure
- Airbyte - fixed title
- Wolfram Alpha - added descriptions, fixed title
- OpenWeatherMap -  - added product descriptions; changed title/headers
- Unstructured - changed description

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@hwchase17
@dev2049
2023-05-30 13:58:16 -07:00
Matt Wells
1d861dc37a MRKL output parser no longer breaks well formed queries (#5432)
# Handles the edge scenario in which the action input is a well formed
SQL query which ends with a quoted column

There may be a cleaner option here (or indeed other edge scenarios) but
this seems to robustly determine if the action input is likely to be a
well formed SQL query in which we don't want to arbitrarily trim off `"`
characters

Fixes #5423

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Agents / Tools / Toolkits
  - @vowelparrot
2023-05-30 15:58:47 -04:00
Yoann Poupart
c1807d8408 encoding_kwargs for InstructEmbeddings (#5450)
# What does this PR do?

Bring support of `encode_kwargs` for ` HuggingFaceInstructEmbeddings`,
change the docstring example and add a test to illustrate with
`normalize_embeddings`.

Fixes #3605
(Similar to #3914)

Use case:
```python
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = "hkunlp/instructor-large"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
hf = HuggingFaceInstructEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)
```
2023-05-30 11:57:04 -07:00
Patrick Keane
e09afb4b44 Removes duplicated call from langchain/client/langchain.py (#5449)
This removes duplicate code presumably introduced by a cut-and-paste
error, spotted while reviewing the code in
```langchain/client/langchain.py```. The original code had back to back
occurrences of the following code block:

```
        response = self._get(
            path,
            params=params,
        )
        raise_for_status_with_text(response)
```
2023-05-30 11:52:46 -07:00
Jan Brinkmann
0d3a9d481f Fixed docstring in faiss.py for load_local (#5440)
# Fix for docstring in faiss.py vectorstore (load_local)

The doctring should reflect that load_local loads something FROM the
disk.
2023-05-30 11:41:00 -07:00
Davis Chase
4379bd4cbb bump 186 (#5459) 2023-05-30 10:47:59 -07:00
Davis Chase
2649b638dd fix (#5457) 2023-05-30 10:42:20 -07:00
Davis Chase
64b4165c8d bump 185 (#5442) 2023-05-30 08:08:11 -07:00
ByronHsu
9d658aaa5a Add more code splitters (go, rst, js, java, cpp, scala, ruby, php, swift, rust) (#5171)
As the title says, I added more code splitters.
The implementation is trivial, so i don't add separate tests for each
splitter.
Let me know if any concerns.

Fixes # (issue)
https://github.com/hwchase17/langchain/issues/5170

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@eyurtsev @hwchase17

---------

Signed-off-by: byhsu <byhsu@linkedin.com>
Co-authored-by: byhsu <byhsu@linkedin.com>
2023-05-30 11:04:05 -04:00
Paul-Emile Brotons
a61b7f7e7c adding MongoDBAtlasVectorSearch (#5338)
# Add MongoDBAtlasVectorSearch for the python library

Fixes #5337
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-30 07:59:01 -07:00
Harrison Chase
c4b502a470 Harrison/condense q llm (#5438) 2023-05-30 07:15:37 -07:00
Lei Xu
ee57054d05 Rename and fix typo in lancedb (#5425)
# Fix typo in LanceDB notebook filename
2023-05-30 00:24:17 -07:00
Zander Chase
26ff18575c Set old LCTracer to default to port 8000 (#5381)
Issue from:
https://discord.com/channels/1038097195422978059/1069478035918688346/1112445980466483222
2023-05-29 22:42:53 -07:00
Harrison Chase
760632b292 Harrison/spark reader (#5405)
Co-authored-by: Rithwik Ediga Lakhamsani <rithwik.ediga@databricks.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-29 20:23:17 -07:00
UmerHA
8259f9b7fa DocumentLoader for GitHub (#5408)
# Creates GitHubLoader (#5257)

GitHubLoader is a DocumentLoader that loads issues and PRs from GitHub.

Fixes #5257

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-29 20:11:21 -07:00
German Martin
0b3e0dd1d2 New Trello document loader (#4767)
# Added New Trello loader class and documentation

Simple Loader on top of py-trello wrapper. 
With a board name you can pull cards and to do some field parameter
tweaks on load operation.
I included documentation and examples.
Included unit test cases using patch and a fixture for py-trello client
class.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-29 19:47:56 -07:00
Harrison Chase
72f99ff953 Harrison/text splitter (#5417)
adds support for keeping separators around when using recursive text
splitter
2023-05-29 16:56:31 -07:00
小铭
cf5803e44c Add ToolException that a tool can throw. (#5050)
# Add ToolException that a tool can throw
This is an optional exception that tool throws when execution error
occurs.
When this exception is thrown, the agent will not stop working,but will
handle the exception according to the handle_tool_error variable of the
tool,and the processing result will be returned to the agent as
observation,and printed in pink on the console.It can be used like this:
```python 
from langchain.schema import ToolException
from langchain import LLMMathChain, SerpAPIWrapper, OpenAI
from langchain.agents import AgentType, initialize_agent
from langchain.chat_models import ChatOpenAI
from langchain.tools import BaseTool, StructuredTool, Tool, tool
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0)
llm_math_chain = LLMMathChain(llm=llm, verbose=True)

class Error_tool:
    def run(self, s: str):
        raise ToolException('The current search tool is not available.')
    
def handle_tool_error(error) -> str:
    return "The following errors occurred during tool execution:"+str(error)

search_tool1 = Error_tool()
search_tool2 = SerpAPIWrapper()
tools = [
    Tool.from_function(
        func=search_tool1.run,
        name="Search_tool1",
        description="useful for when you need to answer questions about current events.You should give priority to using it.",
        handle_tool_error=handle_tool_error,
    ),
    Tool.from_function(
        func=search_tool2.run,
        name="Search_tool2",
        description="useful for when you need to answer questions about current events",
        return_direct=True,
    )
]
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True,
                         handle_tool_errors=handle_tool_error)
agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?")
```

![image](https://github.com/hwchase17/langchain/assets/32786500/51930410-b26e-4f85-a1e1-e6a6fb450ada)

## Who can review?
- @vowelparrot

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-29 20:05:58 +00:00
Harrison Chase
cce731c3c2 bump version 184 (#5407) 2023-05-29 07:53:32 -07:00
Harrison Chase
2da8c48be1 Harrison/datetime parser (#4693)
Co-authored-by: Jacob Valdez <jacobfv@msn.com>
Co-authored-by: Jacob Valdez <jacob.valdez@limboid.ai>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-05-29 07:52:30 -07:00
Leonid Ganeline
1837caa70d docs: ecosystem/integrations update 1 (#5219)
# docs: ecosystem/integrations update

It is the first in a series of `ecosystem/integrations` updates.

The ecosystem/integrations list is missing many integrations.
I'm adding the missing integrations in a consistent format: 
1. description of the integrated system
2. `Installation and Setup` section with 'pip install ...`, Key setup,
and other necessary settings
3. Sections like `LLM`, `Text Embedding Models`, `Chat Models`... with
links to correspondent examples and imports of the used classes.

This PR keeps new docs, that are presented in the
`docs/modules/models/text_embedding/examples` but missed in the
`ecosystem/integrations`. The next PRs will cover the next example
sections.

Also updated `integrations.rst`: added the `Dependencies` section with a
link to the packages used in LangChain.

## Who can review?

@hwchase17
@eyurtsev
@dev2049
2023-05-29 07:25:17 -07:00
Leonid Ganeline
a3598193a0 docs: ecosystem/integrations update 2 (#5282)
# docs: ecosystem/integrations update 2

#5219 - part 1 
The second part of this update (parts are independent of each other! no
overlap):

- added diffbot.md
- updated confluence.ipynb; added confluence.md
- updated college_confidential.md
- updated openai.md
- added blackboard.md
- added bilibili.md
- added azure_blob_storage.md
- added azlyrics.md
- added aws_s3.md

## Who can review?

@hwchase17@agola11
@agola11
 @vowelparrot
 @dev2049
2023-05-29 07:19:43 -07:00
Eduard van Valkenburg
ccb6238de1 Implemented appending arbitrary messages (#5293)
# Implemented appending arbitrary messages to the base chat message
history, the in-memory and cosmos ones.

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

As discussed this is the alternative way instead of #4480, with a
add_message method added that takes a BaseMessage as input, so that the
user can control what is in the base message like kwargs.

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@hwchase17

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-29 07:18:59 -07:00
Harrison Chase
d6fb25c439 Harrison/prediction guard update (#5404)
Co-authored-by: Daniel Whitenack <whitenack.daniel@gmail.com>
2023-05-29 07:14:59 -07:00
Harrison Chase
416c8b1da3 Harrison/deep infra (#5403)
Co-authored-by: Yessen Kanapin <yessenzhar@gmail.com>
Co-authored-by: Yessen Kanapin <yessen@deepinfra.com>
2023-05-29 07:10:50 -07:00
Timothy Ji
100d6655df Reformat openai proxy setting as code (#5330)
# Reformat the openai proxy setting as code


  Only affect the doc for openai Model
  - @hwchase17
  - @agola11
2023-05-29 07:02:47 -07:00
Justin Flick
c09f8e4ddc Add pagination for Vertex AI embeddings (#5325)
Fixes #5316

---------

Co-authored-by: Justin Flick <jflick@homesite.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-29 06:57:41 -07:00
Harrison Chase
3e16468423 Harrison/llamacpp (#5402)
Co-authored-by: Gavin S <gavinswanson@gmail.com>
2023-05-29 06:44:58 -07:00
Chandan Routray
642ae83d86 Removed deprecated llm attribute for load_chain (#5343)
# Removed deprecated llm attribute for load_chain

Currently `load_chain` for some chain types expect `llm` attribute to be
present but `llm` is deprecated attribute for those chains and might not
be persisted during their `chain.save`.

Fixes #5224
[(issue)](https://github.com/hwchase17/langchain/issues/5224)

## Who can review?
@hwchase17
@dev2049

---------

Co-authored-by: imeckr <chandanroutray2012@gmail.com>
2023-05-29 06:44:47 -07:00
Oleh Kuznetsov
f6615cac41 Update llamacpp demonstration notebook (#5344)
# Update llamacpp demonstration notebook

Add instructions to install with BLAS backend, and update the example of
model usage.

Fixes #5071. However, it is more like a prevention of similar issues in
the future, not a fix, since there was no problem in the framework
functionality

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

- @hwchase17 
- @agola11
2023-05-29 06:43:26 -07:00
Martin Holecek
44b48d9518 Fix update_document function, add test and documentation. (#5359)
# Fix for `update_document` Function in Chroma

## Summary
This pull request addresses an issue with the `update_document` function
in the Chroma class, as described in
[#5031](https://github.com/hwchase17/langchain/issues/5031#issuecomment-1562577947).
The issue was identified as an `AttributeError` raised when calling
`update_document` due to a missing corresponding method in the
`Collection` object. This fix refactors the `update_document` method in
`Chroma` to correctly interact with the `Collection` object.

## Changes
1. Fixed the `update_document` method in the `Chroma` class to correctly
call methods on the `Collection` object.
2. Added the corresponding test `test_chroma_update_document` in
`tests/integration_tests/vectorstores/test_chroma.py` to reflect the
updated method call.
3. Added an example and explanation of how to use the `update_document`
function in the Jupyter notebook tutorial for Chroma.

## Test Plan
All existing tests pass after this change. In addition, the
`test_chroma_update_document` test case now correctly checks the
functionality of `update_document`, ensuring that the function works as
expected and updates the content of documents correctly.

## Reviewers
@dev2049

This fix will ensure that users are able to use the `update_document`
function as expected, without encountering the previous
`AttributeError`. This will enhance the usability and reliability of the
Chroma class for all users.

Thank you for considering this pull request. I look forward to your
feedback and suggestions.
2023-05-29 06:39:25 -07:00
Louis Amaudruz
e455ba4ed5 Add async support to routing chains (#5373)
# Add async support for (LLM) routing chains

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Add asynchronous LLM calls support for the routing chains. More
specifically:
- Add async `aroute` function (i.e. async version of `route`) to the
`RouterChain` which calls the routing LLM asynchronously
- Implement the async `_acall` for the `LLMRouterChain`
- Implement the async `_acall` function for `MultiRouteChain` which
first calls asynchronously the routing chain with its new `aroute`
function, and then calls asynchronously the relevant destination chain.

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

## Who can review?

- @agola11

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead
  Async
  - @agola11
        
 -->
2023-05-29 06:37:26 -07:00
Gael Grosch
8b7721ebbb fix: Blob.from_data mimetype is lost (#5395)
# Fix lost mimetype when using Blob.from_data method

The mimetype is lost due to a typo in the class attribue name

Fixes # - (no issue opened but I can open one if needed)

## Changes

* Fixed typo in name
* Added unit-tests to validate the output Blob


## Review
@eyurtsev
2023-05-29 06:36:50 -07:00
Jacob Lee
f77f27163d Update PR template with Twitter handle request (#5382)
# Updates PR template to request Twitter handle for shoutouts!

Makes it easier for maintainers to show their appreciation 😄
2023-05-29 06:23:17 -07:00
Zander Chase
14099f1b93 Use Default Factory (#5380)
We shouldn't be calling a constructor for a default value - should use
default_factory instead. This is especially ad in this case since it
requires an optional dependency and an API key to be set.
 
Resolves #5361
2023-05-29 06:22:35 -07:00
Harrison Chase
6df90ad9fd handle json parsing errors (#5371)
adds tests cases, consolidates a lot of PRs
2023-05-29 06:18:19 -07:00
玄猫
99a1e3f3a3 Fix: Handle empty documents in ContextualCompressionRetriever (Issue #5304) (#5306)
# Fix: Handle empty documents in ContextualCompressionRetriever (Issue
#5304)

Fixes #5304 

Prevent cohere.error.CohereAPIError caused by an empty list of documents
by adding a condition to check if the input documents list is empty in
the compress_documents method. If the list is empty, return an empty
list immediately, avoiding the error and unnecessary processing.

@dev2049

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-28 13:19:34 -07:00
os1ma
1366d070fc Add path validation to DirectoryLoader (#5327)
# Add path validation to DirectoryLoader

This PR introduces a minor adjustment to the DirectoryLoader by adding
validation for the path argument. Previously, if the provided path
didn't exist or wasn't a directory, DirectoryLoader would return an
empty document list due to the behavior of the `glob` method. This could
potentially cause confusion for users, as they might expect a
file-loading error instead.

So, I've added two validations to the load method of the
DirectoryLoader:

- Raise a FileNotFoundError if the provided path does not exist
- Raise a ValueError if the provided path is not a directory

Due to the relatively small scope of these changes, a new issue was not
created.

## Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@eyurtsev
2023-05-28 15:31:23 -04:00
Harrison Chase
ad7f4c0317 bump to 183 (#5372) 2023-05-28 11:42:58 -07:00
Harrison Chase
b6927970f1 revert bad json (#5370) 2023-05-28 10:22:02 -07:00
Matt Wells
9a5c9df809 Fixes iter error in FAISS add_embeddings call (#5367)
# Remove re-use of iter within add_embeddings causing error

As reported in https://github.com/hwchase17/langchain/issues/5336 there
is an issue currently involving the atempted re-use of an iterator
within the FAISS vectorstore adapter

Fixes # https://github.com/hwchase17/langchain/issues/5336

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

  VectorStores / Retrievers / Memory
  - @dev2049
2023-05-28 09:59:30 -07:00
Davis Chase
b705f260f4 bump 182 (#5364) 2023-05-28 09:16:18 -07:00
Janos Tolgyesi
5f4552391f Add SKLearnVectorStore (#5305)
# Add SKLearnVectorStore

This PR adds SKLearnVectorStore, a simply vector store based on
NearestNeighbors implementations in the scikit-learn package. This
provides a simple drop-in vector store implementation with minimal
dependencies (scikit-learn is typically installed in a data scientist /
ml engineer environment). The vector store can be persisted and loaded
from json, bson and parquet format.

SKLearnVectorStore has soft (dynamic) dependency on the scikit-learn,
numpy and pandas packages. Persisting to bson requires the bson package,
persisting to parquet requires the pyarrow package.

## Before submitting

Integration tests are provided under
`tests/integration_tests/vectorstores/test_sklearn.py`

Sample usage notebook is provided under
`docs/modules/indexes/vectorstores/examples/sklear.ipynb`

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-28 08:17:42 -07:00
Aymen Furter
e2742953a6 feat: support for shopping search in SerpApi (#5259)
# Support for shopping search in SerpApi

## Who can review?
@vowelparrot
2023-05-27 21:20:24 -07:00
Eduard van Valkenburg
1daa7068b2 added cosmos kwargs option (#5292)
# Added the ability to pass kwargs to cosmos client constructor

The cosmos client has a ton of options that can be set, so allowing
those to be passed to the constructor from the chat memory constructor
with this PR.
2023-05-27 21:19:40 -07:00
Kenton
881dfe8179 Sample Notebook for DynamoDB Chat Message History (#5351)
# Sample Notebook for DynamoDB Chat Message History

@dev2049

Adding a sample notebook for the DynamoDB Chat Message History class.

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049
        
 -->
2023-05-27 21:16:24 -07:00
mbchang
f079cdf479 fix: remove empty lines that cause InvalidRequestError (#5320)
# remove empty lines in GenerativeAgentMemory that cause
InvalidRequestError in OpenAIEmbeddings

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Let's say the text given to `GenerativeAgent._parse_list` is
```
text = """
Insight 1: <insight 1>

Insight 2: <insight 2>
"""
```
This creates an `openai.error.InvalidRequestError: [''] is not valid
under any of the given schemas - 'input'` because
`GenerativeAgent.add_memory()` tries to add an empty string to the
vectorstore.

This PR fixes the issue by removing the empty line between `Insight 1`
and `Insight 2`

## Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049
        
 -->
@hwchase17
@vowelparrot
@dev2049
2023-05-27 21:15:03 -07:00
Deepak S V
c6e5d90eff Fixing blank thoughts in verbose for "_Exception" Action (#5331)
Fixed the issue of blank Thoughts being printed in verbose when
`handle_parsing_errors=True`, as below:

Before Fix:
```
Observation: There are 38175 accounts available in the dataframe.
Thought:
Observation: Invalid or incomplete response
Thought:
Observation: Invalid or incomplete response
Thought:
```

After Fix:
```
Observation: There are 38175 accounts available in the dataframe.
Thought:AI: {
    "action": "Final Answer",
    "action_input": "There are 38175 accounts available in the dataframe."
}
Observation: Invalid Action or Action Input format
Thought:AI: {
    "action": "Final Answer",
    "action_input": "The number of available accounts is 38175."
}
Observation: Invalid Action or Action Input format
```

@vowelparrot currently I have set the colour of thought to green (same
as the colour when `handle_parsing_errors=False`). If you want to change
the colour of this "_Exception" case to red or something else (when
`handle_parsing_errors=True`), feel free to change it in line 789.
2023-05-27 21:14:16 -07:00
DanConstantini
c49c6ac97a Add Chainlit to deployment options (#5314)
# Add Chainlit to deployment options

Add [Chainlit](https://github.com/Chainlit/chainlit) as deployment
options
Used links to Github examples and Chainlit doc on the LangChain
integration

Co-authored-by: Dan Constantini <danconstantini@Dan-Constantini-MacBook.local>
2023-05-27 21:12:53 -07:00
Harrison Chase
5292e855c0 add enum output parser (#5165) 2023-05-27 20:59:24 -07:00
Harrison Chase
179ddbe88b add enum output parser (#5165) 2023-05-27 20:58:23 -07:00
Leonid Ganeline
465a970724 docs: added link to LangChain Handbook (#5311)
# added a link to LangChain Handbook

## Who can review?

Community members can review the PR once tests pass.
2023-05-27 20:57:40 -07:00
Russ
6e974b5f04 Fix typos (#5323)
# Documentation typo fixes

Fixes # (issue)

Simple typos in the blockchain .ipynb documentation
2023-05-26 18:55:21 -07:00
Michael Landis
f75f0dbad6 docs: improve flow of llm caching notebook (#5309)
# docs: improve flow of llm caching notebook

The notebook `llm_caching` demos various caching providers. In the
previous version, there was setup common to all examples but under the
`In Memory Caching` heading.

If a user comes and only wants to try a particular example, they will
run the common setup, then the cells for the specific provider they are
interested in. Then they will get import and variable reference errors.
This commit moves the common setup to the top to avoid this.

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@dev2049
2023-05-26 13:34:11 -04:00
Eugene Yurtsev
0a8d6bc402 Add instructions to pyproject.toml (#5138)
# Add instructions to pyproject.toml

* Add instructions to pyproject.toml about how to handle optional
dependencies.

## Before submitting


## Who can review?

---------

Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>
2023-05-26 13:29:07 -04:00
Shukri
58e95cd11e Better docs for weaviate hybrid search (#5290)
# Better docs for weaviate hybrid search

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes: NA

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
@dev2049
2023-05-26 09:30:41 -07:00
Davis Chase
641303a361 bump 181 (#5302) 2023-05-26 08:44:19 -07:00
Leonid Kuligin
aa3c7b3271 Fixed passing creds to VertexAI LLM (#5297)
# Fixed passing creds to VertexAI LLM

Fixes  #5279 

It looks like we should drop a type annotation for Credentials.

Co-authored-by: Leonid Kuligin <kuligin@google.com>
2023-05-26 08:31:02 -07:00
Eugene Yurtsev
a669abf16b Update CONTRIBUTION guidelines and PR Template (#5140)
# Update contribution guidelines and PR template

This PR updates the contribution guidelines to include more information
on how to handle optional dependencies. 

The PR template is updated to include a link to the contribution guidelines document.
2023-05-26 10:18:11 -04:00
Peng Qu
d481d887bc Add an example to make the prompt more robust (#5291)
# Add example to LLMMath to help with power operator

Add example to LLMMath that helps the model to interpret `^` as the power operator rather than the python xor operator.
2023-05-26 09:32:35 -04:00
Xiangrui Meng
aec642febb LLM wrapper for Databricks (#5142)
This PR adds LLM wrapper for Databricks. It supports two endpoint types:
* serving endpoint
* cluster driver proxy app

An integration notebook is included to show how it works.


Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
Co-authored-by: Gengliang Wang <gengliang@apache.org>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-25 19:19:37 -07:00
Ted Martinez
1cb6498fdb Tedma4/twilio tool (#5136)
# Add twilio sms tool

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-25 19:19:22 -07:00
Moonsik Kang
a0281f5acb Fixed typo: 'ouput' to 'output' in all documentation (#5272)
# Fixed typo: 'ouput' to 'output' in all documentation

In this instance, the typo 'ouput' was amended to 'output' in all
occurrences within the documentation. There are no dependencies required
for this change.
2023-05-25 19:18:31 -07:00
Michael Landis
7047a2c1af feat: add Momento as a standard cache and chat message history provider (#5221)
# Add Momento as a standard cache and chat message history provider

This PR adds Momento as a standard caching provider. Implements the
interface, adds integration tests, and documentation. We also add
Momento as a chat history message provider along with integration tests,
and documentation.

[Momento](https://www.gomomento.com/) is a fully serverless cache.
Similar to S3 or DynamoDB, it requires zero configuration,
infrastructure management, and is instantly available. Users sign up for
free and get 50GB of data in/out for free every month.

## Before submitting

 We have added documentation, notebooks, and integration tests
demonstrating usage.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-25 19:13:21 -07:00
Hassan Ouda
56ad56c812 Support bigquery dialect - SQL (#5261)
# Your PR Title (What it does)

Adding an if statement to deal with bigquery sql dialect. When I use
bigquery dialect before, it failed while using SET search_path TO. So
added a condition to set dataset as the schema parameter which is
equivalent to SET search_path TO . I have tested and it works.


## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@dev2049
2023-05-25 18:19:17 -07:00
Abdelsalam ElTamawy
2ef5579eae Added pipline args to HuggingFacePipeline.from_model_id (#5268)
The current `HuggingFacePipeline.from_model_id` does not allow passing
of pipeline arguments to the transformer pipeline.
This PR enables adding important pipeline parameters like setting
`max_new_tokens` for example.
Previous to this PR it would be necessary to manually create the
pipeline through huggingface transformers then handing it to langchain.

For example instead of this
```py
model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline(
    "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10
)
hf = HuggingFacePipeline(pipeline=pipe)
```
You can write this
```py
hf = HuggingFacePipeline.from_model_id(
    model_id="gpt2", task="text-generation", pipeline_kwargs={"max_new_tokens": 10}
)
```


Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-25 17:54:52 -07:00
Davis Chase
f01dfe858d OpenAI lint (#5273)
Causing lint issues if you have openai installed, annoying for local dev
2023-05-25 16:20:06 -07:00
Nicholas Liu
7652d2abb0 Add Multi-CSV/DF support in CSV and DataFrame Toolkits (#5009)
Add Multi-CSV/DF support in CSV and DataFrame Toolkits
* CSV and DataFrame toolkits now accept list of CSVs/DFs
* Add default prompts for many dataframes in `pandas_dataframe` toolkit

Fixes #1958
Potentially fixes #4423

## Testing
* Add single and multi-dataframe integration tests for
`pandas_dataframe` toolkit with permutations of `include_df_in_prompt`
* Add single and multi-CSV integration tests for csv toolkit
---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-25 14:23:11 -07:00
Alex Rothberg
3223a97dc6 Add visible_only and strict_mode options to ClickTool (#4088)
Partially addresses: https://github.com/hwchase17/langchain/issues/4066
2023-05-25 14:10:39 -07:00
Ravindra Marella
b3988621c5 Add C Transformers for GGML Models (#5218)
# Add C Transformers for GGML Models
I created Python bindings for the GGML models:
https://github.com/marella/ctransformers

Currently it supports GPT-2, GPT-J, GPT-NeoX, LLaMA, MPT, etc. See
[Supported
Models](https://github.com/marella/ctransformers#supported-models).


It provides a unified interface for all models:

```python
from langchain.llms import CTransformers

llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2')

print(llm('AI is going to'))
```

It can be used with models hosted on the Hugging Face Hub:

```py
llm = CTransformers(model='marella/gpt-2-ggml')
```

It supports streaming:

```py
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = CTransformers(model='marella/gpt-2-ggml', callbacks=[StreamingStdOutCallbackHandler()])
```

Please see [README](https://github.com/marella/ctransformers#readme) for
more details.
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-25 13:42:44 -07:00
Davis Chase
ca88b25da6 Zep sdk version (#5267)
zep-python's sync methods no longer need an asyncio wrapper. This was
causing issues with FastAPI deployment.
Zep also now supports putting and getting of arbitrary message metadata.

Bump zep-python version to v0.30

Remove nest-asyncio from Zep example notebooks.

Modify tests to include metadata.

---------

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
2023-05-25 13:42:10 -07:00
Janil Wörst
5525602df0 Docs link custom agent page in getting started (#5250)
# Docs: link custom agent page in getting started
2023-05-25 13:11:30 -07:00
Alon Diament
d3cd21ccf8 Fixed regression in JoplinLoader's get note url (#5265)
Fixes a regression in JoplinLoader that was introduced during the code
review (bad `page` wildcard in _get_note_url).

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@dev2049
@leo-gan
2023-05-25 13:10:10 -07:00
Davis Chase
3be9ba14f3 OpenSearch top k parameter fix (#5216)
For most queries it's the `size` parameter that determines final number
of documents to return. Since our abstractions refer to this as `k`, set
this to be `k` everywhere instead of expecting a separate param. Would
be great to have someone more familiar with OpenSearch validate that
this is reasonable (e.g. that having `size` and what OpenSearch calls
`k` be the same won't lead to any strange behavior). cc @naveentatikonda

Closes #5212
2023-05-25 09:51:23 -07:00
Yves Maurer
88ed8e1cd6 Added the option of specifying a proxy for the OpenAI API (#5246)
# Added the option of specifying a proxy for the OpenAI API

Fixes #5243

Co-authored-by: Yves Maurer <>
2023-05-25 09:50:25 -07:00
mwinterde
9c0cb90997 Resolve error in StructuredOutputParser docs (#5240)
# Resolve error in StructuredOutputParser docs

Documentation for `StructuredOutputParser` currently not reproducible,
that is, `output_parser.parse(output)` raises an error because the LLM
returns a response with an invalid format

```python
_input = prompt.format_prompt(question="what's the capital of france")
output = model(_input.to_string())

output

# ?
#
# ```json
# {
# 	"answer": "Paris",
# 	"source": "https://www.worldatlas.com/articles/what-is-the-capital-of-france.html"
# }
# ```
```

Was fixed by adding a question mark to the prompt
2023-05-25 07:47:25 -07:00
Peng Qu
c7e2151a4b remove extra "\n" to ensure that the format of the description, examp… (#5232)
remove extra "\n" to ensure that the format of the description, example,
and prompt&generation are completely consistent.
2023-05-25 07:46:39 -07:00
Davis Chase
15b17f9334 bump 180 (#5248) 2023-05-25 07:09:50 -07:00
mwinterde
9e57be4b5c Fix typo in docstring of RetryWithErrorOutputParser (#5244) 2023-05-25 09:59:31 -04:00
Shukri
09e246f306 Weaviate: Add QnA with sources example (#5247)
# Add QnA with sources example 

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes: see
https://stackoverflow.com/questions/76207160/langchain-doesnt-work-with-weaviate-vector-database-getting-valueerror/76210017#76210017

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
@dev2049
2023-05-25 09:58:33 -04:00
Archon
5cdd9ab7e1 Add MiniMax embeddings (#5174)
- Add support for MiniMax embeddings

Doc: [MiniMax
embeddings](https://api.minimax.chat/document/guides/embeddings?id=6464722084cdc277dfaa966a)

---------

Co-authored-by: Archon <archongum@outlook.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-25 06:57:49 -07:00
Eugene Yurtsev
5cfa72a130 Bibtex integration for document loader and retriever (#5137)
# Bibtex integration

Wrap bibtexparser to retrieve a list of docs from a bibtex file.
* Get the metadata from the bibtex entries
* `page_content` get from the local pdf referenced in the `file` field
of the bibtex entry using `pymupdf`
* If no valid pdf file, `page_content` set to the `abstract` field of
the bibtex entry
* Support Zotero flavour using regex to get the file path
* Added usage example in
`docs/modules/indexes/document_loaders/examples/bibtex.ipynb`
---------

Co-authored-by: Sébastien M. Popoff <sebastien.popoff@espci.fr>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-25 00:21:31 -07:00
Ati Sharma
40b086d6e8 Allow to specify ID when adding to the FAISS vectorstore. (#5190)
# Allow to specify ID when adding to the FAISS vectorstore

This change allows unique IDs to be specified when adding documents /
embeddings to a faiss vectorstore.

- This reflects the current approach with the chroma vectorstore.
- It allows rejection of inserts on duplicate IDs
- will allow deletion / update by searching on deterministic ID (such as
a hash).
- If not specified, a random UUID is generated (as per previous
behaviour, so non-breaking).

This commit fixes #5065 and #3896 and should fix #2699 indirectly. I've
tested adding and merging.

Kindly tagging @Xmaster6y @dev2049 for review.

---------

Co-authored-by: Ati Sharma <ati@agalmic.ltd>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-24 22:26:46 -07:00
Nicholas Liu
f0ea093de8 Change Default GoogleDriveLoader Behavior to not Load Trashed Files (issue #5104) (#5220)
# Change Default GoogleDriveLoader Behavior to not Load Trashed Files
(issue #5104)

Fixes #5104

If the previous behavior of loading files that used to live in the
folder, but are now trashed, you can use the `load_trashed_files`
parameter:

```
loader = GoogleDriveLoader(
    folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
    recursive=False,
    load_trashed_files=True
)
```

As not loading trashed files should be expected behavior, should we
1. even provide the `load_trashed_files` parameter?
2. add documentation? Feels most users will stick with default behavior

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

DataLoaders
- @eyurtsev

Twitter: [@nicholasliu77](https://twitter.com/nicholasliu77)
2023-05-24 22:26:17 -07:00
Keno
eff31a3361 Remove API key from docs (#5223)
I found an API key for `serpapi_api_key` while reading the docs. It
seems to have been modified very recently. Removed it in this PR
@hwchase17 - project lead
2023-05-24 22:25:39 -07:00
maspotts
95c9aa1ccb Create async copy of from_text() inside GraphIndexCreator. (#5214)
Copies `GraphIndexCreator.from_text()` to make an async version called
`GraphIndexCreator.afrom_text()`.

This is (should be) a trivial change: it just adds a copy of
`GraphIndexCreator.from_text()` which is async and awaits a call to
`chain.apredict()` instead of `chain.predict()`. There is no unit test
for GraphIndexCreator, and I did not create one, but this code works for
me locally.

@agola11 @hwchase17
2023-05-24 21:54:12 -07:00
Leonid Ganeline
2ad29f410d fix a mistake in concepts.md (#5222)
# fix a mistake in concepts.md


## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
2023-05-24 21:47:22 -07:00
Harrison Chase
a775aa6389 Harrison/vertex (#5049)
Co-authored-by: Leonid Kuligin <kuligin@google.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: sasha-gitg <44654632+sasha-gitg@users.noreply.github.com>
Co-authored-by: Justin Flick <Justinjayflick@gmail.com>
Co-authored-by: Justin Flick <jflick@homesite.com>
2023-05-24 15:51:12 -07:00
Zander Chase
e6c4571191 Add 'status' command to get server status (#5197)
Example:


```
$ langchain plus start --expose
...
$ langchain plus status
The LangChainPlus server is currently running.

Service             Status         Published Ports
langchain-backend   Up 40 seconds  1984
langchain-db        Up 41 seconds  5433
langchain-frontend  Up 40 seconds  80
ngrok               Up 41 seconds  4040

To connect, set the following environment variables in your LangChain application:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT=https://5cef-70-23-89-158.ngrok.io

$ langchain plus stop
$ langchain plus status
The LangChainPlus server is not running.
$ langchain plus start
The LangChainPlus server is currently running.

Service             Status        Published Ports
langchain-backend   Up 5 seconds  1984
langchain-db        Up 6 seconds  5433
langchain-frontend  Up 5 seconds  80

To connect, set the following environment variables in your LangChain application:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT=http://localhost:1984
```
2023-05-24 21:43:16 +00:00
Zander Chase
e76e68b211 Add Delete Session Method (#5193) 2023-05-24 21:06:03 +00:00
Zander Chase
66113c2a62 Log warning (#5192)
Changes debug log to warning log when LC Tracer fails to instantiate
2023-05-24 21:05:13 +00:00
Ankush Gola
b7fcb35a39 add option to pass openai key to langchain plus command (#5213) 2023-05-24 21:05:03 +00:00
Davis Chase
dcee8936c1 nit (#5208) 2023-05-24 12:52:20 -07:00
Alon Diament
44abe925df Add Joplin document loader (#5153)
# Add Joplin document loader

[Joplin](https://joplinapp.org/) is an open source note-taking app.

Joplin has a [REST API](https://joplinapp.org/api/references/rest_api/)
for accessing its local database. The proposed `JoplinLoader` uses the
API to retrieve all notes in the database and their metadata. Joplin
needs to be installed and running locally, and an access token is
required.

- The PR includes an integration test.
- The PR includes an example notebook.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 12:31:55 -07:00
Rodrigo Siqueira
f10be072ff Add Iugu document loader (#5162)
Create IUGU loader
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 11:47:01 -07:00
ByronHsu
f0730c6489 Allow readthedoc loader to pass custom html tag (#5175)
## Description

The html structure of readthedocs can differ. Currently, the html tag is
hardcoded in the reader, and unable to fit into some cases. This pr
includes the following changes:

1. Replace `find_all` with `find` because we just want one tag.
2. Provide `custom_html_tag` to the loader.
3. Add tests for readthedoc loader
4. Refactor code

## Issues

See more in https://github.com/hwchase17/langchain/pull/2609. The
problem was not completely fixed in that pr.
---------

Signed-off-by: byhsu <byhsu@linkedin.com>
Co-authored-by: byhsu <byhsu@linkedin.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 10:40:27 -07:00
Alexander Dibrov
d8eed6018f Output parsing variation allowance (#5178)
# Output parsing variation allowance for self-ask with search

This change makes self-ask with search easier for Llama models to
follow, as they tend toward returning 'Followup:' instead of 'Follow
up:' despite an otherwise valid remaining output.


Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 10:39:09 -07:00
Matt Wells
c173bf1c62 Fixes scope of query Session in PGVector (#5194)
`vectorstore.PGVector`: The transactional boundary should be increased
to cover the query itself

Currently, within the `similarity_search_with_score_by_vector` the
transactional boundary (created via the `Session` call) does not include
the select query being made.

This can result in un-intended consequences when interacting with the
PGVector instance methods directly


---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 10:37:45 -07:00
Tommaso De Lorenzo
52714cedd4 fixing total cost finetuned model giving zero (#5144)
# OpanAI finetuned model giving zero tokens cost

Very simple fix to the previously committed solution to allowing
finetuned Openai models.

Improves #5127 

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 10:04:08 -07:00
Harrison Chase
94cf391ef1 standardize json parsing (#5168)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 10:03:53 -07:00
Davis Chase
2b2176a3c1 tfidf retriever (#5114)
Co-authored-by: vempaliakhil96 <vempaliakhil96@gmail.com>
2023-05-24 10:02:09 -07:00
Shukri
b00c77dc62 Improve weaviate vectorstore docs (#5201)
# Improve weaviate vectorstore docs
2023-05-24 09:31:48 -07:00
Tomaz Bratanic
fd866d1801 Update Cypher QA prompt (#5173)
# Improve Cypher QA prompt

The current QA prompt is optimized for networkX answer generation, which
returns all the possible triples.
However, Cypher search is a bit more focused and doesn't necessary
return all the context information.
Due to that reason, the model sometimes refuses to generate an answer
even though the information is provided:

![Screenshot from 2023-05-24
08-36-23](https://github.com/hwchase17/langchain/assets/19948365/351cf9c1-2567-447c-91fd-284ae3fa1ccf)


To fix this issue, I have updated the prompt. Interestingly, I tried
many variations with less instructions and they didn't work properly.
However, the current fix works nicely.
![Screenshot from 2023-05-24
08-37-25](https://github.com/hwchase17/langchain/assets/19948365/fc830603-e6ec-4a23-8a86-eaf572996014)
2023-05-24 08:31:30 -07:00
Zach Schillaci
aa14e223ee Reuse length_func in MapReduceDocumentsChain (#5181)
# Reuse `length_func` in `MapReduceDocumentsChain`

Pretty straightforward refactor in `MapReduceDocumentsChain`. Reusing
the local variable `length_func`, instead of the longer alternative
`self.combine_document_chain.prompt_length`.

@hwchase17
2023-05-24 08:28:37 -07:00
Harrison Chase
11c26ebb55 Harrison/modelscope (#5156)
Co-authored-by: thomas-yanxin <yx20001210@163.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 08:06:45 -07:00
Davis Chase
2d5588c5f0 bump 179 (#5200) 2023-05-24 07:55:27 -07:00
Saba Sturua
47e4ee4370 adjust docarray docstrings (#5185)
Follow up of https://github.com/hwchase17/langchain/pull/5015

Thanks for catching this! 

Just a small PR to adjust couple of strings to these changes

Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
2023-05-24 07:50:35 -07:00
Jeff Vestal
cf19a2a59f example usage (#5182)
Adding example usage for elasticsearch knn embeddings
[per](https://github.com/hwchase17/langchain/pull/3401#issuecomment-1548518389)


https://github.com/hwchase17/langchain/blob/master/langchain/embeddings/elasticsearch.py
2023-05-24 07:47:15 -07:00
Ikko Eltociear Ashimine
fff21a0b35 Update rellm_experimental.ipynb (#5189)
# Your PR Title (What it does)

HuggingFace -> Hugging Face
2023-05-24 11:41:00 +00:00
Nolan Tremelling
faa26650c9 Beam (#4996)
# Beam

Calls the Beam API wrapper to deploy and make subsequent calls to an
instance of the gpt2 LLM in a cloud deployment. Requires installation of
the Beam library and registration of Beam Client ID and Client Secret.
Additional calls can then be made through the instance of the large
language model in your code or by calling the Beam API.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 01:25:18 -07:00
Ofer Mendelevitch
c81fb88035 Vectara (#5069)
# Vectara Integration

This PR provides integration with Vectara. Implemented here are:
* langchain/vectorstore/vectara.py
* tests/integration_tests/vectorstores/test_vectara.py
* langchain/retrievers/vectara_retriever.py
And two IPYNB notebooks to do more testing:
* docs/modules/chains/index_examples/vectara_text_generation.ipynb
* docs/modules/indexes/vectorstores/examples/vectara.ipynb

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-24 01:24:58 -07:00
Jason Bosco
9c4b43b494 Add Typesense vector store (#1674)
Closes #931.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-23 23:20:45 -07:00
Leonid Ganeline
33929489b9 docs: added missed document_loaders examples (#5150)
# DOCS added missed document_loader examples

Added missed examples: `JSON`, `Open Document Format (ODT)`,
`Wikipedia`, `tomarkdown`.
Updated them to a consistent format.

## Who can review?

@hwchase17 
@dev2049
2023-05-23 21:56:41 -07:00
Daniel Quinteros
c111134a55 Clarification of the reference to the "get_text_legth" function in ge… (#5154)
# Clarification of the reference to the "get_text_legth" function in
getting_started.md

Reference to the function "get_text_legth" in the documentation did not
make sense. Comment added for clarification.

@hwchase17
2023-05-23 20:43:38 -07:00
Daniel Quinteros
de4ef24f75 Docs: updated getting_started.md (#5151)
# Docs: updated getting_started.md

Just accommodating some unnecessary spaces in the example of "pass few
shot examples to a prompt template".

@vowelparrot
2023-05-23 20:43:26 -07:00
mbchang
b1b7f3541c fix: fix current_time=Now bug for aadd_documents in TimeWeightedRetriever (#5155)
# Same as PR #5045, but for async

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes #4825 

I had forgotten to update the asynchronous counterpart `aadd_documents`
with the bug fix from PR #5045, so this PR also fixes `aadd_documents`
too.

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@dev2049

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
2023-05-23 20:31:45 -07:00
Jeremiah Lowin
925dd3e59e Add async versions of predict() and predict_messages() (#4867)
# Add async versions of predict() and predict_messages()

#4615 introduced a unifying interface for "base" and "chat" LLM models
via the new `predict()` and `predict_messages()` methods that allow both
types of models to operate on string and message-based inputs,
respectively.

This PR adds async versions of the same (`apredict()` and
`apredict_messages()`) that are identical except for their use of
`agenerate()` in place of `generate()`, which means they repurpose all
existing work on the async backend.


## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
        @hwchase17 (follows his work on #4615)
        @agola11 (async)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-23 17:22:49 -07:00
Junlin Zhou
9242998db1 Empty check before pop (#4929)
# Check whether 'other' is empty before popping

This PR could fix a potential 'popping empty set' error.

Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>
2023-05-23 16:46:50 -07:00
Daniel King
de6e6c764e Add MosaicML inference endpoints (#4607)
# Add MosaicML inference endpoints
This PR adds support in langchain for MosaicML inference endpoints. We
both serve a select few open source models, and allow customers to
deploy their own models using our inference service. Docs are here
(https://docs.mosaicml.com/en/latest/inference.html), and sign up form
is here (https://forms.mosaicml.com/demo?utm_source=langchain). I'm not
intimately familiar with the details of langchain, or the contribution
process, so please let me know if there is anything that needs fixing or
this is the wrong way to submit a new integration, thanks!

I'm also not sure what the procedure is for integration tests. I have
tested locally with my api key.

## Who can review?
@hwchase17

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-23 15:59:08 -07:00
Adheeban Manoharan
68f0d45485 Adding Weather Loader (#5056)
Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-23 15:57:33 -07:00
Jeff Vestal
0b542a9706 Add ElasticsearchEmbeddings class for generating embeddings using Elasticsearch models (#3401)
This PR introduces a new module, `elasticsearch_embeddings.py`, which
provides a wrapper around Elasticsearch embedding models. The new
ElasticsearchEmbeddings class allows users to generate embeddings for
documents and query texts using a [model deployed in an Elasticsearch
cluster](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-model-ref.html#ml-nlp-model-ref-text-embedding).

### Main features:

1. The ElasticsearchEmbeddings class initializes with an Elasticsearch
connection object and a model_id, providing an interface to interact
with the Elasticsearch ML client through
[infer_trained_model](https://elasticsearch-py.readthedocs.io/en/v8.7.0/api.html?highlight=trained%20model%20infer#elasticsearch.client.MlClient.infer_trained_model)
.
2. The `embed_documents()` method generates embeddings for a list of
documents, and the `embed_query()` method generates an embedding for a
single query text.
3. The class supports custom input text field names in case the deployed
model expects a different field name than the default `text_field`.
4. The implementation is compatible with any model deployed in
Elasticsearch that generates embeddings as output.

### Benefits:

1. Simplifies the process of generating embeddings using Elasticsearch
models.
2. Provides a clean and intuitive interface to interact with the
Elasticsearch ML client.
3. Allows users to easily integrate Elasticsearch-generated embeddings.

Related issue https://github.com/hwchase17/langchain/issues/3400

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-23 14:50:33 -07:00
Theodore Rolle
754b5133e9 Improve PlanningOutputParser whitespace handling (#5143)
Some LLM's will produce numbered lists with leading whitespace, i.e. in
response to "What is the sum of 2 and 3?":
```
Plan:
  1. Add 2 and 3.
  2. Given the above steps taken, please respond to the users original question.
```
This commit updates the PlanningOutputParser regex to ignore leading
whitespace before the step number, enabling it to correctly parse this
format.
2023-05-23 12:47:26 -07:00
Tommaso De Lorenzo
5002f3ae35 solving #2887 (#5127)
# Allowing openAI fine-tuned models
Very simple fix that checks whether a openAI `model_name` is a
fine-tuned model when loading `context_size` and when computing call's
cost in the `openai_callback`.

Fixes #2887 
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-23 11:18:03 -07:00
Myeongseop Kim
7a75bb2121 docs: fix minor typo + add wikipedia package installation part in human_input_llm.ipynb (#5118)
# Fix typo + add wikipedia package installation part in
human_input_llm.ipynb
This PR
1. Fixes typo ("the the human input LLM"), 
2. Addes wikipedia package installation part (in accordance with
`WikipediaQueryRun`
[documentation](https://python.langchain.com/en/latest/modules/agents/tools/examples/wikipedia.html))

in `human_input_llm.ipynb`
(`docs/modules/models/llms/examples/human_input_llm.ipynb`)
2023-05-23 10:59:30 -07:00
Davis Chase
753f4cfc26 bump 178 (#5130) 2023-05-23 07:43:56 -07:00
Ayan Bandyopadhyay
5c87dbf5a8 Add link to Psychic from document loaders documentation page (#5115)
# Add link to Psychic from document loaders documentation page

In my previous PR I forgot to update `document_loaders.rst` to link to
`psychic.ipynb` to make it discoverable from the main documentation.
2023-05-23 06:47:23 -07:00
Tian Wei
d7f807b71f Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API (#5012)
# Add AzureCognitiveServicesToolkit to call Azure Cognitive Services
API: achieve some multimodal capabilities

This PR adds a toolkit named AzureCognitiveServicesToolkit which bundles
the following tools:
- AzureCogsImageAnalysisTool: calls Azure Cognitive Services image
analysis API to extract caption, objects, tags, and text from images.
- AzureCogsFormRecognizerTool: calls Azure Cognitive Services form
recognizer API to extract text, tables, and key-value pairs from
documents.
- AzureCogsSpeech2TextTool: calls Azure Cognitive Services speech to
text API to transcribe speech to text.
- AzureCogsText2SpeechTool: calls Azure Cognitive Services text to
speech API to synthesize text to speech.

This toolkit can be used to process image, document, and audio inputs.
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-23 06:45:48 -07:00
Jamie Broomall
d4fd589638 WhyLabs callback (#4906)
# Add a WhyLabs callback handler

* Adds a simple WhyLabsCallbackHandler
* Add required dependencies as optional
* protect against missing modules with imports
* Add docs/ecosystem basic example

based on initial prototype from @andrewelizondo

> this integration gathers privacy preserving telemetry on text with
whylogs and sends stastical profiles to WhyLabs platform to monitoring
these metrics over time. For more information on what WhyLabs is see:
https://whylabs.ai

After you run the notebook (if you have env variables set for the API
Keys, org_id and dataset_id) you get something like this in WhyLabs:
![Screenshot
(443)](https://github.com/hwchase17/langchain/assets/88007022/6bdb3e1c-4243-4ae8-b974-23a8bb12edac)

Co-authored-by: Andre Elizondo <andre@whylabs.ai>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 20:29:47 -07:00
Eugene Yurtsev
d56313acba Improve effeciency of TextSplitter.split_documents, iterate once (#5111)
# Improve TextSplitter.split_documents, collect page_content and
metadata in one iteration

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@eyurtsev In the case where documents is a generator that can only be
iterated once making this change is a huge help. Otherwise a silent
issue happens where metadata is empty for all documents when documents
is a generator. So we expand the argument from `List[Document]` to
`Union[Iterable[Document], Sequence[Document]]`

---------

Co-authored-by: Steven Tartakovsky <tartakovsky.developer@gmail.com>
2023-05-22 23:00:24 -04:00
Jettro Coenradie
b950022894 Fixes issue #5072 - adds additional support to Weaviate (#5085)
Implementation is similar to search_distance and where_filter

# adds 'additional' support to Weaviate queries

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 18:57:10 -07:00
Zander Chase
87bba2e8d3 Pass Dataset Name by Name not Position (#5108)
Pass dataset name by name
2023-05-23 01:21:39 +00:00
Matt Rickard
de6a401a22 Add OpenLM LLM multi-provider (#4993)
OpenLM is a zero-dependency OpenAI-compatible LLM provider that can call
different inference endpoints directly via HTTP. It implements the
OpenAI Completion class so that it can be used as a drop-in replacement
for the OpenAI API. This changeset utilizes BaseOpenAI for minimal added
code.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 18:09:53 -07:00
Gergely Imreh
69de33e024 Add Mastodon toots loader (#5036)
# Add Mastodon toots loader.

Loader works either with public toots, or Mastodon app credentials. Toot
text and user info is loaded.

I've also added integration test for this new loader as it works with
public data, and a notebook with example output run now.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 16:43:07 -07:00
mbchang
e173e032bc fix: assign current_time to datetime.now() if current_time is None (#5045)
# Assign `current_time` to `datetime.now()` if it `current_time is None`
in `time_weighted_retriever`

Fixes #4825 

As implemented, `add_documents` in `TimeWeightedVectorStoreRetriever`
assigns `doc.metadata["last_accessed_at"]` and
`doc.metadata["created_at"]` to `datetime.datetime.now()` if
`current_time` is not in `kwargs`.
```python
    def add_documents(self, documents: List[Document], **kwargs: Any) -> List[str]:
        """Add documents to vectorstore."""
        current_time = kwargs.get("current_time", datetime.datetime.now())
        # Avoid mutating input documents
        dup_docs = [deepcopy(d) for d in documents]
        for i, doc in enumerate(dup_docs):
            if "last_accessed_at" not in doc.metadata:
                doc.metadata["last_accessed_at"] = current_time
            if "created_at" not in doc.metadata:
                doc.metadata["created_at"] = current_time
            doc.metadata["buffer_idx"] = len(self.memory_stream) + i
        self.memory_stream.extend(dup_docs)
        return self.vectorstore.add_documents(dup_docs, **kwargs)
``` 
However, from the way `add_documents` is being called from
`GenerativeAgentMemory`, `current_time` is set as a `kwarg`, but it is
given a value of `None`:
```python
    def add_memory(
        self, memory_content: str, now: Optional[datetime] = None
    ) -> List[str]:
        """Add an observation or memory to the agent's memory."""
        importance_score = self._score_memory_importance(memory_content)
        self.aggregate_importance += importance_score
        document = Document(
            page_content=memory_content, metadata={"importance": importance_score}
        )
        result = self.memory_retriever.add_documents([document], current_time=now)
```
The default of `now` was set in #4658 to be None. The proposed fix is
the following:
```python
    def add_documents(self, documents: List[Document], **kwargs: Any) -> List[str]:
        """Add documents to vectorstore."""
        current_time = kwargs.get("current_time", datetime.datetime.now())
        # `current_time` may exist in kwargs, but may still have the value of None.
        if current_time is None:
            current_time = datetime.datetime.now()
```
Alternatively, we could just set the default of `now` to be
`datetime.datetime.now()` everywhere instead. Thoughts @hwchase17? If we
still want to keep the default to be `None`, then this PR should fix the
above issue. If we want to set the default to be
`datetime.datetime.now()` instead, I can update this PR with that
alternative fix. EDIT: seems like from #5018 it looks like we would
prefer to keep the default to be `None`, in which case this PR should
fix the error.
2023-05-22 15:47:03 -07:00
Leonid Ganeline
c28cc0f1ac changed ValueError to ImportError (#5103)
# changed ValueError to ImportError

Code cleaning.
Fixed inconsistencies in ImportError handling. Sometimes it raises
ImportError and sometime ValueError.
I've changed all cases to the `raise ImportError`
Also:
- added installation instruction in the error message, where it missed;
- fixed several installation instructions in the error message;
- fixed several error handling in regards to the ImportError
2023-05-22 15:24:45 -07:00
venetisgr
5e47c648ed Update serpapi.py (#4947)
Added link option in  _process_response

<!--
In _process_respons "snippet" provided non working links for the case
that "links" had the correct answer. Thus added an elif statement before
snippet
-->

<!-- Remove if not applicable -->

Fixes # (issue)
In _process_response link provided correct answers while the snippet
reply provided non working links

@vowelparrot 
## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 13:34:36 -07:00
Ankit Arya
5b2b436fab Fixed import error for AutoGPT e.g. from langchain.experimental.auton… (#5101)
`from langchain.experimental.autonomous_agents.autogpt.agent import
AutoGPT` results in an import error as AutoGPT is not defined in the
__init__.py file

https://python.langchain.com/en/latest/use_cases/autonomous_agents/marathon_times.html

An Alternate, way would be to be directly update the import statement to
be `from langchain.experimental import AutoGPT`

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 13:26:25 -07:00
Ankush Gola
467ca6f025 update langchainplus client and docker file to reflect port changes (#5005)
# Currently, only the dev images are updated
2023-05-22 12:53:05 -07:00
Shawn91
9e649462ce fix: add_texts method of Weaviate vector store creats wrong embeddings (#4933)
# fix a bug in the add_texts method of Weaviate vector store that creats
wrong embeddings

The following is the original code in the `add_texts` method of the
Weaviate vector store, from line 131 to 153, which contains a bug. The
code here includes some extra explanations in the form of comments and
some omissions.

```python
            for i, doc in enumerate(texts):

                # some code omitted

                if self._embedding is not None:
                    # variable texts is a list of string and doc here is just a string. 
                    # list(doc) actually breaks up the string into characters.
                    # so, embeddings[0] is just the embedding of the first character
                    embeddings = self._embedding.embed_documents(list(doc))
                    batch.add_data_object(
                        data_object=data_properties,
                        class_name=self._index_name,
                        uuid=_id,
                        vector=embeddings[0],
                    )
```

To fix this bug, I pulled the embedding operation out of the for loop
and embed all texts at once.

Co-authored-by: Shawn91 <zyx199199@qq.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 12:35:52 -07:00
Eduard van Valkenburg
1cb04f2b26 PowerBI major refinement in working of tool and tweaks in the rest (#5090)
# PowerBI major refinement in working of tool and tweaks in the rest

I've gained some experience with more complex sets and the earlier
implementation had too many tries by the agent to create DAX, so
refactored the code to run the LLM to create dax based on a question and
then immediately run the same against the dataset, with retries and a
prompt that includes the error for the retry. This works much better!

Also did some other refactoring of the inner workings, making things
clearer, more concise and faster.
2023-05-22 11:58:28 -07:00
hwaking
e57ebf3922 add get_top_k_cosine_similarity method to get max top k score and index (#5059)
# Row-wise cosine similarity between two equal-width matrices and return
the max top_k score and index, the score all greater than
threshold_score.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 11:55:48 -07:00
Donger
039f8f1abb Add the usage of SSL certificates for Elasticsearch and user password authentication (#5058)
Enhance the code to support SSL authentication for Elasticsearch when
using the VectorStore module, as previous versions did not provide this
capability.
@dev2049

---------

Co-authored-by: caidong <zhucaidong1992@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 11:51:32 -07:00
Andreas Liebschner
44dc959584 Improve pinecone hybrid search retriever adding metadata support (#5098)
# Improve pinecone hybrid search retriever adding metadata support

I simply remove the hardwiring of metadata to the existing
implementation allowing one to pass `metadatas` attribute to the
constructors and in `get_relevant_documents`. I also add one missing pip
install to the accompanying notebook (I am not adding dependencies, they
were pre-existing).

First contribution, just hoping to help, feel free to critique :) 
my twitter username is `@andreliebschner`

While looking at hybrid search I noticed #3043 and #1743. I think the
former can be closed as following the example right now (even prior to
my improvements) works just fine, the latter I think can be also closed
safely, maybe pointing out the relevant classes and example. Should I
reply those issues mentioning someone?

@dev2049, @hwchase17

---------

Co-authored-by: Andreas Liebschner <a.liebschner@shopfully.com>
2023-05-22 11:42:54 -07:00
Deepak S V
5cd12102be Improving Resilience of MRKL Agent (#5014)
This is a highly optimized update to the pull request
https://github.com/hwchase17/langchain/pull/3269

Summary:
1) Added ability to MRKL agent to self solve the ValueError(f"Could not
parse LLM output: `{llm_output}`") error, whenever llm (especially
gpt-3.5-turbo) does not follow the format of MRKL Agent, while returning
"Action:" & "Action Input:".
2) The way I am solving this error is by responding back to the llm with
the messages "Invalid Format: Missing 'Action:' after 'Thought:'" &
"Invalid Format: Missing 'Action Input:' after 'Action:'" whenever
Action: and Action Input: are not present in the llm output
respectively.

For a detailed explanation, look at the previous pull request.

New Updates:
1) Since @hwchase17 , requested in the previous PR to communicate the
self correction (error) message, using the OutputParserException, I have
added new ability to the OutputParserException class to store the
observation & previous llm_output in order to communicate it to the next
Agent's prompt. This is done, without breaking/modifying any of the
functionality OutputParserException previously performs (i.e.
OutputParserException can be used in the same way as before, without
passing any observation & previous llm_output too).

---------

Co-authored-by: Deepak S V <svdeepak99@users.noreply.github.com>
2023-05-22 11:08:08 -07:00
Michael Landis
6eacd88ae7 fix: revert docarray explicit transitive dependencies and use extras instead (#5015)
tldr: The docarray [integration
PR](https://github.com/hwchase17/langchain/pull/4483) introduced a
pinned dependency to protobuf. This is a docarray dependency, not a
langchain dependency. Since this is handled by the docarray
dependencies, it is unnecessary here.

Further, as a pinned dependency, this quickly leads to incompatibilities
with application code that consumes the library. Much less with a
heavily used library like protobuf.

Detail: as we see in the [docarray

integration](https://github.com/hwchase17/langchain/pull/4483/files#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R81-R83),
the transitive dependencies of docarray were also listed as langchain
dependencies. This is unnecessary as the docarray project has an
appropriate
[extras](a01a05542d/pyproject.toml (L70)).
The docarray project also does not require this _pinned_ version of
protobuf, rather [a minimum
version](a01a05542d/pyproject.toml (L41)).
So this pinned version was likely in error.

To fix this, this PR reverts the explicit hnswlib and protobuf
dependencies and adds the hnswlib extras install for docarray (which
installs hnswlib and protobuf, as originally intended). Because version
`0.32.0`
of the docarray hnswlib extras added protobuf, we bump the docarray
dependency from `^0.31.0` to `^0.32.0`.

# revert docarray explicit transitive dependencies and use extras
instead

## Who can review?

@dev2049 -- reviewed the original PR
@eyurtsev -- bumped the pinned protobuf dependency a few days ago

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 12:48:09 -04:00
Davis Chase
fcd88bccb3 Bump 177 (#5095) 2023-05-22 08:19:06 -07:00
Harrison Chase
10ba201d05 Harrison/neo4j (#5078)
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 07:31:48 -07:00
Deepak S V
49ca02711e Improved query, print & exception handling in REPL Tool (#4997)
Update to pull request https://github.com/hwchase17/langchain/pull/3215

Summary:
1) Improved the sanitization of query (using regex), by removing python
command (since gpt-3.5-turbo sometimes assumes python console as a
terminal, and runs python command first which causes error). Also
sometimes 1 line python codes contain single backticks.
2) Added 7 new test cases.

For more details, view the previous pull request.

---------

Co-authored-by: Deepak S V <svdeepak99@users.noreply.github.com>
2023-05-22 13:43:44 +00:00
Zander Chase
785502edb3 Add 'get_token_ids' method (#4784)
Let user inspect the token ids in addition to getting th enumber of tokens

---------

Co-authored-by: Zach Schillaci <40636930+zachschillaci27@users.noreply.github.com>
2023-05-22 13:17:26 +00:00
Zander Chase
ef7d015be5 Separate Runner Functions from Client (#5079)
Extract the methods specific to running an LLM or Chain on a dataset to
separate utility functions.

This simplifies the client a bit and lets us separate concerns of LCP
details from running examples (e.g., for evals)
2023-05-22 05:28:47 +00:00
Leonid Ganeline
443ebe22f4 docs: Deployments page moved into Ecosystem/ (#4949)
# docs: `deployments` page moved into `ecosystem/`

The `Deployments` page moved into the `Ecosystem/` group

Small fixes:
- `index` page: fixed order of items in the `Modules` list, in the `Use
Cases` list
- item `References/Installation` was lost in the `index` page (not on
the Navbar!). Restored it.
- added `|` marker in several places.

NOTE: I also thought about moving the `Additional Resources/Gallery`
page into the `Ecosystem` group but decided to leave it unchanged.
Please, advise on this.

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@dev2049
2023-05-21 21:18:22 -07:00
Hans van Dam
a395ff7c90 preserve language in conversation retrieval (#4969)
Without the addition of 'in its original language', the condensing
response, more often than not, outputs the rephrased question in
English, even when the conversation is in another language. This
question in English then transfers to the question in the retrieval
prompt and the chatbot is stuck in English.

I'm sometimes surprised that this does not happen more often, but
apparently the GPT models are smart enough to understand that when the
template contains

Question: ....
Answer:

then the answer should be in in the language of the question.
2023-05-21 21:16:03 -07:00
Matt Robinson
bf3f554357 feat: batch multiple files in a single Unstructured API request (#4525)
### Submit Multiple Files to the Unstructured API

Enables batching multiple files into a single Unstructured API requests.
Support for requests with multiple files was added to both
`UnstructuredAPIFileLoader` and `UnstructuredAPIFileIOLoader`. Note that
if you submit multiple files in "single" mode, the result will be
concatenated into a single document. We recommend using this feature in
"elements" mode.

### Testing

The following should load both documents, using two of the example docs
from the integration tests folder.

```python
    from langchain.document_loaders import UnstructuredAPIFileLoader

    file_paths = ["examples/layout-parser-paper.pdf",  "examples/whatsapp_chat.txt"]

    loader = UnstructuredAPIFileLoader(
        file_paths=file_paths,
        api_key="FAKE_API_KEY",
        strategy="fast",
        mode="elements",
    )
    docs = loader.load()
```
2023-05-21 20:48:20 -07:00
Harrison Chase
0c3de0a0b3 Merge branch 'master' of github.com:hwchase17/langchain 2023-05-21 09:22:43 -07:00
Harrison Chase
224f73e978 move docs 2023-05-21 09:22:35 -07:00
Harrison Chase
6c25f860fd bump to 176 (#5064) 2023-05-21 09:19:25 -07:00
Harrison Chase
b0431c672b Harrison/psychic (#5063)
Co-authored-by: Ayan Bandyopadhyay <ayanb9440@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-21 09:13:20 -07:00
Harrison Chase
8c661baefb change to type checking (#5062) 2023-05-21 09:09:49 -07:00
Jeffrey Zheng
424a573266 DOC: Misspelling in agents.rst documentation (#5038)
# Corrected Misspelling in agents.rst Documentation

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get
-->

In the
[documentation](https://python.langchain.com/en/latest/modules/agents.html)
it says "in fact, it is often best to have an Action Agent be in
**change** of the execution for the Plan and Execute agent."

**Suggested Change:** I propose correcting change to charge.

Fix for issue: #5039
2023-05-20 22:24:08 -07:00
Gengliang Wang
f9f08c4b69 Add documentation for Databricks integration (#5013)
# Add documentation for Databricks integration

This is a follow-up of https://github.com/hwchase17/langchain/pull/4702
It documents the details of how to integrate Databricks using langchain.
It also provides examples in a notebook.


## Who can review?
@dev2049 @hwchase17 since you are aware of the context. We will promote
the integration after this doc is ready. Thanks in advance!
2023-05-20 22:06:24 -07:00
tornikeo
a6ef20d7fe Fix annoying typo in docs (#5029)
# Fixes an annoying typo in docs

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes Annoying typo in docs - "Therefor" -> "Therefore". It's so
annoying to read that I just had to make this PR.
2023-05-20 22:02:21 -07:00
Davis Chase
9d1280d451 bump v175 (#5041) 2023-05-20 09:24:17 -07:00
UmerHA
7388248b3e Streaming only final output of agent (#2483) (#4630)
# Streaming only final output of agent (#2483)
As requested in issue #2483, this Callback allows to stream only the
final output of an agent (ie not the intermediate steps).

Fixes #2483

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-20 09:20:17 -07:00
Davis Chase
3bc0bf0079 fix prompt saving (#4987)
will add unit tests
2023-05-20 08:21:52 -07:00
Zander Chase
27e63b977a Add logs command (#5007)
to the plus server
2023-05-20 00:06:17 +00:00
Marcus Winter
2aa3754024 Check for single prompt in __call__ method of the BaseLLM class (#4892)
# Ensuring that users pass a single prompt when calling a LLM 

- This PR adds a check to the `__call__` method of the `BaseLLM` class
to ensure that it is called with a single prompt
- Raises a `ValueError` if users try to call a LLM with a list of prompt
and instructs them to use the `generate` method instead

## Why this could be useful

I stumbled across this by accident. I accidentally called the OpenAI LLM
with a list of prompts instead of a single string and still got a
result:

```
>>> from langchain.llms import OpenAI
>>> llm = OpenAI()
>>> llm(["Tell a joke"]*2)
"\n\nQ: Why don't scientists trust atoms?\nA: Because they make up everything!"
```

It might be better to catch such a scenario preventing unnecessary costs
and irritation for the user.

## Proposed behaviour

```
>>> from langchain.llms import OpenAI
>>> llm = OpenAI()
>>> llm(["Tell a joke"]*2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/marcus/Projects/langchain/langchain/llms/base.py", line 291, in __call__
    raise ValueError(
ValueError: Argument `prompt` is expected to be a single string, not a list. If you want to run the LLM on multiple prompts, use `generate` instead.
```
2023-05-19 16:54:26 -07:00
domchan
6c60251f52 Add self query translator for weaviate vectorstore (#4804)
# Add self query translator for weaviate vectorstore

Adds support for the EQ comparator and the AND/OR operators. 

Co-authored-by: Dominic Chan <dchan@cppib.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-19 16:41:12 -07:00
Davis Chase
9928fb2193 Revert "API update: Engines -> Models (#4915)" (#5008)
This reverts commit 8c28ad6dac.

Seems to be causing #5001
2023-05-19 16:38:08 -07:00
SimFG
f07b9fde74 Update the GPTCache example (#4985)
# Update the GPTCache example

Fixes #4757
2023-05-19 16:35:36 -07:00
Leonid Ganeline
ddc2d4c21e added instruction about pip install google-gerativeai (#5004)
# added instruction about pip install google-gerativeai

added instruction about pip install google-gerativeai
2023-05-19 15:32:24 -07:00
Nicolas
02632d52b3 docs: Big Mendable Improvements (#4964)
- Higher accuracy on the responses
- New redesigned UI
- Pretty Sources: display the sources by title / sub-section instead of
long URL.
- Fixed Reset Button bugs and some other UI issues
- Other tweaks
2023-05-19 15:31:48 -07:00
Leonid Ganeline
2ab0e1d526 changed ValueError to ImportError (#5006)
# changed ValueError to ImportError in except

Several places with this bug. ValueError does not catch ImportError.
2023-05-19 15:28:08 -07:00
Davis Chase
080eb1b3fc Fix graphql tool (#4984)
Fix construction and add unit test.
2023-05-19 15:27:50 -07:00
Mike McGarry
ddd595fe81 feature/4493 Improve Evernote Document Loader (#4577)
# Improve Evernote Document Loader

When exporting from Evernote you may export more than one note.
Currently the Evernote loader concatenates the content of all notes in
the export into a single document and only attaches the name of the
export file as metadata on the document.

This change ensures that each note is loaded as an independent document
and all available metadata on the note e.g. author, title, created,
updated are added as metadata on each document.

It also uses an existing optional dependency of `html2text` instead of
`pypandoc` to remove the need to download the pandoc application via
`download_pandoc()` to be able to use the `pypandoc` python bindings.

Fixes #4493 

Co-authored-by: Mike McGarry <mike.mcgarry@finbourne.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-19 14:28:17 -07:00
Juanma Tristancho
729e935ea4 PGVector logger message level (#4920)
# Change the logger message level

The library is logging at `error` level a situation that is not an
error.
We noticed this error in our logs, but from our point of view it's an
expected behavior and the log level should be `warning`.
2023-05-19 14:01:26 -07:00
Peng Wang
62d0a01a0f Update python.py (#4971)
# Delete a useless "print"
2023-05-19 13:57:16 -07:00
Eugene Yurtsev
0ff59569dc Adds 'IN' metadata filter for pgvector for checking set presence (#4982)
# Adds "IN" metadata filter for pgvector to all checking for set
presence

PGVector currently supports metadata filters of the form:
```
{"filter": {"key": "value"}}
```
which will return documents where the "key" metadata field is equal to
"value".

This PR adds support for metadata filters of the form:
```
{"filter": {"key": { "IN" : ["list", "of", "values"]}}}
```

Other vector stores support this via an "$in" syntax. I chose to use
"IN" to match postgres' syntax, though happy to switch.
Tested locally with PGVector and ChatVectorDBChain.


@dev2049

---------

Co-authored-by: jade@spanninglabs.com <jade@spanninglabs.com>
2023-05-19 13:53:23 -07:00
Davis Chase
56cb77a828 Make test gha workflow manually runnable (#4998)
if https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_dispatch
is to be believed this should make it possible to manually kick of test
workflow, but i don't know much about these things
2023-05-19 13:46:33 -07:00
Jiaping(JP) Zhang
22d844dc07 Add async search with relevance score (#4558)
Add the async version for the search with relevance score

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-19 13:05:24 -07:00
Adheeban Manoharan
616e9a93e0 Bug fixes and error handling in Redis - Vectorstore (#4932)
# Bug fixes in Redis - Vectorstore (Added the version of redis to the
error message and removed the cls argument from a classmethod)


Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>
2023-05-19 13:02:03 -07:00
Gengliang Wang
a87a2524c7 Remove autoreload in examples (#4994)
# Remove autoreload in examples
Remove the `autoreload` in examples since it is not necessary for most
users:
```
%load_ext autoreload,
%autoreload 2
```
2023-05-19 17:35:58 +00:00
Davis Chase
2abf6b9f17 bump v0.0.174 (#4988) 2023-05-19 09:34:28 -07:00
Eugene Yurtsev
06e524416c power bi api wrapper integration tests & bug fix (#4983)
# Powerbi API wrapper bug fix + integration tests

- Bug fix by removing `TYPE_CHECKING` in in utilities/powerbi.py
- Added integration test for power bi api in
utilities/test_powerbi_api.py
- Added integration test for power bi agent in
agent/test_powerbi_agent.py
- Edited .env.examples to help set up power bi related environment
variables
- Updated demo notebook with working code in
docs../examples/powerbi.ipynb - AzureOpenAI -> ChatOpenAI

Notes: 

Chat models (gpt3.5, gpt4) are much more capable than davinci at writing
DAX queries, so that is important to getting the agent to work properly.
Interestingly, gpt3.5-turbo needed the examples=DEFAULT_FEWSHOT_EXAMPLES
to write consistent DAX queries, so gpt4 seems necessary as the smart
llm.

Fixes #4325

## Before submitting

Azure-core and Azure-identity are necessary dependencies

check integration tests with the following:
`pytest tests/integration_tests/utilities/test_powerbi_api.py`
`pytest tests/integration_tests/agent/test_powerbi_agent.py`

You will need a power bi account with a dataset id + table name in order
to test. See .env.examples for details.

## Who can review?
@hwchase17
@vowelparrot

---------

Co-authored-by: aditya-pethe <adityapethe1@gmail.com>
2023-05-19 11:25:52 -04:00
Viswanadh Rayavarapu
e68dfa7062 Update planner_prompt.py (#4967)
Typos in the OpenAPI agent Prompt.
2023-05-19 11:17:10 -04:00
Edrick Da Corte Henriquez
e80585bab0 Update tutorials.md (#4960)
# Added a YouTube Tutorial

Added a LangChain tutorial playlist aimed at onboarding newcomers to
LangChain and its use cases.

I've shared the video in the #tutorials channel and it seemed to be well
received. I think this could be useful to the greater community.

## Who can review?

@dev2049
2023-05-19 10:40:14 -04:00
Rahul Rao
13c376345e Fixed assumptions misspelling (#4961)
Fixed assumptions misspelling in the link mentioned below:-


https://python.langchain.com/en/latest/modules/chains/examples/llm_summarization_checker.html


![image](https://github.com/hwchase17/langchain/assets/16189966/94cf2be0-b3d0-495b-98ad-e1f44331727e)

Fix for Issue:- #4959 

@hwchase17
2023-05-19 10:40:04 -04:00
Gengliang Wang
bf5a3c6dec Support Databricks in SQLDatabase (#4702)
This PR adds support for Databricks runtime and Databricks SQL by using
[Databricks SQL Connector for
Python](https://docs.databricks.com/dev-tools/python-sql-connector.html).
As a cloud data platform, accessing Databricks requires a URL as follows

`databricks://token:{api_token}@{hostname}?http_path={http_path}&catalog={catalog}&schema={schema}`.

**The URL is **complicated** and it may take users a while to figure it
out**. Since the fields `api_token`/`hostname`/`http_path` fields are
known in the Databricks notebook, I am proposing a new method
`from_databricks` to simplify the connection to Databricks.

## In Databricks Notebook
After changes, Databricks users only need to specify the `catalog` and
`schema` field when using langchain.
<img width="881" alt="image"
src="https://github.com/hwchase17/langchain/assets/1097932/984b4c57-4c2d-489d-b060-5f4918ef2f37">

## In Jupyter Notebook
The method can be used on the local setup as well:
<img width="678" alt="image"
src="https://github.com/hwchase17/langchain/assets/1097932/142e8805-a6ef-4919-b28e-9796ca31ef19">
2023-05-19 00:42:06 -07:00
Harrison Chase
88a3a56c1a Add Spark SQL support (#4602) (#4956)
# Add Spark SQL support 
* Add Spark SQL support. It can connect to Spark via building a
local/remote SparkSession.
* Include a notebook example

I tried some complicated queries (window function, table joins), and the
tool works well.
Compared to the [Spark Dataframe

agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/spark.html),
this tool is able to generate queries across multiple tables.

---------

# Your PR Title (What it does)

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->

---------

Co-authored-by: Gengliang Wang <gengliang@apache.org>
Co-authored-by: Mike W <62768671+skcoirz@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
Co-authored-by: 张城铭 <z@hyperf.io>
Co-authored-by: assert <zhangchengming@kkguan.com>
Co-authored-by: blob42 <spike@w530>
Co-authored-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Co-authored-by: Richard He <he.yucheng@outlook.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
Co-authored-by: Alexey Nominas <60900649+Chae4ek@users.noreply.github.com>
Co-authored-by: elBarkey <elbarkey@gmail.com>
Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
Co-authored-by: Jeffrey D <1289344+verygoodsoftwarenotvirus@users.noreply.github.com>
Co-authored-by: so2liu <yangliu35@outlook.com>
Co-authored-by: Viswanadh Rayavarapu <44315599+vishwa-rn@users.noreply.github.com>
Co-authored-by: Chakib Ben Ziane <contact@blob42.xyz>
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
Co-authored-by: Jari Bakken <jari.bakken@gmail.com>
Co-authored-by: escafati <scafatieugenio@gmail.com>
2023-05-18 20:53:08 -07:00
Harrison Chase
5feb60f426 Harrison/spell executor (#4914)
Co-authored-by: Jan Minar <rdancer@rdancer.org>
2023-05-18 20:43:33 -07:00
Aidan Boland
c06973261a Fix for syntax when setting search_path for Snowflake database (#4747)
# Fixes syntax for setting Snowflake database search_path

An error occurs when using a Snowflake database and providing a schema
argument.
I have updated the syntax to run a Snowflake specific query when the
database dialect is 'snowflake'.
2023-05-18 20:30:38 -07:00
Mike Wang
db6f7ed0ba [nit] Simplify Spark Creation Validation Check A Little Bit (#4761)
- simplify the validation check a little bit.
- re-tested in jupyter notebook.

Reviewer: @hwchase17
2023-05-18 18:57:54 -07:00
escafati
e027a38f33 NIT: Instead of hardcoding k in each definition, define it as a param above. (#2675)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
2023-05-18 17:35:31 -07:00
Jari Bakken
3df2d831f9 Fix get_num_tokens for Anthropic models (#4911)
The Anthropic classes used `BaseLanguageModel.get_num_tokens` because of
an issue with multiple inheritance. Fixed by moving the method from
`_AnthropicCommon` to both its subclasses.

This change will significantly speed up token counting for Anthropic
users.
2023-05-18 16:32:27 -07:00
Daniel Chalef
c8c2276ccb Zep Retriever - Vector Search Over Chat History (#4533)
# Zep Retriever - Vector Search Over Chat History with the Zep Long-term
Memory Service

More on Zep: https://github.com/getzep/zep

Note: This PR is related to and relies on
https://github.com/hwchase17/langchain/pull/4834. I did not want to
modify the `pyproject.toml` file to add the `zep-python` dependency a
second time.

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-05-18 16:27:18 -07:00
Chakib Ben Ziane
5525b704cc Chatconv agent: output parser exception (#4923)
the output parser form chat conversational agent now raises
`OutputParserException` like the rest.

The `raise OutputParserExeption(...) from e` form also carries through
the original error details on what went wrong.

I added the `ValueError` as a base class to `OutputParserException` to
avoid breaking code that was relying on `ValueError` as a way to catch
exceptions from the agent. So catching ValuError still works. Not sure
if this is a good idea though ?
2023-05-18 16:20:35 -07:00
Leonid Ganeline
a9bb3147d7 docs: vectorstores, different updates and fixes (#4939)
# docs: vectorstores, different updates and fixes

Multiple updates:
- added/improved descriptions
- fixed header levels
- added headers
- fixed headers
2023-05-18 15:35:47 -07:00
Leonid Ganeline
8f8593aac5 docs: added ecosystem/dependents page (#4941)
# docs: added `ecosystem/dependents` page

Added `ecosystem/dependents` page. Can we propose a better page name?
2023-05-18 13:11:08 -07:00
Viswanadh Rayavarapu
c9f963e295 Update custom_multi_action_agent.ipynb (#4931)
Updated the docs from 
"An agent consists of three parts:" to 
"An agent consists of two parts:" since there are only two parts in the
documentation
2023-05-18 11:53:12 -07:00
so2liu
3002c1d508 fix: error in gptcache example nb (#4930) 2023-05-18 11:49:45 -07:00
Jeffrey D
7e8e21c914 Correct typo in APIChain example notebook (Farenheit -> Fahrenheit) (#4938)
Correct typo in APIChain example notebook (Farenheit -> Fahrenheit)
2023-05-18 11:48:02 -07:00
Leonid Ganeline
c75c0775e1 docs supabase update (#4935)
# docs: updated `Supabase` notebook

- the title of the notebook was inconsistent (included redundant
"Vectorstore"). Removed this "Vectorstore"
- added `Postgress` to the title. It is important. The `Postgres` name
is much more popular than `Supabase`.
- added description for the `Postrgress`
- added more info to the `Supabase` description
2023-05-18 10:42:08 -07:00
Davis Chase
55baa0d153 Update redis integration tests (#4937) 2023-05-18 10:22:17 -07:00
Davis Chase
440b8761f4 Redis kwargs fix (#4936)
cc @tylerhutcherson
2023-05-18 10:02:46 -07:00
elBarkey
a8ded21b69 FIX: GPTCache cache_obj creation loop (#4827)
_get_gptcache method keep creating new gptcache instance, here's the fix

# Fix GPTCache cache_obj creation loop

Fixes #4830 

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-18 09:42:35 -07:00
Alexey Nominas
c9e2a01875 Update GPT4ALL integration (#4567)
# Update GPT4ALL integration

GPT4ALL have completely changed their bindings. They use a bit odd
implementation that doesn't fit well into base.py and it will probably
be changed again, so it's a temporary solution.

Fixes #3839, #4628
2023-05-18 09:38:54 -07:00
Leonid Ganeline
e2d7677526 docs: compound ecosystem and integrations (#4870)
# Docs: compound ecosystem and integrations

**Problem statement:** We have a big overlap between the
References/Integrations and Ecosystem/LongChain Ecosystem pages. It
confuses users. It creates a situation when new integration is added
only on one of these pages, which creates even more confusion.
- removed References/Integrations page (but move all its information
into the individual integration pages - in the next PR).
- renamed Ecosystem/LongChain Ecosystem into Integrations/Integrations.
I like the Ecosystem term. It is more generic and semantically richer
than the Integration term. But it mentally overloads users. The
`integration` term is more concrete.
UPDATE: after discussion, the Ecosystem is the term.
Ecosystem/Integrations is the page (in place of Ecosystem/LongChain
Ecosystem).

As a result, a user gets a single place to start with the individual
integration.
2023-05-18 09:29:57 -07:00
Harrison Chase
d5a0704544 dont error on sql import (#4647)
this makes it so we dont throw errors when importing langchain when
sqlalchemy==1.3.1

we dont really want to support 1.3.1 (seems like unneccessary maintance
cost) BUT we would like it to not terribly error should someone decide
to run on it
2023-05-18 09:27:09 -07:00
Harrison Chase
c9a362e482 add alias for model (#4553)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-18 09:12:23 -07:00
Richard He
7642f2159c Add human message as input variable to chat agent prompt creation (#4542)
# Add human message as input variable to chat agent prompt creation

This PR adds human message and system message input to
`CHAT_ZERO_SHOT_REACT_DESCRIPTION` agent, similar to [conversational
chat
agent](7bcf238a1a/langchain/agents/conversational_chat/base.py (L64-L71)).

I met this issue trying to use `create_prompt` function when using the
[BabyAGI agent with tools
notebook](https://python.langchain.com/en/latest/use_cases/autonomous_agents/baby_agi_with_agent.html),
since BabyAGI uses “task” instead of “input” input variable. For normal
zero shot react agent this is fine because I can manually change the
suffix to “{input}/n/n{agent_scratchpad}” just like the notebook, but I
cannot do this with conversational chat agent, therefore blocking me to
use BabyAGI with chat zero shot agent.

I tested this in my own project
[Chrome-GPT](https://github.com/richardyc/Chrome-GPT) and this fix
worked.

## Request for review
Agents / Tools / Toolkits
- @vowelparrot
2023-05-18 09:09:31 -07:00
Yuekai Zhang
1ed4228822 Fix bilibili (#4860)
# Fix bilibili api import error

bilibili-api package is depracated and there is no sync module.

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes #2673 #2724 

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@vowelparrot  @liaokongVFX 

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
2023-05-18 09:56:51 -04:00
Eugene Yurtsev
e46202829f feat #4479: TextLoader auto detect encoding and improved exceptions (#4927)
# TextLoader auto detect encoding and enhanced exception handling

- Add an option to enable encoding detection on `TextLoader`. 
- The detection is done using `chardet`
- The loading is done by trying all detected encodings by order of
confidence or raise an exception otherwise.

### New Dependencies:
- `chardet`

Fixes #4479 

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

- @eyurtsev

---------

Co-authored-by: blob42 <spike@w530>
2023-05-18 09:55:14 -04:00
张城铭
8c28ad6dac API update: Engines -> Models (#4915)
# API update: Engines -> Models

see: https://community.openai.com/t/api-update-engines-models/18597

Co-authored-by: assert <zhangchengming@kkguan.com>
2023-05-18 09:54:42 -04:00
Eugene Yurtsev
c06a47a691 Load specific file types from Google Drive (issue #4878) (#4926)
# Load specific file types from Google Drive (issue #4878)
Add the possibility to define what file types you want to load from
Google Drive.
 
```
 loader = GoogleDriveLoader(
    folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
    file_types=["document", "pdf"]
    recursive=False
)
```

Fixes ##4878

## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
DataLoaders
- @eyurtsev

Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) | Discord:
RicChilligerDude#7589

---------

Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
2023-05-18 09:27:53 -04:00
Harrison Chase
dfbf45f028 bump version to 173 (#4910) 2023-05-17 23:36:45 -07:00
Harrison Chase
b8d48939a2 Harrison/unified objectives (#4905)
Co-authored-by: Matthias Samwald <samwald@gmx.at>
2023-05-17 23:03:57 -07:00
Harrison Chase
9165267f8a Harrison/improved retry tool (#4842) 2023-05-17 21:41:01 -07:00
Harrison Chase
ba023d53ca Harrison/faiss norm (#4903)
Co-authored-by: Jiaxin Shan <seedjeffwan@gmail.com>
2023-05-17 21:40:49 -07:00
Harrison Chase
9e2227ba11 Harrison/serper api bug (#4902)
Co-authored-by: Jerry Luan <xmaswillyou@gmail.com>
2023-05-17 21:40:39 -07:00
Leonid Ganeline
c998569c8f docs: text splitters improvements (#4490)
#docs: text splitters improvements

Changes are only in the Jupyter notebooks.
- added links to the source packages and a short description of these
packages
- removed " Text Splitters" suffixes from the TOC elements (they made
the list of the text splitters messy)
- moved text splitters, based on the length function into a separate
list. They can be mixed with any classes from the "Text Splitters", so
it is a different classification.

## Who can review?
        @hwchase17 - project lead
        @eyurtsev
        @vowelparrot

NOTE: please, check out the results of the `Python code` text splitter
example (text_splitters/examples/python.ipynb). It looks suboptimal.
2023-05-17 21:33:34 -07:00
Steve Kim
613bf9b514 Update getting_started.md (#4482)
# Added another helpful way for developers who want to set OpenAI API
Key dynamically

Previous methods like exporting environment variables are good for
project-wide settings.
But many use cases need to assign API keys dynamically, recently.

```python
from langchain.llms import OpenAI
llm = OpenAI(openai_api_key="OPENAI_API_KEY")
```

## Before submitting
```bash
export OPENAI_API_KEY="..."
```
Or,
```python
import os
os.environ["OPENAI_API_KEY"] = "..."
```

<hr>

Thank you.
Cheers,
Bongsang
2023-05-17 21:32:25 -07:00
Ismael G Serrano
41e2394c9c Fix AzureOpenAI embeddings documentation example. model -> deployment (#4389)
# Documentation for Azure OpenAI embeddings model

- OPENAI_API_VERSION environment variable is needed for the endpoint
- The constructor does not work with model, it works with deployment.

I fixed it in the notebook.

(This is my first contribution)

## Who can review?

@hwchase17 
@agola

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-17 21:05:53 -07:00
Davis Chase
a4ac006658 Update gallery (#4873) 2023-05-17 20:59:41 -07:00
Davis Chase
8966f61ca5 Zep memory (#4898)
Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>
2023-05-17 20:01:01 -07:00
Davis Chase
e28bdf4453 Cadlabs/python tool sanitization (#4754)
Co-authored-by: BenSchZA <BenSchZA@users.noreply.github.com>
2023-05-17 19:46:12 -07:00
Eugene Yurtsev
0dc304ca80 Add html parsers (#4874)
# Add bs4 html parser

* Some minor refactors
* Extract the bs4 html parsing code from the bs html loader
* Move some tests from integration tests to unit tests
2023-05-17 22:39:11 -04:00
Eugene Yurtsev
8e41143bf5 Add a generic document loader (#4875)
# Add generic document loader

* This PR adds a generic document loader which can assemble a loader
from a blob loader and a parser
* Adds a registry for parsers
* Populate registry with a default mimetype based parser


## Expected changes

- Parsing involves loading content via IO so can be sped up via:
  * Threading in sync
  * Async  
- The actual parsing logic may be computatinoally involved: may need to
figure out to add multi-processing support
- May want to add suffix based parser since suffixes are easier to
specify in comparison to mime types

## Before submitting

No notebooks yet, we first need to get a few of the basic parsers up
(prior to advertising the interface)
2023-05-17 22:38:55 -04:00
Davis Chase
df0c33a005 Faiss no avx2 (#4895)
Co-authored-by: Ali Mirlou <alimirlou@gmail.com>
2023-05-17 19:18:57 -07:00
Emil Ahlbäck
5c9205d5f4 ConversationalChatAgent: Allow customizing TEMPLATE_TOOL_RESPONSE (#2361)
It's currently not possible to change the `TEMPLATE_TOOL_RESPONSE`
prompt for ConversationalChatAgent, this PR changes that.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-17 17:23:08 -07:00
Zander Chase
1ff7c958b0 Bold Crumbs (#4876) 2023-05-17 22:50:35 +00:00
Alexander Miasoiedov (Myasoedov)
4c3ab55e94 feat(Add FastAPI + Vercel deployment option): (#4520)
# Update deployments doc with langcorn API server

API server example 

```python
from fastapi import FastAPI

from langcorn import create_service

app: FastAPI = create_service(
    "examples.ex1:chain",
    "examples.ex2:chain",
    "examples.ex3:chain",
    "examples.ex4:sequential_chain",
    "examples.ex5:conversation",
    "examples.ex6:conversation_with_summary",
)

```
More examples: https://github.com/msoedov/langcorn/tree/main/examples

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-17 15:50:25 -07:00
Taqi Jaffri
ef8b5f64bc Tiny code review and docs fix for Docugami DataLoader (#4877)
# Docs and code review fixes for Docugami DataLoader

1. I noticed a couple of hyperlinks that are not loading in the
langchain docs (I guess need explicit anchor tags). Added those.
2. In code review @eyurtsev had a
[suggestion](https://github.com/hwchase17/langchain/pull/4727#discussion_r1194069347)
to allow string paths. Turns out just updating the type works (I tested
locally with string paths).

# Pre-submission checks
I ran `make lint` and `make tests` successfully.

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
2023-05-17 15:31:43 -07:00
C.J. Jameson
d6e0b9a43d fix homepage typo (#4883)
# Fix Homepage Typo

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested... not sure
2023-05-17 15:30:23 -07:00
Leonid Ganeline
b96ab4b763 docs retriever improvements (#4430)
# Docs: improvements in the `retrievers/examples/` notebooks

Its primary purpose is to make the Jupyter notebook examples
**consistent** and more suitable for first-time viewers.
- add links to the integration source (if applicable) with a short
description of this source;
- removed `_retriever` suffix from the file names (where it existed) for
consistency;
- removed ` retriever` from the notebook title (where it existed) for
consistency;
- added code to install necessary Python package(s);
- added code to set up the necessary API Key.
- very small fixes in notebooks from other folders (for consistency):
  - docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb
  - docs/modules/indexes/vectorstores/examples/pinecone.ipynb
  - docs/modules/models/llms/integrations/cohere.ipynb
- fixed misspelling in langchain/retrievers/time_weighted_retriever.py
comment (sorry, about this change in a .py file )

## Who can review
@dev2049
2023-05-17 15:29:22 -07:00
Justin Levi Winter
0147f845f1 Update getting_started.ipynb (#4850)
minor grammer issue
2023-05-17 13:19:14 -07:00
Yong Fu
3e12f0957a Remove unused variables in Milvus vectorstore (#4868)
# Remove unused variables in Milvus vectorstore
This PR simply removes a variable unused in Milvus. The variable looks
like a copy-paste from other functions in Milvus but it is really
unnecessary.
2023-05-17 12:00:37 -07:00
Eugene Yurtsev
c5ab9782c6 Add beautiful soup 4 to extended testing extra (#4869)
# Add bs4 to extended testing extra

Updating extended testing extra in preparation for more refactors.
2023-05-17 14:11:26 -04:00
Ryan Culligan
6a9cdc43f5 Fix TypeError in Vectorstore Redis class methods (#4857)
# Fix TypeError in Vectorstore Redis class methods

This change resolves a TypeError that was raised when invoking the
`from_texts_return_keys` method from the `from_texts` method in the
`Redis` class. The error was due to the `cls` argument being passed
explicitly, which led to it being provided twice since it's also
implicitly passed in class methods. No relevant tests were added as the
issue appeared to be better suited for linters to catch proactively.

Changes:
- Removed `cls=cls` from the call to `from_texts_return_keys` in the
`from_texts` method.

Related to:
https://github.com/hwchase17/langchain/pull/4653
2023-05-17 10:48:09 -07:00
Eugene Yurtsev
2d20a1196e Hugging Face Loader: Add lazy load (#4799)
# Add lazy load to HF datasets loader

Unfortunately, there are no tests as far as i can tell. Verified code manually.
2023-05-17 12:04:23 -04:00
Davis Chase
a63ab7ded1 bump 172 (#4864) 2023-05-17 08:54:39 -07:00
yujiosaka
2f8eb95a91 Remove unnecessary comment (#4845)
# Remove unnecessary comment

Remove unnecessary comment accidentally included in #4800

## Before submitting

- no test
- no document

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
2023-05-17 11:53:03 -04:00
UmerHA
e257380deb Typos (#4851)
# Fixed typos (issues #4818 & #4668 & more typos)
- At some places, it said `model = ChatOpenAI(model='gpt-3.5-turbo')`
but should be `model = ChatOpenAI(model_name='gpt-3.5-turbo')`
- Fixes some other typos

Fixes #4818, #4668

## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
        Models
        - @hwchase17
        - @agola11
        Agents / Tools / Toolkits
        - @vowelparrot
2023-05-17 11:52:22 -04:00
Zander Chase
8dcad0f272 Add Support for Flexible Input Format for LLM and Chat Model Runs (#4805)
Previously, the client expected a strict 'prompt' or 'messages' format
and wouldn't permit running a chat model or llm on prompts or messages
(respectively).

Since many datasets may want to specify custom key: string , relax this
requirement.
Also, add support for running a chat model on raw prompts and LLM on
chat messages through their respective fallbacks.
2023-05-17 14:24:17 +00:00
Zander Chase
a47c62fcba Add dev option (#4828)
enable running
```
langchain plus start --dev
```

To use the RC iamges instead
2023-05-17 14:09:25 +00:00
Harrison Chase
720ac49f42 2markdown loader (#4796)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-05-16 23:42:53 -07:00
Ankush Gola
aa73a888fa Some notebook and client fixes (add retries, clean up docs, etc) (#4820)
# Your PR Title (What it does)

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
2023-05-16 20:23:00 -07:00
Davis Chase
0a591da6db Add weaviate by_text (#4824)
Thanks @ZouhairElhadi! Made small change

Closes #4742

---------

Co-authored-by: Zouhair Elhadi <zouhair11elhadi@gmail.com>
Co-authored-by: ZouhairElhadi <87149442+ZouhairElhadi@users.noreply.github.com>
2023-05-16 19:43:15 -07:00
Zander Chase
d1b6839d97 Retry session and tenant (#4822) 2023-05-17 01:54:40 +00:00
Nguyen Trung Duc (john)
49e4aaf673 Fix subclassing OpenAIEmbeddings (#4500)
# Fix subclassing OpenAIEmbeddings

Fixes #4498 

## Before submitting

- Problem: Due to annotated type `Tuple[()]`.
- Fix: Change the annotated type to "Iterable[str]". Even though
tiktoken use
[Collection[str]](095924e02c/tiktoken/core.py (L80))
type annotation, but pydantic doesn't support Collection type, and
[Iterable](https://docs.pydantic.dev/latest/usage/types/#typing-iterables)
is the closest to Collection.
2023-05-16 18:35:19 -07:00
Harrison Chase
08df80bed6 console callback verbose (#4696)
add verbose callback

Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>
2023-05-17 01:28:43 +00:00
David Peterson
d5d4c0a172 Update summarize.ipynb (#4529)
# Update order in which tasks are stated (logically correct)

Fixes the order in which steps are placed under titles.

@vowelparrot
2023-05-16 18:14:00 -07:00
Django
bcffc704c1 fix: agenerate miss run_manager args in llm.py (#4566)
# fix: agenerate miss run_manager args in llm.py

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)
fix: agenerate miss run_manager args in llm.py


<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
2023-05-16 17:37:56 -07:00
Brendan Mannix
4e56d3119c update qdrant docs to reflect the proper way to initialize Qdrant() constructor (#4596)
# update qdrant docs to reflect the proper way to initialize Qdrant()
constructor

The [Qdrant
docs](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/qdrant.html)
still contain an old reference for passing an `embedding_function` into
the constructor. This is no longer supported.

This PR updates the docs to reflect the proper way to initialize
`Qdrant()`

Old:
![Screenshot 2023-05-12 at 3 06 33
PM](https://github.com/hwchase17/langchain/assets/1552962/dd4063d2-2a07-4340-91bb-e305f7215ddd)

New:
![Screenshot 2023-05-12 at 3 21 09
PM](https://github.com/hwchase17/langchain/assets/1552962/aebc3f63-1a8b-4ca3-93c0-a2ce30dcd282)
2023-05-16 17:30:38 -07:00
Sean Morgan
5372a06a8c DOC: Fix SageMaker example (#4598)
# Fix SageMaker example typing

Since https://github.com/hwchase17/langchain/pull/3249 a new type
`LLMContentHandler` is enforced for SageMaker Endpoints

Fixes #4168
2023-05-16 17:28:16 -07:00
Steve Kim
e90654f39b Added cleaning up the downloaded PDF files (#4601)
ArxivAPIWrapper searches and downloads PDFs to get related information.
But I found that it doesn't delete the downloaded file. The reason why
this is a problem is that a lot of PDF files remain on the server. For
example, one size is about 28M.
So, I added a delete line because it's too big to maintain on the
server.

# Clean up downloaded PDF files
- Changes: Added new line to delete downloaded file
- Background: To get the information on arXiv's paper, ArxivAPIWrapper
class downloads a PDF.
It's a natural approach, but the wrapper retains a lot of PDF files on
the server.
- Problem: One size of PDFs is about 28M. It's too big to maintain on a
small server like AWS.
- Dependency: import os

Thank you.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-16 17:26:56 -07:00
Quinn
6fbd5e837f Update planner_prompt.py, change usery to user (#4623)
# Fix misspell in planner_prompt.py

before

```
Usery query: I want to buy a couch
```

after

```
User query: I want to buy a couch
```
2023-05-16 17:24:27 -07:00
Tony Zhang
432421ffa5 [Fix][GenerativeAgent] Get the memory importance score from regex matched group (#4636)
# Get the memory importance score from regex matched group

In `GenerativeAgentMemory`, the `_score_memory_importance()` will make a
prompt to get a rating score. The prompt is:
```
        prompt = PromptTemplate.from_template(
            "On the scale of 1 to 10, where 1 is purely mundane"
            + " (e.g., brushing teeth, making bed) and 10 is"
            + " extremely poignant (e.g., a break up, college"
            + " acceptance), rate the likely poignancy of the"
            + " following piece of memory. Respond with a single integer."
            + "\nMemory: {memory_content}"
            + "\nRating: "
        )
```
For some LLM, it will respond with, for example, `Rating: 8`. Thus we
might want to get the score from the matched regex group.
2023-05-16 16:59:50 -07:00
Daniel Maturana
be405ac139 Query_constructor.base.py function _get_prompt() not including passed examples. (#4680)
The function _get_prompt() was returning the DEFAULT_EXAMPLES even if
some custom examples were given. The return FewShotPromptTemplate was
returnong DEFAULT_EXAMPLES and not examples
2023-05-16 16:31:10 -07:00
Anam Hira
3af448d72e Update huggingface_tools.ipynb (#4700) 2023-05-16 16:28:27 -07:00
rajib
e28f4a5f39 changed cohere.py to update the default model of embedding (#4709)
# The cohere embedding model do not use large, small. It is deprecated.
Changed the modules default model

Fixes #4694


Co-authored-by: rajib76 <rajib76@yahoo.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-16 16:27:23 -07:00
charosen
75fe9d3555 Add from_file method to message prompt template (#4713)
**Feature**: This PR adds `from_template_file` class method to
BaseStringMessagePromptTemplate. This is useful to help user to create
message prompt templates directly from template files, including
`ChatMessagePromptTemplate`, `HumanMessagePromptTemplate`,
`AIMessagePromptTemplate` & `SystemMessagePromptTemplate`.

**Tests**: Unit tests have been added in this PR.

Co-authored-by: charosen <charosen@bupt.cn>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-16 16:25:17 -07:00
Chandan Routray
e8d46bdd9b Replaced SQLDatabaseChain deprecated direct initialisation with from_llm method (#4778)
# Removed usage of deprecated methods

Replaced `SQLDatabaseChain` deprecated direct initialisation with
`from_llm` method

## Who can review?

@hwchase17
@agola11

---------

Co-authored-by: imeckr <chandanroutray2012@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-16 15:59:06 -07:00
Chandan Routray
11341fcecb Fixed query checker for SQLDatabaseChain (#4780)
# Fixed query checker for SQLDatabaseChain

When `SQLDatabaseChain`'s llm attribute was deprecated, the query
checker stopped working if `SQLDatabaseChain` is initialised via
`from_llm` method. With this fix, `SQLDatabaseChain`'s query checker
would use the same `llm` as used in the `llm_chain`


## Who can review?
@hwchase17 - project lead

Co-authored-by: imeckr <chandanroutray2012@gmail.com>
2023-05-16 15:58:58 -07:00
Yeong0228
08876ad066 Fix SelfQueryRetriever, passing new query to vector store (#4774)
# Fix SelfQueryRetriever, passing new query to vector store
2023-05-16 15:46:22 -07:00
Mark Pors
8fd4d5d117 Added dependencies to make example executable (#4790)
- Installation of non-colab packages
- Get API keys

# Added dependencies to make notebook executable on hosted notebooks

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@hwchase17
@vowelparrot
2023-05-16 15:46:09 -07:00
Mark Pors
5bc7082e82 Cleanup and added dependencies to make example executable (#4795)
- Installation of non-colab packages
- Get API keys
- Get rid of warnings

# Cleanup and added dependencies to make notebook executable on hosted
notebooks
@hwchase17
@vowelparrot
2023-05-16 15:29:01 -07:00
keenangraham
bcce9a3a92 Fix age inconsistency in plan and execute Jupyter notebook example (#4814)
The current example in
https://python.langchain.com/en/latest/modules/agents/plan_and_execute.html
has inconsistent reasoning step (observing 28 years and thinking it's 26
years):

```
Observation: 28 years
Thought:Based on my search, Gigi Hadid's current age is 26 years old. 
Action:
{
  "action": "Final Answer",
  "action_input": "Gigi Hadid's current age is 26 years old."
}
```

Guessing this is model noise. Rerunning seems to give correct answer of
28 years.
2023-05-16 15:27:27 -07:00
Prateek K. Keshari
61f9c52fc7 Update twitter-the-algorithm-analysis-deeplake.ipynb (#4812)
Changed model to model_name
2023-05-16 15:27:15 -07:00
yujiosaka
6561efebb7 Accept uuids kwargs for weaviate (#4800)
# Accept uuids kwargs for weaviate

Fixes #4791
2023-05-16 15:26:46 -07:00
Adam Quigley
e78c9be312 Add Confluence Loader unit tests (#3333)
Adds some basic unit tests for the ConfluenceLoader that can be extended
later. Ports this [PR from
llama-hub](https://github.com/emptycrown/llama-hub/pull/208) and adapts
it to `langchain`.

@Jflick58 and @zywilliamli adding you here as potential reviewers

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-16 15:17:07 -07:00
Magnus Friberg
d126276693 Specify which data to return from chromadb (#4393)
# Improve the Chroma get() method by adding the optional "include"
parameter.

The Chroma get() method excludes embeddings by default. You can
customize the response by specifying the "include" parameter to
selectively retrieve the desired data from the collection.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-16 14:43:09 -07:00
Raduan Al-Shedivat
00c6ec8a2d fix(document_loaders/telegram): fix pandas calls + add tests (#4806)
# Fix Telegram API loader + add tests.
I was testing this integration and it was broken with next error:
```python
message_threads = loader._get_message_threads(df)
KeyError: False
```
Also, this particular loader didn't have any tests / related group in
poetry, so I added those as well.

@hwchase17 / @eyurtsev please take a look on this fix PR.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-16 14:35:25 -07:00
Zander Chase
206c87d525 Change server start name (#4811)
to `langchain plus start/stop`
2023-05-16 20:04:09 +00:00
Eugene Yurtsev
255690d78e Catch changes to test group (#4802)
# Catch changes to test group

Add test to catch changes to test group.
2023-05-16 14:48:56 -04:00
Eugene Yurtsev
c3b6129beb Block sockets for unit-tests (#4803)
# Block usage of sockets during unit tests

Catch any tests that attempt to use the network.
2023-05-16 14:41:24 -04:00
了空
f7e3d97b19 Remove unnecessary spaces from document object’s page_content of BiliBiliLoader (#4619)
- Remove unnecessary spaces from document object’s page_content of
BiliBiliLoader
- Fix BiliBiliLoader document and test file
2023-05-16 13:13:57 -04:00
Eugene Yurtsev
f47ec5b4b6 Docugami docs: First cell should be a title cell (#4735)
# Make first cell a title in docugami docs

This makes the first cell a title cell in docugami notebook
2023-05-16 13:12:14 -04:00
Eugene Yurtsev
d403f659ea Update google protobuf dep (#4798)
# Update google protobuf dep

Resolve: https://github.com/hwchase17/langchain/security/dependabot/11
2023-05-16 12:25:07 -04:00
Eugene Yurtsev
3ecd7c9641 Add check to verify poetry.toml (#4794)
# Add poetry check to github action

Check poetry toml file during tests for errors
2023-05-16 11:53:06 -04:00
Ikko Eltociear Ashimine
f5a476fdd4 Fix typo in dataframe.py (#4786)
# Fix typo in dataframe.py (#4786)

Fixed typo.
```
yeild -> yield
```
2023-05-16 11:49:04 -04:00
Eugene Yurtsev
14bedf1cc5 Github Action: Fix poetry lock file checking (#4789)
Fix how poetry lock file is checked to avoid skipping caches silently.
2023-05-16 11:40:28 -04:00
Davis Chase
7ce43372c3 Version 171 (#4788) 2023-05-16 08:24:45 -07:00
Zander Chase
bee136efa4 Update Tracing Walkthrough (#4760)
Add client methods to read / list runs and sessions.

Update walkthrough to:
- Let the user create a dataset from the runs without going to the UI
- Use the new CLI command to start the server

Improve the error message when `docker` isn't found
2023-05-16 13:26:43 +00:00
Zander Chase
fc0a3c8500 Persist Volume After Stop (#4763)
Previously, the data would be removed after shutting down the server.
This mounts a db volume that isn't erased between calls
2023-05-16 13:10:13 +00:00
Harrison Chase
a7af32c274 Cassandra support for chat history (#4378) (#4764)
# Cassandra support for chat history

### Description

- Store chat messages in cassandra

### Dependency

- cassandra-driver - Python Module

## Before submitting

- Added Integration Test

## Who can review?

@hwchase17
@agola11

# Your PR Title (What it does)

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->

Co-authored-by: Jinto Jose <129657162+jj701@users.noreply.github.com>
2023-05-15 23:43:09 -07:00
Harrison Chase
c4c7936caa Harrison/wiki loader (#4765)
Co-authored-by: Guillermo Segovia <T1b4lt@users.noreply.github.com>
2023-05-15 23:42:57 -07:00
Filip Haltmayer
c632f7fc4e Add Milvus and Zilliz Retrievals (#4416)
Adds the basic retrievers for Milvus and Zilliz. Hybrid search support
will be added in the future.

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
2023-05-15 21:22:54 -07:00
Bradley James
2e43954bc3 fixed on_llm issue (#4717)
Fixes #4714
2023-05-16 01:36:21 +00:00
Zander Chase
bf0904b676 Add Server Command (#4695)
Add Support for `langchain server {start|stop}` commands, with support for using ngrok to tunnel to a remote notebook
2023-05-16 00:44:30 +00:00
Anirudh Suresh
03ac39368f Fixing DeepLake Overwrite Flag (#4683)
# Fix DeepLake Overwrite Flag Issue

Fixes Issue #4682: essentially, setting overwrite to False in the
DeepLake constructor still triggers an overwrite, because the logic is
just checking for the presence of "overwrite" in kwargs. The fix is
simple--just add some checks to inspect if "overwrite" in kwargs AND
kwargs["overwrite"]==True.

Added a new test in
tests/integration_tests/vectorstores/test_deeplake.py to reflect the
desired behavior.


Co-authored-by: Anirudh Suresh <ani@Anirudhs-MBP.cable.rcn.com>
Co-authored-by: Anirudh Suresh <ani@Anirudhs-MacBook-Pro.local>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-15 17:39:16 -07:00
d 3 n 7
8bb32d77d0 Update utils.py to make headless an optional argument (#4745)
Making headless an optional argument for
create_async_playwright_browser() and create_sync_playwright_browser()
By default no functionality is changed.

This allows for disabled people to use a web browser intelligently with
their voice, for example, while still seeing the content on the screen.
As well as many other use cases

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-15 17:29:06 -07:00
Mose Tronci
a9dbe90447 Exponential back-off support for Google PaLM api (#4001)
This PR adds exponential back-off to the Google PaLM api to gracefully
handle rate limiting errors.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-15 17:21:11 -07:00
Leonid Ganeline
a6f3ec94bc docs: added additional_resources folder (#4748)
# docs: added `additional_resources` folder

The additional resource files were inside the doc top-level folder,
which polluted the top-level folder.
- added the `additional_resources` folder and moved correspondent files
to this folder;
- fixed a broken link to the "Model comparison" page (model_laboratory
notebook)
- fixed a broken link to one of the YouTube videos (sorry, it is not
directly related to this PR)

## Who can review?

@dev2049
2023-05-15 17:12:47 -07:00
Zander Chase
a128d95aeb Fix Async Shared Resource Bug (#4751)
Use an async queue to distribute tracers rather than inappropriately
sharing a single one
2023-05-16 00:04:01 +00:00
whuwxl
3f0357f94a Add summarization task type for HuggingFace APIs (#4721)
# Add summarization task type for HuggingFace APIs

Add summarization task type for HuggingFace APIs.
This task type is described by [HuggingFace inference
API](https://huggingface.co/docs/api-inference/detailed_parameters#summarization-task)

My project utilizes LangChain to connect multiple LLMs, including
various HuggingFace models that support the summarization task.
Integrating this task type is highly convenient and beneficial.

Fixes #4720
2023-05-15 16:26:17 -07:00
Zander Chase
580861e7f2 Revert "Make serpapi base url configurable via env (#4402)" (#4750)
This reverts commit 5111bec540.

This PR introduced a bug in the async API (the `url` param isn't bound);
it also didn't update the synchronous API correctly, which makes it
error-prone (the behavior of the async and sync endpoints would be
different)
2023-05-15 16:17:16 -07:00
shiyu22
21b9397342 Update the milvus example (#4706)
# Fix issue when running example

- add the query content
- update the `user` parameter with Zilliz

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
2023-05-15 16:16:57 -07:00
hilarious-viking
7d15669b41 llama-cpp: add gpu layers parameter (#4739)
Adds gpu layers parameter to llama.cpp wrapper

Co-authored-by: andrew.khvalenski <andrew.khvalenski@behavox.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-15 16:01:48 -07:00
Davis Chase
36c9fd1af7 Dev2049/docs edit0 (#4699) 2023-05-15 15:20:37 -07:00
Jinto Jose
1e467d9fc4 Jupyter Notebook Example for using Mongodb to store Chat Message History (#4436)
# Jupyter Notebook Example for using Mongodb Chat Message History

@dev2049
2023-05-15 14:33:42 -07:00
Leonid Ganeline
6060505a9d Add new links to Tutorials and YouTube pages (#4746)
- added an official LangChain YouTube channel :)
- added new tutorials and videos (only videos with enough subscriber or
view numbers)
- added a "New video" icon 

## Who can review?

@dev2049
2023-05-15 14:32:48 -07:00
Eduard van Valkenburg
47657fe01a Tweaks to the PowerBI toolkit and utility (#4442)
Fixes some bugs I found while testing with more advanced datasets and
queries. Includes using the output of PowerBI to parse the error and
give that back to the LLM.
2023-05-15 14:30:48 -07:00
mvhensbergen
e363e709cb Add source field to metadata (#4462)
This is needed if one want to use index.query_with_sources on git files.
Without a source field, index.query_with_sources fails with an
exception.
2023-05-15 14:30:12 -07:00
vinoyang
5111bec540 Make serpapi base url configurable via env (#4402)
Fixes #4328

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-15 14:25:25 -07:00
Roma
cb802edf75 [Feature] Add GraphQL Query Tool (#4409)
# Add GraphQL Query Support

This PR introduces a GraphQL API Wrapper tool that allows LLM agents to
query GraphQL databases. The tool utilizes the httpx and gql Python
packages to interact with GraphQL APIs and provides a simple interface
for running queries with LLM agents.

@vowelparrot

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-15 14:06:12 -07:00
Eugene Yurtsev
49ce5ce1ca Only run linkcheck against docs dir on PR (#4741)
# Only run linkchecker on direct changes to docs

This is a stop-gap that will speed up PRs.

Some broken links can slip through if they're embedded in doc-strings
inside the codebase.

But we'll still be running the linkchecker on master.
2023-05-15 14:40:43 -04:00
Eugene Yurtsev
99cfe71cd0 Check poetry lock file (#4740)
# Check poetry lock file on CI

This PR checks that the lock file is up to date using poetry lock
--check.

As part of this PR, a new lock file was generated.
2023-05-15 14:38:01 -04:00
Eugene Yurtsev
09587a3201 Clean up tests for pdf parsers (#4595)
# Organize tests for pdf parsers

Clean up tests for pdf parsers, remove duplicate tests, convert to unit
tests.
2023-05-15 14:21:05 -04:00
Leonid Ganeline
70fd7cda14 docs: Concepts (#4734)
# glossary.md renamed as concepts.md and moved under the Getting Started

small PR.
`Concepts` looks right to the point. It is moved under Getting Started
(typical place). Previously it was lost in the Additional Resources
section.

## Who can review?

 @hwchase17
2023-05-15 11:09:25 -07:00
Harrison Chase
8de81d34a1 bump version to 170 (#4733) 2023-05-15 09:21:00 -07:00
Harrison Chase
dd95f0892d Harrison/add top k (#4707)
Co-authored-by: blc16 <benlc@umich.edu>
2023-05-15 09:09:22 -07:00
Harrison Chase
0551594722 add async default (#4701)
a spin on
https://github.com/hwchase17/langchain/pull/4300/files#diff-4f16071d58cd34fb3ec5cd5089e9dbd6fb06574c25c76b4d573827f8a2f48e96
2023-05-15 08:57:30 -07:00
Zander Chase
97434a64c5 Add Environment Info to Run (#4691)
Store the environment info within the `extra` fields of the Run
2023-05-15 15:38:49 +00:00
Eugene Yurtsev
d3300bd799 YouTube Loader: Replace regexp with built-in parsing (#4729) 2023-05-15 08:34:41 -07:00
Daniel Barker
c70ae562b4 Added support for streaming output response to HuggingFaceTextgenInference LLM class (#4633)
# Added support for streaming output response to
HuggingFaceTextgenInference LLM class

Current implementation does not support streaming output. Updated to
incorporate this feature. Tagging @agola11 for visibility.
2023-05-15 14:59:12 +00:00
d 3 n 7
435b70da47 Update click.py to pass errors back to Agent (#4723)
Instead of halting the entire program if this tool encounters an error,
it should pass the error back to the agent to decide what to do.

This may be best suited for @vowelparrot to review.
2023-05-15 14:54:08 +00:00
Eugene Yurtsev
3c490b5ba3 Docugami DataLoader (#4727)
### Adds a document loader for Docugami

Specifically:

1. Adds a data loader that talks to the [Docugami](http://docugami.com)
API to download processed documents as semantic XML
2. Parses the semantic XML into chunks, with additional metadata
capturing chunk semantics
3. Adds a detailed notebook showing how you can use additional metadata
returned by Docugami for techniques like the [self-querying
retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html)
4. Adds an integration test, and related documentation

Here is an example of a result that is not possible without the
capabilities added by Docugami (from the notebook):

<img width="1585" alt="image"
src="https://github.com/hwchase17/langchain/assets/749277/bb6c1ce3-13dc-4349-a53b-de16681fdd5b">

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>
2023-05-15 10:53:00 -04:00
KNiski
c2761aa8f4 Improve video_id extraction in YoutubeLoader (#4452)
# Improve video_id extraction in `YoutubeLoader`

`YoutubeLoader.from_youtube_url` can only deal with one specific url
format. I've introduced `YoutubeLoader.extract_video_id` which can
extract video id from common YT urls.

Fixes #4451 


@eyurtsev

---------

Co-authored-by: Kamil Niski <kamil.niski@gmail.com>
2023-05-15 10:45:19 -04:00
sqr
8b42e8a510 Update Makefile (typo) (#4725)
# Update minor typo in makefile
2023-05-15 10:34:44 -04:00
Lester Yang
cd3f9865f3 Feature: pdfplumber PDF loader with BaseBlobParser (#4552)
# Feature: pdfplumber PDF loader with BaseBlobParser

* Adds pdfplumber as a PDF loader
* Adds pdfplumber as a blob parser.
2023-05-15 09:47:02 -04:00
Harrison Chase
b6e3ac17c4 Harrison/sitemap local (#4704)
Co-authored-by: Lukas Bauer <lukas.bauer@mayflower.de>
2023-05-14 22:04:38 -07:00
Harrison Chase
12b4ee1fc7 Harrison/telegram chat loader (#4698)
Co-authored-by: Akinwande Komolafe <47945512+Sensei-akin@users.noreply.github.com>
Co-authored-by: Akinwande Komolafe <akhinoz@gmail.com>
2023-05-14 22:04:27 -07:00
Leonid Ganeline
2b181e5a6c docs: tutorials are moved on the top-level of docs (#4464)
# Added Tutorials section on the top-level of documentation

**Problem Statement**: the Tutorials section in the documentation is
top-priority. Not every project has resources to make tutorials. We have
such a privilege. Community experts created several tutorials on
YouTube.
But the tutorial links are now hidden on the YouTube page and not easily
discovered by first-time visitors.

**PR**: I've created the `Tutorials` page (from the `Additional
Resources/YouTube` page) and moved it to the top level of documentation
in the `Getting Started` section.

## Who can review?

        @dev2049
 
NOTE:
PR checks are randomly failing

3aefaafcdb

258819eadf

514d81b5b3
2023-05-14 21:22:25 -07:00
Li Yuanzheng
3b6206af49 Respect User-Specified User-Agent in WebBaseLoader (#4579)
# Respect User-Specified User-Agent in WebBaseLoader
This pull request modifies the `WebBaseLoader` class initializer from
the `langchain.document_loaders.web_base` module to preserve any
User-Agent specified by the user in the `header_template` parameter.
Previously, even if a User-Agent was specified in `header_template`, it
would always be overridden by a random User-Agent generated by the
`fake_useragent` library.

With this change, if a User-Agent is specified in `header_template`, it
will be used. Only in the case where no User-Agent is specified will a
random User-Agent be generated and used. This provides additional
flexibility when using the `WebBaseLoader` class, allowing users to
specify their own User-Agent if they have a specific need or preference,
while still providing a reasonable default for cases where no User-Agent
is specified.

This change has no impact on existing users who do not specify a
User-Agent, as the behavior in this case remains the same. However, for
users who do specify a User-Agent, their choice will now be respected
and used for all subsequent requests made using the `WebBaseLoader`
class.


Fixes #4167

## Before submitting

============================= test session starts
==============================
collecting ... collected 1 item


test_web_base.py::TestWebBaseLoader::test_respect_user_specified_user_agent

============================== 1 passed in 3.64s
===============================
PASSED [100%]

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested: @eyurtsev

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-05-14 23:09:27 -04:00
Ashish Talati
372a5113ff Update gallery.rst with chatpdf opensource (#4342) 2023-05-14 19:43:16 -07:00
Samuli Rauatmaa
66828ad231 add the existing OpenWeatherMap tool to the public api (#4292)
[OpenWeatherMapAPIWrapper](f70e18a5b3/docs/modules/agents/tools/examples/openweathermap.ipynb)
works wonderfully, but the _tool_ itself can't be used in master branch.

- added OpenWeatherMap **tool** to the public api, to be loadable with
`load_tools` by using "openweathermap-api" tool name (that name is used
in the existing
[docs](aff33d52c5/docs/modules/agents/tools/getting_started.md),
at the bottom of the page)
- updated OpenWeatherMap tool's **description** to make the input format
match what the API expects (e.g. `London,GB` instead of `'London,GB'`)
- added [ecosystem documentation page for
OpenWeatherMap](f9c41594fe/docs/ecosystem/openweathermap.md)
- added tool usage example to [OpenWeatherMap's
notebook](f9c41594fe/docs/modules/agents/tools/examples/openweathermap.ipynb)

Let me know if there's something I missed or something needs to be
updated! Or feel free to make edits yourself if that makes it easier for
you 🙂
2023-05-14 18:50:45 -07:00
Harrison Chase
6f47ab17a4 Harrison/param notion db (#4689)
Co-authored-by: Edward Park <ed.sh.park@gmail.com>
2023-05-14 18:26:25 -07:00
Harrison Chase
5d63fc65e1 add warning for combined memory (#4688) 2023-05-14 18:26:16 -07:00
Harrison Chase
a48810fb21 dont have openai_api_version by default (#4687)
an alternative to https://github.com/hwchase17/langchain/pull/4234/files
2023-05-14 18:26:08 -07:00
Harrison Chase
cdc20d1203 Harrison/json loader fix (#4686)
Co-authored-by: Triet Le <112841660+triet-lq-holistics@users.noreply.github.com>
2023-05-14 18:25:59 -07:00
Harrison Chase
ed8207b2fb Harrison/typing of return (#4685)
Co-authored-by: OlajideOgun <37077640+OlajideOgun@users.noreply.github.com>
2023-05-14 18:25:50 -07:00
Harrison Chase
c48f1301ee oops remove api key, dont worried i cycled it 2023-05-14 17:40:31 -07:00
Harrison Chase
57b2f3ffe6 add rebuff (#4637) 2023-05-14 17:38:43 -07:00
Zander Chase
d85b04be7f Add RELLM and JSONFormer experimental LLM decoding (#4185)
[RELLM](https://github.com/r2d4/rellm) is a library that wraps local
HuggingFace pipeline models for structured decoding.

RELLM works by generating tokens one at a time. At each step, it masks
tokens that don't conform to the provided partial regular expression.

[JSONFormer](https://github.com/1rgs/jsonformer) is a bit different, where it sequentially adds the keys then decodes each value directly
2023-05-14 22:40:03 +00:00
Harrison Chase
54f5523197 bump version to 169 (#4675) 2023-05-14 14:18:29 -07:00
Harrison Chase
243886be93 Harrison/virtual time (#4658)
Co-authored-by: ifsheldon <39153080+ifsheldon@users.noreply.github.com>
Co-authored-by: maple.liang <maple.liang@gempoll.com>
2023-05-14 10:29:17 -07:00
Harrison Chase
f2f2aced6d allow partials in from_template (#4638) 2023-05-13 21:47:20 -07:00
Harrison Chase
fbfa49f2c1 agent serialization (#4642) 2023-05-13 21:47:10 -07:00
Harrison Chase
ef49c659f6 add embedding router (#4644) 2023-05-13 21:47:01 -07:00
Harrison Chase
5020094e3b Harrison/azure content filter (#4645)
Co-authored-by: Rob Kopel <R0bk@users.noreply.github.com>
2023-05-13 21:46:51 -07:00
Harrison Chase
f5e2f70115 Harrison/json new line (#4646)
Co-authored-by: David Chen <davidchen@gliacloud.com>
2023-05-13 21:46:33 -07:00
Harrison Chase
87d8d221fb Harrison/headers for openai (#4648)
Co-authored-by: aakash.shah <aakash.shah@quintiles.com>
2023-05-13 21:46:20 -07:00
Harrison Chase
c09bb00959 Harrison/summary memory history (#4649)
Co-authored-by: engkheng <60956360+outday29@users.noreply.github.com>
2023-05-13 21:46:11 -07:00
Harrison Chase
44ae673388 Harrison/multithreading directory loader (#4650)
Co-authored-by: PawelFaron <42373772+PawelFaron@users.noreply.github.com>
Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
2023-05-13 21:46:02 -07:00
Harrison Chase
b0c733e327 list of messages (#4651) 2023-05-13 21:45:53 -07:00
Harrison Chase
873b0c7eb6 Harrison/structured chat mem (#4652)
Co-authored-by: d 3 n 7 <29033313+d3n7@users.noreply.github.com>
2023-05-13 21:45:42 -07:00
Harrison Chase
9ba3a798c4 Harrison/from keys redis (#4653)
Co-authored-by: Christoph Kahl <christoph@zauberware.com>
2023-05-13 21:45:24 -07:00
Harrison Chase
e781ff9256 Harrison/chatopenaibase path (#4656)
Co-authored-by: Dave <dave@gray101.com>
2023-05-13 21:45:14 -07:00
Harrison Chase
279605b4d3 Harrison/metaphor search (#4657)
Co-authored-by: Jeffrey Wang <jeffreyzhiyuanwang@gmail.com>
2023-05-13 21:45:05 -07:00
Harrison Chase
9aa9fe7021 Harrison/spark connect example (#4659)
Co-authored-by: Mike Wang <62768671+skcoirz@users.noreply.github.com>
2023-05-13 21:44:54 -07:00
Prerit Das
2747ccbcf1 Allow custom base Zapier prompt (#4213)
Currently, all Zapier tools are built using the pre-written base Zapier
prompt. These small changes (that retain default behavior) will allow a
user to create a Zapier tool using the ZapierNLARunTool while providing
their own base prompt.

Their prompt must contain input fields for zapier_description and
params, checked and enforced in the tool's root validator.

An example of when this may be useful: user has several, say 10, Zapier
tools enabled. Currently, the long generic default Zapier base prompt is
attached to every single tool, using an extreme number of tokens for no
real added benefit (repeated). User prompts LLM on how to use Zapier
tools once, then overrides the base prompt.

Or: user has a few specific Zapier tools and wants to maximize their
success rate. So, user writes prompts/descriptions for those tools
specific to their use case, and provides those to the ZapierNLARunTool.

A consideration - this is the simplest way to implement this I could
think of... though ideally custom prompting would be possible at the
Toolkit level as well. For now, this should be sufficient in solving the
concerns outlined above.
2023-05-13 21:08:18 -07:00
Paresh Mathur
e2bc836571 Fix #4087 by setting the correct csv dialect (#4103)
The error in #4087 was happening because of the use of csv.Dialect.*
which is just an empty base class. we need to make a choice on what is
our base dialect. I usually use excel so I put it as excel, if
maintainers have other preferences do let me know.

Open Questions:
1. What should be the default dialect?
2. Should we rework all tests to mock the open function rather than the
csv.DictReader?
3. Should we make a separate input for `dialect` like we have for
`encoding`?

---------

Co-authored-by: = <=>
2023-05-13 20:35:01 -07:00
Leonid Ganeline
3ce78ef6c4 docs: document_loaders classification (#4069)
**Problem statement:** the
[document_loaders](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html#)
section is too long and hard to comprehend.
**Proposal:** group document_loaders by 3 classes: (see `Files changed`
tab)

UPDATE: I've completely reworked the document_loader classification.
Now this PR changes only one file! 

FYI @eyurtsev @hwchase17
2023-05-13 19:17:32 -07:00
Zander Chase
928cdd57a4 [Breaking] Refactor Base Tracer(#4549)
### Refactor the BaseTracer
- Remove the 'session' abstraction from the BaseTracer
- Rename 'RunV2' object(s) to be called 'Run' objects (Rename previous
Run objects to be RunV1 objects)
- Ditto for sessions: TracerSession*V2 -> TracerSession*
- Remove now deprecated conversion from v1 run objects to v2 run objects
in LangChainTracerV2
- Add conversion from v2 run objects to v1 run objects in V1 tracer
2023-05-13 17:23:56 +00:00
Harrison Chase
1e322ffc1c change heading 2023-05-13 09:52:23 -07:00
Harrison Chase
86c1f090fd bump version to 168 (#4632) 2023-05-13 09:50:22 -07:00
Davis Chase
9ab7101182 WIP: FLARE-inspired chain (#4612)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-13 09:28:28 -07:00
Harrison Chase
daa3e6dedb Harrison/prompt constructor methods (#4616) 2023-05-13 09:23:51 -07:00
Harrison Chase
6265cbfb11 Harrison/standard llm interface (#4615) 2023-05-13 09:05:31 -07:00
Harrison Chase
485ecc3580 option for csv agent to not include df in prompt (#4610) 2023-05-12 21:55:22 -07:00
Harrison Chase
7d425cbf38 improve sql prompt (#4611)
Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>
2023-05-12 21:55:03 -07:00
Hans van Dam
01531cb16d remove quotes from sql database prompts (caused syntax error) (#4101)
fixes a syntax error mentioned in
#2027 and #3305
another PR to remedy is in #3385, but I believe that is not tacking the
core problem.
Also #2027 mentions a solution that works:
add to the prompt:
'The SQL query should be outputted plainly, do not surround it in quotes
or anything else.'

To me it seems strange to first ask for:

SQLQuery: "SQL Query to run"

and then to tell the LLM not to put the quotes around it. Other
templates (than the sql one) do not use quotes in their steps.
This PR changes that to:

SQLQuery: SQL Query to run
2023-05-12 20:03:37 -07:00
Zander Chase
0c6ed657ef Convert Chain to a Chain Factory (#4605)
## Change Chain argument in client to accept a chain factory

The `run_over_dataset` functionality seeks to treat each iteration of an
example as an independent trial.
Chains have memory, so it's easier to permit this type of behavior if we
accept a factory method rather than the chain object directly.

There's still corner cases / UX pains people will likely run into, like:
- Caching may cause issues
- if memory is persisted to a shared object (e.g., same redis queue) ,
this could impact what is retrieved
- If we're running the async methods with concurrency using local
models, if someone naively instantiates the chain and loads each time,
it could lead to tons of disk I/O or OOM
2023-05-13 02:13:21 +00:00
Tim Asp
ed0d557ede docs: fix pdf docs hierarchy and formatting (#4593)
# Fix pdf loader docs page


![image](https://github.com/hwchase17/langchain/assets/707699/4a11f379-00ed-4f7a-9870-71f74e0cadc6)

Using h1's messes with hierarchy, this fixes that, and moves the
PyPDFium2 loader out of the middle of PDFMiner docs
2023-05-12 15:03:01 -04:00
Davis Chase
36f9e9a0ba Skip flaky unit test (#4591) 2023-05-12 11:54:40 -07:00
Eugene Yurtsev
08ed927c32 Turn on extended tests (#4588)
# Turn on strict extended tests

This PR turns on strict testing for extended tests.
2023-05-12 14:50:08 -04:00
Zander Chase
d96f6a106b Add Steamship Image Generation Tool (#4580)
Co-authored-by: Enias Cailliau <enias@steamship.com>
2023-05-12 10:35:01 -07:00
Davis Chase
739c297c94 Release 167 (#4589) 2023-05-12 10:24:59 -07:00
Davis Chase
a4a9d1f403 Improve vespa interface (#4546)
![Screenshot 2023-05-11 at 7 50 31
PM](https://github.com/hwchase17/langchain/assets/130488702/bc8ab4bb-8006-44fc-ba07-df54e84ee2c1)
2023-05-12 10:11:26 -07:00
vinoyang
72f18fd08b Provide get current date function dialect for other DBs (#4576)
# Provide get current date function dialect for other DBs

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@eyurtsev

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
2023-05-12 13:04:28 -04:00
Neil Ruaro
3a2855945b added documentation on retrieving a PG vectorstore (#4578)
This PR adds in documentation on querying an existing vectorstore in PG 

Fixes 3191 (issue)
2023-05-12 13:04:06 -04:00
Andrea Pinto
1e5d25b93c Improve error messages formatting in doc loaders (#4586)
# Cosmetic in errors formatting

Added appropriate spacing to the `ImportError` message in a bunch of
document loaders to enhance trace readability (including Google Drive,
Youtube, Confluence and others). This change ensures that the error
messages are not displayed as a single line block, and that the `pip
install xyz` commands can be copied to clipboard from terminal easily.

## Who can review?

@eyurtsev
2023-05-12 13:03:39 -04:00
kYLe
570d057db4 Expose AnyScale LLM in langchain.llms (#4585)
# Expose AnyScale LLM in  langchain.llms

Fixes # update init.py so we can from langchain.llms import Anyscale
2023-05-12 12:48:38 -04:00
Eugene Yurtsev
a5371a0fa2 Add pytest --only-extended and --only-core options (#4494)
# Adds testing options to pytest

This PR adds the following options: 

* `--only-core` will skip all extended tests, running all core tests.
* `--only-extended` will skip all core tests. Forcing alll extended
tests to be run.

Running `py.test` without specifying either option will remain
unaffected. Run
all tests that can be run within the unit_tests direction. Extended
tests will
run if required packages are installed.

## Before submitting

## Who can review?
2023-05-12 11:35:22 -04:00
Harrison Chase
5ad151ed44 Add constitutional principles from paper (#4554)
Add constitutional principles from https://arxiv.org/pdf/2212.08073.pdf

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-12 07:34:03 -07:00
Sai Vinay G
cf4c1394a2 feat: Added class to support huggingface text generation inference server (#4447)
[Text Generation
Inference](https://github.com/huggingface/text-generation-inference) is
a Rust, Python and gRPC server for generating text using LLMs.

This pull request add support for self hosted Text Generation Inference
servers.

feature: #4280

---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-12 07:32:37 -07:00
Zander Chase
258c319855 Dereference Messages (#4557)
Update how we parse the messages now that the server splits prompts /
messages up
2023-05-12 00:12:43 -07:00
Leonid Ganeline
e17d0319d5 Add arxiv retriever (#4538) 2023-05-11 22:48:38 -07:00
vinoyang
25cd6e060a Enhance the prompt to make the LLM generate right date for real today (#4505)
# Enhance the prompt to make the LLM generate right date for real today

Fixes # (issue)

Currently, if the user's question contains `today`, the clickhouse
always points to an old date. This may be related to the fact that the
GPT training data is relatively old.
2023-05-11 22:11:14 -04:00
vinoyang
e942db3e78 Add prestodb prompt (#4516)
Add a PrestoDB prompt
2023-05-11 22:09:48 -04:00
SimFG
7bcf238a1a Optimize the initialization method of GPTCache (#4522)
Optimize the initialization method of GPTCache, so that users can use GPTCache more quickly.
2023-05-11 16:15:23 -07:00
Zander Chase
f4d3cf2dfb Add Invocation Params (#4509)
### Add Invocation Params to Logged Run


Adds an llm type to each chat model as well as an override of the dict()
method to log the invocation parameters for each call

---------

Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
2023-05-11 15:34:06 -07:00
Ankush Gola
59853fc876 add invocation params as extra params in llm callbacks (#4506)
# Your PR Title (What it does)

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoader Abstractions
        - @eyurtsev

        LLM/Chat Wrappers
        - @hwchase17
        - @agola11

        Tools / Toolkits
        - @vowelparrot
 -->
2023-05-11 15:33:52 -07:00
Ofey Chan
1c0ec26e40 [pyproject.toml] add tiktoken when install langchain[openai] (#4514)
# Add `tiktoken` as dependency when installed as `langchain[openai]`

Fixes #4513 (issue)

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@vowelparrot 

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
2023-05-11 12:21:06 -07:00
Zander Chase
4ee47926ca Add on_chat_message_start (#4499)
### Add on_chat_message_start to callback manager and base tracer

Goal: trace messages directly to permit reloading as chat messages
(store in an integration-agnostic way)

Add an `on_chat_message_start` method. Fall back to `on_llm_start()` for
handlers that don't have it implemented.

Does so in a non-backwards-compat breaking way (for now)
2023-05-11 11:06:39 -07:00
Yu Le
bbf76dbb52 fix typos in the prompts of LLMSummarizationCheckerChain (#4518) 2023-05-11 10:32:34 -07:00
Jonas Nelle
97e7dc1502 Make BaseStringMessagePromptTemplate.from_template return type generic (#4523)
# Make BaseStringMessagePromptTemplate.from_template return type generic

I use mypy to check type on my code that uses langchain. Currently after
I load a prompt and convert it to a system prompt I have to explicitly
cast it which is quite ugly (and not necessary):
```
prompt_template = load_prompt("prompt.yaml")
system_prompt_template = cast(
    SystemMessagePromptTemplate,
    SystemMessagePromptTemplate.from_template(prompt_template.template),
)
```

With this PR, the code would simply be: 
```
prompt_template = load_prompt("prompt.yaml")
system_prompt_template = SystemMessagePromptTemplate.from_template(prompt_template.template)
```

Given how much langchain uses inheritance, I think this type hinting
could be applied in a bunch more places, e.g. load_prompt also return a
`FewShotPromptTemplate` or a `PromptTemplate` but without typing the
type checkers aren't able to infer that. Let me know if you agree and I
can take a look at implementing that as well.

        @hwchase17 - project lead

        DataLoaders
        - @eyurtsev
2023-05-11 10:24:50 -07:00
kYLe
446b60d803 Fix a typo in langchain/docs/modules/models/llms/integrations/anyscale.ipynb (#4526) 2023-05-11 09:03:04 -07:00
Davis Chase
0f93de0a59 Release 0.0.166 (#4510) 2023-05-11 08:53:48 -07:00
Sunish Sheth
812e5f43f5 Add _type for all parsers (#4189)
Used for serialization. Also add test that recurses through
our subclasses to check they have them implemented

Would fix https://github.com/hwchase17/langchain/issues/3217
Blocking: https://github.com/mlflow/mlflow/pull/8297

---------

Signed-off-by: Sunish Sheth <sunishsheth2009@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-11 01:27:58 -07:00
Akshaya Annavajhala
b21d7c138c Callback Handler for MLflow (#4150)
Rebased Mahmedk's PR with the callback refactor and added the example
requested by hwchase plus a couple minor fixes

---------

Co-authored-by: Ahmed K <77802633+mahmedk@users.noreply.github.com>
Co-authored-by: Ahmed K <mda3k27@gmail.com>
Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-11 01:10:40 -07:00
kYLe
0d51a1f12b Add LLMs support for Anyscale Service (#4350)
Add Anyscale service integration under LLM

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-11 00:39:59 -07:00
Kristóf Dombi
99b2400048 [Docs]: Add Kinsta to the list of deployment providers (#4445)
We're fans of the LangChain framework thus we wanted to make sure we
provide an easy way for our customers to be able to utilize this
framework for their LLM-powered applications at our platform.
2023-05-11 00:29:48 -07:00
Evan Jones
f668251948 parameterized distance metrics; lint; format; tests (#4375)
# Parameterize Redis vectorstore index

Redis vectorstore allows for three different distance metrics: `L2`
(flat L2), `COSINE`, and `IP` (inner product). Currently, the
`Redis._create_index` method hard codes the distance metric to COSINE.

I've parameterized this as an argument in the `Redis.from_texts` method
-- pretty simple.

Fixes #4368 

## Before submitting

I've added an integration test showing indexes can be instantiated with
all three values in the `REDIS_DISTANCE_METRICS` literal. An example
notebook seemed overkill here. Normal API documentation would be more
appropriate, but no standards are in place for that yet.

## Who can review?

Not sure who's responsible for the vectorstore module... Maybe @eyurtsev
/ @hwchase17 / @agola11 ?
2023-05-11 00:20:01 -07:00
Nick Omeyer
f46710d408 Fix minor issues in self-query retriever prompt formatting (#4450)
# Fix minor issues in self-query retriever prompt formatting

I noticed a few minor issues with the self-query retriever's prompt
while using it, so here's PR to fix them 😇

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoader Abstractions
        - @eyurtsev

        LLM/Chat Wrappers
        - @hwchase17
        - @agola11

        Tools / Toolkits
        - @vowelparrot
 -->
2023-05-11 00:10:41 -07:00
Zander Chase
d969f43ed8 Load HuggingFace Tool (#4475)
# Add option to `load_huggingface_tool`

Expose a method to load a huggingface Tool from the HF hub

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-11 00:07:36 -07:00
Davis Chase
cd01de49cf Update contribution guidelines (#4431)
provide more guidance on pr's
2023-05-11 00:05:25 -07:00
Eugene Yurtsev
146616aa5d Test workflow, fix minor typos (#4495)
# Fix 2 minor typos in test workflow.

This PR does not result in any functional changes.
2023-05-10 22:36:50 -04:00
Eugene Yurtsev
f373883c1a Refactor test workflow (#4457)
# Refactor the test workflow

This PR refactors the tests to run using a single test workflow. This
makes it easier to relaunch failing tests and see in the UI which test
failed since the jobs are grouped together.

## Before submitting

## Who can review?
2023-05-10 21:57:39 -04:00
Davis Chase
b77e103ca6 Add aleph alpha api key attribute (#4489)
@tugot17 applied your change to master
2023-05-10 17:29:57 -07:00
Harrison Chase
3ce29cb4a6 Harrison/new search (#4359)
Co-authored-by: Jiaping(JP) Zhang <vincentzhangv@gmail.com>
2023-05-10 17:09:16 -07:00
Jakob Heyder
545ae8b756 Fix: Add run_manager on all AgentFinish returns in AgentExecutor (#4466) 2023-05-10 16:25:23 -07:00
Ankush Gola
ae8d6d5a89 Add docs for tracing environment variable (#4477) 2023-05-10 16:07:02 -07:00
Davis Chase
9ec60ad832 Add azure cognitive search retriever (#4467)
All credit to @UmerHA, made a couple small changes

---------

Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
2023-05-10 15:27:27 -07:00
Davis Chase
46b100ea63 Add DocArray vector stores (#4483)
Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made
few small changes to get it across the finish line

---------

Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>
2023-05-10 15:22:16 -07:00
Davis Chase
f2a536b445 release 165 (#4486)
bump version
2023-05-10 15:20:43 -07:00
Harrison Chase
b2f920e891 add tracing v2 env var (#4465)
Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
2023-05-10 11:08:29 -07:00
Zander Chase
9231143f91 Fix Duplicate trust_remote_code in pipeline (#4369)
### Fix issue with duplicate specification of `trust_remote_code` in
HuggingFacePipeline

Fixes # 4351
2023-05-10 10:21:54 -07:00
Davis Chase
6fbdb9ce51 Release 0.0.164 (#4454) 2023-05-10 08:44:14 -07:00
Davis Chase
04475bea7d Mv plan and execute to experimental (#4459) 2023-05-10 08:31:53 -07:00
netseye
1ad180f6de Add request timeout to openai embedding (#4144)
Add request_timeout field to openai embedding. Defaults to None

---------

Co-authored-by: Jeakin <Jeakin@botu.cc>
2023-05-10 08:11:32 -07:00
zvrr
274dc4bc53 add clickhouse prompt (#4456)
# Add clickhouse prompt

Add clickhouse database sql prompt
2023-05-10 10:22:42 -04:00
Paresh Mathur
05e749d9fe make running specific unit tests easier (#4336)
I find it's easier to do TDD if i can run specific unit tests. I know
watch is there but some people prefer running their tests manually.
2023-05-10 09:39:22 -04:00
Eugene Yurtsev
80558b5b27 Add workflow for testing with all deps (#4410)
# Add action to test with all dependencies installed

PR adds a custom action for setting up poetry that allows specifying a
cache key:
https://github.com/actions/setup-python/issues/505#issuecomment-1273013236

This makes it possible to run 2 types of unit tests: 

(1) unit tests with only core dependencies
(2) unit tests with extended dependencies (e.g., those that rely on an
optional pdf parsing library)


As part of this PR, we're moving some pdf parsing tests into the
unit-tests section and making sure that these unit tests get executed
when running with extended dependencies.
2023-05-10 09:35:07 -04:00
Matt Robinson
3637d6da6e feat: add loader for open office odt files (#4405)
# ODF File Loader

Adds a data loader for handling Open Office ODT files. Requires
`unstructured>=0.6.3`.

### Testing

The following should work using the `fake.odt` example doc from the
[`unstructured` repo](https://github.com/Unstructured-IO/unstructured).

```python
from langchain.document_loaders import UnstructuredODTLoader

loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements")
loader.load()

loader = UnstructuredODTLoader(file_path="fake.odt", mode="single")
loader.load()
```
2023-05-10 01:37:17 -07:00
Zander Chase
65f85af242 Improve math chain error msg (#4415) 2023-05-10 01:08:01 -07:00
Davis Chase
f6c97e6af4 Fix Lark import error (#4421)
Any import that touches langchain.retrievers currently requires Lark.
Here's one attempt to fix. Not very pretty, very open to other ideas.
Alternatives I thought of are 1) make Lark requirement, 2) put
everything in parser.py in the try/except. Neither sounds much better

Related to #4316, #4275
2023-05-10 01:07:34 -07:00
Harrison Chase
f0cfed636f change nb name 2023-05-09 21:22:35 -07:00
Harrison Chase
6b8d144ccc Harrison/plan and solve (#4422) 2023-05-09 21:07:56 -07:00
StephaneBereux
d383c0cb43 fixed the filtering error in chromadb (#1621)
Fixed two small bugs (as reported in issue #1619 ) in the filtering by
metadata for `chroma` databases :
- ```langchain.vectorstores.chroma.similarity_search``` takes a
```filter``` input parameter but do not forward it to
```langchain.vectorstores.chroma.similarity_search_with_score```
- ```langchain.vectorstores.chroma.similarity_search_by_vector```
doesn't take this parameter in input, although it could be very useful,
without any additional complexity - and it would thus be coherent with
the syntax of the two other functions.

Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
2023-05-09 16:43:00 -07:00
jrhe
28091c2101 Use passed LLM for default chain in MultiPromptChain (#4418)
Currently, MultiPromptChain instantiates a ChatOpenAI LLM instance for
the default chain to use if none of the prompts passed match. This seems
like an error as it means that you can't use your choice of LLM, or
configure how to instantiate the default LLM (e.g. passing in an API key
that isn't in the usual env variable).
2023-05-09 16:15:25 -07:00
Davis Chase
5c8e12558d Dev2049/pinecone try except (#4424)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bernie G <bernie.gandin2@gmail.com>
2023-05-09 16:03:19 -07:00
Rukmani
2b14036126 Update WhatsAppChatLoader to include the character ~ in the sender name (#4420)
Fixes #4153

If the sender of a message in a group chat isn't in your contact list,
they will appear with a ~ prefix in the exported chat. This PR adds
support for parsing such lines.
2023-05-09 15:00:04 -07:00
Zander Chase
f2150285a4 Fix nested runs example ID (#4413)
#### Only reference example ID on the parent run

Previously, I was assigning the example ID to every child run. 
Adds a test.
2023-05-09 12:21:53 -07:00
Davis Chase
e4ca511ec8 Delete comment (#4412) 2023-05-09 10:38:44 -07:00
mbchang
9fafe7b2b9 fix: remove unnecessary line of code (#4408)
Removes unnecessary line of code in
https://python.langchain.com/en/latest/use_cases/agent_simulations/two_agent_debate_tools.html
2023-05-09 10:35:09 -07:00
Aivin V. Solatorio
6335cb5b3a Add support for Qdrant nested filter (#4354)
# Add support for Qdrant nested filter

This extends the filter functionality for the Qdrant vectorstore. The
current filter implementation is limited to a single-level metadata
structure; however, Qdrant supports nested metadata filtering. This
extends the functionality for users to maximize the filter functionality
when using Qdrant as the vectorstore.

Reference: https://qdrant.tech/documentation/filtering/#nested-key

---------

Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
2023-05-09 10:34:11 -07:00
Martin Holzhauer
872605a5c5 Add an option to extract more metadata from crawled websites (#4347)
This pr makes it possible to extract more metadata from websites for
later use.

my usecase:
parsing ld+json or microdata from sites and store it as structured data
in the metadata field
2023-05-09 10:18:33 -07:00
Leonid Ganeline
ce15ffae6a added Wikipedia retriever (#4302)
- added `Wikipedia` retriever. It is effectively a wrapper for
`WikipediaAPIWrapper`. It wrapps load() into get_relevant_documents()
- sorted `__all__` in the `retrievers/__init__`
- added integration tests for the WikipediaRetriever
- added an example (as Jupyter notebook) for the WikipediaRetriever
2023-05-09 10:08:39 -07:00
Davis Chase
ea83eed9ba Bump to version 0.0.163 (#4382) 2023-05-09 07:51:51 -07:00
Prayson Wilfred Daniel
2b4ba203f7 query correction from when to what (#4383)
# Minor Wording Documentation Change 

```python
agent_chain.run("When's my friend Eric's surname?")
# Answer with 'Zhu'
```

is change to 

```python
agent_chain.run("What's my friend Eric's surname?")
# Answer with 'Zhu'
```

I think when is a residual of the old query that was "When’s my friends
Eric`s birthday?".
2023-05-09 07:42:47 -07:00
Eugene Yurtsev
2ceb807da2 Add PDF parser implementations (#4356)
# Add PDF parser implementations

This PR separates the data loading from the parsing for a number of
existing PDF loaders.

Parser tests have been designed to help encourage developers to create a
consistent interface for parsing PDFs.

This interface can be made more consistent in the future by adding
information into the initializer on desired behavior with respect to splitting by
page etc.

This code is expected to be backwards compatible -- with the exception
of a bug fix with pymupdf parser which was returning `bytes` in the page
content rather than strings.

Also changing the lazy parser method of document loader to return an
Iterator rather than Iterable over documents.

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoader Abstractions
        - @eyurtsev

        LLM/Chat Wrappers
        - @hwchase17
        - @agola11

        Tools / Toolkits
        - @vowelparrot
 -->
2023-05-09 10:24:17 -04:00
Eugene Yurtsev
ae0c3382dd Add MimeType based parser (#4376)
# Add MimeType Based Parser

This PR adds a MimeType Based Parser. The parser inspects the mime-type
of the blob it is parsing and based on the mime-type can delegate to the sub
parser.

## Before submitting

Waiting on adding notebooks until more implementations are landed. 

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:


@hwchase17
@vowelparrot
2023-05-09 10:22:56 -04:00
Leonid Ganeline
c485e7ab59 added GitHub star number (#4214)
added GitHub star number with a link to the `GitHub star history chart`
This is an interesting chart https://star-history.com/#hwchase17/langchain :)
2023-05-09 09:39:53 -04:00
Heath
0d568daacb Update writer integration (#4363)
# Update Writer LLM integration

Changes the parameters and base URL to be in line with Writer's current
API.
Based on the documentation on this page:
https://dev.writer.com/reference/completions-1
2023-05-08 21:59:46 -07:00
BioErrorLog
04f765b838 Fix grammar in Text Splitters docs (#4373)
# Fix grammar in Text Splitters docs

Just a small fix of grammar in the documentation:

"That means there two different axes" -> "That means there are two
different axes"
2023-05-08 22:38:40 -04:00
Zander Chase
c73cec5ac1 Add Example Notebook for LCP Client (#4207)
Add a notebook in the `experimental/` directory detailing:
- How to capture traces with the v2 endpoint
- How to create datasets
- How to run traces over the dataset
2023-05-08 18:33:19 -07:00
mbchang
f1401a6dff new example: two agent debate with tools (#4024) 2023-05-08 17:10:44 -07:00
玄猫
deffc65693 fix: vectorstore pgvector ensure compatibility #3884 (#4248)
Ensure compatibility with both SQLAlchemy v1/v2 

fix the issue when using SQLAlchemy v1 (reported at #3884)

`
langchain/vectorstores/pgvector.py", line 168, in
create_tables_if_not_exists
    self._conn.commit()
AttributeError: 'Connection' object has no attribute 'commit'
`

Ref Doc :
https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#migration-20-autocommit
2023-05-08 16:43:50 -07:00
Davis Chase
ba0057c077 Check OpenAI model kwargs (#4366)
Handle duplicate and incorrectly specified OpenAI params

Thanks @PawelFaron for the fix! Made small update

Closes #4331

---------

Co-authored-by: PawelFaron <42373772+PawelFaron@users.noreply.github.com>
Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
2023-05-08 16:37:34 -07:00
Davis Chase
02ebb15c4a Fix TextSplitter.from_tiktoken(#4361)
Thanks to @danb27 for the fix! Minor update

Fixes https://github.com/hwchase17/langchain/issues/4357

---------

Co-authored-by: Dan Bianchini <42096328+danb27@users.noreply.github.com>
2023-05-08 16:36:38 -07:00
Naveen Tatikonda
782df1db10 OpenSearch: Add Similarity Search with Score (#4089)
### Description
Add `similarity_search_with_score` method for OpenSearch to return
scores along with documents in the search results

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-05-08 16:35:21 -07:00
Ankush Gola
b3ecce0545 fix json saving, update docs to reference anthropic chat model (#4364)
Fixes # (issue)
https://github.com/hwchase17/langchain/issues/4085
2023-05-08 15:30:52 -07:00
ImmortalZ
b04d84f6b3 fix: solve the infinite loop caused by 'add_memory' function when run… (#4318)
fix: solve the infinite loop caused by 'add_memory' function when run
'pause_to_reflect' function

run steps:
'add_memory' -> 'pause_to_reflect' -> 'add_memory':  infinite loop
2023-05-08 15:13:23 -07:00
Eugene Yurtsev
aa11f7c89b Add progress bar to filesystemblob loader, update pytest config for unit tests (#4212)
This PR adds:

* Option to show a tqdm progress bar when using the file system blob loader
* Update pytest run configuration to be stricter
* Adding a new marker that checks that required pkgs exist
2023-05-08 16:15:09 -04:00
Eduard van Valkenburg
f4c8502e61 fix for cosmos not loading old messages (#4094)
I noticed cosmos was not loading old messages properly, fixed now.
2023-05-08 12:48:15 -07:00
Simba Khadder
d84df25466 Add example on how to use Featureform with langchain (#4337)
Added an example on how to use Featureform to
connecting_to_a_feature_store.ipynb .
2023-05-08 10:32:17 -07:00
Harrison Chase
42df78d396 bump ver 162 (#4346) 2023-05-08 09:28:41 -07:00
Zander Chase
8b284f9ad0 Pass parsed inputs through to tool _run (#4309) 2023-05-08 09:13:05 -07:00
Zander Chase
35c9e6ab40 Pass Callbacks through load_tools (#4298)
- Update the load_tools method to properly accept `callbacks` arguments.
- Add a deprecation warning when `callback_manager` is passed
- Add two unit tests to check the deprecation warning is raised and to
confirm the callback is passed through.

Closes issue #4096
2023-05-08 08:44:26 -07:00
Zander Chase
0870a45a69 Add Pull Request Template (#4247) 2023-05-08 08:34:37 -07:00
Jinto Jose
8a338412fa mongodb support for chat history (#4266) 2023-05-08 08:34:05 -07:00
Harrison Chase
f510940bde add check for lower bound of lark (#4287) 2023-05-08 08:31:05 -07:00
Harrison Chase
c8b0b6e6c1 add youtube tools (#4320) 2023-05-08 08:29:30 -07:00
PawelFaron
1d1166ded6 Fixed huggingfacehub_api_token hadning in HuggingFaceEndpoint (#4335)
Reported here:
https://github.com/hwchase17/langchain/issues/4334

---------

Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
2023-05-08 08:29:17 -07:00
Arjun Aravindan
637c61cffb Add support for passing binary_location to the SeleniumURLLoader when creating Chrome or Firefox web drivers (#4305)
This commit adds support for passing binary_location to the SeleniumURLLoader when creating Chrome or Firefox web drivers.

This allows users to specify the Browser binary location which is required when deploying to services such as Heroku

This change also includes updated documentation and type hints to reflect the new binary_location parameter and its usage.

fixes #4304
2023-05-08 11:05:55 -04:00
Lior Neudorfer
65c95f9fb2 Better error when running chain without any args (#4294)
Today, when running a chain without any arguments, the raised ValueError
incorrectly specifies that user provided "both positional arguments and
keyword arguments".

This PR adds a more accurate error in that case.
2023-05-07 21:11:51 -07:00
Harrison Chase
edcd171535 bring back ref (#4308) 2023-05-07 17:32:28 -07:00
Wuxian Zhang
6f386628c2 Permit unicode outputs when dumping json in GetElementsTool (#4276)
Adds ensure_ascii=False when dumping json in the GetElementsTool
Fixes issue https://github.com/hwchase17/langchain/issues/4265
2023-05-07 14:43:03 -07:00
Eugene Brodsky
a1001b29eb Incorrect docstring for PythonCodeTextSplitter (#4296)
Fixes a copy-paste error in the doctring
2023-05-07 14:04:54 -07:00
Ikko Eltociear Ashimine
f70e18a5b3 Fix typo in huggingface.py (#4277)
enviroment -> environment
2023-05-07 11:37:06 -04:00
Eugene Yurtsev
0c646bb703 Minor clean up in BlobParser (#4210)
Minor clean up to use `abstractmethod` and `ABC` instead of `abc.abstractmethod` and `abc.ABC`.
2023-05-07 11:32:53 -04:00
PawelFaron
04b74d0446 Adjusted GPT4All llm to streaming API and added support for GPT4All_J (#4131)
Fix for these issues:
https://github.com/hwchase17/langchain/issues/4126

https://github.com/hwchase17/langchain/issues/3839#issuecomment-1534258559

---------

Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
2023-05-06 15:14:09 -07:00
Harrison Chase
075d9631f5 bump ver to 161 (#4239) 2023-05-06 10:20:36 -07:00
Harrison Chase
64940e9d0f docs for azure (#4238) 2023-05-06 10:16:00 -07:00
Myeongseop Kim
747b5f87c2 Add HumanInputLLM (#4160)
Related: #4028, I opened a new PR because (1) I was unable to unstage
mistakenly committed files (I'm not familiar with git enough to resolve
this issue), (2) I felt closing the original PR and opening a new PR
would be more appropriate if I changed the class name.

This PR creates HumanInputLLM(HumanLLM in #4028), a simple LLM wrapper
class that returns user input as the response. I also added a simple
Jupyter notebook regarding how and why to use this LLM wrapper. In the
notebook, I went over how to use this LLM wrapper and showed example of
testing `WikipediaQueryRun` using HumanInputLLM.
 
I believe this LLM wrapper will be useful especially for debugging,
educational or testing purpose.
2023-05-06 09:48:40 -07:00
Davis Chase
6cd51ef3d0 Simplify router chain constructor signatures (#4146) 2023-05-06 09:38:17 -07:00
玄猫
43a7a89e93 opt: document_loader notiondb to extract url (#4222) 2023-05-06 09:34:33 -07:00
Leonid Ganeline
9544b30821 added Wikipedia document loader (#4141)
- Added the `Wikipedia` document loader. It is based on the existing
`unilities/WikipediaAPIWrapper`
- Added a respective ut-s and example notebook
- Sorted list of classes in __init__
2023-05-06 09:32:45 -07:00
Eugene Yurtsev
423f497168 Add BlobParser abstraction (#3979)
This PR adds the BlobParser abstraction.

It follows the proposal described here:
https://github.com/hwchase17/langchain/pull/2833#issuecomment-1509097756
2023-05-05 21:43:38 -04:00
Davis Chase
5ca13cc1f0 Dev2049/pypdfium2 (#4209)
thanks @jerrytigerxu for the addition!

---------

Co-authored-by: Jere Xu <jtxu2008@gmail.com>
Co-authored-by: jerrytigerxu <jere.tiger.xu@gmailc.om>
2023-05-05 17:55:31 -07:00
Leonid Ganeline
59204a5033 docs: document_loaders improvements (#4200)
- made notebooks consistent: titles, service/format descriptions.
- corrected short names to full names, for example, `Word` -> `Microsoft
Word`
- added missed descriptions
- renamed notebook files to make ToC correctly sorted
2023-05-05 17:44:54 -07:00
Harrison Chase
eeb7c96e0c bump version to 160 (#4205) 2023-05-05 17:02:39 -07:00
Davis Chase
f1fc4dfebc Dev2049/obsidian patch (#4204)
thanks @shkarlsson for the fix! (just updated formatting)

---------

Co-authored-by: shkarlsson <sven.henrik.karlsson@gmail.com>
2023-05-05 16:49:19 -07:00
George
2324f19c85 Update qdrant interface (#3971)
Hello

1) Passing `embedding_function` as a callable seems to be outdated and
the common interface is to pass `Embeddings` instance

2) At the moment `Qdrant.add_texts` is designed to be used with
`embeddings.embed_query`, which is 1) slow 2) causes ambiguity due to 1.
It should be used with `embeddings.embed_documents`

This PR solves both problems and also provides some new tests
2023-05-05 16:46:40 -07:00
Harrison Chase
76ed41f48a update docs (#4194) 2023-05-05 16:45:26 -07:00
Zander Chase
1017e5cee2 Add LCP Client (#4198)
Adding a client to fetch datasets, examples, and runs from a LCP
instance and run objects over them.
2023-05-05 16:28:56 -07:00
Zander Chase
a30f42da4e Update V2 Tracer (#4193)
- Update the RunCreate object to work with recent changes
- Add optional Example ID to the tracer
- Adjust default persist_session behavior to attempt to load the session
if it exists
- Raise more useful HTTP errors for logging
- Add unit testing
- Fix the default ID to be a UUID for v2 tracer sessions


Broken out from the big draft here:
https://github.com/hwchase17/langchain/pull/4061
2023-05-05 14:55:01 -07:00
Mike Wang
c3044b1bf0 [test] Add integration_test for PandasAgent (#4056)
- confirm creation
- confirm functionality with a simple dimension check.

The test now is calling OpenAI API directly, but learning from
@vowelparrot that we’re caching the requests, so that it’s not that
expensive. I also found we’re calling OpenAI api in other integration
tests. Please lmk if there is any concern of real external API calls. I
can alternatively make a fake LLM for this test. Thanks
2023-05-05 14:49:02 -07:00
Aivin V. Solatorio
6567b73e1a JSON loader (#4067)
This implements a loader of text passages in JSON format. The `jq`
syntax is used to define a schema for accessing the relevant contents
from the JSON file. This requires dependency on the `jq` package:
https://pypi.org/project/jq/.

---------

Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
2023-05-05 14:48:13 -07:00
PawelFaron
bb6d97c18c Fixed the example code (#4117)
Fixed the issue mentioned here:

https://github.com/hwchase17/langchain/issues/3799#issuecomment-1534785861

Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
2023-05-05 14:22:10 -07:00
Anurag
19e28d8784 feat: Allow users to pass additional arguments to the WebDriver (#4121)
This commit adds support for passing additional arguments to the
`SeleniumURLLoader ` when creating Chrome or Firefox web drivers.
Previously, only a few arguments such as `headless` could be passed in.
With this change, users can pass any additional arguments they need as a
list of strings using the `arguments` parameter.

The `arguments` parameter allows users to configure the driver with any
options that are available for that particular browser. For example,
users can now pass custom `user_agent` strings or `proxy` settings using
this parameter.

This change also includes updated documentation and type hints to
reflect the new `arguments` parameter and its usage.

fixes #4120
2023-05-05 13:24:42 -07:00
hp0404
2a3c5f8353 Update WhatsAppChatLoader regex to handle multiple date-time formats (#4186)
This PR updates the `message_line_regex` used by `WhatsAppChatLoader` to
support different date-time formats used in WhatsApp chat exports;
resolves #4153.

The new regex handles the following input formats:
```terminal
[05.05.23, 15:48:11] James: Hi here
[11/8/21, 9:41:32 AM] User name: Message 123
1/23/23, 3:19 AM - User 2: Bye!
1/23/23, 3:22_AM - User 1: And let me know if anything changes
```

Tests have been added to verify that the loader works correctly with all
formats.
2023-05-05 13:13:05 -07:00
Nicolas
a57259ec83 docs: Mendable Fixes and Improvements (#4184)
Overall fixes and improvements.
2023-05-05 13:04:24 -07:00
Harrison Chase
7dcc698ebf bump version to 159 (#4183) 2023-05-05 09:31:08 -07:00
Harrison Chase
26534457f5 simplify csv args (#4182) 2023-05-05 09:22:08 -07:00
Eduard van Valkenburg
3095546851 PowerBI fix for table names with spaces (#4170)
small fix to make sure a table name with spaces is passed correctly to
the API for the schema lookup.
2023-05-05 09:15:47 -07:00
obbiondo
b1e2e29222 fix: remove expand parameter from ConfluenceLoader by label (#4181)
expand is not an allowed parameter for the method
confluence.get_all_pages_by_label, since it doesn't return the body of
the text but just metadata of documents

Co-authored-by: Andrea Biondo <a.biondo@reply.it>
2023-05-05 09:15:21 -07:00
Zander Chase
84cfa76e00 Update Cohere Reranker (#4180)
The forward ref annotations don't get updated if we only iimport with
type checking

---------

Co-authored-by: Abhinav Verma <abhinav_win12@yahoo.co.in>
2023-05-05 09:11:37 -07:00
Davis Chase
d84bb02881 Add Chroma self query (#4149)
Add internal query language -> chroma metadata filter translator
2023-05-05 08:43:08 -07:00
Vinoo Ganesh
905a2114d7 Fix: Typo in Docs (#4179)
Fixing small typo in docs
2023-05-05 08:35:49 -07:00
Ankush Gola
8de1b4c4c2 Revert "fix: #4128 missing run_manager parameter" (#4159)
Reverts hwchase17/langchain#4130
2023-05-05 00:52:16 -07:00
Chakib Ben Ziane
878d0c8155 fix: #4128 missing run_manager parameter (#4130)
`run_manager` was not being passed downstream. Not sure if this was a
deliberate choice but it seems like it broke many agent callbacks like
`agent_action` and `agent_finish`. This fix needs a proper review.

Co-authored-by: blob42 <spike@w530>
2023-05-04 23:59:55 -07:00
Zander Chase
6032a051e9 Add Tenant ID to V2 Tracer (#4135)
Update the V2 tracer to
- use UUIDs instead of int's
- load a tenant ID and use that when saving sessions
2023-05-04 21:35:20 -07:00
Zander Chase
fea639c1fc Vwp/sqlalchemy (#4145)
Bump threshold to 1.4 from 1.3. Change import to be compatible

Resolves #4142 and #4129

---------

Co-authored-by: ndaugreal <ndaugreal@gmail.com>
Co-authored-by: Jeremy Lopez <lopez86@users.noreply.github.com>
2023-05-04 20:46:38 -07:00
Zander Chase
2f087d63af Fix Python RePL Tool (#4137)
Filter out kwargs from inferred schema when determining if a tool is
single input.

Add a couple unit tests.

Move tool unit tests to the tools dir
2023-05-04 20:31:16 -07:00
Zander Chase
cc068f1b77 Add Issue Templates (#4021)
Add issue templates for
- bug reports
- feature suggestions
- documentation
and a link to the discord for general discussion.

Open to other suggestions here. Could also add another "Other" template
with just a raw text box if we think this is too restrictive


<img width="1464" alt="image"
src="https://user-images.githubusercontent.com/130414180/236115358-e603bcbe-282c-40c7-82eb-905eb93ccec0.png">
2023-05-04 16:33:52 -07:00
Zander Chase
ac0a9d02bd Visual Studio Code/Github Codespaces Dev Containers (#4035) (#4122)
Having dev containers makes its easier, faster and secure to setup the
dev environment for the repository.

The pull request consists of:

- .devcontainer folder with:
- **devcontainer.json :** (minimal necessary vscode extensions and
settings)
- **docker-compose.yaml :** (could be modified to run necessary services
as per need. Ex vectordbs, databases)
    - **Dockerfile:**(non root with dev tools)
- Changes to README - added the Open in Github Codespaces Badge - added
the Open in dev container Badge

Co-authored-by: Jinto Jose <129657162+jj701@users.noreply.github.com>
2023-05-04 11:37:00 -07:00
Harrison Chase
d86ed15d88 bump version to 158 (#4091) 2023-05-04 09:14:47 -07:00
OlajideOgun
624554a43a DeepLake: Pass in rest of args to self._search_helper (#4080)
As of right now when trying to use functions like
`max_marginal_relevance_search()` or
`max_marginal_relevance_search_by_vector()` the rest of the kwargs are
not propagated to `self._search_helper()`. For example a user cannot
explicitly state the distance_metric they want to use when calling
`max_marginal_relevance_search`
2023-05-04 02:14:22 -07:00
Eduard van Valkenburg
6d84541ff9 fix base url (#4095)
Noticed a mistake in the base url and group vs non-group urls
2023-05-04 02:08:21 -07:00
Harrison Chase
a9c2450330 Harrison/toml loader (#4090)
Co-authored-by: Mika Ayenson <Mikaayenson@users.noreply.github.com>
2023-05-03 23:14:39 -07:00
Harrison Chase
d4cf1eb60a Add firestore memory (#3792) (#3941)
If you have any other suggestions or feedback, please let me know.

---------

Co-authored-by: yakigac <10434946+yakigac@users.noreply.github.com>
2023-05-03 22:55:47 -07:00
Harrison Chase
fba6921b50 Harrison/one drive loader (#4081)
Co-authored-by: José Ferraz Neto <netoferraz@gmail.com>
2023-05-03 22:55:34 -07:00
golergka
bd277b5327 feat: prune summary buffer (#4004)
If the library user has to decrease the `max_token_limit`, he would
probably want to prune the summary buffer even though he haven't added
any new messages.

Personally, I need it because I want to serialise memory buffer object
and save to database, and when I load it, I may have re-configured my
code to have a shorter memory to save on tokens.
2023-05-03 22:45:48 -07:00
AndreLCanada
bf726f9d8a Update python_repl docs (#4012)
In the example for creating a Python REPL tool under the Agent module,
the ".run" was omitted in the example. I believe this is required when
defining a Tool.
2023-05-03 22:45:32 -07:00
Mike Wang
67db495fcf [agent] Add Spark Agent (#4020)
- added support for spark through pyspark library.
- added jupyter notebook as example.
2023-05-03 22:45:23 -07:00
Gengliang Wang
8af25867cb Simplify HumanMessages in the quick start guide (#4026)
In the section `Get Message Completions from a Chat Model` of the quick
start guide, the HumanMessage doesn't need to include `Translate this
sentence from English to French.` when there is a system message.

Simplify HumanMessages in these examples can further demonstrate the
power of LLM.
2023-05-03 22:45:03 -07:00
Harrison Chase
087a4bd2b8 improve agent documentation (#4062) 2023-05-03 22:44:01 -07:00
rogerserper
b1446bea5f google-serper: async + full json results + support for Google Images, Places and News (#4078)
* implemented arun, results, and aresults. Reuses aiosession if
available.
* helper tools GoogleSerperRun and GoogleSerperResults
* support for Google Images, Places and News (examples given) and
filtering based on time (e.g. past hour)
* updated docs
2023-05-03 22:35:48 -07:00
mbchang
cdea47491d refactor: refactor dialogue examples (DialogueAgent, DialogueSimulator) (#4074)
refactor dialogue examples to have same DialogueAgent and
DialogueSimulator definitions
2023-05-03 22:32:26 -07:00
Jan Philipp Harries
657f5f259f Added option to reduce verbosity of Deeplake integration (#4038)
The deeplake integration was/is very verbose (see e.g. [the
documentation
example](https://python.langchain.com/en/latest/use_cases/code/code-analysis-deeplake.html)
when loading or creating a deeplake dataset with only limited options to
dial down verbosity.

Additionally, the warning that a "Deep Lake Dataset already exists" was
confusing, as there is as far as I can tell no other way to load a
dataset.

This small PR changes that and introduces an explicit `verbose` argument
which is also passed to the deeplake library.

There should be minimal changes to the default output (the loading line
is printed instead of warned to make it consistent with `ds.summary()`
which also prints.
2023-05-03 22:16:27 -07:00
Davis Chase
7f8727bbcd Router chains (#4019)
Unpolished router examples to help flesh out abstractions and use cases 
![Screenshot 2023-05-02 at 7 02 58
PM](https://user-images.githubusercontent.com/130488702/235820394-389e5584-db0b-415e-a260-2824b5555167.png)

---------

Co-authored-by: Shreya Rajpal <shreya.rajpal@gmail.com>
2023-05-03 22:02:55 -07:00
Pulkit Mehta
bbbca10704 issue#4082 base_language had wrong code comment that it was using gpt… (#4084)
…3 to tokenize text instead of gpt-2

Co-authored-by: Pulkit <pulkit.mehta@catylex.com>
2023-05-03 21:58:29 -07:00
Leonid Ganeline
6caba8e759 docs: added a link to the Google Scholar articles (#4007)
Google Scholar outputs a nice list of scientific and research articles
that use LangChain.
I added a link to the Google Scholar page to the `gallery` doc page
2023-05-03 21:54:44 -07:00
obbiondo
d18e788ee3 bugfix: return whole document when loading with ConfluenceLoader.load by label (#3980)
Method confluence.get_all_pages_by_label, returns only metadata about
documents with a certain label (such as pageId, titles, ...). To return
all documents with a certain label we need to extract all page ids given
a certain label and get pages content by these ids.

---------

Co-authored-by: Andrea Biondo <a.biondo@reply.it>
2023-05-03 21:52:05 -07:00
Harrison Chase
5f30cc8713 Harrison/knn retriever (#4083)
Co-authored-by: Yuichi Tateno (secon) <hotchpotch@users.noreply.github.com>
2023-05-03 21:21:58 -07:00
Zander Chase
65c3b146c9 Accept str or list[str] for shell (#4060)
Relax the requirements
2023-05-03 21:11:06 -07:00
Harrison Chase
5a269d3175 Harrison/media wiki xml (#4072)
Co-authored-by: Géraud de Drouas <gdedrouas@users.noreply.github.com>
2023-05-03 20:45:33 -07:00
Zeeland
c186f18aab fix: incorrect data type when construct_path in chain (#4031)
A incorrect data type error happened when executing _construct_path in
`chain.py` as follows:

```python
Error with message replace() argument 2 must be str, not int
```

The path is always a string. But the result of `args.pop(param, "")` is
undefined.
2023-05-03 18:49:47 -07:00
engkheng
349ba88aee Export FileChatMessageHistory (#4042) 2023-05-03 18:14:47 -07:00
Nikolas Garske
1608f5dcae Remove pip stdout and fix typo (#4050) 2023-05-03 18:06:39 -07:00
Ivo Stranic
3b556eae44 Update deeplake example (#4055) 2023-05-03 18:03:51 -07:00
Steve Kim
9b830f437c Deleted importing Document from document_loaders.base because Documen… (#4068)
Hi,

- Modification:
https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/arxiv.html
- Reason: In this example, the first line is unnecessary because the
Document class does not exist in the base.
- Resolves: Issue #4052

--------
P.S: This pull-request is my first time, so please let me know if I need
to correct or write more explanation.
2023-05-03 17:54:30 -07:00
hp0404
374725a715 Refactor TelegramChatLoader and FacebookChatLoader classes and add tests (#3863)
This PR includes two main changes:

- Refactor the `TelegramChatLoader` and `FacebookChatLoader` classes by
removing the dependency on pandas and simplifying the message filtering
process.

- Add test cases for the `TelegramChatLoader` and `FacebookChatLoader`
classes. This test ensures that the class correctly loads and processes
the example chat data, providing better test coverage for this
functionality.
2023-05-03 15:59:19 -07:00
Jon Saginaw
ea64b1716d Enhancement: option to Get All Tokens with a single Blockchain Document Loader call (#3797)
The Blockchain Document Loader's default behavior is to return 100
tokens at a time which is the Alchemy API limit. The Document Loader
exposes a startToken that can be used for pagination against the API.

This enhancement includes an optional get_all_tokens param (default:
False) which will:

- Iterate over the Alchemy API until it receives all the tokens, and
return the tokens in a single call to the loader.
- Manage all/most tokenId formats (this can be int, hex16 with zero or
all the leading zeros). There aren't constraints as to how smart
contracts can represent this value, but these three are most common.

Note that a contract with 10,000 tokens will issue 100 calls to the
Alchemy API, and could take about a minute, which is why this param will
default to False. But I've been using the doc loader with these
utilities on the side, so figured it might make sense to build them in
for others to use.
2023-05-03 15:46:44 -07:00
Akash Sharma
525db1b6cb Fixed typo leading to broken link (#4034) 2023-05-03 14:45:54 -07:00
Zander Chase
afa9d1292b Re-Permit Partials in Tool (#4058)
Resolved issue #4053

Now that StructuredTool is a separate class, this constraint is no
longer needed.

Added/updated a unit test
2023-05-03 13:16:41 -07:00
Zander Chase
7e967aa4d5 Update Notebooks (#4051) 2023-05-03 09:31:02 -07:00
Nuno Campos
f3ec6d2449 Replace remaining usage of basellm with baselangmodel (#3981) 2023-05-02 21:52:29 -07:00
mbchang
f291fd7eed docs: remove stdout from pip install (for gymnasium) (#3993) 2023-05-02 21:51:40 -07:00
Harrison Chase
b67be55ab8 bump ver (#4018) 2023-05-02 19:02:02 -07:00
Harrison Chase
a5dd73c1a6 Revert "[agent][property type] Change allowed_tools to Set as Duplicate doesn’t make sense" (#4014)
Reverts hwchase17/langchain#3840
2023-05-02 18:58:05 -07:00
Davis Chase
df3bc707fc Dev2049/callback example fix (#4010)
Closes #3997

---------

Co-authored-by: Akshaj Jain <akshaj.jain@gmail.com>
2023-05-02 16:20:16 -07:00
Davis Chase
f08a76250f Better custom model handling OpenAICallbackHandler (#4009)
Thanks @maykcaldas for flagging! think this should resolve #3988. Let me
know if you still see issues after next release.
2023-05-02 16:19:57 -07:00
Zander Chase
aa38355999 Vwp/docs improved document loaders (#4006)
Huge thanks to @leo-gan for improving the document loaders notebooks

---------

Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
2023-05-02 15:24:53 -07:00
Zander Chase
1c68cbdb28 Fix typing of attribute (#3999) 2023-05-02 15:11:23 -07:00
MichaelMDowling
36ee60c96c Update \docs\modules\models\text_embedding\examples\openai.ipynb (#3976)
Single edit to: models/text_embedding/examples/openai.ipynb - Line 88:
changed from: "embeddings = OpenAIEmbeddings(model_name=\"ada\")" to
"embeddings = OpenAIEmbeddings()" as model_name is no longer part of the
OpenAIEmbeddings class.
2023-05-02 14:41:31 -07:00
Harrison Chase
e23391965b fix import (#4003) 2023-05-02 14:26:46 -07:00
Jinto Jose
013208cce6 Fix Documentation - Nomic - Atlas Jupyter Notebook (#3987)
Correction to Numic-Atlas Jupyter Notebook Docs
2023-05-02 14:20:01 -07:00
Ankush Gola
18f9d7b4f6 don't deepcopy handlers (#3995)
Co-authored-by: Sami Liedes <sami.liedes@iki.fi>
Co-authored-by: Sami Liedes <sami.liedes@rocket-science.ch>
2023-05-02 13:53:27 -07:00
Mike Wang
c26cf04110 [check] add import check and warning for pandas (#3944)
- as titled, add an `import` catch for pandas with a user suggestion
message.
2023-05-02 10:08:16 -07:00
Chop Tr
71a337dac6 Update output_fixing_parser.ipynb (#3978) 2023-05-02 09:33:46 -07:00
Ankush Gola
3bd5a99b83 v2 tracer with single runs endpoint (#3951) 2023-05-01 22:41:32 -07:00
Harrison Chase
8fcb56e74a bump version to 155 (#3943) 2023-05-01 22:05:52 -07:00
Harrison Chase
ca08a34a98 retry to parsing (#3696) 2023-05-01 22:05:42 -07:00
mbchang
3993166b5e docs: remove stdout from pip install (#3945) 2023-05-01 22:05:22 -07:00
Harrison Chase
2366e71bed Harrison/azure openai (#3942)
Co-authored-by: Saverio Proto <zioproto@gmail.com>
2023-05-01 21:34:16 -07:00
Harrison Chase
48ea27ba60 Harrison/blockwise sitemap (#3940)
Co-authored-by: Martin Holzhauer <martin@holzhauer.eu>
2023-05-01 21:34:07 -07:00
Harrison Chase
483fe257d9 bump timeout (#3939) 2023-05-01 21:33:57 -07:00
Jan Philipp Harries
fc3c2c4406 Async Support for LLMChainExtractor (new) (#3780)
@vowelparrot @hwchase17 Here a new implementation of
`acompress_documents` for `LLMChainExtractor ` without changes to the
sync-version, as you suggested in #3587 / [Async Support for
LLMChainExtractor](https://github.com/hwchase17/langchain/pull/3587) .

I created a new PR to avoid cluttering history with reverted commits,
hope that is the right way.
Happy for any improvements/suggestions.

(PS:
I also tried an alternative implementation with a nested helper function
like

``` python
  async def acompress_documents_old(
      self, documents: Sequence[Document], query: str
  ) -> Sequence[Document]:
      """Compress page content of raw documents."""
      async def _compress_concurrently(doc):
          _input = self.get_input(query, doc)
          output = await self.llm_chain.apredict_and_parse(**_input)
          return Document(page_content=output, metadata=doc.metadata)
      outputs=await asyncio.gather(*[_compress_concurrently(doc) for doc in documents])
      compressed_docs=list(filter(lambda x: len(x.page_content)>0,outputs))
      return compressed_docs
```

But in the end I found the commited version to be better readable and
more "canonical" - hope you agree.
2023-05-01 21:23:13 -07:00
Harrison Chase
2cecc572f9 Harrison/chroma get (#3938)
Co-authored-by: sdan <git@sdan.io>
2023-05-01 21:19:28 -07:00
liviuasnash1
6396a4ad8d Fix documentation typos (#3870)
Co-authored-by: Liviu Asnash <liviua@maximallearning.com>
2023-05-01 20:58:38 -07:00
Hristo Stoychev
109927cdb2 Make project compatible with SQLAlchemy 1.3.* (#3862)
Related to [this
issue.](https://github.com/hwchase17/langchain/issues/3655#issuecomment-1529415363)

The `Mapped` SQLAlchemy class is introduced in SQLAlchemy 1.4 but the
migration from 1.3 to 1.4 is quite challenging so, IMO, it's better to
keep backwards compatibility and not change the SQLAlchemy requirements
just because of type annotations.
2023-05-01 20:58:22 -07:00
sqr
8bbdde8f9e make ARG POETRY_HOME available in multistage (#3882) 2023-05-01 20:57:41 -07:00
玄猫
188a7bd653 fix: pgvector hang risk if table not exist #3883 (#3884) 2023-05-01 20:57:31 -07:00
tomer555
9acf80fd69 fix: invalid escape sequence error in regex pattern (#3902)
This PR fixes the "SyntaxError: invalid escape sequence" error in the
pydantic.py file. The issue was caused by the backslashes in the regular
expression pattern being treated as escape characters. By using a raw
string literal for the regex pattern (e.g., r"\{.*\}"), this fix ensures
that backslashes are treated as literal characters, thus preventing the
error.

Co-authored-by: Tomer Levy <tomer.levy@tipalti.com>
2023-05-01 20:57:19 -07:00
Samuel Dion-Girardeau
c5c33786a7 Fix bad spellings for 'convenience' (#3936)
Found in the docs for chat prompt templates:

https://python.langchain.com/en/latest/getting_started/getting_started.html#chat-prompt-templates

and fixed similar issues in neighboring notebooks.
2023-05-01 20:57:06 -07:00
Harrison Chase
f04faf8496 Harrison/spreedly (#3937)
Co-authored-by: Esmit Pérez <esmitperez@users.noreply.github.com>
2023-05-01 20:56:56 -07:00
Harrison Chase
cd3f8582cb Harrison/combined memory (#3935)
Co-authored-by: engkheng <60956360+outday29@users.noreply.github.com>
2023-05-01 20:55:56 -07:00
Zander Chase
c4cb55a0c5 [Breaking] Migrate GPT4All to use PyGPT4All (#3934)
Seems the pyllamacpp package is no longer the supported bindings from
gpt4all. Tested that this works locally.

Given that the older models weren't very performant, I think it's better
to migrate now without trying to include a lot of try / except blocks

---------

Co-authored-by: Nissan Pow <npow@users.noreply.github.com>
Co-authored-by: Nissan Pow <pownissa@amazon.com>
2023-05-01 20:42:45 -07:00
leo-gan
f0a4bbb8e2 updated YouTube links (#3916)
Added several links to fresh videos

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-05-01 20:39:59 -07:00
Mike Wang
68a18cc621 [simple] add ddg-search to __init__ for easier loading (#3933)
the same as other tools
2023-05-01 20:39:17 -07:00
Matt Robinson
c51dec5101 feat: add Unstructured API loaders (#3906)
### Summary

Adds `UnstructuredAPIFileLoaders` and `UnstructuredAPIFIleIOLoaders`
that partition documents through the Unstructured API. Defaults to the
URL for hosted Unstructured API, but can switch to a self hosted or
locally running API using the `url` kwarg. Currently, the Unstructured
API is open and does not require an API, but it will soon. A note was
added about that to the Unstructured ecosystem page.

### Testing


```python
from langchain.document_loaders import UnstructuredAPIFileIOLoader

filename = "fake-email.eml"

with open(filename, "rb") as f:
    loader = UnstructuredAPIFileIOLoader(file=f, file_filename=filename)
    docs = loader.load()

docs[0]
```

```python
from langchain.document_loaders import UnstructuredAPIFileLoader

filename = "fake-email.eml"
loader = UnstructuredAPIFileLoader(file_path=filename, mode="elements")
docs = loader.load()

docs[0]
```
2023-05-01 20:37:35 -07:00
Harrison Chase
13269fb583 Harrison/relevancy score (#3907)
Co-authored-by: Ryan Grippeling <R.Grippeling@hotmail.com>
Co-authored-by: Ryan <ryan@webgrip.nl>
Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>
2023-05-01 20:37:24 -07:00
Zander Chase
c582f2e9e3 Add Structure Chat Agent (#3912)
Create a new chat agent that is compatible with the Multi-input tools
2023-05-01 20:34:50 -07:00
Mike Wang
ec21b7126c [agent][property type] Change allowed_tools to Set as Duplicate doesn’t make sense (#3840)
- ActionAgent has a property called, `allowed_tools`, which is declared
as `List`. It stores all provided tools which is available to use during
agent action.
- This collection shouldn’t allow duplicates. The original datatype List
doesn’t make sense. Each tool should be unique. Even when there are
variants (assuming in the future), it would be named differently in
load_tools.


Test:
- confirm the functionality in an example by initializing an agent with
a list of 2 tools and confirm everything works.
```python3
def test_agent_chain_chat_bot():
	from langchain.agents import load_tools
	from langchain.agents import initialize_agent
	from langchain.agents import AgentType
	from langchain.chat_models import ChatOpenAI
	from langchain.llms import OpenAI
	from langchain.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper

	chat = ChatOpenAI(temperature=0)
	llm = OpenAI(temperature=0)
	tools = load_tools(["ddg-search", "llm-math"], llm=llm)

	agent = initialize_agent(tools, chat, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
	agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?")
test_agent_chain_chat_bot()
```
Result:
<img width="863" alt="Screenshot 2023-05-01 at 7 58 11 PM"
src="https://user-images.githubusercontent.com/62768671/235572157-0937594c-ddfb-4760-acb2-aea4cacacd89.png">
2023-05-01 20:30:10 -07:00
Harrison Chase
c5cc09d4e3 Harrison/agent exec kwargs (#3917)
Co-authored-by: Zach Schillaci <40636930+zachschillaci27@users.noreply.github.com>
2023-05-01 20:28:43 -07:00
Harrison Chase
05170b6764 Harrison/from documents (#3919)
Co-authored-by: Gabriel Altay <gabriel.altay@gmail.com>
2023-05-01 20:28:14 -07:00
Davis Chase
e7e29f9937 Dev2049/add modern treasury (#3924)
Modified Modern Treasury and Strip slightly so credentials don't have to
be passed in explicitly. Thanks @mattgmarcus for adding Modern Treasury!

---------

Co-authored-by: Matt Marcus <matt.g.marcus@gmail.com>
2023-05-01 20:28:02 -07:00
Davis Chase
5db6b796cf Dev2049/hf emb encode kwargs (#3925)
Thanks @amogkam for the addition! Refactored slightly

---------

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2023-05-01 20:27:41 -07:00
mbchang
ffc87233a1 refactor GymnasiumAgent (#3927)
refactor GymnasiumAgent (for single-agent environments) to be extensible
to PettingZooAgent (multi-agent environments)
2023-05-01 20:25:03 -07:00
mbchang
81601d886c new example: multi-agent simulations with environment (#3928) 2023-05-01 20:24:15 -07:00
Harrison Chase
f7a828685d Harrison/constitutional chain (#3931)
Co-authored-by: Sam Ching <samuel@duolingo.com>
2023-05-01 20:23:16 -07:00
Eduard van Valkenburg
43a0cb4b92 small change to allow powerbi tools to all have single inputs (#3864)
Small change in the tool input so that the single_input_tool function
works against all powerbi tools
2023-05-01 20:22:16 -07:00
Eduard van Valkenburg
c38cafd6c2 Add connection string auth to cosmos (#3867)
Adds a connection string option for the cosmos memory, in case AAD auth
is not enabled on the cosmos instance.
2023-05-01 20:21:46 -07:00
Venelin Valkov
bc7e4d5cd4 Add links to YouTube videos by Venelin Valkov (#3820)
Hi,
I've added links to my YouTube videos on LangChain. Thank you for
making/maintaining LangChain!
Venelin
2023-05-01 20:20:30 -07:00
Rafal Wojdyla
a5a4999fb7 New line should be remove only for the 1st gen embedding models (#3853)
Only 1st generation OpenAI embeddings models are negatively impacted by
new lines.

Context:
https://github.com/openai/openai-python/issues/418#issuecomment-1525939500
2023-05-01 20:09:20 -07:00
Johan Stenberg (MSFT)
6bd367916c Update adding_memory_chain_multiple_inputs.ipynb (#3895)
Fix misleading docs in memory chain example (used the term "outputs"
instead of "inputs")
2023-05-01 19:57:27 -07:00
Zander Chase
9b9b231e10 Update some Tools Docs (#3913)
Haven't gotten to all of them, but this:
- Updates some of the tools notebooks to actually instantiate a tool
(many just show a 'utility' rather than a tool. More changes to come in
separate PR)
- Move the `Tool` and decorator definitions to `langchain/tools/base.py`
(but still export from `langchain.agents`)
- Add scene explain to the load_tools() function
- Add unit tests for public apis for the langchain.tools and langchain.agents modules
2023-05-01 19:07:26 -07:00
Zander Chase
84ea17b786 Move Tool Validation (#3923)
Move tool validation to each implementation of the Agent.

Another alternative would be to adjust the `_validate_tools()` signature
to accept the output parser (and format instructions) and add logic
there. Something like

`parser.outputs_structured_actions(format_instructions)`

But don't think that's needed right now.
2023-05-01 18:44:24 -07:00
Eugene Yurtsev
7cce68a051 Add minimal file system blob loader (#3669)
This adds a minimal file system blob loader.

If looks good, this PR will be merged and a few additional enhancements will be made.
2023-05-01 21:37:26 -04:00
Bank Natchapol
487d4aeebd Motorhead Memory messages come in reversed order. (#3835)
History from Motorhead memory return in reversed order
It should be Human: 1, AI:..., Human: 2, Ai...

```
You are a chatbot having a conversation with a human.
AI: I'm sorry, I'm still not sure what you're trying to communicate. Can you please provide more context or information?
Human: 3
AI: I'm sorry, I'm not sure what you mean by "1" and "2". Could you please clarify your request or question?
Human: 2
AI: Hello, how can I assist you today?
Human: 1
Human: 4
AI:
```

So, i `reversed` the messages before putting in chat_memory.
2023-05-01 17:02:34 -07:00
Davis Chase
900ad106d3 Update google palm model signatures (#3920)
Signatures out of date after callback refactors
2023-05-01 16:19:31 -07:00
sherylZhaoCode
145ff23fb1 correct the llm type of AzureOpenAI (#3721)
The llm type of AzureOpenAI was previously set to default, which is
openai. But since AzureOpenAI has different API from openai, it creates
problems when doing chain saving and loading. This PR corrected the llm
type of AzureOpenAI to "azure"
2023-05-01 15:51:34 -07:00
engkheng
21335d43b2 Minor LLMChain docs correction (#3791)
`LLMChain` run method can take multiple input variables.
2023-05-01 15:50:57 -07:00
Rafal Wojdyla
039b672f46 Fixup OpenAI Embeddings - fix the weighted mean (#3778)
Re: https://github.com/hwchase17/langchain/issues/3777

Copy pasting from the issue:

While working on https://github.com/hwchase17/langchain/issues/3722 I
have noticed that there might be a bug in the current implementation of
the OpenAI length safe embeddings in `_get_len_safe_embeddings`, which
before https://github.com/hwchase17/langchain/issues/3722 was actually
the **default implementation** regardless of the length of the context
(via https://github.com/hwchase17/langchain/pull/2330).

It appears the weights used are constant and the length of the embedding
vector (1536) and NOT the number of tokens in the batch, as in the
reference implementation at
https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb

<hr>

Here's some debug info:

<img width="1094" alt="image"
src="https://user-images.githubusercontent.com/1419010/235286595-a8b55298-7830-45df-b9f7-d2a2ad0356e0.png">

<hr>

We can also validate this against the reference implementation:

<details>

<summary>Reference implementation (click to unroll)</summary>

This implementation is copy pasted from
https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb

```py
import openai
from itertools import islice
import numpy as np
from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_not_exception_type


EMBEDDING_MODEL = 'text-embedding-ada-002'
EMBEDDING_CTX_LENGTH = 8191
EMBEDDING_ENCODING = 'cl100k_base'

# let's make sure to not retry on an invalid request, because that is what we want to demonstrate
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6), retry=retry_if_not_exception_type(openai.InvalidRequestError))
def get_embedding(text_or_tokens, model=EMBEDDING_MODEL):
    return openai.Embedding.create(input=text_or_tokens, model=model)["data"][0]["embedding"]

def batched(iterable, n):
    """Batch data into tuples of length n. The last batch may be shorter."""
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while (batch := tuple(islice(it, n))):
        yield batch
        
def chunked_tokens(text, encoding_name, chunk_length):
    encoding = tiktoken.get_encoding(encoding_name)
    tokens = encoding.encode(text)
    chunks_iterator = batched(tokens, chunk_length)
    yield from chunks_iterator


def reference_safe_get_embedding(text, model=EMBEDDING_MODEL, max_tokens=EMBEDDING_CTX_LENGTH, encoding_name=EMBEDDING_ENCODING, average=True):
    chunk_embeddings = []
    chunk_lens = []
    for chunk in chunked_tokens(text, encoding_name=encoding_name, chunk_length=max_tokens):
        chunk_embeddings.append(get_embedding(chunk, model=model))
        chunk_lens.append(len(chunk))

    if average:
        chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
        chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)  # normalizes length to 1
        chunk_embeddings = chunk_embeddings.tolist()
    return chunk_embeddings
```

</details>

```py
long_text = 'foo bar' * 5000

reference_safe_get_embedding(long_text, average=True)[:10]

# Here's the first 10 floats from the reference embeddings:
[0.004407593824276758,
 0.0017611146161865465,
 -0.019824815970984996,
 -0.02177626039794025,
 -0.012060967454897886,
 0.0017955296329155309,
 -0.015609168983609643,
 -0.012059823076681351,
 -0.016990468527792825,
 -0.004970484452089445]


# and now langchain implementation
from langchain.embeddings.openai import OpenAIEmbeddings
OpenAIEmbeddings().embed_query(long_text)[:10]

[0.003791506184693747,
 0.0025310066579390025,
 -0.019282322699514628,
 -0.021492679249899803,
 -0.012598522213242891,
 0.0022181168611315662,
 -0.015858940621301307,
 -0.011754004130791204,
 -0.016402944319627515,
 -0.004125287485127554]

# clearly they are different ^
```
2023-05-01 15:47:38 -07:00
Younis Shah
22a1896c30 [docs]: updates connecting_to_a_feature_store.ipynb (#3776)
* fixes `FeastPromptTemplate.format` example to use `driver_id`
2023-05-01 15:45:59 -07:00
Harrison Chase
e28c6403aa Harrison/cohere reranker (#3904) 2023-05-01 15:40:16 -07:00
Zura Isakadze
647bbf61c1 Add SQLiteChatMessageHistory (#3534)
It's based on already existing `PostgresChatMessageHistory`

Use case somewhere in between multiple files and Postgres storage.
2023-05-01 15:40:00 -07:00
James Brotchie
921894960b Add ChatModel, LLM, and Embeddings for Google's PaLM APIs (#3575)
- Add langchain.llms.GooglePalm for text completion,
 - Add langchain.chat_models.ChatGooglePalm for chat completion,
- Add langchain.embeddings.GooglePalmEmbeddings for sentence embeddings,
- Add example field to HumanMessage and AIMessage so that users can feed
in examples into the PaLM Chat API,
 - Add system and unit tests.

Note async completion for the Text API is not yet supported and will be
included in a future PR.

Happy for feedback on any aspect of this PR, especially our choice of
adding an example field to Human and AI Message objects to enable
passing example messages to the API.
2023-05-01 15:23:16 -07:00
Roma
d15f481352 Add unit test to output parsers (#3911)
This pull request adds unit tests for various output parsers
(BooleanOutputParser, CommaSeparatedListOutputParser, and
StructuredOutputParser) to ensure their correct functionality and to
increase code reliability and maintainability. The tests cover both
valid and invalid input cases.

Changes:

Added unit tests for BooleanOutputParser.
Added unit tests for CommaSeparatedListOutputParser.
Added unit tests for StructuredOutputParser.

Testing:

All new unit tests have been executed, and they pass successfully.
The overall test suite has been run, and all tests pass.
Notes:

These tests cover both successful parsing scenarios and error handling
for invalid inputs.
If any new output parsers are added in the future, corresponding unit
tests should also be created to maintain coverage.
2023-05-01 14:53:08 -07:00
Tim Asp
9c89ff8bd9 Increase request_timeout on ChatOpenAI (#3910)
With longer context and completions, gpt-3.5-turbo and, especially,
gpt-4, will more times than not take > 60seconds to respond.

Based on some other discussions, it seems like this is an increasingly
common problem, especially with summarization tasks.
- https://github.com/hwchase17/langchain/issues/3512
- https://github.com/hwchase17/langchain/issues/3005

OpenAI's max 600s timeout seems excessive, so I settled on 120, but I do
run into generations that take >240 seconds when using large prompts and
completions with GPT-4, so maybe 240 would be a better compromise?
2023-05-01 14:51:05 -07:00
Davis Chase
2451310975 Chroma fix mmr (#3897)
Fixes #3628, thanks @derekmoeller for the issue!
2023-05-01 10:47:15 -07:00
mbchang
3e1cb31f63 fix: add import for gymnasium (#3899) 2023-05-01 10:37:25 -07:00
Zander Chase
484707ad29 Add incremental messages token count (#3890) 2023-05-01 10:36:54 -07:00
Davis Chase
52e4fba897 Fix self query pinecone translation (#3892)
Enum to string conversion handled differently between python 3.9 and
3.11, currently breaking in 3.11 (see #3788). Thanks @peter-brady for
catching this!
2023-05-01 10:35:48 -07:00
Jef Packer
47a685adcf count tokens instead of chars in autogpt prompt (#3841)
This looks like a bug. 

Overall by using len instead of token_counter the prompt thinks it has
less context window than it actually does. Because of this it adds fewer
messages. The reduced previous message context makes the agent
repetitive when selecting tasks.
2023-05-01 09:21:42 -07:00
Nikolas Garske
c4d3d74148 Fix typos in arxiv.ipynb (#3887)
Several minor typos in the doc for the arxiv document loaders were
fixed.
2023-05-01 09:17:37 -07:00
Zander Chase
f7cb2af5f4 Export StructuredTool at /tools (#3858) 2023-04-30 19:22:21 -07:00
Ankush Gola
e87f81b3ec add more color to callbacks docs (#3856) 2023-04-30 19:13:01 -07:00
Zander Chase
19912d755e Vwp/arxiv (#3855)
Co-authored-by: Mike Wang <62768671+skcoirz@users.noreply.github.com>
2023-04-30 18:59:22 -07:00
Zander Chase
e17858470c Vwp/multi line input (#3854)
Co-authored-by: Paolo Rechia <paolorechia@gmail.com>
2023-04-30 18:59:11 -07:00
Harrison Chase
c896657d28 bump version to 154 (#3846) 2023-04-30 17:49:58 -07:00
Zander Chase
d7e17fc8fe Deprecate StdInquireTool (#3850)
- Deprecate StdInInquire tool (dup of HumanInputRun)
- Expose missing tools from `langchain.tools`
2023-04-30 16:55:50 -07:00
Zander Chase
b1d69d3e7a Vwp/fix vectorstore typing (#3851)
Co-authored-by: Jay Stakelon <stakes@users.noreply.github.com>
2023-04-30 16:45:10 -07:00
Zander Chase
fbbdf161cd Lambda Tool (#3842)
Co-authored-by: Jason Holtkamp <holtkam2@gmail.com>
2023-04-30 15:15:09 -07:00
Ankush Gola
d3ec00b566 Callbacks Refactor [base] (#3256)
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Davis Chase <130488702+dev2049@users.noreply.github.com>
Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-30 11:14:09 -07:00
Zander Chase
18ec22fe56 Remove multi-input tool section (#3810)
Moving to new notebook. Will re-intro w/ new agent
2023-04-29 15:29:08 -07:00
mbchang
adcad98bee fix: fix filepath error in agent simulations docs (#3795) 2023-04-29 11:21:27 -07:00
Harrison Chase
20aad0bed1 stripe docs 2023-04-29 08:16:37 -07:00
Harrison Chase
378f0889eb bump version to 153 (#3774) 2023-04-29 07:31:35 -07:00
Sheldon
399065e858 update zilliz example (#3578)
1. Now the Zilliz example can't connect to Zilliz Cloud, fixed

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-28 22:10:13 -07:00
Harrison Chase
bd7e0a534c Harrison/csv loader (#3771)
Co-authored-by: mrT23 <tal.r@codium.ai>
2023-04-28 21:54:24 -07:00
Harrison Chase
c494ca3ad2 Harrison/doc2txt (#3772)
Co-authored-by: rishni ratnam <rishniratnam@gmail.com>
2023-04-28 21:54:16 -07:00
Mike Wang
ce4fea983b [simple] added test case and improve self class return type annotation (#3773)
a simple follow up of https://github.com/hwchase17/langchain/pull/3748
- added test case
- improve annotation when function return type is class itself.
2023-04-28 21:54:07 -07:00
Harrison Chase
0c0f14407c Harrison/tair (#3770)
Co-authored-by: Seth Huang <848849+seth-hg@users.noreply.github.com>
2023-04-28 21:25:33 -07:00
Aurélien SCHILTZ
502ba6a0be Fix type annotation for SQLDatabaseToolkit.llm (#3581)
Currently `langchain.agents.agent_toolkits.SQLDatabaseToolkit` has a
field `llm` with type `BaseLLM`. This breaks initialization for some
LLMs. For example, trying to use it with GPT4:
```

from langchain.sql_database import SQLDatabase
from langchain.chat_models import ChatOpenAI
from langchain.agents.agent_toolkits import SQLDatabaseToolkit


db = SQLDatabase.from_uri("some_db_uri")
llm = ChatOpenAI(model_name="gpt-4")
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

# pydantic.error_wrappers.ValidationError: 1 validation error for SQLDatabaseToolkit
# llm
#  Can't instantiate abstract class BaseLLM with abstract methods _agenerate, _generate, _llm_type (type=type_error)
```
Seems like much of the rest of the codebase has switched from BaseLLM to
BaseLanguageModel. This PR makes the change for SQLDatabaseToolkit as
well
2023-04-28 21:19:01 -07:00
uyhcire
0a7a2b99b5 Fix Chroma integration failing when there are less than 4 items in the collection (#3674)
The code was failing to decrement the `n_results` kwarg passed to
`query(...)`
2023-04-28 21:18:19 -07:00
Rafal Wojdyla
57e028549a Expose kwargs in LLMChainExtractor.from_llm (#3748)
Re: https://github.com/hwchase17/langchain/issues/3747
2023-04-28 21:18:05 -07:00
Mike Wang
512c24fc9c [annotation improvement] Make AgentType->Class Conversion More Scalable (#3749)
In the current solution, AgentType and AGENT_TO_CLASS are placed in two
separate files and both manually maintained. This might cause
inconsistency when we update either of them.

— latest —
based on the discussion with hwchase17, we don’t know how to further use
the newly introduced AgentTypeConfig type, so it doesn’t make sense yet
to add it. Instead, it’s better to move the dictionary to another file
to keep the loading.py file clear. The consistency is a good point.
Instead of asserting the consistency during linting, we added a unittest
for consistency check. I think it works as auto unittest is triggered
every time with clear failure notice. (well, force push is possible, but
we all know what we are doing, so let’s show trust. :>)

~~This PR includes~~
- ~~Introduced AgentTypeConfig as the source of truth of all AgentType
related meta data.~~
- ~~Each AgentTypeConfig is a annotated class type which can be used for
annotation in other places.~~
- ~~Each AgentTypeConfig can be easily extended when we have more meta
data needs.~~
- ~~Strong assertion to ensure AgentType and AGENT_TO_CLASS are always
consistent.~~
- ~~Made AGENT_TO_CLASS automatically generated.~~

~~Test Plan:~~
- ~~since this change is focusing on annotation, lint is the major test
focus.~~
- ~~lint, format and test passed on local.~~
2023-04-28 21:17:28 -07:00
Harrison Chase
b7ae9f715d Langchain with reddit (#3661) (#3768)
I have added a reddit document loader which fetches the text from the
Posts of Subreddits or Reddit users, using the `praw` Python package. I
have also added an example notebook reddit.ipynb in order to guide users
to use this dataloader.
This code was made in format similar to twiiter document loader. I have
run code formating, linting and also checked the code myself for
different scenarios.

This is my first contribution to an open source project and I am really
excited about this. If you want to suggest some improvements in my code,
I will be happy to do it. :)

Co-authored-by: Taaha Bajwa <taaha.s.bajwa@gmail.com>
2023-04-28 20:59:56 -07:00
Kohei Kumazaki
fa4c35e9e5 Fix encoding issue in WebBaseLoader (#3602)
The character code mismatches occurred when character information was
not included in the response header (In my case, a Japanese web page).
I solved this issue by changing the encoding setting to
apparent_encoding.
2023-04-28 20:56:33 -07:00
Harrison Chase
be7a8e0824 Harrison/redis cache (#3766)
Co-authored-by: Tyler Hutcherson <tyler.hutcherson@redis.com>
2023-04-28 20:47:18 -07:00
Mike Wang
b588446bf9 [simple][test] Added test case for schema.py (#3692)
- added unittest for schema.py covering utility functions and token
counting.
- fixed a nit. based on huggingface doc, the tokenizer model is gpt-2.
[link](https://huggingface.co/transformers/v4.8.2/_modules/transformers/models/gpt2/tokenization_gpt2_fast.html)
- make lint && make format, passed on local
- screenshot of new test running result

<img width="1283" alt="Screenshot 2023-04-27 at 9 51 55 PM"
src="https://user-images.githubusercontent.com/62768671/235057441-c0ac3406-9541-453f-ba14-3ebb08656114.png">
2023-04-28 20:42:24 -07:00
Harrison Chase
15b92d361d Harrison/confluence stuff (#3765)
Co-authored-by: Jelmer Borst <japborst@gmail.com>
2023-04-28 20:19:44 -07:00
SimFG
5998b53596 Use the GPTCache api interface (#3693)
Use the GPTCache api interface to reduce the possibility of
compatibility issues
2023-04-28 20:18:51 -07:00
engkheng
f37a932b24 Improve chat prompt template docs (#3719)
Add a few more explanations and examples.
2023-04-28 20:16:22 -07:00
Robert Perrotta
22770f5202 Make StuffDocumentsChain doc separator configurable (#3718)
This PR makes the `"\n\n"` string with which `StuffDocumentsChain` joins
formatted documents a property so it can be configured. The new
`document_separator` property defaults to `"\n\n"` so the change is
backwards compatible.
2023-04-28 20:14:07 -07:00
Akhil Vempali
64ba24292d fix: 🐛 SQLAlchemy import error (#3716)
During the import of langchain, SQLAlchemy was throeing an errror
`ImportError: cannot import name 'Mapped' from 'sqlalchemy.orm'`. This
is becaue the Mapped name was introduced in v1.4
2023-04-28 20:13:32 -07:00
Jon Saginaw
f8d69e4e52 Enhancement: Blockchain Document Loader with better Metadata support (#3710)
This PR includes some minor alignment updates, including:

- metadata object extended to support contractAddress, blockchainType,
and tokenId
- notebook doc better aligned to standard langchain format
- startToken changed from int to str to support multiple hex value types
on the Alchemy API

The updated metadata will look like the below. It's possible for a
single contractAddress to exist across multiple blockchains (e.g.
Ethereum, Polygon, etc.) so it's important to include the
blockchainType.

```
 metadata = {"source": self.contract_address, 
                      "blockchain": self.blockchainType,
                      "tokenId": tokenId}
```
2023-04-28 20:13:05 -07:00
Davis Chase
220a7076ac Add Mathpix pdf loader (#3727)
Inspo
https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-28 20:11:22 -07:00
Rafal Wojdyla
37ed6f2177 Handle length safe embedding only if needed (#3723)
Re: https://github.com/hwchase17/langchain/issues/3722

Copy pasting context from the issue:


1bf1c37c0c/langchain/embeddings/openai.py (L210-L211)

Means that the length safe embedding method is "always" used, initial
implementation https://github.com/hwchase17/langchain/pull/991 has the
`embedding_ctx_length` set to -1 (meaning you had to opt-in for the
length safe method), https://github.com/hwchase17/langchain/pull/2330
changed that to max length of OpenAI embeddings v2, meaning the length
safe method is used at all times.

How about changing that if branch to use length safe method only when
needed, meaning when the text is longer than the max context length?
2023-04-28 20:10:04 -07:00
Harrison Chase
40f6e60e68 Harrison/stripe (#3762)
Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>
2023-04-28 20:03:21 -07:00
Jelmer Borst
8cf2ff0be0 Confluence: Add page status filter for spaces (#3732)
At the moment all content in Confluence is retrieved by default,
including archived content.

Often, this is undesired as the content is not relevant anymore.

**Notes**
Fetching pages by label does not support excluding archived content.
This may lead to unexpected results.
2023-04-28 19:56:53 -07:00
Harrison Chase
7a129ac043 Harrison/pypdf loader (#3764)
Co-authored-by: Felipe Meres <felipe@felipemeres.com>
2023-04-28 19:56:21 -07:00
mbchang
4eefea0fe8 new example: single agent, simulated environment (openai gym) (#3758)
For many applications of LLM agents, the environment is real (internet,
database, REPL, etc). However, we can also define agents to interact in
simulated environments like text-based games. This is an example of how
to create a simple agent-environment interaction loop with
[Gymnasium](https://github.com/Farama-Foundation/Gymnasium) (formerly
[OpenAI Gym](https://github.com/openai/gym)).
2023-04-28 19:52:05 -07:00
0xDTE
6ce34bb4fe Fixing broken document links (#3756)
simple document url fixes. nothing fancy.
2023-04-28 19:51:23 -07:00
Rafal Wojdyla
160bfae93f Add DocstoreFn - lookup doc via arbitrary function (#3760)
This **partially** addresses
https://github.com/hwchase17/langchain/issues/1524, but it's also useful
for some of our use cases.

This `DocstoreFn` allows to lookup a document given a function that
accepts the `search` string without the need to implement a custom
`Docstore`.

This could be useful when:
* you don't want to implement a `Docstore` just to provide a custom
`search`
 * it's expensive to construct an `InMemoryDocstore`/dict
 * you retrieve documents from remote sources
 * you just want to reuse existing objects
2023-04-28 19:50:32 -07:00
Harrison Chase
c55ba43093 Harrison/vespa (#3761)
Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>
2023-04-28 19:48:43 -07:00
mbchang
ee20b3e0d0 bug fix: initialize the arxivAPIWrapper object (#3733) 2023-04-28 19:35:01 -07:00
leo-gan
e510732ad2 docs: improved vectorstore notebooks (#3724)
- Added links to the vectorstore providers
- Added installation code (it is not clear that we have to go to the
`LangChan Ecosystem` page to get installation instructions.)
2023-04-28 19:26:50 -07:00
BioErrorLog
ad4eae7ef0 Fix linting on the Quickstart Guide sample codes (#3701)
When copying and pasting the sample code from the Quickstart Guide, lint
errors ("missing whitespace around operator") occur."
2023-04-28 17:29:05 -07:00
Zander Chase
a46f1d830e Synchronous Browser (#3745)
Split out sync methods in playwright
2023-04-28 17:09:00 -07:00
Zander Chase
6c2b16e465 Add SceneXplain Tool (#3752) 2023-04-28 17:01:54 -07:00
erwanlc
72c5c15f7f Fix: Updated links for in depth explanation of chain types in the Question Answering notebooks (#3714)
In the notebook question_answering.ipynb
([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb)),
and the notebook qa_with_sources.ipynb
([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/qa_with_sources.ipynb)),
the first paragraph contains a dead link:

> This notebook walks through how to use LangChain for question
answering over a list of documents. It covers four different types of
chains: stuff, map_reduce, refine, map_rerank. For a more in depth
explanation of what these chain types are, see
[here](32793f94fd/docs/modules/chains/combine_docs.md).

The file combine_docs.md doesn't exist anymore and thus provide 404 -
Page not found.

I updated the links so it redirect to
https://docs.langchain.com/docs/components/chains/index_related_chains
as in the summarize notebook
([link](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/summarize.ipynb))
present in the same folder.
2023-04-28 15:06:46 -07:00
Alan Cha
e3b7a20454 Fix typo (#3728) 2023-04-28 13:01:09 -07:00
Zander Chase
5042bd40d3 Add Shell Tool (#3335)
Create an official bash shell tool to replace the dynamically generated one
2023-04-28 11:10:43 -07:00
Zander Chase
334c162f16 Add Other File Utilities (#3209)
Add other File Utilities, include
- List Directory
- Search for file
- Move
- Copy
- Remove file

Bundle as toolkit
Add a notebook that connects to the Chat Agent, which somewhat supports
multi-arg input tools
Update original read/write files to return the original dir paths and
better handle unsupported file paths.
Add unit tests
2023-04-28 10:53:37 -07:00
Zander Chase
491c27f861 PlayWright Web Browser Toolkit (#3262)
Adds a PlayWright web browser toolkit with the following tools:

- NavigateTool (navigate_browser) - navigate to a URL
- NavigateBackTool (previous_page) - wait for an element to appear
- ClickTool (click_element) - click on an element (specified by
selector)
- ExtractTextTool (extract_text) - use beautiful soup to extract text
from the current web page
- ExtractHyperlinksTool (extract_hyperlinks) - use beautiful soup to
extract hyperlinks from the current web page
- GetElementsTool (get_elements) - select elements by CSS selector
- CurrentPageTool (current_page) - get the current page URL
2023-04-28 10:42:44 -07:00
Zander Chase
da7b51455c Dynamic tool -> single purpose (#3697)
I think the logic of
https://github.com/hwchase17/langchain/pull/3684#pullrequestreview-1405358565
is too confusing.

I prefer this alternative because:
- All `Tool()` implementations by default will be treated the same as
before. No breaking changes.
- Less reliance on pydantic magic
- The decorator (which only is typed as returning a callable) can infer
schema and generate a structured tool
- Either way, the recommended way to create a custom tool is through
inheriting from the base tool
2023-04-28 09:38:41 -07:00
Zach Schillaci
1bf1c37c0c Update VectorDBQA to RetrievalQA in tools (#3698)
Because `VectorDBQA` and `VectorDBQAWithSourcesChain` are deprecated
2023-04-28 07:39:59 -07:00
Harrison Chase
32793f94fd bump version to 152 (#3695) 2023-04-28 00:21:53 -07:00
mbchang
1da3ee1386 Multiagent authoritarian (#3686)
This notebook showcases how to implement a multi-agent simulation where
a privileged agent decides who to speak.
This follows the polar opposite selection scheme as [multi-agent
decentralized speaker
selection](https://python.langchain.com/en/latest/use_cases/agent_simulations/multiagent_bidding.html).

We show an example of this approach in the context of a fictitious
simulation of a news network. This example will showcase how we can
implement agents that
- think before speaking
- terminate the conversation
2023-04-27 23:33:29 -07:00
Zander Chase
4654c58f72 Add validation on agent instantiation for multi-input tools (#3681)
Tradeoffs here:
- No lint-time checking for compatibility
- Differs from JS package
- The signature inference, etc. in the base tool isn't simple
- The `args_schema` is optional 

Pros:
- Forwards compatibility retained
- Doesn't break backwards compatibility
- User doesn't have to think about which class to subclass (single base
tool or dynamic `Tool` interface regardless of input)
-  No need to change the load_tools, etc. interfaces

Co-authored-by: Hasan Patel <mangafield@gmail.com>
2023-04-27 15:36:11 -07:00
1387 changed files with 113755 additions and 15149 deletions

42
.devcontainer/Dockerfile Normal file
View File

@@ -0,0 +1,42 @@
# This is a Dockerfile for Developer Container
# Use the Python base image
ARG VARIANT="3.11-bullseye"
FROM mcr.microsoft.com/vscode/devcontainers/python:0-${VARIANT} AS langchain-dev-base
USER vscode
# Define the version of Poetry to install (default is 1.4.2)
# Define the directory of python virtual environment
ARG PYTHON_VIRTUALENV_HOME=/home/vscode/langchain-py-env \
POETRY_VERSION=1.4.2
ENV POETRY_VIRTUALENVS_IN_PROJECT=false \
POETRY_NO_INTERACTION=true
# Create a Python virtual environment for Poetry and install it
RUN python3 -m venv ${PYTHON_VIRTUALENV_HOME} && \
$PYTHON_VIRTUALENV_HOME/bin/pip install --upgrade pip && \
$PYTHON_VIRTUALENV_HOME/bin/pip install poetry==${POETRY_VERSION}
ENV PATH="$PYTHON_VIRTUALENV_HOME/bin:$PATH" \
VIRTUAL_ENV=$PYTHON_VIRTUALENV_HOME
# Setup for bash
RUN poetry completions bash >> /home/vscode/.bash_completion && \
echo "export PATH=$PYTHON_VIRTUALENV_HOME/bin:$PATH" >> ~/.bashrc
# Set the working directory for the app
WORKDIR /workspaces/langchain
# Use a multi-stage build to install dependencies
FROM langchain-dev-base AS langchain-dev-dependencies
ARG PYTHON_VIRTUALENV_HOME
# Copy only the dependency files for installation
COPY pyproject.toml poetry.lock poetry.toml ./
# Install the Poetry dependencies (this layer will be cached as long as the dependencies don't change)
RUN poetry install --no-interaction --no-ansi --with dev,test,docs

View File

@@ -0,0 +1,33 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/docker-existing-dockerfile
{
"dockerComposeFile": "./docker-compose.yaml",
"service": "langchain",
"workspaceFolder": "/workspaces/langchain",
"name": "langchain",
"customizations": {
"vscode": {
"extensions": [
"ms-python.python"
],
"settings": {
"python.defaultInterpreterPath": "/home/vscode/langchain-py-env/bin/python3.11"
}
}
},
// Features to add to the dev container. More info: https://containers.dev/features.
"features": {},
// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
// Uncomment the next line to run commands after the container is created.
// "postCreateCommand": "cat /etc/os-release",
// Uncomment to connect as an existing user other than the container default. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "devcontainer"
"remoteUser": "vscode",
"overrideCommand": true
}

View File

@@ -0,0 +1,31 @@
version: '3'
services:
langchain:
build:
dockerfile: .devcontainer/Dockerfile
context: ../
volumes:
- ../:/workspaces/langchain
networks:
- langchain-network
# environment:
# MONGO_ROOT_USERNAME: root
# MONGO_ROOT_PASSWORD: example123
# depends_on:
# - mongo
# mongo:
# image: mongo
# restart: unless-stopped
# environment:
# MONGO_INITDB_ROOT_USERNAME: root
# MONGO_INITDB_ROOT_PASSWORD: example123
# ports:
# - "27017:27017"
# networks:
# - langchain-network
networks:
langchain-network:
driver: bridge

View File

@@ -2,60 +2,62 @@
Hi there! Thank you for even being interested in contributing to LangChain.
As an open source project in a rapidly developing field, we are extremely open
to contributions, whether it be in the form of a new feature, improved infra, or better documentation.
to contributions, whether they be in the form of new features, improved infra, better documentation, or bug fixes.
## 🗺️ Guidelines
### 👩‍💻 Contributing Code
To contribute to this project, please follow a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
Please do not try to push directly to this repo unless you are maintainer.
## 🗺Contributing Guidelines
Please follow the checked-in pull request template when opening pull requests. Note related issues and tag relevant
maintainers.
Pull requests cannot land without passing the formatting, linting and testing checks first. See
[Common Tasks](#-common-tasks) for how to run these checks locally.
It's essential that we maintain great documentation and testing. If you:
- Fix a bug
- Add a relevant unit or integration test when possible. These live in `tests/unit_tests` and `tests/integration_tests`.
- Make an improvement
- Update any affected example notebooks and documentation. These lives in `docs`.
- Update unit and integration tests when relevant.
- Add a feature
- Add a demo notebook in `docs/modules`.
- Add unit and integration tests.
We're a small, building-oriented team. If there's something you'd like to add or change, opening a pull request is the
best way to get our attention.
### 🚩GitHub Issues
Our [issues](https://github.com/hwchase17/langchain/issues) page is kept up to date
with bugs, improvements, and feature requests. There is a taxonomy of labels to help
with sorting and discovery of issues of interest. These include:
with bugs, improvements, and feature requests.
- prompts: related to prompt tooling/infra.
- llms: related to LLM wrappers/tooling/infra.
- chains
- utilities: related to different types of utilities to integrate with (Python, SQL, etc.).
- agents
- memory
- applications: related to example applications to build
There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help
organize issues.
If you start working on an issue, please assign it to yourself.
If you are adding an issue, please try to keep it focused on a single modular bug/improvement/feature.
If the two issues are related, or blocking, please link them rather than keep them as one single one.
If you are adding an issue, please try to keep it focused on a single, modular bug/improvement/feature.
If two issues are related, or blocking, please link them rather than combining them.
We will try to keep these issues as up to date as possible, though
with the rapid rate of develop in this field some may get out of date.
If you notice this happening, please just let us know.
If you notice this happening, please let us know.
### 🙋Getting Help
Although we try to have a developer setup to make it as easy as possible for others to contribute (see below)
it is possible that some pain point may arise around environment setup, linting, documentation, or other.
Should that occur, please contact a maintainer! Not only do we want to help get you unblocked,
but we also want to make sure that the process is smooth for future contributors.
Our goal is to have the simplest developer setup possible. Should you experience any difficulty getting setup, please
contact a maintainer! Not only do we want to help get you unblocked, but we also want to make sure that the process is
smooth for future contributors.
In a similar vein, we do enforce certain linting, formatting, and documentation standards in the codebase.
If you are finding these difficult (or even just annoying) to work with,
feel free to contact a maintainer for help - we do not want these to get in the way of getting
good code into the codebase.
If you are finding these difficult (or even just annoying) to work with, feel free to contact a maintainer for help -
we do not want these to get in the way of getting good code into the codebase.
### 🏭Release process
As of now, LangChain has an ad hoc release process: releases are cut with high frequency by
a developer and published to [PyPI](https://pypi.org/project/langchain/).
LangChain follows the [semver](https://semver.org/) versioning standard. However, as pre-1.0 software,
even patch releases may contain [non-backwards-compatible changes](https://semver.org/#spec-item-4).
If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
If you have a Twitter account you would like us to mention, please let us know in the PR or in another manner.
## 🚀Quick Start
## 🚀 Quick Start
This project uses [Poetry](https://python-poetry.org/) as a dependency manager. Check out Poetry's [documentation on how to install it](https://python-poetry.org/docs/#installation) on your system before proceeding.
@@ -77,7 +79,7 @@ This will install all requirements for running the package, examples, linting, f
Now, you should be able to run the common tasks in the following section. To double check, run `make test`, all tests should pass. If they don't you may need to pip install additional dependencies, such as `numexpr` and `openapi_schema_pydantic`.
## ✅Common Tasks
## ✅ Common Tasks
Type `make` for a list of common tasks.
@@ -113,8 +115,37 @@ To get a report of current coverage, run the following:
make coverage
```
### Working with Optional Dependencies
Langchain relies heavily on optional dependencies to keep the Langchain package lightweight.
If you're adding a new dependency to Langchain, assume that it will be an optional dependency, and
that most users won't have it installed.
Users that do not have the dependency installed should be able to **import** your code without
any side effects (no warnings, no errors, no exceptions).
To introduce the dependency to the pyproject.toml file correctly, please do the following:
1. Add the dependency to the main group as an optional dependency
```bash
poetry add --optional [package_name]
```
2. Open pyproject.toml and add the dependency to the `extended_testing` extra
3. Relock the poetry file to update the extra.
```bash
poetry lock --no-update
```
4. Add a unit test that the very least attempts to import the new code. Ideally the unit
test makes use of lightweight fixtures to test the logic of the code.
5. Please use the `@pytest.mark.requires(package_name)` decorator for any tests that require the dependency.
### Testing
See section about optional dependencies.
#### Unit Tests
Unit tests cover modular logic that does not require calls to outside APIs.
To run unit tests:
@@ -131,8 +162,20 @@ make docker_tests
If you add new logic, please add a unit test.
#### Integration Tests
Integration tests cover logic that requires making calls to outside APIs (often integration with other services).
**warning** Almost no tests should be integration tests.
Tests that require making network connections make it difficult for other
developers to test the code.
Instead favor relying on `responses` library and/or mock.patch to mock
requests using small fixtures.
To run integration tests:
```bash
@@ -188,3 +231,17 @@ Finally, you can build the documentation as outlined below:
```bash
make docs_build
```
## 🏭 Release Process
As of now, LangChain has an ad hoc release process: releases are cut with high frequency by
a developer and published to [PyPI](https://pypi.org/project/langchain/).
LangChain follows the [semver](https://semver.org/) versioning standard. However, as pre-1.0 software,
even patch releases may contain [non-backwards-compatible changes](https://semver.org/#spec-item-4).
### 🌟 Recognition
If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
If you have a Twitter account you would like us to mention, please let us know in the PR or in another manner.

106
.github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file
View File

@@ -0,0 +1,106 @@
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve LangChain
labels: ["02 Bug Report"]
body:
- type: markdown
attributes:
value: >
Thank you for taking the time to file a bug report. Before creating a new
issue, please make sure to take a few moments to check the issue tracker
for existing issues about the bug.
- type: textarea
id: system-info
attributes:
label: System Info
description: Please share your system info with us.
placeholder: LangChain version, platform, python version, ...
validations:
required: true
- type: textarea
id: who-can-help
attributes:
label: Who can help?
description: |
Your issue will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
The core maintainers strive to read all issues, but tagging them will help them prioritize.
Please tag fewer than 3 people.
@hwchase17 - project lead
Tracing / Callbacks
- @agola11
Async
- @agola11
DataLoader Abstractions
- @eyurtsev
LLM/Chat Wrappers
- @hwchase17
- @agola11
Tools / Toolkits
- ...
placeholder: "@Username ..."
- type: checkboxes
id: information-scripts-examples
attributes:
label: Information
description: "The problem arises when using:"
options:
- label: "The official example notebooks/scripts"
- label: "My own modified scripts"
- type: checkboxes
id: related-components
attributes:
label: Related Components
description: "Select the components related to the issue (if applicable):"
options:
- label: "LLMs/Chat Models"
- label: "Embedding Models"
- label: "Prompts / Prompt Templates / Prompt Selectors"
- label: "Output Parsers"
- label: "Document Loaders"
- label: "Vector Stores / Retrievers"
- label: "Memory"
- label: "Agents / Agent Executors"
- label: "Tools / Toolkits"
- label: "Chains"
- label: "Callbacks/Tracing"
- label: "Async"
- type: textarea
id: reproduction
validations:
required: true
attributes:
label: Reproduction
description: |
Please provide a [code sample](https://stackoverflow.com/help/minimal-reproducible-example) that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
placeholder: |
Steps to reproduce the behavior:
1.
2.
3.
- type: textarea
id: expected-behavior
validations:
required: true
attributes:
label: Expected behavior
description: "A clear and concise description of what you would expect to happen."

6
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1,6 @@
blank_issues_enabled: true
version: 2.1
contact_links:
- name: Discord
url: https://discord.gg/6adMQxSpJS
about: General community discussions

View File

@@ -0,0 +1,19 @@
name: Documentation
description: Report an issue related to the LangChain documentation.
title: "DOC: <Please write a comprehensive title after the 'DOC: ' prefix>"
labels: [03 - Documentation]
body:
- type: textarea
attributes:
label: "Issue with current documentation:"
description: >
Please make sure to leave a reference to the document/code you're
referring to.
- type: textarea
attributes:
label: "Idea or request for content:"
description: >
Please describe as clearly as possible what topics you think are missing
from the current documentation.

View File

@@ -0,0 +1,30 @@
name: "\U0001F680 Feature request"
description: Submit a proposal/request for a new LangChain feature
labels: ["02 Feature Request"]
body:
- type: textarea
id: feature-request
validations:
required: true
attributes:
label: Feature request
description: |
A clear and concise description of the feature proposal. Please provide links to any relevant GitHub repos, papers, or other resources if relevant.
- type: textarea
id: motivation
validations:
required: true
attributes:
label: Motivation
description: |
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
- type: textarea
id: contribution
validations:
required: true
attributes:
label: Your contribution
description: |
Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md)

18
.github/ISSUE_TEMPLATE/other.yml vendored Normal file
View File

@@ -0,0 +1,18 @@
name: Other Issue
description: Raise an issue that wouldn't be covered by the other templates.
title: "Issue: <Please write a comprehensive title after the 'Issue: ' prefix>"
labels: [04 - Other]
body:
- type: textarea
attributes:
label: "Issue you'd like to raise."
description: >
Please describe the issue you'd like to raise as clearly as possible.
Make sure to include any relevant links or references.
- type: textarea
attributes:
label: "Suggestion:"
description: >
Please outline a suggestion to improve the issue here.

56
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,56 @@
<!--
Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution.
Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change.
After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost.
Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle!
-->
<!-- Remove if not applicable -->
Fixes # (issue)
#### Before submitting
<!-- If you're adding a new integration, please include:
1. a test for the integration - favor unit tests that does not rely on network access.
2. an example notebook showing its use
See contribution guidelines for more information on how to write tests, lint
etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
#### Who can review?
Tag maintainers/contributors who might be interested:
<!-- For a quicker response, figure out the right person to tag with @
@hwchase17 - project lead
Tracing / Callbacks
- @agola11
Async
- @agola11
DataLoaders
- @eyurtsev
Models
- @hwchase17
- @agola11
Agents / Tools / Toolkits
- @hwchase17
VectorStores / Retrievers / Memory
- @dev2049
-->

76
.github/actions/poetry_setup/action.yml vendored Normal file
View File

@@ -0,0 +1,76 @@
# An action for setting up poetry install with caching.
# Using a custom action since the default action does not
# take poetry install groups into account.
# Action code from:
# https://github.com/actions/setup-python/issues/505#issuecomment-1273013236
name: poetry-install-with-caching
description: Poetry install with support for caching of dependency groups.
inputs:
python-version:
description: Python version, supporting MAJOR.MINOR only
required: true
poetry-version:
description: Poetry version
required: true
install-command:
description: Command run for installing dependencies
required: false
default: poetry install
cache-key:
description: Cache key to use for manual handling of caching
required: true
working-directory:
description: Directory to run install-command in
required: false
default: ""
runs:
using: composite
steps:
- uses: actions/setup-python@v4
name: Setup python $${ inputs.python-version }}
with:
python-version: ${{ inputs.python-version }}
- uses: actions/cache@v3
id: cache-pip
name: Cache Pip ${{ inputs.python-version }}
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "15"
with:
path: |
~/.cache/pip
key: pip-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}
- run: pipx install poetry==${{ inputs.poetry-version }} --python python${{ inputs.python-version }}
shell: bash
- name: Check Poetry File
shell: bash
run: |
poetry check
- name: Check lock file
shell: bash
run: |
poetry lock --check
- uses: actions/cache@v3
id: cache-poetry
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "15"
with:
path: |
~/.cache/pypoetry/virtualenvs
~/.cache/pypoetry/cache
~/.cache/pypoetry/artifacts
key: poetry-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-poetry-${{ inputs.poetry-version }}-${{ inputs.cache-key }}-${{ hashFiles('poetry.lock') }}
- run: ${{ inputs.install-command }}
working-directory: ${{ inputs.working-directory }}
shell: bash

View File

@@ -4,6 +4,8 @@ on:
push:
branches: [master]
pull_request:
paths:
- 'docs/**'
env:
POETRY_VERSION: "1.4.2"

View File

@@ -4,6 +4,7 @@ on:
push:
branches: [master]
pull_request:
workflow_dispatch:
env:
POETRY_VERSION: "1.4.2"
@@ -18,17 +19,31 @@ jobs:
- "3.9"
- "3.10"
- "3.11"
test_type:
- "core"
- "extended"
name: Python ${{ matrix.python-version }} ${{ matrix.test_type }}
steps:
- uses: actions/checkout@v3
- name: Install poetry
run: pipx install poetry==$POETRY_VERSION
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
cache: "poetry"
- name: Install dependencies
run: poetry install
- name: Run unit tests
poetry-version: "1.4.2"
cache-key: ${{ matrix.test_type }}
install-command: |
if [ "${{ matrix.test_type }}" == "core" ]; then
echo "Running core tests, installing dependencies with poetry..."
poetry install
else
echo "Running extended tests, installing dependencies with poetry..."
poetry install -E extended_testing
fi
- name: Run ${{matrix.test_type}} tests
run: |
make test
if [ "${{ matrix.test_type }}" == "core" ]; then
make test
else
make extended_tests
fi
shell: bash

6
.gitignore vendored
View File

@@ -1,3 +1,4 @@
.vs/
.vscode/
.idea/
# Byte-compiled / optimized / DLL files
@@ -148,4 +149,7 @@ wandb/
# integration test artifacts
data_map*
\[('_type', 'fake'), ('stop', None)]
\[('_type', 'fake'), ('stop', None)]
# Replit files
*replit*

26
.readthedocs.yaml Normal file
View File

@@ -0,0 +1,26 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.11"
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# If using Sphinx, optionally build your docs in additional formats such as PDF
# formats:
# - pdf
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: docs/requirements.txt
- method: pip
path: .

View File

@@ -1,5 +1,7 @@
# This is a Dockerfile for running unit tests
ARG POETRY_HOME=/opt/poetry
# Use the Python base image
FROM python:3.11.2-bullseye AS builder
@@ -7,7 +9,7 @@ FROM python:3.11.2-bullseye AS builder
ARG POETRY_VERSION=1.4.2
# Define the directory to install Poetry to (default is /opt/poetry)
ARG POETRY_HOME=/opt/poetry
ARG POETRY_HOME
# Create a Python virtual environment for Poetry and install it
RUN python3 -m venv ${POETRY_HOME} && \
@@ -23,6 +25,8 @@ WORKDIR /app
# Use a multi-stage build to install dependencies
FROM builder AS dependencies
ARG POETRY_HOME
# Copy only the dependency files for installation
COPY pyproject.toml poetry.lock poetry.toml ./

View File

@@ -1,4 +1,4 @@
.PHONY: all clean format lint test tests test_watch integration_tests docker_tests help
.PHONY: all clean format lint test tests test_watch integration_tests docker_tests help extended_tests
all: help
@@ -32,11 +32,16 @@ lint lint_diff:
poetry run black $(PYTHON_FILES) --check
poetry run ruff .
test:
poetry run pytest tests/unit_tests
TEST_FILE ?= tests/unit_tests/
tests:
poetry run pytest tests/unit_tests
test:
poetry run pytest --disable-socket --allow-unix-socket $(TEST_FILE)
tests:
poetry run pytest --disable-socket --allow-unix-socket $(TEST_FILE)
extended_tests:
poetry run pytest --disable-socket --allow-unix-socket --only-extended tests/unit_tests
test_watch:
poetry run ptw --now . -- tests/unit_tests
@@ -50,13 +55,16 @@ docker_tests:
help:
@echo '----'
@echo 'coverage - run unit tests and generate coverage report'
@echo 'docs_build - build the documentation'
@echo 'docs_clean - clean the documentation build artifacts'
@echo 'docs_linkcheck - run linkchecker on the documentation'
@echo 'format - run code formatters'
@echo 'lint - run linters'
@echo 'test - run unit tests'
@echo 'test_watch - run unit tests in watch mode'
@echo 'integration_tests - run integration tests'
@echo 'docker_tests - run unit tests in docker'
@echo 'coverage - run unit tests and generate coverage report'
@echo 'docs_build - build the documentation'
@echo 'docs_clean - clean the documentation build artifacts'
@echo 'docs_linkcheck - run linkchecker on the documentation'
@echo 'format - run code formatters'
@echo 'lint - run linters'
@echo 'test - run unit tests'
@echo 'tests - run unit tests'
@echo 'test TEST_FILE=<test_file> - run all tests in file'
@echo 'extended_tests - run only extended unit tests'
@echo 'test_watch - run unit tests in watch mode'
@echo 'integration_tests - run integration tests'
@echo 'docker_tests - run unit tests in docker'

View File

@@ -2,7 +2,20 @@
⚡ Building applications with LLMs through composability ⚡
[![lint](https://github.com/hwchase17/langchain/actions/workflows/lint.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/lint.yml) [![test](https://github.com/hwchase17/langchain/actions/workflows/test.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/test.yml) [![linkcheck](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml) [![Downloads](https://static.pepy.tech/badge/langchain/month)](https://pepy.tech/project/langchain) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai) [![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
[![Release Notes](https://img.shields.io/github/release/hwchase17/langchain)](https://github.com/hwchase17/langchain/releases)
[![lint](https://github.com/hwchase17/langchain/actions/workflows/lint.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/lint.yml)
[![test](https://github.com/hwchase17/langchain/actions/workflows/test.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/test.yml)
[![linkcheck](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml)
[![Downloads](https://static.pepy.tech/badge/langchain/month)](https://pepy.tech/project/langchain)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai)
[![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/hwchase17/langchain)
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/hwchase17/langchain)
[![GitHub star chart](https://img.shields.io/github/stars/hwchase17/langchain?style=social)](https://star-history.com/#hwchase17/langchain)
[![Dependency Status](https://img.shields.io/librariesio/github/hwchase17/langchain)](https://libraries.io/github/hwchase17/langchain)
[![Open Issues](https://img.shields.io/github/issues-raw/hwchase17/langchain)](https://github.com/hwchase17/langchain/issues)
Looking for the JS/TS version? Check out [LangChain.js](https://github.com/hwchase17/langchainjs).

View File

@@ -13,5 +13,5 @@ pre {
}
#my-component-root *, #headlessui-portal-root * {
z-index: 1000000000000;
z-index: 10000;
}

View File

@@ -30,18 +30,17 @@ document.addEventListener('DOMContentLoaded', () => {
const icon = React.createElement('p', {
style: { color: '#ffffff', fontSize: '22px',width: '48px', height: '48px', margin: '0px', padding: '0px', display: 'flex', alignItems: 'center', justifyContent: 'center', textAlign: 'center' },
}, [iconSpan1, iconSpan2]);
const mendableFloatingButton = React.createElement(
MendableFloatingButton,
{
style: { darkMode: false, accentColor: '#010810' },
floatingButtonStyle: { color: '#ffffff', backgroundColor: '#010810' },
anon_key: '82842b36-3ea6-49b2-9fb8-52cfc4bde6bf', // Mendable Search Public ANON key, ok to be public
cmdShortcutKey:'j',
messageSettings: {
openSourcesInNewTab: false,
prettySources: true // Prettify the sources displayed now
},
icon: icon,
}
@@ -52,7 +51,7 @@ document.addEventListener('DOMContentLoaded', () => {
loadScript('https://unpkg.com/react@17/umd/react.production.min.js', () => {
loadScript('https://unpkg.com/react-dom@17/umd/react-dom.production.min.js', () => {
loadScript('https://unpkg.com/@mendable/search@0.0.83/dist/umd/mendable.min.js', initializeMendable);
loadScript('https://unpkg.com/@mendable/search@0.0.102/dist/umd/mendable.min.js', initializeMendable);
});
});
});

View File

@@ -0,0 +1,137 @@
===========================
Deploying LLMs in Production
===========================
In today's fast-paced technological landscape, the use of Large Language Models (LLMs) is rapidly expanding. As a result, it's crucial for developers to understand how to effectively deploy these models in production environments. LLM interfaces typically fall into two categories:
- **Case 1: Utilizing External LLM Providers (OpenAI, Anthropic, etc.)**
In this scenario, most of the computational burden is handled by the LLM providers, while LangChain simplifies the implementation of business logic around these services. This approach includes features such as prompt templating, chat message generation, caching, vector embedding database creation, preprocessing, etc.
- **Case 2: Self-hosted Open-Source Models**
Alternatively, developers can opt to use smaller, yet comparably capable, self-hosted open-source LLM models. This approach can significantly decrease costs, latency, and privacy concerns associated with transferring data to external LLM providers.
Regardless of the framework that forms the backbone of your product, deploying LLM applications comes with its own set of challenges. It's vital to understand the trade-offs and key considerations when evaluating serving frameworks.
Outline
=======
This guide aims to provide a comprehensive overview of the requirements for deploying LLMs in a production setting, focusing on:
- `Designing a Robust LLM Application Service <#robust>`_
- `Maintaining Cost-Efficiency <#cost>`_
- `Ensuring Rapid Iteration <#iteration>`_
Understanding these components is crucial when assessing serving systems. LangChain integrates with several open-source projects designed to tackle these issues, providing a robust framework for productionizing your LLM applications. Some notable frameworks include:
- `Ray Serve <../integrations/ray_serve.html>`_
- `BentoML <https://github.com/ssheng/BentoChain>`_
- `Modal <../integrations/modal.html>`_
These links will provide further information on each ecosystem, assisting you in finding the best fit for your LLM deployment needs.
Designing a Robust LLM Application Service
===========================================
.. _robust:
When deploying an LLM service in production, it's imperative to provide a seamless user experience free from outages. Achieving 24/7 service availability involves creating and maintaining several sub-systems surrounding your application.
Monitoring
----------
Monitoring forms an integral part of any system running in a production environment. In the context of LLMs, it is essential to monitor both performance and quality metrics.
**Performance Metrics:** These metrics provide insights into the efficiency and capacity of your model. Here are some key examples:
- Query per second (QPS): This measures the number of queries your model processes in a second, offering insights into its utilization.
- Latency: This metric quantifies the delay from when your client sends a request to when they receive a response.
- Tokens Per Second (TPS): This represents the number of tokens your model can generate in a second.
**Quality Metrics:** These metrics are typically customized according to the business use-case. For instance, how does the output of your system compare to a baseline, such as a previous version? Although these metrics can be calculated offline, you need to log the necessary data to use them later.
Fault tolerance
---------------
Your application may encounter errors such as exceptions in your model inference or business logic code, causing failures and disrupting traffic. Other potential issues could arise from the machine running your application, such as unexpected hardware breakdowns or loss of spot-instances during high-demand periods. One way to mitigate these risks is by increasing redundancy through replica scaling and implementing recovery mechanisms for failed replicas. However, model replicas aren't the only potential points of failure. It's essential to build resilience against various failures that could occur at any point in your stack.
Zero down time upgrade
----------------------
System upgrades are often necessary but can result in service disruptions if not handled correctly. One way to prevent downtime during upgrades is by implementing a smooth transition process from the old version to the new one. Ideally, the new version of your LLM service is deployed, and traffic gradually shifts from the old to the new version, maintaining a constant QPS throughout the process.
Load balancing
--------------
Load balancing, in simple terms, is a technique to distribute work evenly across multiple computers, servers, or other resources to optimize the utilization of the system, maximize throughput, minimize response time, and avoid overload of any single resource. Think of it as a traffic officer directing cars (requests) to different roads (servers) so that no single road becomes too congested.
There are several strategies for load balancing. For example, one common method is the *Round Robin* strategy, where each request is sent to the next server in line, cycling back to the first when all servers have received a request. This works well when all servers are equally capable. However, if some servers are more powerful than others, you might use a *Weighted Round Robin* or *Least Connections* strategy, where more requests are sent to the more powerful servers, or to those currently handling the fewest active requests. Let's imagine you're running a LLM chain. If your application becomes popular, you could have hundreds or even thousands of users asking questions at the same time. If one server gets too busy (high load), the load balancer would direct new requests to another server that is less busy. This way, all your users get a timely response and the system remains stable.
Maintaining Cost-Efficiency and Scalability
============================================
.. _cost:
Deploying LLM services can be costly, especially when you're handling a large volume of user interactions. Charges by LLM providers are usually based on tokens used, making a chat system inference on these models potentially expensive. However, several strategies can help manage these costs without compromising the quality of the service.
Self-hosting models
-------------------
Several smaller and open-source LLMs are emerging to tackle the issue of reliance on LLM providers. Self-hosting allows you to maintain similar quality to LLM provider models while managing costs. The challenge lies in building a reliable, high-performing LLM serving system on your own machines.
Resource Management and Auto-Scaling
------------------------------------
Computational logic within your application requires precise resource allocation. For instance, if part of your traffic is served by an OpenAI endpoint and another part by a self-hosted model, it's crucial to allocate suitable resources for each. Auto-scaling—adjusting resource allocation based on traffic—can significantly impact the cost of running your application. This strategy requires a balance between cost and responsiveness, ensuring neither resource over-provisioning nor compromised application responsiveness.
Utilizing Spot Instances
------------------------
On platforms like AWS, spot instances offer substantial cost savings, typically priced at about a third of on-demand instances. The trade-off is a higher crash rate, necessitating a robust fault-tolerance mechanism for effective use.
Independent Scaling
-------------------
When self-hosting your models, you should consider independent scaling. For example, if you have two translation models, one fine-tuned for French and another for Spanish, incoming requests might necessitate different scaling requirements for each.
Batching requests
-----------------
In the context of Large Language Models, batching requests can enhance efficiency by better utilizing your GPU resources. GPUs are inherently parallel processors, designed to handle multiple tasks simultaneously. If you send individual requests to the model, the GPU might not be fully utilized as it's only working on a single task at a time. On the other hand, by batching requests together, you're allowing the GPU to work on multiple tasks at once, maximizing its utilization and improving inference speed. This not only leads to cost savings but can also improve the overall latency of your LLM service.
In summary, managing costs while scaling your LLM services requires a strategic approach. Utilizing self-hosting models, managing resources effectively, employing auto-scaling, using spot instances, independently scaling models, and batching requests are key strategies to consider. Open-source libraries such as Ray Serve and BentoML are designed to deal with these complexities.
Ensuring Rapid Iteration
========================
.. _iteration:
The LLM landscape is evolving at an unprecedented pace, with new libraries and model architectures being introduced constantly. Consequently, it's crucial to avoid tying yourself to a solution specific to one particular framework. This is especially relevant in serving, where changes to your infrastructure can be time-consuming, expensive, and risky. Strive for infrastructure that is not locked into any specific machine learning library or framework, but instead offers a general-purpose, scalable serving layer. Here are some aspects where flexibility plays a key role:
Model composition
-----------------
Deploying systems like LangChain demands the ability to piece together different models and connect them via logic. Take the example of building a natural language input SQL query engine. Querying an LLM and obtaining the SQL command is only part of the system. You need to extract metadata from the connected database, construct a prompt for the LLM, run the SQL query on an engine, collect and feed back the response to the LLM as the query runs, and present the results to the user. This demonstrates the need to seamlessly integrate various complex components built in Python into a dynamic chain of logical blocks that can be served together.
Cloud providers
---------------
Many hosted solutions are restricted to a single cloud provider, which can limit your options in today's multi-cloud world. Depending on where your other infrastructure components are built, you might prefer to stick with your chosen cloud provider.
Infrastructure as Code (IaC)
---------------------------
Rapid iteration also involves the ability to recreate your infrastructure quickly and reliably. This is where Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or Kubernetes YAML files come into play. They allow you to define your infrastructure in code files, which can be version controlled and quickly deployed, enabling faster and more reliable iterations.
CI/CD
-----
In a fast-paced environment, implementing CI/CD pipelines can significantly speed up the iteration process. They help automate the testing and deployment of your LLM applications, reducing the risk of errors and enabling faster feedback and iteration.

View File

@@ -6,51 +6,51 @@ First, you should install tracing and set up your environment properly.
You can use either a locally hosted version of this (uses Docker) or a cloud hosted version (in closed alpha).
If you're interested in using the hosted platform, please fill out the form [here](https://forms.gle/tRCEMSeopZf6TE3b6).
- [Locally Hosted Setup](./tracing/local_installation.md)
- [Cloud Hosted Setup](./tracing/hosted_installation.md)
- [Locally Hosted Setup](../tracing/local_installation.md)
- [Cloud Hosted Setup](../tracing/hosted_installation.md)
## Tracing Walkthrough
When you first access the UI, you should see a page with your tracing sessions.
An initial one "default" should already be created for you.
A session is just a way to group traces together.
If you click on a session, it will take you to a page with no recorded traces that says "No Runs."
When you first access the UI, you should see a page with your tracing sessions.
An initial one "default" should already be created for you.
A session is just a way to group traces together.
If you click on a session, it will take you to a page with no recorded traces that says "No Runs."
You can create a new session with the new session form.
![](tracing/homepage.png)
![](../tracing/homepage.png)
If we click on the `default` session, we can see that to start we have no traces stored.
![](tracing/default_empty.png)
![](../tracing/default_empty.png)
If we now start running chains and agents with tracing enabled, we will see data show up here.
To do so, we can run [this notebook](tracing/agent_with_tracing.ipynb) as an example.
To do so, we can run [this notebook](../tracing/agent_with_tracing.ipynb) as an example.
After running it, we will see an initial trace show up.
![](tracing/first_trace.png)
![](../tracing/first_trace.png)
From here we can explore the trace at a high level by clicking on the arrow to show nested runs.
We can keep on clicking further and further down to explore deeper and deeper.
![](tracing/explore.png)
![](../tracing/explore.png)
We can also click on the "Explore" button of the top level run to dive even deeper.
We can also click on the "Explore" button of the top level run to dive even deeper.
Here, we can see the inputs and outputs in full, as well as all the nested traces.
![](tracing/explore_trace.png)
![](../tracing/explore_trace.png)
We can keep on exploring each of these nested traces in more detail.
For example, here is the lowest level trace with the exact inputs/outputs to the LLM.
![](tracing/explore_llm.png)
![](../tracing/explore_llm.png)
## Changing Sessions
1. To initially record traces to a session other than `"default"`, you can set the `LANGCHAIN_SESSION` environment variable to the name of the session you want to record to:
```python
import os
os.environ["LANGCHAIN_HANDLER"] = "langchain"
os.environ["LANGCHAIN_TRACING"] = "true"
os.environ["LANGCHAIN_SESSION"] = "my_session" # Make sure this session actually exists. You can create a new session in the UI.
```

View File

@@ -0,0 +1,90 @@
# YouTube
This is a collection of `LangChain` videos on `YouTube`.
### ⛓️[Official LangChain YouTube channel](https://www.youtube.com/@LangChain)⛓️
### Introduction to LangChain with Harrison Chase, creator of LangChain
- [Building the Future with LLMs, `LangChain`, & `Pinecone`](https://youtu.be/nMniwlGyX-c) by [Pinecone](https://www.youtube.com/@pinecone-io)
- [LangChain and Weaviate with Harrison Chase and Bob van Luijt - Weaviate Podcast #36](https://youtu.be/lhby7Ql7hbk) by [Weaviate • Vector Database](https://www.youtube.com/@Weaviate)
- [LangChain Demo + Q&A with Harrison Chase](https://youtu.be/zaYTXQFR0_s?t=788) by [Full Stack Deep Learning](https://www.youtube.com/@FullStackDeepLearning)
- [LangChain Agents: Build Personal Assistants For Your Data (Q&A with Harrison Chase and Mayo Oshin)](https://youtu.be/gVkF8cwfBLI) by [Chat with data](https://www.youtube.com/@chatwithdata)
- ⛓️ [LangChain "Agents in Production" Webinar](https://youtu.be/k8GNCCs16F4) by [LangChain](https://www.youtube.com/@LangChain)
## Videos (sorted by views)
- [Building AI LLM Apps with LangChain (and more?) - LIVE STREAM](https://www.youtube.com/live/M-2Cj_2fzWI?feature=share) by [Nicholas Renotte](https://www.youtube.com/@NicholasRenotte)
- [First look - `ChatGPT` + `WolframAlpha` (`GPT-3.5` and Wolfram|Alpha via LangChain by James Weaver)](https://youtu.be/wYGbY811oMo) by [Dr Alan D. Thompson](https://www.youtube.com/@DrAlanDThompson)
- [LangChain explained - The hottest new Python framework](https://youtu.be/RoR4XJw8wIc) by [AssemblyAI](https://www.youtube.com/@AssemblyAI)
- [Chatbot with INFINITE MEMORY using `OpenAI` & `Pinecone` - `GPT-3`, `Embeddings`, `ADA`, `Vector DB`, `Semantic`](https://youtu.be/2xNzB7xq8nk) by [David Shapiro ~ AI](https://www.youtube.com/@DavidShapiroAutomator)
- [LangChain for LLMs is... basically just an Ansible playbook](https://youtu.be/X51N9C-OhlE) by [David Shapiro ~ AI](https://www.youtube.com/@DavidShapiroAutomator)
- [Build your own LLM Apps with LangChain & `GPT-Index`](https://youtu.be/-75p09zFUJY) by [1littlecoder](https://www.youtube.com/@1littlecoder)
- [`BabyAGI` - New System of Autonomous AI Agents with LangChain](https://youtu.be/lg3kJvf1kXo) by [1littlecoder](https://www.youtube.com/@1littlecoder)
- [Run `BabyAGI` with Langchain Agents (with Python Code)](https://youtu.be/WosPGHPObx8) by [1littlecoder](https://www.youtube.com/@1littlecoder)
- [How to Use Langchain With `Zapier` | Write and Send Email with GPT-3 | OpenAI API Tutorial](https://youtu.be/p9v2-xEa9A0) by [StarMorph AI](https://www.youtube.com/@starmorph)
- [Use Your Locally Stored Files To Get Response From GPT - `OpenAI` | Langchain | Python](https://youtu.be/NC1Ni9KS-rk) by [Shweta Lodha](https://www.youtube.com/@shweta-lodha)
- [`Langchain JS` | How to Use GPT-3, GPT-4 to Reference your own Data | `OpenAI Embeddings` Intro](https://youtu.be/veV2I-NEjaM) by [StarMorph AI](https://www.youtube.com/@starmorph)
- [The easiest way to work with large language models | Learn LangChain in 10min](https://youtu.be/kmbS6FDQh7c) by [Sophia Yang](https://www.youtube.com/@SophiaYangDS)
- [4 Autonomous AI Agents: “Westworld” simulation `BabyAGI`, `AutoGPT`, `Camel`, `LangChain`](https://youtu.be/yWbnH6inT_U) by [Sophia Yang](https://www.youtube.com/@SophiaYangDS)
- [AI CAN SEARCH THE INTERNET? Langchain Agents + OpenAI ChatGPT](https://youtu.be/J-GL0htqda8) by [tylerwhatsgood](https://www.youtube.com/@tylerwhatsgood)
- [Query Your Data with GPT-4 | Embeddings, Vector Databases | Langchain JS Knowledgebase](https://youtu.be/jRnUPUTkZmU) by [StarMorph AI](https://www.youtube.com/@starmorph)
- [`Weaviate` + LangChain for LLM apps presented by Erika Cardenas](https://youtu.be/7AGj4Td5Lgw) by [`Weaviate` • Vector Database](https://www.youtube.com/@Weaviate)
- [Langchain Overview — How to Use Langchain & `ChatGPT`](https://youtu.be/oYVYIq0lOtI) by [Python In Office](https://www.youtube.com/@pythoninoffice6568)
- [Langchain Overview - How to Use Langchain & `ChatGPT`](https://youtu.be/oYVYIq0lOtI) by [Python In Office](https://www.youtube.com/@pythoninoffice6568)
- [Custom langchain Agent & Tools with memory. Turn any `Python function` into langchain tool with Gpt 3](https://youtu.be/NIG8lXk0ULg) by [echohive](https://www.youtube.com/@echohive)
- [LangChain: Run Language Models Locally - `Hugging Face Models`](https://youtu.be/Xxxuw4_iCzw) by [Prompt Engineering](https://www.youtube.com/@engineerprompt)
- [`ChatGPT` with any `YouTube` video using langchain and `chromadb`](https://youtu.be/TQZfB2bzVwU) by [echohive](https://www.youtube.com/@echohive)
- [How to Talk to a `PDF` using LangChain and `ChatGPT`](https://youtu.be/v2i1YDtrIwk) by [Automata Learning Lab](https://www.youtube.com/@automatalearninglab)
- [Langchain Document Loaders Part 1: Unstructured Files](https://youtu.be/O5C0wfsen98) by [Merk](https://www.youtube.com/@merksworld)
- [LangChain - Prompt Templates (what all the best prompt engineers use)](https://youtu.be/1aRu8b0XNOQ) by [Nick Daigler](https://www.youtube.com/@nick_daigs)
- [LangChain. Crear aplicaciones Python impulsadas por GPT](https://youtu.be/DkW_rDndts8) by [Jesús Conde](https://www.youtube.com/@0utKast)
- [Easiest Way to Use GPT In Your Products | LangChain Basics Tutorial](https://youtu.be/fLy0VenZyGc) by [Rachel Woods](https://www.youtube.com/@therachelwoods)
- [`BabyAGI` + `GPT-4` Langchain Agent with Internet Access](https://youtu.be/wx1z_hs5P6E) by [tylerwhatsgood](https://www.youtube.com/@tylerwhatsgood)
- [Learning LLM Agents. How does it actually work? LangChain, AutoGPT & OpenAI](https://youtu.be/mb_YAABSplk) by [Arnoldas Kemeklis](https://www.youtube.com/@processusAI)
- [Get Started with LangChain in `Node.js`](https://youtu.be/Wxx1KUWJFv4) by [Developers Digest](https://www.youtube.com/@DevelopersDigest)
- [LangChain + `OpenAI` tutorial: Building a Q&A system w/ own text data](https://youtu.be/DYOU_Z0hAwo) by [Samuel Chan](https://www.youtube.com/@SamuelChan)
- [Langchain + `Zapier` Agent](https://youtu.be/yribLAb-pxA) by [Merk](https://www.youtube.com/@merksworld)
- [Connecting the Internet with `ChatGPT` (LLMs) using Langchain And Answers Your Questions](https://youtu.be/9Y0TBC63yZg) by [Kamalraj M M](https://www.youtube.com/@insightbuilder)
- [Build More Powerful LLM Applications for Businesss with LangChain (Beginners Guide)](https://youtu.be/sp3-WLKEcBg) by[ No Code Blackbox](https://www.youtube.com/@nocodeblackbox)
- ⛓️ [LangFlow LLM Agent Demo for 🦜🔗LangChain](https://youtu.be/zJxDHaWt-6o) by [Cobus Greyling](https://www.youtube.com/@CobusGreylingZA)
- ⛓️ [Chatbot Factory: Streamline Python Chatbot Creation with LLMs and Langchain](https://youtu.be/eYer3uzrcuM) by [Finxter](https://www.youtube.com/@CobusGreylingZA)
- ⛓️ [LangChain Tutorial - ChatGPT mit eigenen Daten](https://youtu.be/0XDLyY90E2c) by [Coding Crashkurse](https://www.youtube.com/@codingcrashkurse6429)
- ⛓️ [Chat with a `CSV` | LangChain Agents Tutorial (Beginners)](https://youtu.be/tjeti5vXWOU) by [GoDataProf](https://www.youtube.com/@godataprof)
- ⛓️ [Introdução ao Langchain - #Cortes - Live DataHackers](https://youtu.be/fw8y5VRei5Y) by [Prof. João Gabriel Lima](https://www.youtube.com/@profjoaogabriellima)
- ⛓️ [LangChain: Level up `ChatGPT` !? | LangChain Tutorial Part 1](https://youtu.be/vxUGx8aZpDE) by [Code Affinity](https://www.youtube.com/@codeaffinitydev)
- ⛓️ [KI schreibt krasses Youtube Skript 😲😳 | LangChain Tutorial Deutsch](https://youtu.be/QpTiXyK1jus) by [SimpleKI](https://www.youtube.com/@simpleki)
- ⛓️ [Chat with Audio: Langchain, `Chroma DB`, OpenAI, and `Assembly AI`](https://youtu.be/Kjy7cx1r75g) by [AI Anytime](https://www.youtube.com/@AIAnytime)
- ⛓️ [QA over documents with Auto vector index selection with Langchain router chains](https://youtu.be/9G05qybShv8) by [echohive](https://www.youtube.com/@echohive)
- ⛓️ [Build your own custom LLM application with `Bubble.io` & Langchain (No Code & Beginner friendly)](https://youtu.be/O7NhQGu1m6c) by [No Code Blackbox](https://www.youtube.com/@nocodeblackbox)
- ⛓️ [Simple App to Question Your Docs: Leveraging `Streamlit`, `Hugging Face Spaces`, LangChain, and `Claude`!](https://youtu.be/X4YbNECRr7o) by [Chris Alexiuk](https://www.youtube.com/@chrisalexiuk)
- ⛓️ [LANGCHAIN AI- `ConstitutionalChainAI` + Databutton AI ASSISTANT Web App](https://youtu.be/5zIU6_rdJCU) by [Avra](https://www.youtube.com/@Avra_b)
- ⛓️ [LANGCHAIN AI AUTONOMOUS AGENT WEB APP - 👶 `BABY AGI` 🤖 with EMAIL AUTOMATION using `DATABUTTON`](https://youtu.be/cvAwOGfeHgw) by [Avra](https://www.youtube.com/@Avra_b)
- ⛓️ [The Future of Data Analysis: Using A.I. Models in Data Analysis (LangChain)](https://youtu.be/v_LIcVyg5dk) by [Absent Data](https://www.youtube.com/@absentdata)
- ⛓️ [Memory in LangChain | Deep dive (python)](https://youtu.be/70lqvTFh_Yg) by [Eden Marco](https://www.youtube.com/@EdenMarco)
- ⛓️ [9 LangChain UseCases | Beginner's Guide | 2023](https://youtu.be/zS8_qosHNMw) by [Data Science Basics](https://www.youtube.com/@datasciencebasics)
- ⛓️ [Use Large Language Models in Jupyter Notebook | LangChain | Agents & Indexes](https://youtu.be/JSe11L1a_QQ) by [Abhinaw Tiwari](https://www.youtube.com/@AbhinawTiwariAT)
- ⛓️ [How to Talk to Your Langchain Agent | `11 Labs` + `Whisper`](https://youtu.be/N4k459Zw2PU) by [VRSEN](https://www.youtube.com/@vrsen)
- ⛓️ [LangChain Deep Dive: 5 FUN AI App Ideas To Build Quickly and Easily](https://youtu.be/mPYEPzLkeks) by [James NoCode](https://www.youtube.com/@jamesnocode)
- ⛓️ [BEST OPEN Alternative to OPENAI's EMBEDDINGs for Retrieval QA: LangChain](https://youtu.be/ogEalPMUCSY) by [Prompt Engineering](https://www.youtube.com/@engineerprompt)
- ⛓️ [LangChain 101: Models](https://youtu.be/T6c_XsyaNSQ) by [Mckay Wrigley](https://www.youtube.com/@realmckaywrigley)
- ⛓️ [LangChain with JavaScript Tutorial #1 | Setup & Using LLMs](https://youtu.be/W3AoeMrg27o) by [Leon van Zyl](https://www.youtube.com/@leonvanzyl)
- ⛓️ [LangChain Overview & Tutorial for Beginners: Build Powerful AI Apps Quickly & Easily (ZERO CODE)](https://youtu.be/iI84yym473Q) by [James NoCode](https://www.youtube.com/@jamesnocode)
- ⛓️ [LangChain In Action: Real-World Use Case With Step-by-Step Tutorial](https://youtu.be/UO699Szp82M) by [Rabbitmetrics](https://www.youtube.com/@rabbitmetrics)
- ⛓️ [Summarizing and Querying Multiple Papers with LangChain](https://youtu.be/p_MQRWH5Y6k) by [Automata Learning Lab](https://www.youtube.com/@automatalearninglab)
- ⛓️ [Using Langchain (and `Replit`) through `Tana`, ask `Google`/`Wikipedia`/`Wolfram Alpha` to fill out a table](https://youtu.be/Webau9lEzoI) by [Stian Håklev](https://www.youtube.com/@StianHaklev)
- ⛓️ [Langchain PDF App (GUI) | Create a ChatGPT For Your `PDF` in Python](https://youtu.be/wUAUdEw5oxM) by [Alejandro AO - Software & Ai](https://www.youtube.com/@alejandro_ao)
- ⛓️ [Auto-GPT with LangChain 🔥 | Create Your Own Personal AI Assistant](https://youtu.be/imDfPmMKEjM) by [Data Science Basics](https://www.youtube.com/@datasciencebasics)
- ⛓️ [Create Your OWN Slack AI Assistant with Python & LangChain](https://youtu.be/3jFXRNn2Bu8) by [Dave Ebbelaar](https://www.youtube.com/@daveebbelaar)
- ⛓️ [How to Create LOCAL Chatbots with GPT4All and LangChain [Full Guide]](https://youtu.be/4p1Fojur8Zw) by [Liam Ottley](https://www.youtube.com/@LiamOttley)
- ⛓️ [Build a `Multilingual PDF` Search App with LangChain, `Cohere` and `Bubble`](https://youtu.be/hOrtuumOrv8) by [Menlo Park Lab](https://www.youtube.com/@menloparklab)
- ⛓️ [Building a LangChain Agent (code-free!) Using `Bubble` and `Flowise`](https://youtu.be/jDJIIVWTZDE) by [Menlo Park Lab](https://www.youtube.com/@menloparklab)
- ⛓️ [Build a LangChain-based Semantic PDF Search App with No-Code Tools Bubble and Flowise](https://youtu.be/s33v5cIeqA4) by [Menlo Park Lab](https://www.youtube.com/@menloparklab)
- ⛓️ [LangChain Memory Tutorial | Building a ChatGPT Clone in Python](https://youtu.be/Cwq91cj2Pnc) by [Alejandro AO - Software & Ai](https://www.youtube.com/@alejandro_ao)
- ⛓️ [ChatGPT For Your DATA | Chat with Multiple Documents Using LangChain](https://youtu.be/TeDgIDqQmzs) by [Data Science Basics](https://www.youtube.com/@datasciencebasics)
- ⛓️ [`Llama Index`: Chat with Documentation using URL Loader](https://youtu.be/XJRoDEctAwA) by [Merk](https://www.youtube.com/@merksworld)
- ⛓️ [Using OpenAI, LangChain, and `Gradio` to Build Custom GenAI Applications](https://youtu.be/1MsmqMg3yUc) by [David Hundley](https://www.youtube.com/@dkhundley)
---------------------
⛓ icon marks a new video [last update 2023-05-15]

231
docs/dependents.md Normal file
View File

@@ -0,0 +1,231 @@
# Dependents
Dependents stats for `hwchase17/langchain`
[![](https://img.shields.io/static/v1?label=Used%20by&message=7484&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(public)&message=212&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(private)&message=7272&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(stars)&message=19095&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[update: 2023-06-05; only dependent repositories with Stars > 100]
| Repository | Stars |
| :-------- | -----: |
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 38024 |
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 33609 |
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 33136 |
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 30032 |
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 28094 |
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 23430 |
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 17942 |
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 16697 |
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16410 |
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14517 |
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 10793 |
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 10155 |
|[openai/evals](https://github.com/openai/evals) | 10076 |
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8619 |
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 8211 |
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 8154 |
|[PromtEngineer/localGPT](https://github.com/PromtEngineer/localGPT) | 6853 |
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 6830 |
|[PipedreamHQ/pipedream](https://github.com/PipedreamHQ/pipedream) | 6520 |
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 6018 |
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5643 |
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 5075 |
|[langgenius/dify](https://github.com/langgenius/dify) | 4281 |
|[nsarrazin/serge](https://github.com/nsarrazin/serge) | 4228 |
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 4084 |
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 4039 |
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 3871 |
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 3837 |
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3625 |
|[csunny/DB-GPT](https://github.com/csunny/DB-GPT) | 3545 |
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 3404 |
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3303 |
|[postgresml/postgresml](https://github.com/postgresml/postgresml) | 3052 |
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 3014 |
|[MineDojo/Voyager](https://github.com/MineDojo/Voyager) | 2945 |
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2761 |
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2673 |
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2589 |
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2572 |
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 2366 |
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2330 |
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 2289 |
|[ParisNeo/gpt4all-ui](https://github.com/ParisNeo/gpt4all-ui) | 2159 |
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 2158 |
|[guangzhengli/ChatFiles](https://github.com/guangzhengli/ChatFiles) | 2005 |
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 1939 |
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1845 |
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1749 |
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1740 |
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1628 |
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1607 |
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1544 |
|[SamurAIGPT/privateGPT](https://github.com/SamurAIGPT/privateGPT) | 1543 |
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1526 |
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1485 |
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1402 |
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1387 |
|[Chainlit/chainlit](https://github.com/Chainlit/chainlit) | 1336 |
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1323 |
|[psychic-api/psychic](https://github.com/psychic-api/psychic) | 1248 |
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1208 |
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1193 |
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 1182 |
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1137 |
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1135 |
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1086 |
|[keephq/keep](https://github.com/keephq/keep) | 1063 |
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1037 |
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 1035 |
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 997 |
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 995 |
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 949 |
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 936 |
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 908 |
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 902 |
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 875 |
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 822 |
|[homanp/superagent](https://github.com/homanp/superagent) | 806 |
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 800 |
|[chatarena/chatarena](https://github.com/chatarena/chatarena) | 796 |
|[hashintel/hash](https://github.com/hashintel/hash) | 795 |
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 786 |
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 770 |
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 769 |
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 755 |
|[noahshinn024/reflexion](https://github.com/noahshinn024/reflexion) | 706 |
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 695 |
|[cheshire-cat-ai/core](https://github.com/cheshire-cat-ai/core) | 681 |
|[e-johnstonn/BriefGPT](https://github.com/e-johnstonn/BriefGPT) | 656 |
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 635 |
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 583 |
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 555 |
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 550 |
|[kreneskyp/ix](https://github.com/kreneskyp/ix) | 543 |
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 510 |
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 501 |
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 497 |
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 496 |
|[microsoft/PodcastCopilot](https://github.com/microsoft/PodcastCopilot) | 492 |
|[debanjum/khoj](https://github.com/debanjum/khoj) | 485 |
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 485 |
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 462 |
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 460 |
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 457 |
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 451 |
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 446 |
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 446 |
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 441 |
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 439 |
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 429 |
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 422 |
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 407 |
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 405 |
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 395 |
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 384 |
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 376 |
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 371 |
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 365 |
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 358 |
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 357 |
|[opentensor/bittensor](https://github.com/opentensor/bittensor) | 347 |
|[showlab/VLog](https://github.com/showlab/VLog) | 345 |
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 345 |
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 332 |
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 320 |
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 312 |
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 311 |
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 310 |
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 294 |
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 283 |
|[itamargol/openai](https://github.com/itamargol/openai) | 281 |
|[momegas/megabots](https://github.com/momegas/megabots) | 279 |
|[personoids/personoids-lite](https://github.com/personoids/personoids-lite) | 277 |
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 267 |
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 266 |
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 260 |
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 248 |
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 245 |
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 240 |
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 237 |
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 234 |
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 234 |
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 226 |
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 220 |
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 219 |
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 216 |
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 215 |
|[truera/trulens](https://github.com/truera/trulens) | 208 |
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 208 |
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 207 |
|[monarch-initiative/ontogpt](https://github.com/monarch-initiative/ontogpt) | 200 |
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 195 |
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 185 |
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 184 |
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 182 |
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 180 |
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 177 |
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 174 |
|[billxbf/ReWOO](https://github.com/billxbf/ReWOO) | 170 |
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 168 |
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 168 |
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 164 |
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 164 |
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 158 |
|[plastic-labs/tutor-gpt](https://github.com/plastic-labs/tutor-gpt) | 154 |
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 154 |
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 154 |
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 153 |
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 153 |
|[edreisMD/plugnplai](https://github.com/edreisMD/plugnplai) | 148 |
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 145 |
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 145 |
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 144 |
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 143 |
|[PradipNichite/Youtube-Tutorials](https://github.com/PradipNichite/Youtube-Tutorials) | 140 |
|[gustavz/DataChad](https://github.com/gustavz/DataChad) | 140 |
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 140 |
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 139 |
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 137 |
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 137 |
|[SamPink/dev-gpt](https://github.com/SamPink/dev-gpt) | 135 |
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 135 |
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 135 |
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 134 |
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 133 |
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 133 |
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 133 |
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 132 |
|[yuanjie-ai/ChatLLM](https://github.com/yuanjie-ai/ChatLLM) | 132 |
|[yasyf/summ](https://github.com/yasyf/summ) | 132 |
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 130 |
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 127 |
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 126 |
|[vaibkumr/prompt-optimizer](https://github.com/vaibkumr/prompt-optimizer) | 125 |
|[preset-io/promptimize](https://github.com/preset-io/promptimize) | 124 |
|[homanp/vercel-langchain](https://github.com/homanp/vercel-langchain) | 124 |
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 123 |
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 118 |
|[nicknochnack/LangchainDocuments](https://github.com/nicknochnack/LangchainDocuments) | 116 |
|[jiran214/GPT-vup](https://github.com/jiran214/GPT-vup) | 112 |
|[rsaryev/talk-codebase](https://github.com/rsaryev/talk-codebase) | 112 |
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 112 |
|[microsoft/azure-openai-in-a-day-workshop](https://github.com/microsoft/azure-openai-in-a-day-workshop) | 112 |
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 112 |
|[prof-frink-lab/slangchain](https://github.com/prof-frink-lab/slangchain) | 111 |
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 110 |
|[fixie-ai/fixie-examples](https://github.com/fixie-ai/fixie-examples) | 108 |
|[miaoshouai/miaoshouai-assistant](https://github.com/miaoshouai/miaoshouai-assistant) | 105 |
|[flurb18/AgentOoba](https://github.com/flurb18/AgentOoba) | 103 |
|[solana-labs/chatgpt-plugin](https://github.com/solana-labs/chatgpt-plugin) | 102 |
|[Significant-Gravitas/Auto-GPT-Benchmarks](https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks) | 102 |
|[kaarthik108/snowChat](https://github.com/kaarthik108/snowChat) | 100 |
_Generated by [github-dependents-info](https://github.com/nvuillam/github-dependents-info)_
`github-dependents-info --repo hwchase17/langchain --markdownfile dependents.md --minstars 100 --sort stars`

25
docs/ecosystem/baseten.md Normal file
View File

@@ -0,0 +1,25 @@
# Baseten
Learn how to use LangChain with models deployed on Baseten.
## Installation and setup
- Create a [Baseten](https://baseten.co) account and [API key](https://docs.baseten.co/settings/api-keys).
- Install the Baseten Python client with `pip install baseten`
- Use your API key to authenticate with `baseten login`
## Invoking a model
Baseten integrates with LangChain through the LLM module, which provides a standardized and interoperable interface for models that are deployed on your Baseten workspace.
You can deploy foundation models like WizardLM and Alpaca with one click from the [Baseten model library](https://app.baseten.co/explore/) or if you have your own model, [deploy it with this tutorial](https://docs.baseten.co/deploying-models/deploy).
In this example, we'll work with WizardLM. [Deploy WizardLM here](https://app.baseten.co/explore/wizardlm) and follow along with the deployed [model's version ID](https://docs.baseten.co/managing-models/manage).
```python
from langchain.llms import Baseten
wizardlm = Baseten(model="MODEL_VERSION_ID", verbose=True)
wizardlm("What is the difference between a Wizard and a Sorcerer?")
```

View File

@@ -1,25 +0,0 @@
# Cohere
This page covers how to use the Cohere ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Cohere wrappers.
## Installation and Setup
- Install the Python SDK with `pip install cohere`
- Get an Cohere api key and set it as an environment variable (`COHERE_API_KEY`)
## Wrappers
### LLM
There exists an Cohere LLM wrapper, which you can access with
```python
from langchain.llms import Cohere
```
### Embeddings
There exists an Cohere Embeddings wrapper, which you can access with
```python
from langchain.embeddings import CohereEmbeddings
```
For a more detailed walkthrough of this, see [this notebook](../modules/models/text_embedding/examples/cohere.ipynb)

View File

@@ -1,25 +0,0 @@
# Databerry
This page covers how to use the [Databerry](https://databerry.ai) within LangChain.
## What is Databerry?
Databerry is an [open source](https://github.com/gmpetrov/databerry) document retrievial platform that helps to connect your personal data with Large Language Models.
![Databerry](../_static/DataberryDashboard.png)
## Quick start
Retrieving documents stored in Databerry from LangChain is very easy!
```python
from langchain.retrievers import DataberryRetriever
retriever = DataberryRetriever(
datastore_url="https://api.databerry.ai/query/clg1xg2h80000l708dymr0fxc",
# api_key="DATABERRY_API_KEY", # optional if datastore is public
# top_k=10 # optional
)
docs = retriever.get_relevant_documents("What's Databerry?")
```

View File

@@ -6,6 +6,11 @@ This section covers several options for that. Note that these options are meant
What follows is a list of template GitHub repositories designed to be easily forked and modified to use your chain. This list is far from exhaustive, and we are EXTREMELY open to contributions here.
## [Anyscale](https://www.anyscale.com/model-serving)
Anyscale is a unified compute platform that makes it easy to develop, deploy, and manage scalable LLM applications in production using Ray.
With Anyscale you can scale the most challenging LLM-based workloads and both develop and deploy LLM-based apps on a single compute platform.
## [Streamlit](https://github.com/hwchase17/langchain-streamlit-template)
This repo serves as a template for how to deploy a LangChain with Streamlit.
@@ -19,6 +24,12 @@ It implements a chatbot interface, with a "Bring-Your-Own-Token" approach (nice
It also contains instructions for how to deploy this app on the Hugging Face platform.
This is heavily influenced by James Weaver's [excellent examples](https://huggingface.co/JavaFXpert).
## [Chainlit](https://github.com/Chainlit/cookbook)
This repo is a cookbook explaining how to visualize and deploy LangChain agents with Chainlit.
You create ChatGPT-like UIs with Chainlit. Some of the key features include intermediary steps visualisation, element management & display (images, text, carousel, etc.) as well as cloud deployment.
Chainlit [doc](https://docs.chainlit.io/langchain) on the integration with LangChain
## [Beam](https://github.com/slai-labs/get-beam/tree/main/examples/langchain-question-answering)
This repo serves as a template for how deploy a LangChain with [Beam](https://beam.cloud).
@@ -29,6 +40,14 @@ It implements a Question Answering app and contains instructions for deploying t
A minimal example on how to run LangChain on Vercel using Flask.
## [FastAPI + Vercel](https://github.com/msoedov/langcorn)
A minimal example on how to run LangChain on Vercel using FastAPI and LangCorn/Uvicorn.
## [Kinsta](https://github.com/kinsta/hello-world-langchain)
A minimal example on how to deploy LangChain to [Kinsta](https://kinsta.com) using Flask.
## [Fly.io](https://github.com/fly-apps/hello-fly-langchain)
A minimal example of how to deploy LangChain to [Fly.io](https://fly.io/) using Flask.

View File

@@ -0,0 +1,20 @@
# ModelScope
This page covers how to use the modelscope ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific modelscope wrappers.
## Installation and Setup
* Install the Python SDK with `pip install modelscope`
## Wrappers
### Embeddings
There exists a modelscope Embeddings wrapper, which you can access with
```python
from langchain.embeddings import ModelScopeEmbeddings
```
For a more detailed walkthrough of this, see [this notebook](../modules/models/text_embedding/examples/modelscope_hub.ipynb)

View File

@@ -1,55 +0,0 @@
# OpenAI
This page covers how to use the OpenAI ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific OpenAI wrappers.
## Installation and Setup
- Install the Python SDK with `pip install openai`
- Get an OpenAI api key and set it as an environment variable (`OPENAI_API_KEY`)
- If you want to use OpenAI's tokenizer (only available for Python 3.9+), install it with `pip install tiktoken`
## Wrappers
### LLM
There exists an OpenAI LLM wrapper, which you can access with
```python
from langchain.llms import OpenAI
```
If you are using a model hosted on Azure, you should use different wrapper for that:
```python
from langchain.llms import AzureOpenAI
```
For a more detailed walkthrough of the Azure wrapper, see [this notebook](../modules/models/llms/integrations/azure_openai_example.ipynb)
### Embeddings
There exists an OpenAI Embeddings wrapper, which you can access with
```python
from langchain.embeddings import OpenAIEmbeddings
```
For a more detailed walkthrough of this, see [this notebook](../modules/models/text_embedding/examples/openai.ipynb)
### Tokenizer
There are several places you can use the `tiktoken` tokenizer. By default, it is used to count tokens
for OpenAI LLMs.
You can also use it to count tokens when splitting documents with
```python
from langchain.text_splitter import CharacterTextSplitter
CharacterTextSplitter.from_tiktoken_encoder(...)
```
For a more detailed walkthrough of this, see [this notebook](../modules/indexes/text_splitters/examples/tiktoken.ipynb)
### Moderation
You can also access the OpenAI content moderation endpoint with
```python
from langchain.chains import OpenAIModerationChain
```
For a more detailed walkthrough of this, see [this notebook](../modules/chains/examples/moderation.ipynb)

View File

@@ -1,56 +0,0 @@
# Prediction Guard
This page covers how to use the Prediction Guard ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Prediction Guard wrappers.
## Installation and Setup
- Install the Python SDK with `pip install predictionguard`
- Get an Prediction Guard access token (as described [here](https://docs.predictionguard.com/)) and set it as an environment variable (`PREDICTIONGUARD_TOKEN`)
## LLM Wrapper
There exists a Prediction Guard LLM wrapper, which you can access with
```python
from langchain.llms import PredictionGuard
```
You can provide the name of your Prediction Guard "proxy" as an argument when initializing the LLM:
```python
pgllm = PredictionGuard(name="your-text-gen-proxy")
```
Alternatively, you can use Prediction Guard's default proxy for SOTA LLMs:
```python
pgllm = PredictionGuard(name="default-text-gen")
```
You can also provide your access token directly as an argument:
```python
pgllm = PredictionGuard(name="default-text-gen", token="<your access token>")
```
## Example usage
Basic usage of the LLM wrapper:
```python
from langchain.llms import PredictionGuard
pgllm = PredictionGuard(name="default-text-gen")
pgllm("Tell me a joke")
```
Basic LLM Chaining with the Prediction Guard wrapper:
```python
from langchain import PromptTemplate, LLMChain
from langchain.llms import PredictionGuard
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=PredictionGuard(name="default-text-gen"), verbose=True)
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
llm_chain.predict(question=question)
```

View File

@@ -1,21 +0,0 @@
# Zilliz
This page covers how to use the Zilliz Cloud ecosystem within LangChain.
Zilliz uses the Milvus integration.
It is broken into two parts: installation and setup, and then references to specific Milvus wrappers.
## Installation and Setup
- Install the Python SDK with `pip install pymilvus`
## Wrappers
### VectorStore
There exists a wrapper around Zilliz indexes, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
To import this vectorstore:
```python
from langchain.vectorstores import Milvus
```
For a more detailed walkthrough of the Miluvs wrapper, see [this notebook](../modules/indexes/vectorstores/examples/zilliz.ipynb)

View File

@@ -1,346 +0,0 @@
LangChain Gallery
=================
Lots of people have built some pretty awesome stuff with LangChain.
This is a collection of our favorites.
If you see any other demos that you think we should highlight, be sure to let us know!
Open Source
-----------
.. panels::
:body: text-center
---
.. link-button:: https://github.com/bborn/howdoi.ai
:type: url
:text: HowDoI.ai
:classes: stretched-link btn-lg
+++
This is an experiment in building a large-language-model-backed chatbot. It can hold a conversation, remember previous comments/questions,
and answer all types of queries (history, web search, movie data, weather, news, and more).
---
.. link-button:: https://colab.research.google.com/drive/1sKSTjt9cPstl_WMZ86JsgEqFG-aSAwkn?usp=sharing
:type: url
:text: YouTube Transcription QA with Sources
:classes: stretched-link btn-lg
+++
An end-to-end example of doing question answering on YouTube transcripts, returning the timestamps as sources to legitimize the answer.
---
.. link-button:: https://github.com/normandmickey/MrsStax
:type: url
:text: QA Slack Bot
:classes: stretched-link btn-lg
+++
This application is a Slack Bot that uses Langchain and OpenAI's GPT3 language model to provide domain specific answers. You provide the documents.
---
.. link-button:: https://github.com/OpenBioLink/ThoughtSource
:type: url
:text: ThoughtSource
:classes: stretched-link btn-lg
+++
A central, open resource and community around data and tools related to chain-of-thought reasoning in large language models.
---
.. link-button:: https://github.com/blackhc/llm-strategy
:type: url
:text: LLM Strategy
:classes: stretched-link btn-lg
+++
This Python package adds a decorator llm_strategy that connects to an LLM (such as OpenAIs GPT-3) and uses the LLM to "implement" abstract methods in interface classes. It does this by forwarding requests to the LLM and converting the responses back to Python data using Python's @dataclasses.
---
.. link-button:: https://github.com/JohnNay/llm-lobbyist
:type: url
:text: Zero-Shot Corporate Lobbyist
:classes: stretched-link btn-lg
+++
A notebook showing how to use GPT to help with the work of a corporate lobbyist.
---
.. link-button:: https://dagster.io/blog/chatgpt-langchain
:type: url
:text: Dagster Documentation ChatBot
:classes: stretched-link btn-lg
+++
A jupyter notebook demonstrating how you could create a semantic search engine on documents in one of your Google Folders
---
.. link-button:: https://github.com/venuv/langchain_semantic_search
:type: url
:text: Google Folder Semantic Search
:classes: stretched-link btn-lg
+++
Build a GitHub support bot with GPT3, LangChain, and Python.
---
.. link-button:: https://huggingface.co/spaces/team7/talk_with_wind
:type: url
:text: Talk With Wind
:classes: stretched-link btn-lg
+++
Record sounds of anything (birds, wind, fire, train station) and chat with it.
---
.. link-button:: https://huggingface.co/spaces/JavaFXpert/Chat-GPT-LangChain
:type: url
:text: ChatGPT LangChain
:classes: stretched-link btn-lg
+++
This simple application demonstrates a conversational agent implemented with OpenAI GPT-3.5 and LangChain. When necessary, it leverages tools for complex math, searching the internet, and accessing news and weather.
---
.. link-button:: https://huggingface.co/spaces/JavaFXpert/gpt-math-techniques
:type: url
:text: GPT Math Techniques
:classes: stretched-link btn-lg
+++
A Hugging Face spaces project showing off the benefits of using PAL for math problems.
---
.. link-button:: https://colab.research.google.com/drive/1xt2IsFPGYMEQdoJFNgWNAjWGxa60VXdV
:type: url
:text: GPT Political Compass
:classes: stretched-link btn-lg
+++
Measure the political compass of GPT.
---
.. link-button:: https://github.com/hwchase17/notion-qa
:type: url
:text: Notion Database Question-Answering Bot
:classes: stretched-link btn-lg
+++
Open source GitHub project shows how to use LangChain to create a chatbot that can answer questions about an arbitrary Notion database.
---
.. link-button:: https://github.com/jerryjliu/llama_index
:type: url
:text: LlamaIndex
:classes: stretched-link btn-lg
+++
LlamaIndex (formerly GPT Index) is a project consisting of a set of data structures that are created using GPT-3 and can be traversed using GPT-3 in order to answer queries.
---
.. link-button:: https://github.com/JavaFXpert/llm-grovers-search-party
:type: url
:text: Grover's Algorithm
:classes: stretched-link btn-lg
+++
Leveraging Qiskit, OpenAI and LangChain to demonstrate Grover's algorithm
---
.. link-button:: https://huggingface.co/spaces/rituthombre/QNim
:type: url
:text: QNimGPT
:classes: stretched-link btn-lg
+++
A chat UI to play Nim, where a player can select an opponent, either a quantum computer or an AI
---
.. link-button:: https://colab.research.google.com/drive/19WTIWC3prw5LDMHmRMvqNV2loD9FHls6?usp=sharing
:type: url
:text: ReAct TextWorld
:classes: stretched-link btn-lg
+++
Leveraging the ReActTextWorldAgent to play TextWorld with an LLM!
---
.. link-button:: https://github.com/jagilley/fact-checker
:type: url
:text: Fact Checker
:classes: stretched-link btn-lg
+++
This repo is a simple demonstration of using LangChain to do fact-checking with prompt chaining.
---
.. link-button:: https://github.com/arc53/docsgpt
:type: url
:text: DocsGPT
:classes: stretched-link btn-lg
+++
Answer questions about the documentation of any project
Misc. Colab Notebooks
~~~~~~~~~~~~~~~~~~~~~
.. panels::
:body: text-center
---
.. link-button:: https://colab.research.google.com/drive/1AAyEdTz-Z6ShKvewbt1ZHUICqak0MiwR?usp=sharing
:type: url
:text: Wolfram Alpha in Conversational Agent
:classes: stretched-link btn-lg
+++
Give ChatGPT a WolframAlpha neural implant
---
.. link-button:: https://colab.research.google.com/drive/1UsCLcPy8q5PMNQ5ytgrAAAHa124dzLJg?usp=sharing
:type: url
:text: Tool Updates in Agents
:classes: stretched-link btn-lg
+++
Agent improvements (6th Jan 2023)
---
.. link-button:: https://colab.research.google.com/drive/1UsCLcPy8q5PMNQ5ytgrAAAHa124dzLJg?usp=sharing
:type: url
:text: Conversational Agent with Tools (Langchain AGI)
:classes: stretched-link btn-lg
+++
Langchain AGI (23rd Dec 2022)
Proprietary
-----------
.. panels::
:body: text-center
---
.. link-button:: https://twitter.com/sjwhitmore/status/1580593217153531908?s=20&t=neQvtZZTlp623U3LZwz3bQ
:type: url
:text: Daimon
:classes: stretched-link btn-lg
+++
A chat-based AI personal assistant with long-term memory about you.
---
.. link-button:: https://anysummary.app
:type: url
:text: Summarize any file with AI
:classes: stretched-link btn-lg
+++
Summarize not only long docs, interview audio or video files quickly, but also entire websites and YouTube videos. Share or download your generated summaries to collaborate with others, or revisit them at any time! Bonus: `@anysummary <https://twitter.com/anysummary>`_ on Twitter will also summarize any thread it is tagged in.
---
.. link-button:: https://twitter.com/dory111111/status/1608406234646052870?s=20&t=XYlrbKM0ornJsrtGa0br-g
:type: url
:text: AI Assisted SQL Query Generator
:classes: stretched-link btn-lg
+++
An app to write SQL using natural language, and execute against real DB.
---
.. link-button:: https://twitter.com/krrish_dh/status/1581028925618106368?s=20&t=neQvtZZTlp623U3LZwz3bQ
:type: url
:text: Clerkie
:classes: stretched-link btn-lg
+++
Stack Tracing QA Bot to help debug complex stack tracing (especially the ones that go multi-function/file deep).
---
.. link-button:: https://twitter.com/Raza_Habib496/status/1596880140490838017?s=20&t=6MqEQYWfSqmJwsKahjCVOA
:type: url
:text: Sales Email Writer
:classes: stretched-link btn-lg
+++
By Raza Habib, this demo utilizes LangChain + SerpAPI + HumanLoop to write sales emails. Give it a company name and a person, this application will use Google Search (via SerpAPI) to get more information on the company and the person, and then write them a sales message.
---
.. link-button:: https://twitter.com/chillzaza_/status/1592961099384905730?s=20&t=EhU8jl0KyCPJ7vE9Rnz-cQ
:type: url
:text: Question-Answering on a Web Browser
:classes: stretched-link btn-lg
+++
By Zahid Khawaja, this demo utilizes question answering to answer questions about a given website. A followup added this for `YouTube videos <https://twitter.com/chillzaza_/status/1593739682013220865?s=20&t=EhU8jl0KyCPJ7vE9Rnz-cQ>`_, and then another followup added it for `Wikipedia <https://twitter.com/chillzaza_/status/1594847151238037505?s=20&t=EhU8jl0KyCPJ7vE9Rnz-cQ>`_.
---
.. link-button:: https://mynd.so
:type: url
:text: Mynd
:classes: stretched-link btn-lg
+++
A journaling app for self-care that uses AI to uncover insights and patterns over time.

View File

@@ -0,0 +1,75 @@
# Concepts
These are concepts and terminology commonly used when developing LLM applications.
It contains reference to external papers or sources where the concept was first introduced,
as well as to places in LangChain where the concept is used.
## Chain of Thought
`Chain of Thought (CoT)` is a prompting technique used to encourage the model to generate a series of intermediate reasoning steps.
A less formal way to induce this behavior is to include “Lets think step-by-step” in the prompt.
- [Chain-of-Thought Paper](https://arxiv.org/pdf/2201.11903.pdf)
- [Step-by-Step Paper](https://arxiv.org/abs/2112.00114)
## Action Plan Generation
`Action Plan Generation` is a prompting technique that uses a language model to generate actions to take.
The results of these actions can then be fed back into the language model to generate a subsequent action.
- [WebGPT Paper](https://arxiv.org/pdf/2112.09332.pdf)
- [SayCan Paper](https://say-can.github.io/assets/palm_saycan.pdf)
## ReAct
`ReAct` is a prompting technique that combines Chain-of-Thought prompting with action plan generation.
This induces the model to think about what action to take, then take it.
- [Paper](https://arxiv.org/pdf/2210.03629.pdf)
- [LangChain Example](../modules/agents/agents/examples/react.ipynb)
## Self-ask
`Self-ask` is a prompting method that builds on top of chain-of-thought prompting.
In this method, the model explicitly asks itself follow-up questions, which are then answered by an external search engine.
- [Paper](https://ofir.io/self-ask.pdf)
- [LangChain Example](../modules/agents/agents/examples/self_ask_with_search.ipynb)
## Prompt Chaining
`Prompt Chaining` is combining multiple LLM calls, with the output of one-step being the input to the next.
- [PromptChainer Paper](https://arxiv.org/pdf/2203.06566.pdf)
- [Language Model Cascades](https://arxiv.org/abs/2207.10342)
- [ICE Primer Book](https://primer.ought.org/)
- [Socratic Models](https://socraticmodels.github.io/)
## Memetic Proxy
`Memetic Proxy` is encouraging the LLM
to respond in a certain way framing the discussion in a context that the model knows of and that
will result in that type of response.
For example, as a conversation between a student and a teacher.
- [Paper](https://arxiv.org/pdf/2102.07350.pdf)
## Self Consistency
`Self Consistency` is a decoding strategy that samples a diverse set of reasoning paths and then selects the most consistent answer.
Is most effective when combined with Chain-of-thought prompting.
- [Paper](https://arxiv.org/pdf/2203.11171.pdf)
## Inception
`Inception` is also called `First Person Instruction`.
It is encouraging the model to think a certain way by including the start of the models response in the prompt.
- [Example](https://twitter.com/goodside/status/1583262455207460865?s=20&t=8Hz7XBnK1OF8siQrxxCIGQ)
## MemPrompt
`MemPrompt` maintains a memory of errors and user feedback, and uses them to prevent repetition of mistakes.
- [Paper](https://memprompt.com/)

View File

@@ -37,6 +37,12 @@ import os
os.environ["OPENAI_API_KEY"] = "..."
```
If you want to set the API key dynamically, you can use the openai_api_key parameter when initiating OpenAI class—for instance, each user's API key.
```python
from langchain.llms import OpenAI
llm = OpenAI(openai_api_key="OPENAI_API_KEY")
```
## Building a Language Model Application: LLMs
@@ -172,9 +178,9 @@ In order to load agents, you should understand the following concepts:
- LLM: The language model powering the agent.
- Agent: The agent to use. This should be a string that references a support agent class. Because this notebook focuses on the simplest, highest level API, this only covers using the standard supported agents. If you want to implement a custom agent, see the documentation for custom agents (coming soon).
**Agents**: For a list of supported agents and their specifications, see [here](../modules/agents/agents.md).
**Agents**: For a list of supported agents and their specifications, see [here](../modules/agents/getting_started.ipynb).
**Tools**: For a list of predefined tools and their specifications, see [here](../modules/agents/tools.md).
**Tools**: For a list of predefined tools and their specifications, see [here](../modules/agents/tools/getting_started.md).
For this example, you will also need to install the SerpAPI Python package.
@@ -316,7 +322,7 @@ You can also pass in multiple messages for OpenAI's gpt-3.5-turbo and gpt-4 mode
```python
messages = [
SystemMessage(content="You are a helpful assistant that translates English to French."),
HumanMessage(content="Translate this sentence from English to French. I love programming.")
HumanMessage(content="I love programming.")
]
chat(messages)
# -> AIMessage(content="J'aime programmer.", additional_kwargs={})
@@ -327,29 +333,29 @@ You can go one step further and generate completions for multiple sets of messag
batch_messages = [
[
SystemMessage(content="You are a helpful assistant that translates English to French."),
HumanMessage(content="Translate this sentence from English to French. I love programming.")
HumanMessage(content="I love programming.")
],
[
SystemMessage(content="You are a helpful assistant that translates English to French."),
HumanMessage(content="Translate this sentence from English to French. I love artificial intelligence.")
HumanMessage(content="I love artificial intelligence.")
],
]
result = chat.generate(batch_messages)
result
# -> LLMResult(generations=[[ChatGeneration(text="J'aime programmer.", generation_info=None, message=AIMessage(content="J'aime programmer.", additional_kwargs={}))], [ChatGeneration(text="J'aime l'intelligence artificielle.", generation_info=None, message=AIMessage(content="J'aime l'intelligence artificielle.", additional_kwargs={}))]], llm_output={'token_usage': {'prompt_tokens': 71, 'completion_tokens': 18, 'total_tokens': 89}})
# -> LLMResult(generations=[[ChatGeneration(text="J'aime programmer.", generation_info=None, message=AIMessage(content="J'aime programmer.", additional_kwargs={}))], [ChatGeneration(text="J'aime l'intelligence artificielle.", generation_info=None, message=AIMessage(content="J'aime l'intelligence artificielle.", additional_kwargs={}))]], llm_output={'token_usage': {'prompt_tokens': 57, 'completion_tokens': 20, 'total_tokens': 77}})
```
You can recover things like token usage from this LLMResult:
```
result.llm_output['token_usage']
# -> {'prompt_tokens': 71, 'completion_tokens': 18, 'total_tokens': 89}
# -> {'prompt_tokens': 57, 'completion_tokens': 20, 'total_tokens': 77}
```
## Chat Prompt Templates
Similar to LLMs, you can make use of templating by using a `MessagePromptTemplate`. You can build a `ChatPromptTemplate` from one or more `MessagePromptTemplate`s. You can use `ChatPromptTemplate`'s `format_prompt` -- this returns a `PromptValue`, which you can convert to a string or `Message` object, depending on whether you want to use the formatted value as input to an llm or chat model.
For convience, there is a `from_template` method exposed on the template. If you were to use this template, this is what it would look like:
For convenience, there is a `from_template` method exposed on the template. If you were to use this template, this is what it would look like:
```python
from langchain.chat_models import ChatOpenAI
@@ -361,9 +367,9 @@ from langchain.prompts.chat import (
chat = ChatOpenAI(temperature=0)
template="You are a helpful assistant that translates {input_language} to {output_language}."
template = "You are a helpful assistant that translates {input_language} to {output_language}."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="{text}"
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
@@ -387,9 +393,9 @@ from langchain.prompts.chat import (
chat = ChatOpenAI(temperature=0)
template="You are a helpful assistant that translates {input_language} to {output_language}."
template = "You are a helpful assistant that translates {input_language} to {output_language}."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="{text}"
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

View File

@@ -0,0 +1,113 @@
# Tutorials
⛓ icon marks a new addition [last update 2023-05-15]
### DeepLearning.AI course
⛓[LangChain for LLM Application Development](https://learn.deeplearning.ai/langchain) by Harrison Chase presented by [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng)
### Handbook
[LangChain AI Handbook](https://www.pinecone.io/learn/langchain/) By **James Briggs** and **Francisco Ingham**
### Tutorials
[LangChain Tutorials](https://www.youtube.com/watch?v=FuqdVNB_8c0&list=PL9V0lbeJ69brU-ojMpU1Y7Ic58Tap0Cw6) by [Edrick](https://www.youtube.com/@edrickdch):
- ⛓ [LangChain, Chroma DB, OpenAI Beginner Guide | ChatGPT with your PDF](https://youtu.be/FuqdVNB_8c0)
- ⛓ [LangChain 101: The Complete Beginner's Guide](https://youtu.be/P3MAbZ2eMUI)
[LangChain Crash Course: Build an AutoGPT app in 25 minutes](https://youtu.be/MlK6SIjcjE8) by [Nicholas Renotte](https://www.youtube.com/@NicholasRenotte)
[LangChain Crash Course - Build apps with language models](https://youtu.be/LbT1yp6quS8) by [Patrick Loeber](https://www.youtube.com/@patloeber)
[LangChain Explained in 13 Minutes | QuickStart Tutorial for Beginners](https://youtu.be/aywZrzNaKjs) by [Rabbitmetrics](https://www.youtube.com/@rabbitmetrics)
###
[LangChain for Gen AI and LLMs](https://www.youtube.com/playlist?list=PLIUOU7oqGTLieV9uTIFMm6_4PXg-hlN6F) by [James Briggs](https://www.youtube.com/@jamesbriggs):
- #1 [Getting Started with `GPT-3` vs. Open Source LLMs](https://youtu.be/nE2skSRWTTs)
- #2 [Prompt Templates for `GPT 3.5` and other LLMs](https://youtu.be/RflBcK0oDH0)
- #3 [LLM Chains using `GPT 3.5` and other LLMs](https://youtu.be/S8j9Tk0lZHU)
- #4 [Chatbot Memory for `Chat-GPT`, `Davinci` + other LLMs](https://youtu.be/X05uK0TZozM)
- #5 [Chat with OpenAI in LangChain](https://youtu.be/CnAgB3A5OlU)
-#6 [Fixing LLM Hallucinations with Retrieval Augmentation in LangChain](https://youtu.be/kvdVduIJsc8)
-#7 [LangChain Agents Deep Dive with GPT 3.5](https://youtu.be/jSP-gSEyVeI)
-#8 [Create Custom Tools for Chatbots in LangChain](https://youtu.be/q-HNphrWsDE)
-#9 [Build Conversational Agents with Vector DBs](https://youtu.be/H6bCqqw9xyI)
###
[LangChain 101](https://www.youtube.com/playlist?list=PLqZXAkvF1bPNQER9mLmDbntNfSpzdDIU5) by [Data Independent](https://www.youtube.com/@DataIndependent):
- [What Is LangChain? - LangChain + `ChatGPT` Overview](https://youtu.be/_v_fgW2SkkQ)
- [Quickstart Guide](https://youtu.be/kYRB-vJFy38)
- [Beginner Guide To 7 Essential Concepts](https://youtu.be/2xxziIWmaSA)
- [`OpenAI` + `Wolfram Alpha`](https://youtu.be/UijbzCIJ99g)
- [Ask Questions On Your Custom (or Private) Files](https://youtu.be/EnT-ZTrcPrg)
- [Connect `Google Drive Files` To `OpenAI`](https://youtu.be/IqqHqDcXLww)
- [`YouTube Transcripts` + `OpenAI`](https://youtu.be/pNcQ5XXMgH4)
- [Question A 300 Page Book (w/ `OpenAI` + `Pinecone`)](https://youtu.be/h0DHDp1FbmQ)
- [Workaround `OpenAI's` Token Limit With Chain Types](https://youtu.be/f9_BWhCI4Zo)
- [Build Your Own OpenAI + LangChain Web App in 23 Minutes](https://youtu.be/U_eV8wfMkXU)
- [Working With The New `ChatGPT API`](https://youtu.be/e9P7FLi5Zy8)
- [OpenAI + LangChain Wrote Me 100 Custom Sales Emails](https://youtu.be/y1pyAQM-3Bo)
- [Structured Output From `OpenAI` (Clean Dirty Data)](https://youtu.be/KwAXfey-xQk)
- [Connect `OpenAI` To +5,000 Tools (LangChain + `Zapier`)](https://youtu.be/7tNm0yiDigU)
- [Use LLMs To Extract Data From Text (Expert Mode)](https://youtu.be/xZzvwR9jdPA)
- ⛓ [Extract Insights From Interview Transcripts Using LLMs](https://youtu.be/shkMOHwJ4SM)
- ⛓ [5 Levels Of LLM Summarizing: Novice to Expert](https://youtu.be/qaPMdcCqtWk)
###
[LangChain How to and guides](https://www.youtube.com/playlist?list=PL8motc6AQftk1Bs42EW45kwYbyJ4jOdiZ) by [Sam Witteveen](https://www.youtube.com/@samwitteveenai):
- [LangChain Basics - LLMs & PromptTemplates with Colab](https://youtu.be/J_0qvRt4LNk)
- [LangChain Basics - Tools and Chains](https://youtu.be/hI2BY7yl_Ac)
- [`ChatGPT API` Announcement & Code Walkthrough with LangChain](https://youtu.be/phHqvLHCwH4)
- [Conversations with Memory (explanation & code walkthrough)](https://youtu.be/X550Zbz_ROE)
- [Chat with `Flan20B`](https://youtu.be/VW5LBavIfY4)
- [Using `Hugging Face Models` locally (code walkthrough)](https://youtu.be/Kn7SX2Mx_Jk)
- [`PAL` : Program-aided Language Models with LangChain code](https://youtu.be/dy7-LvDu-3s)
- [Building a Summarization System with LangChain and `GPT-3` - Part 1](https://youtu.be/LNq_2s_H01Y)
- [Building a Summarization System with LangChain and `GPT-3` - Part 2](https://youtu.be/d-yeHDLgKHw)
- [Microsoft's `Visual ChatGPT` using LangChain](https://youtu.be/7YEiEyfPF5U)
- [LangChain Agents - Joining Tools and Chains with Decisions](https://youtu.be/ziu87EXZVUE)
- [Comparing LLMs with LangChain](https://youtu.be/rFNG0MIEuW0)
- [Using `Constitutional AI` in LangChain](https://youtu.be/uoVqNFDwpX4)
- [Talking to `Alpaca` with LangChain - Creating an Alpaca Chatbot](https://youtu.be/v6sF8Ed3nTE)
- [Talk to your `CSV` & `Excel` with LangChain](https://youtu.be/xQ3mZhw69bc)
- [`BabyAGI`: Discover the Power of Task-Driven Autonomous Agents!](https://youtu.be/QBcDLSE2ERA)
- [Improve your `BabyAGI` with LangChain](https://youtu.be/DRgPyOXZ-oE)
- ⛓ [Master `PDF` Chat with LangChain - Your essential guide to queries on documents](https://youtu.be/ZzgUqFtxgXI)
- ⛓ [Using LangChain with `DuckDuckGO` `Wikipedia` & `PythonREPL` Tools](https://youtu.be/KerHlb8nuVc)
- ⛓ [Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)](https://youtu.be/biS8G8x8DdA)
- ⛓ [LangChain Retrieval QA Over Multiple Files with `ChromaDB`](https://youtu.be/3yPBVii7Ct0)
- ⛓ [LangChain Retrieval QA with Instructor Embeddings & `ChromaDB` for PDFs](https://youtu.be/cFCGUjc33aU)
- ⛓ [LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!](https://youtu.be/9ISVjh8mdlA)
###
[LangChain](https://www.youtube.com/playlist?list=PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr) by [Prompt Engineering](https://www.youtube.com/@engineerprompt):
- [LangChain Crash Course — All You Need to Know to Build Powerful Apps with LLMs](https://youtu.be/5-fc4Tlgmro)
- [Working with MULTIPLE `PDF` Files in LangChain: `ChatGPT` for your Data](https://youtu.be/s5LhRdh5fu4)
- [`ChatGPT` for YOUR OWN `PDF` files with LangChain](https://youtu.be/TLf90ipMzfE)
- [Talk to YOUR DATA without OpenAI APIs: LangChain](https://youtu.be/wrD-fZvT6UI)
- ⛓️ [CHATGPT For WEBSITES: Custom ChatBOT](https://youtu.be/RBnuhhmD21U)
###
LangChain by [Chat with data](https://www.youtube.com/@chatwithdata)
- [LangChain Beginner's Tutorial for `Typescript`/`Javascript`](https://youtu.be/bH722QgRlhQ)
- [`GPT-4` Tutorial: How to Chat With Multiple `PDF` Files (~1000 pages of Tesla's 10-K Annual Reports)](https://youtu.be/Ix9WIZpArm0)
- [`GPT-4` & LangChain Tutorial: How to Chat With A 56-Page `PDF` Document (w/`Pinecone`)](https://youtu.be/ih9PBGVVOO4)
- ⛓ [LangChain & Supabase Tutorial: How to Build a ChatGPT Chatbot For Your Website](https://youtu.be/R2FMzcsmQY8)
###
[Get SH\*T Done with Prompt Engineering and LangChain](https://www.youtube.com/watch?v=muXbPpG_ys4&list=PLEJK-H61Xlwzm5FYLDdKt_6yibO33zoMW) by [Venelin Valkov](https://www.youtube.com/@venelin_valkov)
- [Getting Started with LangChain: Load Custom Data, Run OpenAI Models, Embeddings and `ChatGPT`](https://www.youtube.com/watch?v=muXbPpG_ys4)
- [Loaders, Indexes & Vectorstores in LangChain: Question Answering on `PDF` files with `ChatGPT`](https://www.youtube.com/watch?v=FQnvfR8Dmr0)
- [LangChain Models: `ChatGPT`, `Flan Alpaca`, `OpenAI Embeddings`, Prompt Templates & Streaming](https://www.youtube.com/watch?v=zy6LiK5F5-s)
- [LangChain Chains: Use `ChatGPT` to Build Conversational Agents, Summaries and Q&A on Text With LLMs](https://www.youtube.com/watch?v=h1tJZQPcimM)
- [Analyze Custom CSV Data with `GPT-4` using Langchain](https://www.youtube.com/watch?v=Ew3sGdX8at4)
- ⛓ [Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations](https://youtu.be/CyuUlf54wTs)
---------------------
⛓ icon marks a new addition [last update 2023-05-15]

View File

@@ -1,90 +0,0 @@
# Glossary
This is a collection of terminology commonly used when developing LLM applications.
It contains reference to external papers or sources where the concept was first introduced,
as well as to places in LangChain where the concept is used.
## Chain of Thought Prompting
A prompting technique used to encourage the model to generate a series of intermediate reasoning steps.
A less formal way to induce this behavior is to include “Lets think step-by-step” in the prompt.
Resources:
- [Chain-of-Thought Paper](https://arxiv.org/pdf/2201.11903.pdf)
- [Step-by-Step Paper](https://arxiv.org/abs/2112.00114)
## Action Plan Generation
A prompt usage that uses a language model to generate actions to take.
The results of these actions can then be fed back into the language model to generate a subsequent action.
Resources:
- [WebGPT Paper](https://arxiv.org/pdf/2112.09332.pdf)
- [SayCan Paper](https://say-can.github.io/assets/palm_saycan.pdf)
## ReAct Prompting
A prompting technique that combines Chain-of-Thought prompting with action plan generation.
This induces the to model to think about what action to take, then take it.
Resources:
- [Paper](https://arxiv.org/pdf/2210.03629.pdf)
- [LangChain Example](modules/agents/agents/examples/react.ipynb)
## Self-ask
A prompting method that builds on top of chain-of-thought prompting.
In this method, the model explicitly asks itself follow-up questions, which are then answered by an external search engine.
Resources:
- [Paper](https://ofir.io/self-ask.pdf)
- [LangChain Example](modules/agents/agents/examples/self_ask_with_search.ipynb)
## Prompt Chaining
Combining multiple LLM calls together, with the output of one-step being the input to the next.
Resources:
- [PromptChainer Paper](https://arxiv.org/pdf/2203.06566.pdf)
- [Language Model Cascades](https://arxiv.org/abs/2207.10342)
- [ICE Primer Book](https://primer.ought.org/)
- [Socratic Models](https://socraticmodels.github.io/)
## Memetic Proxy
Encouraging the LLM to respond in a certain way framing the discussion in a context that the model knows of and that will result in that type of response. For example, as a conversation between a student and a teacher.
Resources:
- [Paper](https://arxiv.org/pdf/2102.07350.pdf)
## Self Consistency
A decoding strategy that samples a diverse set of reasoning paths and then selects the most consistent answer.
Is most effective when combined with Chain-of-thought prompting.
Resources:
- [Paper](https://arxiv.org/pdf/2203.11171.pdf)
## Inception
Also called “First Person Instruction”.
Encouraging the model to think a certain way by including the start of the models response in the prompt.
Resources:
- [Example](https://twitter.com/goodside/status/1583262455207460865?s=20&t=8Hz7XBnK1OF8siQrxxCIGQ)
## MemPrompt
MemPrompt maintains a memory of errors and user feedback, and uses them to prevent repetition of mistakes.
Resources:
- [Paper](https://memprompt.com/)

View File

@@ -1,49 +1,63 @@
Welcome to LangChain
==========================
LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an API, but will also:
| **LangChain** is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model, but will also be:
1. *Data-aware*: connect a language model to other sources of data
2. *Agentic*: allow a language model to interact with its environment
- *Be data-aware*: connect a language model to other sources of data
- *Be agentic*: allow a language model to interact with its environment
| The LangChain framework is designed around these principles.
The LangChain framework is designed with the above principles in mind.
This is the Python specific portion of the documentation. For a purely conceptual guide to LangChain, see `here <https://docs.langchain.com/docs/>`_. For the JavaScript documentation, see `here <https://js.langchain.com/docs/>`_.
| This is the Python specific portion of the documentation. For a purely conceptual guide to LangChain, see `here <https://docs.langchain.com/docs/>`_. For the JavaScript documentation, see `here <https://js.langchain.com/docs/>`_.
Getting Started
----------------
Checkout the below guide for a walkthrough of how to get started using LangChain to create an Language Model application.
| How to get started using LangChain to create an Language Model application.
- `Getting Started Documentation <./getting_started/getting_started.html>`_
- `Quickstart Guide <./getting_started/getting_started.html>`_
| Concepts and terminology.
- `Concepts and terminology <./getting_started/concepts.html>`_
| Tutorials created by community experts and presented on YouTube.
- `Tutorials <./getting_started/tutorials.html>`_
.. toctree::
:maxdepth: 1
:maxdepth: 2
:caption: Getting Started
:name: getting_started
:hidden:
getting_started/getting_started.md
getting_started/concepts.md
getting_started/tutorials.md
Modules
-----------
There are several main modules that LangChain provides support for.
For each module we provide some examples to get started, how-to guides, reference docs, and conceptual guides.
These modules are, in increasing order of complexity:
| These modules are the core abstractions which we view as the building blocks of any LLM-powered application.
For each module LangChain provides standard, extendable interfaces. LangChain also provides external integrations and even end-to-end implementations for off-the-shelf use.
- `Models <./modules/models.html>`_: The various model types and model integrations LangChain supports.
| The docs for each module contain quickstart examples, how-to guides, reference docs, and conceptual guides.
- `Prompts <./modules/prompts.html>`_: This includes prompt management, prompt optimization, and prompt serialization.
| The modules are (from least to most complex):
- `Memory <./modules/memory.html>`_: Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.
- `Models <./modules/models.html>`_: Supported model types and integrations.
- `Indexes <./modules/indexes.html>`_: Language models are often more powerful when combined with your own text data - this module covers best practices for doing exactly that.
- `Prompts <./modules/prompts.html>`_: Prompt management, optimization, and serialization.
- `Chains <./modules/chains.html>`_: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
- `Memory <./modules/memory.html>`_: Memory refers to state that is persisted between calls of a chain/agent.
- `Agents <./modules/agents.html>`_: Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents.
- `Indexes <./modules/indexes.html>`_: Language models become much more powerful when combined with application-specific data - this module contains interfaces and integrations for loading, querying and updating external data.
- `Chains <./modules/chains.html>`_: Chains are structured sequences of calls (to an LLM or to a different utility).
- `Agents <./modules/agents.html>`_: An agent is a Chain in which an LLM, given a high-level directive and a set of tools, repeatedly decides an action, executes the action and observes the outcome until the high-level directive is complete.
- `Callbacks <./modules/callbacks/getting_started.html>`_: Callbacks let you log and stream the intermediate steps of any chain, making it easy to observe, debug, and evaluate the internals of an application.
.. toctree::
:maxdepth: 1
@@ -53,37 +67,38 @@ These modules are, in increasing order of complexity:
./modules/models.rst
./modules/prompts.rst
./modules/indexes.md
./modules/memory.md
./modules/indexes.md
./modules/chains.md
./modules/agents.md
./modules/callbacks/getting_started.ipynb
Use Cases
----------
The above modules can be used in a variety of ways. LangChain also provides guidance and assistance in this. Below are some of the common use cases LangChain supports.
| Best practices and built-in implementations for common LangChain use cases:
- `Autonomous Agents <./use_cases/autonomous_agents.html>`_: Autonomous agents are long running agents that take many steps in an attempt to accomplish an objective. Examples include AutoGPT and BabyAGI.
- `Autonomous Agents <./use_cases/autonomous_agents.html>`_: Autonomous agents are long-running agents that take many steps in an attempt to accomplish an objective. Examples include AutoGPT and BabyAGI.
- `Agent Simulations <./use_cases/agent_simulations.html>`_: Putting agents in a sandbox and observing how they interact with each other or to events can be an interesting way to observe their long-term memory abilities.
- `Agent Simulations <./use_cases/agent_simulations.html>`_: Putting agents in a sandbox and observing how they interact with each other and react to events can be an effective way to evaluate their long-range reasoning and planning abilities.
- `Personal Assistants <./use_cases/personal_assistants.html>`_: The main LangChain use case. Personal assistants need to take actions, remember interactions, and have knowledge about your data.
- `Personal Assistants <./use_cases/personal_assistants.html>`_: One of the primary LangChain use cases. Personal assistants need to take actions, remember interactions, and have knowledge about your data.
- `Question Answering <./use_cases/question_answering.html>`_: The second big LangChain use case. Answering questions over specific documents, only utilizing the information in those documents to construct an answer.
- `Question Answering <./use_cases/question_answering.html>`_: Another common LangChain use case. Answering questions over specific documents, only utilizing the information in those documents to construct an answer.
- `Chatbots <./use_cases/chatbots.html>`_: Since language models are good at producing text, that makes them ideal for creating chatbots.
- `Chatbots <./use_cases/chatbots.html>`_: Language models love to chat, making this a very natural use of them.
- `Querying Tabular Data <./use_cases/tabular.html>`_: If you want to understand how to use LLMs to query data that is stored in a tabular format (csvs, SQL, dataframes, etc) you should read this page.
- `Querying Tabular Data <./use_cases/tabular.html>`_: Recommended reading if you want to use language models to query structured data (CSVs, SQL, dataframes, etc).
- `Code Understanding <./use_cases/code.html>`_: If you want to understand how to use LLMs to query source code from github, you should read this page.
- `Code Understanding <./use_cases/code.html>`_: Recommended reading if you want to use language models to analyze code.
- `Interacting with APIs <./use_cases/apis.html>`_: Enabling LLMs to interact with APIs is extremely powerful in order to give them more up-to-date information and allow them to take actions.
- `Interacting with APIs <./use_cases/apis.html>`_: Enabling language models to interact with APIs is extremely powerful. It gives them access to up-to-date information and allows them to take actions.
- `Extraction <./use_cases/extraction.html>`_: Extract structured information from text.
- `Summarization <./use_cases/summarization.html>`_: Summarizing longer documents into shorter, more condensed chunks of information. A type of Data Augmented Generation.
- `Summarization <./use_cases/summarization.html>`_: Compressing longer documents. A type of Data-Augmented Generation.
- `Evaluation <./use_cases/evaluation.html>`_: Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.
- `Evaluation <./use_cases/evaluation.html>`_: Generative models are hard to evaluate with traditional metrics. One promising approach is to use language models themselves to do the evaluation.
.. toctree::
@@ -92,26 +107,29 @@ The above modules can be used in a variety of ways. LangChain also provides guid
:name: use_cases
:hidden:
./use_cases/personal_assistants.md
./use_cases/autonomous_agents.md
./use_cases/agent_simulations.md
./use_cases/personal_assistants.md
./use_cases/question_answering.md
./use_cases/chatbots.md
./use_cases/tabular.rst
./use_cases/code.md
./use_cases/apis.md
./use_cases/summarization.md
./use_cases/extraction.md
./use_cases/summarization.md
./use_cases/evaluation.rst
Reference Docs
---------------
All of LangChain's reference documentation, in one place. Full documentation on all methods, classes, installation methods, and integration setups for LangChain.
| Full documentation on all methods, classes, installation methods, and integration setups for LangChain.
- `LangChain Installation <./reference/installation.html>`_
- `Reference Documentation <./reference.html>`_
.. toctree::
:maxdepth: 1
:caption: Reference
@@ -119,47 +137,54 @@ All of LangChain's reference documentation, in one place. Full documentation on
:hidden:
./reference/installation.md
./reference/integrations.md
./reference.rst
LangChain Ecosystem
-------------------
Ecosystem
------------
Guides for how other companies/products can be used with LangChain
| LangChain integrates a lot of different LLMs, systems, and products.
| From the other side, many systems and products depend on LangChain.
| It creates a vibrant and thriving ecosystem.
- `Integrations <./integrations.html>`_: Guides for how other products can be used with LangChain.
- `Dependents <./dependents.html>`_: List of repositories that use LangChain.
- `Deployments <./ecosystem/deployments.html>`_: A collection of instructions, code snippets, and template repositories for deploying LangChain apps.
- `LangChain Ecosystem <./ecosystem.html>`_
.. toctree::
:maxdepth: 1
:maxdepth: 2
:glob:
:caption: Ecosystem
:name: ecosystem
:hidden:
./ecosystem.rst
./integrations.rst
./dependents.md
./ecosystem/deployments.md
Additional Resources
---------------------
Additional collection of resources we think may be useful as you develop your application!
| Additional resources we think may be useful as you develop your application!
- `LangChainHub <https://github.com/hwchase17/langchain-hub>`_: The LangChainHub is a place to share and explore other prompts, chains, and agents.
- `Glossary <./glossary.html>`_: A glossary of all related terms, papers, methods, etc. Whether implemented in LangChain or not!
- `Gallery <https://github.com/kyrolabs/awesome-langchain>`_: A collection of great projects that use Langchain, compiled by the folks at `Kyrolabs <https://kyrolabs.com>`_. Useful for finding inspiration and example implementations.
- `Gallery <./gallery.html>`_: A collection of our favorite projects that use LangChain. Useful for finding inspiration or seeing how things were done in other applications.
- `Deploying LLMs in Production <./additional_resources/deploy_llms.html>`_: A collection of best practices and tutorials for deploying LLMs in production.
- `Deployments <./deployments.html>`_: A collection of instructions, code snippets, and template repositories for deploying LangChain apps.
- `Tracing <./additional_resources/tracing.html>`_: A guide on using tracing in LangChain to visualize the execution of chains and agents.
- `Tracing <./tracing.html>`_: A guide on using tracing in LangChain to visualize the execution of chains and agents.
- `Model Laboratory <./model_laboratory.html>`_: Experimenting with different prompts, models, and chains is a big part of developing the best possible application. The ModelLaboratory makes it easy to do so.
- `Model Laboratory <./additional_resources/model_laboratory.html>`_: Experimenting with different prompts, models, and chains is a big part of developing the best possible application. The ModelLaboratory makes it easy to do so.
- `Discord <https://discord.gg/6adMQxSpJS>`_: Join us on our Discord to discuss all things LangChain!
- `YouTube <./youtube.html>`_: A collection of the LangChain tutorials and videos.
- `YouTube <./additional_resources/youtube.html>`_: A collection of the LangChain tutorials and videos.
- `Production Support <https://forms.gle/57d8AmXBYp8PP8tZA>`_: As you move your LangChains into production, we'd love to offer more comprehensive support. Please fill out this form and we'll set up a dedicated support Slack channel.
@@ -171,11 +196,11 @@ Additional collection of resources we think may be useful as you develop your ap
:hidden:
LangChainHub <https://github.com/hwchase17/langchain-hub>
./glossary.md
./gallery.rst
./deployments.md
./tracing.md
./use_cases/model_laboratory.ipynb
./additional_resources/deployments.md
./additional_resources/deploy_llms.rst
Gallery <https://github.com/kyrolabs/awesome-langchain>
./additional_resources/tracing.md
./additional_resources/model_laboratory.ipynb
Discord <https://discord.gg/6adMQxSpJS>
./youtube.md
./additional_resources/youtube.md
Production Support <https://forms.gle/57d8AmXBYp8PP8tZA>

View File

@@ -1,12 +1,13 @@
LangChain Ecosystem
Integrations
===================
Guides for how other companies/products can be used with LangChain
LangChain integrates with many LLMs, systems, and products.
Groups
----------
Integrations by Module
--------------------------------
| Integrations grouped by the core LangChain module they map to:
LangChain provides integration with many LLMs and systems:
- `LLM Providers <./modules/models/llms/integrations.html>`_
- `Chat Model Providers <./modules/models/chat/integrations.html>`_
@@ -18,12 +19,21 @@ LangChain provides integration with many LLMs and systems:
- `Tool Providers <./modules/agents/tools.html>`_
- `Toolkit Integrations <./modules/agents/toolkits.html>`_
Companies / Products
----------
Dependencies
----------------
| LangChain depends on `several hungered Python packages <https://github.com/hwchase17/langchain/network/dependencies>`_.
All Integrations
-------------------------------------------
| A comprehensive list of LLMs, systems, and products integrated with LangChain:
.. toctree::
:maxdepth: 1
:glob:
ecosystem/*
integrations/*

File diff suppressed because one or more lines are too long

View File

@@ -61,7 +61,6 @@
"from datetime import datetime\n",
"\n",
"from langchain.llms import OpenAI\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.callbacks import AimCallbackHandler, StdOutCallbackHandler"
]
},
@@ -109,8 +108,8 @@
" experiment_name=\"scenario 1: OpenAI LLM\",\n",
")\n",
"\n",
"manager = CallbackManager([StdOutCallbackHandler(), aim_callback])\n",
"llm = OpenAI(temperature=0, callback_manager=manager, verbose=True)"
"callbacks = [StdOutCallbackHandler(), aim_callback]\n",
"llm = OpenAI(temperature=0, callbacks=callbacks)"
]
},
{
@@ -177,7 +176,7 @@
"Title: {title}\n",
"Playwright: This is a synopsis for the above play:\"\"\"\n",
"prompt_template = PromptTemplate(input_variables=[\"title\"], template=template)\n",
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callback_manager=manager)\n",
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callbacks=callbacks)\n",
"\n",
"test_prompts = [\n",
" {\"title\": \"documentary about good video games that push the boundary of game design\"},\n",
@@ -249,13 +248,12 @@
],
"source": [
"# scenario 3 - Agent with Tools\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callback_manager=manager)\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callbacks=callbacks)\n",
"agent = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n",
" callback_manager=manager,\n",
" verbose=True,\n",
" callbacks=callbacks,\n",
")\n",
"agent.run(\n",
" \"Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?\"\n",

View File

@@ -0,0 +1,29 @@
# Airbyte
>[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs,
> databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.
## Installation and Setup
This instruction shows how to load any source from `Airbyte` into a local `JSON` file that can be read in as a document.
**Prerequisites:**
Have `docker desktop` installed.
**Steps:**
1. Clone Airbyte from GitHub - `git clone https://github.com/airbytehq/airbyte.git`.
2. Switch into Airbyte directory - `cd airbyte`.
3. Start Airbyte - `docker compose up`.
4. In your browser, just visit http://localhost:8000. You will be asked for a username and password. By default, that's username `airbyte` and password `password`.
5. Setup any source you wish.
6. Set destination as Local JSON, with specified destination path - lets say `/json_data`. Set up a manual sync.
7. Run the connection.
8. To see what files are created, navigate to: `file:///tmp/airbyte_local/`.
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/airbyte_json.ipynb).
```python
from langchain.document_loaders import AirbyteJSONLoader
```

View File

@@ -0,0 +1,36 @@
# Aleph Alpha
>[Aleph Alpha](https://docs.aleph-alpha.com/) was founded in 2019 with the mission to research and build the foundational technology for an era of strong AI. The team of international scientists, engineers, and innovators researches, develops, and deploys transformative AI like large language and multimodal models and runs the fastest European commercial AI cluster.
>[The Luminous series](https://docs.aleph-alpha.com/docs/introduction/luminous/) is a family of large language models.
## Installation and Setup
```bash
pip install aleph-alpha-client
```
You have to create a new token. Please, see [instructions](https://docs.aleph-alpha.com/docs/account/#create-a-new-token).
```python
from getpass import getpass
ALEPH_ALPHA_API_KEY = getpass()
```
## LLM
See a [usage example](../modules/models/llms/integrations/aleph_alpha.ipynb).
```python
from langchain.llms import AlephAlpha
```
## Text Embedding Models
See a [usage example](../modules/models/text_embedding/examples/aleph_alpha.ipynb).
```python
from langchain.embeddings import AlephAlphaSymmetricSemanticEmbedding, AlephAlphaAsymmetricSemanticEmbedding
```

View File

@@ -0,0 +1,24 @@
# Amazon Bedrock
>[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.
## Installation and Setup
```bash
pip install boto3
```
## LLM
See a [usage example](../modules/models/llms/integrations/bedrock.ipynb).
```python
from langchain import Bedrock
```
## Text Embedding Models
See a [usage example](../modules/models/text_embedding/examples/amazon_bedrock.ipynb).
```python
from langchain.embeddings import BedrockEmbeddings
```

View File

@@ -0,0 +1,18 @@
# Annoy
> [Annoy](https://github.com/spotify/annoy) (`Approximate Nearest Neighbors Oh Yeah`) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.
## Installation and Setup
```bash
pip install annoy
```
## Vectorstore
See a [usage example](../modules/indexes/vectorstores/examples/annoy.ipynb).
```python
from langchain.vectorstores import Annoy
```

View File

@@ -0,0 +1,26 @@
# Anthropic
>[Anthropic](https://en.wikipedia.org/wiki/Anthropic) is an American artificial intelligence (AI) startup and
> public-benefit corporation, founded by former members of OpenAI. `Anthropic` specializes in developing general AI
> systems and language models, with a company ethos of responsible AI usage.
> `Anthropic` develops a chatbot, named `Claude`. Similar to `ChatGPT`, `Claude` uses a messaging
> interface where users can submit questions or requests and receive highly detailed and relevant responses.
## Installation and Setup
```bash
pip install anthropic
```
See the [setup documentation](https://console.anthropic.com/docs/access).
## Chat Models
See a [usage example](../modules/models/chat/integrations/anthropic.ipynb)
```python
from langchain.chat_models import ChatAnthropic
```

View File

@@ -0,0 +1,17 @@
# Anyscale
This page covers how to use the Anyscale ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Anyscale wrappers.
## Installation and Setup
- Get an Anyscale Service URL, route and API key and set them as environment variables (`ANYSCALE_SERVICE_URL`,`ANYSCALE_SERVICE_ROUTE`, `ANYSCALE_SERVICE_TOKEN`).
- Please see [the Anyscale docs](https://docs.anyscale.com/productionize/services-v2/get-started) for more details.
## Wrappers
### LLM
There exists an Anyscale LLM wrapper, which you can access with
```python
from langchain.llms import Anyscale
```

View File

@@ -0,0 +1,29 @@
# Argilla
![Argilla - Open-source data platform for LLMs](https://argilla.io/og.png)
>[Argilla](https://argilla.io/) is an open-source data curation platform for LLMs.
> Using Argilla, everyone can build robust language models through faster data curation
> using both human and machine feedback. We provide support for each step in the MLOps cycle,
> from data labeling to model monitoring.
## Installation and Setup
First, you'll need to install the `argilla` Python package as follows:
```bash
pip install argilla --upgrade
```
If you already have an Argilla Server running, then you're good to go; but if
you don't, follow the next steps to install it.
If you don't you can refer to [Argilla - 🚀 Quickstart](https://docs.argilla.io/en/latest/getting_started/quickstart.html#Running-Argilla-Quickstart) to deploy Argilla either on HuggingFace Spaces, locally, or on a server.
## Tracking
See a [usage example of `ArgillaCallbackHandler`](../modules/callbacks/examples/examples/argilla.ipynb).
```python
from langchain.callbacks import ArgillaCallbackHandler
```

View File

@@ -0,0 +1,36 @@
# Arxiv
>[arXiv](https://arxiv.org/) is an open-access archive for 2 million scholarly articles in the fields of physics,
> mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and
> systems science, and economics.
## Installation and Setup
First, you need to install `arxiv` python package.
```bash
pip install arxiv
```
Second, you need to install `PyMuPDF` python package which transforms PDF files downloaded from the `arxiv.org` site into the text format.
```bash
pip install pymupdf
```
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/arxiv.ipynb).
```python
from langchain.document_loaders import ArxivLoader
```
## Retriever
See a [usage example](../modules/indexes/retrievers/examples/arxiv.ipynb).
```python
from langchain.retrievers import ArxivRetriever
```

View File

@@ -0,0 +1,21 @@
# AwaDB
>[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.
## Installation and Setup
```bash
pip install awadb
```
## VectorStore
There exists a wrapper around AwaDB vector databases, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
```python
from langchain.vectorstores import AwaDB
```
For a more detailed walkthrough of the AwaDB wrapper, see [this notebook](../modules/indexes/vectorstores/examples/awadb.ipynb)

View File

@@ -0,0 +1,25 @@
# AWS S3 Directory
>[Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html) is an object storage service.
>[AWS S3 Directory](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html)
>[AWS S3 Buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingBucket.html)
## Installation and Setup
```bash
pip install boto3
```
## Document Loader
See a [usage example for S3DirectoryLoader](../modules/indexes/document_loaders/examples/aws_s3_directory.ipynb).
See a [usage example for S3FileLoader](../modules/indexes/document_loaders/examples/aws_s3_file.ipynb).
```python
from langchain.document_loaders import S3DirectoryLoader, S3FileLoader
```

View File

@@ -0,0 +1,16 @@
# AZLyrics
>[AZLyrics](https://www.azlyrics.com/) is a large, legal, every day growing collection of lyrics.
## Installation and Setup
There isn't any special setup for it.
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/azlyrics.ipynb).
```python
from langchain.document_loaders import AZLyricsLoader
```

View File

@@ -0,0 +1,36 @@
# Azure Blob Storage
>[Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data.
>[Azure Files](https://learn.microsoft.com/en-us/azure/storage/files/storage-files-introduction) offers fully managed
> file shares in the cloud that are accessible via the industry standard Server Message Block (`SMB`) protocol,
> Network File System (`NFS`) protocol, and `Azure Files REST API`. `Azure Files` are based on the `Azure Blob Storage`.
`Azure Blob Storage` is designed for:
- Serving images or documents directly to a browser.
- Storing files for distributed access.
- Streaming video and audio.
- Writing to log files.
- Storing data for backup and restore, disaster recovery, and archiving.
- Storing data for analysis by an on-premises or Azure-hosted service.
## Installation and Setup
```bash
pip install azure-storage-blob
```
## Document Loader
See a [usage example for the Azure Blob Storage](../modules/indexes/document_loaders/examples/azure_blob_storage_container.ipynb).
```python
from langchain.document_loaders import AzureBlobStorageContainerLoader
```
See a [usage example for the Azure Files](../modules/indexes/document_loaders/examples/azure_blob_storage_file.ipynb).
```python
from langchain.document_loaders import AzureBlobStorageFileLoader
```

View File

@@ -0,0 +1,24 @@
# Azure Cognitive Search
>[Azure Cognitive Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Search`) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.
>Search is foundational to any app that surfaces text to users, where common scenarios include catalog or document search, online retail apps, or data exploration over proprietary content. When you create a search service, you'll work with the following capabilities:
>- A search engine for full text search over a search index containing user-owned content
>- Rich indexing, with lexical analysis and optional AI enrichment for content extraction and transformation
>- Rich query syntax for text search, fuzzy search, autocomplete, geo-search and more
>- Programmability through REST APIs and client libraries in Azure SDKs
>- Azure integration at the data layer, machine learning layer, and AI (Cognitive Services)
## Installation and Setup
See [set up instructions](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal).
## Retriever
See a [usage example](../modules/indexes/retrievers/examples/azure_cognitive_search.ipynb).
```python
from langchain.retrievers import AzureCognitiveSearchRetriever
```

View File

@@ -0,0 +1,50 @@
# Azure OpenAI
>[Microsoft Azure](https://en.wikipedia.org/wiki/Microsoft_Azure), often referred to as `Azure` is a cloud computing platform run by `Microsoft`, which offers access, management, and development of applications and services through global data centers. It provides a range of capabilities, including software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). `Microsoft Azure` supports many programming languages, tools, and frameworks, including Microsoft-specific and third-party software and systems.
>[Azure OpenAI](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/) is an `Azure` service with powerful language models from `OpenAI` including the `GPT-3`, `Codex` and `Embeddings model` series for content generation, summarization, semantic search, and natural language to code translation.
## Installation and Setup
```bash
pip install openai
pip install tiktoken
```
Set the environment variables to get access to the `Azure OpenAI` service.
```python
import os
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_BASE"] = "https://<your-endpoint.openai.azure.com/"
os.environ["OPENAI_API_KEY"] = "your AzureOpenAI key"
os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"
```
## LLM
See a [usage example](../modules/models/llms/integrations/azure_openai_example.ipynb).
```python
from langchain.llms import AzureOpenAI
```
## Text Embedding Models
See a [usage example](../modules/models/text_embedding/examples/azureopenai.ipynb)
```python
from langchain.embeddings import OpenAIEmbeddings
```
## Chat Models
See a [usage example](../modules/models/chat/integrations/azure_chat_openai.ipynb)
```python
from langchain.chat_models import AzureChatOpenAI
```

93
docs/integrations/beam.md Normal file
View File

@@ -0,0 +1,93 @@
# Beam
>[Beam](https://docs.beam.cloud/introduction) makes it easy to run code on GPUs, deploy scalable web APIs,
> schedule cron jobs, and run massively parallel workloads — without managing any infrastructure.
## Installation and Setup
- [Create an account](https://www.beam.cloud/)
- Install the Beam CLI with `curl https://raw.githubusercontent.com/slai-labs/get-beam/main/get-beam.sh -sSfL | sh`
- Register API keys with `beam configure`
- Set environment variables (`BEAM_CLIENT_ID`) and (`BEAM_CLIENT_SECRET`)
- Install the Beam SDK:
```bash
pip install beam-sdk
```
## LLM
```python
from langchain.llms.beam import Beam
```
### Example of the Beam app
This is the environment youll be developing against once you start the app.
It's also used to define the maximum response length from the model.
```python
llm = Beam(model_name="gpt2",
name="langchain-gpt2-test",
cpu=8,
memory="32Gi",
gpu="A10G",
python_version="python3.8",
python_packages=[
"diffusers[torch]>=0.10",
"transformers",
"torch",
"pillow",
"accelerate",
"safetensors",
"xformers",],
max_length="50",
verbose=False)
```
### Deploy the Beam app
Once defined, you can deploy your Beam app by calling your model's `_deploy()` method.
```python
llm._deploy()
```
### Call the Beam app
Once a beam model is deployed, it can be called by calling your model's `_call()` method.
This returns the GPT2 text response to your prompt.
```python
response = llm._call("Running machine learning on a remote GPU")
```
An example script which deploys the model and calls it would be:
```python
from langchain.llms.beam import Beam
import time
llm = Beam(model_name="gpt2",
name="langchain-gpt2-test",
cpu=8,
memory="32Gi",
gpu="A10G",
python_version="python3.8",
python_packages=[
"diffusers[torch]>=0.10",
"transformers",
"torch",
"pillow",
"accelerate",
"safetensors",
"xformers",],
max_length="50",
verbose=False)
llm._deploy()
response = llm._call("Running machine learning on a remote GPU")
print(response)
```

View File

@@ -0,0 +1,17 @@
# BiliBili
>[Bilibili](https://www.bilibili.tv/) is one of the most beloved long-form video sites in China.
## Installation and Setup
```bash
pip install bilibili-api-python
```
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/bilibili.ipynb).
```python
from langchain.document_loaders import BiliBiliLoader
```

View File

@@ -0,0 +1,22 @@
# Blackboard
>[Blackboard Learn](https://en.wikipedia.org/wiki/Blackboard_Learn) (previously the `Blackboard Learning Management System`)
> is a web-based virtual learning environment and learning management system developed by Blackboard Inc.
> The software features course management, customizable open architecture, and scalable design that allows
> integration with student information systems and authentication protocols. It may be installed on local servers,
> hosted by `Blackboard ASP Solutions`, or provided as Software as a Service hosted on Amazon Web Services.
> Its main purposes are stated to include the addition of online elements to courses traditionally delivered
> face-to-face and development of completely online courses with few or no face-to-face meetings.
## Installation and Setup
There isn't any special setup for it.
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/blackboard.ipynb).
```python
from langchain.document_loaders import BlackboardLoader
```

View File

@@ -0,0 +1,23 @@
# Cassandra
>[Cassandra](https://en.wikipedia.org/wiki/Apache_Cassandra) is a free and open-source, distributed, wide-column
> store, NoSQL database management system designed to handle large amounts of data across many commodity servers,
> providing high availability with no single point of failure. `Cassandra` offers support for clusters spanning
> multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
> `Cassandra` was designed to implement a combination of `Amazon's Dynamo` distributed storage and replication
> techniques combined with `Google's Bigtable` data and storage engine model.
## Installation and Setup
```bash
pip install cassandra-drive
```
## Memory
See a [usage example](../modules/memory/examples/cassandra_chat_message_history.ipynb).
```python
from langchain.memory import CassandraChatMessageHistory
```

View File

@@ -1,20 +1,29 @@
# Chroma
This page covers how to use the Chroma ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Chroma wrappers.
>[Chroma](https://docs.trychroma.com/getting-started) is a database for building AI applications with embeddings.
## Installation and Setup
- Install the Python package with `pip install chromadb`
## Wrappers
### VectorStore
```bash
pip install chromadb
```
## VectorStore
There exists a wrapper around Chroma vector databases, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
To import this vectorstore:
```python
from langchain.vectorstores import Chroma
```
For a more detailed walkthrough of the Chroma wrapper, see [this notebook](../modules/indexes/vectorstores/getting_started.ipynb)
## Retriever
See a [usage example](../modules/indexes/retrievers/examples/chroma_self_query.ipynb).
```python
from langchain.retrievers import SelfQueryRetriever
```

View File

@@ -1,13 +1,22 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# ClearML Integration\n",
"# ClearML\n",
"\n",
"In order to properly keep track of your langchain experiments and their results, you can enable the ClearML integration. ClearML is an experiment manager that neatly tracks and organizes all your experiment runs.\n",
"> [ClearML](https://github.com/allegroai/clearml) is a ML/DL development and production suite, it contains 5 main modules:\n",
"> - `Experiment Manager` - Automagical experiment tracking, environments and results\n",
"> - `MLOps` - Orchestration, Automation & Pipelines solution for ML/DL jobs (K8s / Cloud / bare-metal)\n",
"> - `Data-Management` - Fully differentiable data management & version control solution on top of object-storage (S3 / GS / Azure / NAS)\n",
"> - `Model-Serving` - cloud-ready Scalable model serving solution!\n",
" Deploy new model endpoints in under 5 minutes\n",
" Includes optimized GPU serving support backed by Nvidia-Triton\n",
" with out-of-the-box Model Monitoring\n",
"> - `Fire Reports` - Create and share rich MarkDown documents supporting embeddable online content\n",
"\n",
"In order to properly keep track of your langchain experiments and their results, you can enable the `ClearML` integration. We use the `ClearML Experiment Manager` that neatly tracks and organizes all your experiment runs.\n",
"\n",
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/hwchase17/langchain/blob/master/docs/ecosystem/clearml_tracking.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
@@ -15,11 +24,32 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"## Installation and Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install clearml\n",
"!pip install pandas\n",
"!pip install textstat\n",
"!pip install spacy\n",
"!python -m spacy download en_core_web_sm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting API Credentials\n",
"### Getting API Credentials\n",
"\n",
"We'll be using quite some APIs in this notebook, here is a list and where to get them:\n",
"\n",
@@ -43,24 +73,21 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting Up"
"## Callbacks"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"execution_count": 2,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install clearml\n",
"!pip install pandas\n",
"!pip install textstat\n",
"!pip install spacy\n",
"!python -m spacy download en_core_web_sm"
"from langchain.callbacks import ClearMLCallbackHandler"
]
},
{
@@ -78,8 +105,7 @@
],
"source": [
"from datetime import datetime\n",
"from langchain.callbacks import ClearMLCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.callbacks import StdOutCallbackHandler\n",
"from langchain.llms import OpenAI\n",
"\n",
"# Setup and use the ClearML Callback\n",
@@ -93,17 +119,16 @@
" complexity_metrics=True,\n",
" stream_logs=True\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), clearml_callback])\n",
"callbacks = [StdOutCallbackHandler(), clearml_callback]\n",
"# Get the OpenAI model ready to go\n",
"llm = OpenAI(temperature=0, callback_manager=manager, verbose=True)"
"llm = OpenAI(temperature=0, callbacks=callbacks)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scenario 1: Just an LLM\n",
"### Scenario 1: Just an LLM\n",
"\n",
"First, let's just run a single LLM a few times and capture the resulting prompt-answer conversation in ClearML"
]
@@ -345,7 +370,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -357,11 +381,10 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scenario 2: Creating an agent with tools\n",
"### Scenario 2: Creating an agent with tools\n",
"\n",
"To show a more advanced workflow, let's create an agent with access to tools. The way ClearML tracks the results is not different though, only the table will look slightly different as there are other types of actions taken when compared to the earlier, simpler example.\n",
"\n",
@@ -523,13 +546,12 @@
"from langchain.agents import AgentType\n",
"\n",
"# SCENARIO 2 - Agent with Tools\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callback_manager=manager)\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callbacks=callbacks)\n",
"agent = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n",
" callback_manager=manager,\n",
" verbose=True,\n",
" callbacks=callbacks,\n",
")\n",
"agent.run(\n",
" \"Who is the wife of the person who sang summer of 69?\"\n",
@@ -538,11 +560,10 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tips and Next Steps\n",
"### Tips and Next Steps\n",
"\n",
"- Make sure you always use a unique `name` argument for the `clearml_callback.flush_tracker` function. If not, the model parameters used for a run will override the previous run!\n",
"\n",
@@ -561,7 +582,7 @@
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -575,9 +596,8 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.10.6"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a53ebf4a859167383b364e7e7521d0add3c2dbbdecce4edf676e8c4634ff3fbb"
@@ -585,5 +605,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

View File

@@ -0,0 +1,52 @@
# ClickHouse
This page covers how to use ClickHouse Vector Search within LangChain.
[ClickHouse](https://clickhouse.com) is a open source real-time OLAP database with full SQL support and a wide range of functions to assist users in writing analytical queries. Some of these functions and data structures perform distance operations between vectors, enabling ClickHouse to be used as a vector database.
Due to the fully parallelized query pipeline, ClickHouse can process vector search operations very quickly, especially when performing exact matching through a linear scan over all rows, delivering processing speed comparable to dedicated vector databases.
High compression levels, tunable through custom compression codecs, enable very large datasets to be stored and queried. ClickHouse is not memory-bound, allowing multi-TB datasets containing embeddings to be queried.
The capabilities for computing the distance between two vectors are just another SQL function and can be effectively combined with more traditional SQL filtering and aggregation capabilities. This allows vectors to be stored and queried alongside metadata, and even rich text, enabling a broad array of use cases and applications.
Finally, experimental ClickHouse capabilities like [Approximate Nearest Neighbour (ANN) indices](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes) support faster approximate matching of vectors and provide a promising development aimed to further enhance the vector matching capabilities of ClickHouse.
## Installation
- Install clickhouse server by [binary](https://clickhouse.com/docs/en/install) or [docker image](https://hub.docker.com/r/clickhouse/clickhouse-server/)
- Install the Python SDK with `pip install clickhouse-connect`
### Configure clickhouse vector index
Customize `ClickhouseSettings` object with parameters
```python
from langchain.vectorstores import ClickHouse, ClickhouseSettings
config = ClickhouseSettings(host="<clickhouse-server-host>", port=8123, ...)
index = Clickhouse(embedding_function, config)
index.add_documents(...)
```
## Wrappers
supported functions:
- `add_texts`
- `add_documents`
- `from_texts`
- `from_documents`
- `similarity_search`
- `asimilarity_search`
- `similarity_search_by_vector`
- `asimilarity_search_by_vector`
- `similarity_search_with_relevance_scores`
### VectorStore
There exists a wrapper around open source Clickhouse database, allowing you to use it as a vectorstore,
whether for semantic search or similar example retrieval.
To import this vectorstore:
```python
from langchain.vectorstores import Clickhouse
```
For a more detailed walkthrough of the MyScale wrapper, see [this notebook](../modules/indexes/vectorstores/examples/clickhouse.ipynb)

View File

@@ -0,0 +1,38 @@
# Cohere
>[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models
> that help companies improve human-machine interactions.
## Installation and Setup
- Install the Python SDK :
```bash
pip install cohere
```
Get a [Cohere api key](https://dashboard.cohere.ai/) and set it as an environment variable (`COHERE_API_KEY`)
## LLM
There exists an Cohere LLM wrapper, which you can access with
See a [usage example](../modules/models/llms/integrations/cohere.ipynb).
```python
from langchain.llms import Cohere
```
## Text Embedding Model
There exists an Cohere Embedding model, which you can access with
```python
from langchain.embeddings import CohereEmbeddings
```
For a more detailed walkthrough of this, see [this notebook](../modules/models/text_embedding/examples/cohere.ipynb)
## Retriever
See a [usage example](../modules/indexes/retrievers/examples/cohere-reranker.ipynb).
```python
from langchain.retrievers.document_compressors import CohereRerank
```

View File

@@ -0,0 +1,16 @@
# College Confidential
>[College Confidential](https://www.collegeconfidential.com/) gives information on 3,800+ colleges and universities.
## Installation and Setup
There isn't any special setup for it.
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/college_confidential.ipynb).
```python
from langchain.document_loaders import CollegeConfidentialLoader
```

View File

@@ -121,7 +121,6 @@
"from datetime import datetime\n",
"\n",
"from langchain.callbacks import CometCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.llms import OpenAI\n",
"\n",
"comet_callback = CometCallbackHandler(\n",
@@ -131,8 +130,8 @@
" tags=[\"llm\"],\n",
" visualizations=[\"dep\"],\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), comet_callback])\n",
"llm = OpenAI(temperature=0.9, callback_manager=manager, verbose=True)\n",
"callbacks = [StdOutCallbackHandler(), comet_callback]\n",
"llm = OpenAI(temperature=0.9, callbacks=callbacks, verbose=True)\n",
"\n",
"llm_result = llm.generate([\"Tell me a joke\", \"Tell me a poem\", \"Tell me a fact\"] * 3)\n",
"print(\"LLM result\", llm_result)\n",
@@ -153,7 +152,6 @@
"outputs": [],
"source": [
"from langchain.callbacks import CometCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.chains import LLMChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate\n",
@@ -164,15 +162,14 @@
" stream_logs=True,\n",
" tags=[\"synopsis-chain\"],\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), comet_callback])\n",
"\n",
"llm = OpenAI(temperature=0.9, callback_manager=manager, verbose=True)\n",
"callbacks = [StdOutCallbackHandler(), comet_callback]\n",
"llm = OpenAI(temperature=0.9, callbacks=callbacks)\n",
"\n",
"template = \"\"\"You are a playwright. Given the title of play, it is your job to write a synopsis for that title.\n",
"Title: {title}\n",
"Playwright: This is a synopsis for the above play:\"\"\"\n",
"prompt_template = PromptTemplate(input_variables=[\"title\"], template=template)\n",
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callback_manager=manager)\n",
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callbacks=callbacks)\n",
"\n",
"test_prompts = [{\"title\": \"Documentary about Bigfoot in Paris\"}]\n",
"print(synopsis_chain.apply(test_prompts))\n",
@@ -194,7 +191,6 @@
"source": [
"from langchain.agents import initialize_agent, load_tools\n",
"from langchain.callbacks import CometCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.llms import OpenAI\n",
"\n",
"comet_callback = CometCallbackHandler(\n",
@@ -203,15 +199,15 @@
" stream_logs=True,\n",
" tags=[\"agent\"],\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), comet_callback])\n",
"llm = OpenAI(temperature=0.9, callback_manager=manager, verbose=True)\n",
"callbacks = [StdOutCallbackHandler(), comet_callback]\n",
"llm = OpenAI(temperature=0.9, callbacks=callbacks)\n",
"\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callback_manager=manager)\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callbacks=callbacks)\n",
"agent = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=\"zero-shot-react-description\",\n",
" callback_manager=manager,\n",
" callbacks=callbacks,\n",
" verbose=True,\n",
")\n",
"agent.run(\n",
@@ -255,7 +251,6 @@
"from rouge_score import rouge_scorer\n",
"\n",
"from langchain.callbacks import CometCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.chains import LLMChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate\n",
@@ -298,10 +293,10 @@
" tags=[\"custom_metrics\"],\n",
" custom_metrics=rouge_score.compute_metric,\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), comet_callback])\n",
"llm = OpenAI(temperature=0.9, callback_manager=manager, verbose=True)\n",
"callbacks = [StdOutCallbackHandler(), comet_callback]\n",
"llm = OpenAI(temperature=0.9)\n",
"\n",
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callback_manager=manager)\n",
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template)\n",
"\n",
"test_prompts = [\n",
" {\n",
@@ -323,7 +318,7 @@
" \"\"\"\n",
" }\n",
"]\n",
"print(synopsis_chain.apply(test_prompts))\n",
"print(synopsis_chain.apply(test_prompts, callbacks=callbacks))\n",
"comet_callback.flush_tracker(synopsis_chain, finish=True)"
]
}

View File

@@ -0,0 +1,22 @@
# Confluence
>[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform that saves and organizes all of the project-related material. `Confluence` is a knowledge base that primarily handles content management activities.
## Installation and Setup
```bash
pip install atlassian-python-api
```
We need to set up `username/api_key` or `Oauth2 login`.
See [instructions](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/confluence.ipynb).
```python
from langchain.document_loaders import ConfluenceLoader
```

View File

@@ -0,0 +1,57 @@
# C Transformers
This page covers how to use the [C Transformers](https://github.com/marella/ctransformers) library within LangChain.
It is broken into two parts: installation and setup, and then references to specific C Transformers wrappers.
## Installation and Setup
- Install the Python package with `pip install ctransformers`
- Download a supported [GGML model](https://huggingface.co/TheBloke) (see [Supported Models](https://github.com/marella/ctransformers#supported-models))
## Wrappers
### LLM
There exists a CTransformers LLM wrapper, which you can access with:
```python
from langchain.llms import CTransformers
```
It provides a unified interface for all models:
```python
llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2')
print(llm('AI is going to'))
```
If you are getting `illegal instruction` error, try using `lib='avx'` or `lib='basic'`:
```py
llm = CTransformers(model='/path/to/ggml-gpt-2.bin', model_type='gpt2', lib='avx')
```
It can be used with models hosted on the Hugging Face Hub:
```py
llm = CTransformers(model='marella/gpt-2-ggml')
```
If a model repo has multiple model files (`.bin` files), specify a model file using:
```py
llm = CTransformers(model='marella/gpt-2-ggml', model_file='ggml-model.bin')
```
Additional parameters can be passed using the `config` parameter:
```py
config = {'max_new_tokens': 256, 'repetition_penalty': 1.1}
llm = CTransformers(model='marella/gpt-2-ggml', config=config)
```
See [Documentation](https://github.com/marella/ctransformers#config) for a list of available parameters.
For a more detailed walkthrough of this, see [this notebook](../modules/models/llms/integrations/ctransformers.ipynb).

View File

@@ -0,0 +1,17 @@
# Databerry
>[Databerry](https://databerry.ai) is an [open source](https://github.com/gmpetrov/databerry) document retrieval platform that helps to connect your personal data with Large Language Models.
## Installation and Setup
We need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url.
We need the [API Key](https://docs.databerry.ai/api-reference/authentication).
## Retriever
See a [usage example](../modules/indexes/retrievers/examples/databerry.ipynb).
```python
from langchain.retrievers import DataberryRetriever
```

View File

@@ -0,0 +1,36 @@
Databricks
==========
The [Databricks](https://www.databricks.com/) Lakehouse Platform unifies data, analytics, and AI on one platform.
Databricks embraces the LangChain ecosystem in various ways:
1. Databricks connector for the SQLDatabase Chain: SQLDatabase.from_databricks() provides an easy way to query your data on Databricks through LangChain
2. Databricks-managed MLflow integrates with LangChain: Tracking and serving LangChain applications with fewer steps
3. Databricks as an LLM provider: Deploy your fine-tuned LLMs on Databricks via serving endpoints or cluster driver proxy apps, and query it as langchain.llms.Databricks
4. Databricks Dolly: Databricks open-sourced Dolly which allows for commercial use, and can be accessed through the Hugging Face Hub
Databricks connector for the SQLDatabase Chain
----------------------------------------------
You can connect to [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the SQLDatabase wrapper of LangChain. See the notebook [Connect to Databricks](./databricks/databricks.html) for details.
Databricks-managed MLflow integrates with LangChain
---------------------------------------------------
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. See the notebook [MLflow Callback Handler](./mlflow_tracking.ipynb) for details about MLflow's integration with LangChain.
Databricks provides a fully managed and hosted version of MLflow integrated with enterprise security features, high availability, and other Databricks workspace features such as experiment and run management and notebook revision capture. MLflow on Databricks offers an integrated experience for tracking and securing machine learning model training runs and running machine learning projects. See [MLflow guide](https://docs.databricks.com/mlflow/index.html) for more details.
Databricks-managed MLflow makes it more convenient to develop LangChain applications on Databricks. For MLflow tracking, you don't need to set the tracking uri. For MLflow Model Serving, you can save LangChain Chains in the MLflow langchain flavor, and then register and serve the Chain with a few clicks on Databricks, with credentials securely managed by MLflow Model Serving.
Databricks as an LLM provider
-----------------------------
The notebook [Wrap Databricks endpoints as LLMs](../modules/models/llms/integrations/databricks.html) illustrates the method to wrap Databricks endpoints as LLMs in LangChain. It supports two types of endpoints: the serving endpoint, which is recommended for both production and development, and the cluster driver proxy app, which is recommended for interactive development.
Databricks endpoints support Dolly, but are also great for hosting models like MPT-7B or any other models from the Hugging Face ecosystem. Databricks endpoints can also be used with proprietary models like OpenAI to provide a governance layer for enterprises.
Databricks Dolly
----------------
Databricks Dolly is an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. The model is available on Hugging Face Hub as databricks/dolly-v2-12b. See the notebook [Hugging Face Hub](../modules/models/llms/integrations/huggingface_hub.html) for instructions to access it through the Hugging Face Hub integration with LangChain.

View File

@@ -0,0 +1,280 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Databricks\n",
"\n",
"This notebook covers how to connect to the [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the SQLDatabase wrapper of LangChain.\n",
"It is broken into 3 parts: installation and setup, connecting to Databricks, and examples."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Installation and Setup"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 1,
"outputs": [],
"source": [
"!pip install databricks-sql-connector"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Connecting to Databricks\n",
"\n",
"You can connect to [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the `SQLDatabase.from_databricks()` method.\n",
"\n",
"### Syntax\n",
"```python\n",
"SQLDatabase.from_databricks(\n",
" catalog: str,\n",
" schema: str,\n",
" host: Optional[str] = None,\n",
" api_token: Optional[str] = None,\n",
" warehouse_id: Optional[str] = None,\n",
" cluster_id: Optional[str] = None,\n",
" engine_args: Optional[dict] = None,\n",
" **kwargs: Any)\n",
"```\n",
"### Required Parameters\n",
"* `catalog`: The catalog name in the Databricks database.\n",
"* `schema`: The schema name in the catalog.\n",
"\n",
"### Optional Parameters\n",
"There following parameters are optional. When executing the method in a Databricks notebook, you don't need to provide them in most of the cases.\n",
"* `host`: The Databricks workspace hostname, excluding 'https://' part. Defaults to 'DATABRICKS_HOST' environment variable or current workspace if in a Databricks notebook.\n",
"* `api_token`: The Databricks personal access token for accessing the Databricks SQL warehouse or the cluster. Defaults to 'DATABRICKS_TOKEN' environment variable or a temporary one is generated if in a Databricks notebook.\n",
"* `warehouse_id`: The warehouse ID in the Databricks SQL.\n",
"* `cluster_id`: The cluster ID in the Databricks Runtime. If running in a Databricks notebook and both 'warehouse_id' and 'cluster_id' are None, it uses the ID of the cluster the notebook is attached to.\n",
"* `engine_args`: The arguments to be used when connecting Databricks.\n",
"* `**kwargs`: Additional keyword arguments for the `SQLDatabase.from_uri` method."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Examples"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 2,
"outputs": [],
"source": [
"# Connecting to Databricks with SQLDatabase wrapper\n",
"from langchain import SQLDatabase\n",
"\n",
"db = SQLDatabase.from_databricks(catalog='samples', schema='nyctaxi')"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 3,
"outputs": [],
"source": [
"# Creating a OpenAI Chat LLM wrapper\n",
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"### SQL Chain example\n",
"\n",
"This example demonstrates the use of the [SQL Chain](https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html) for answering a question over a Databricks database."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 4,
"id": "36f2270b",
"metadata": {},
"outputs": [],
"source": [
"from langchain import SQLDatabaseChain\n",
"\n",
"db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4e2b5f25",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001B[1m> Entering new SQLDatabaseChain chain...\u001B[0m\n",
"What is the average duration of taxi rides that start between midnight and 6am?\n",
"SQLQuery:\u001B[32;1m\u001B[1;3mSELECT AVG(UNIX_TIMESTAMP(tpep_dropoff_datetime) - UNIX_TIMESTAMP(tpep_pickup_datetime)) as avg_duration\n",
"FROM trips\n",
"WHERE HOUR(tpep_pickup_datetime) >= 0 AND HOUR(tpep_pickup_datetime) < 6\u001B[0m\n",
"SQLResult: \u001B[33;1m\u001B[1;3m[(987.8122786304605,)]\u001B[0m\n",
"Answer:\u001B[32;1m\u001B[1;3mThe average duration of taxi rides that start between midnight and 6am is 987.81 seconds.\u001B[0m\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
"data": {
"text/plain": "'The average duration of taxi rides that start between midnight and 6am is 987.81 seconds.'"
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db_chain.run(\"What is the average duration of taxi rides that start between midnight and 6am?\")"
]
},
{
"cell_type": "markdown",
"source": [
"### SQL Database Agent example\n",
"\n",
"This example demonstrates the use of the [SQL Database Agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/sql_database.html) for answering questions over a Databricks database."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9918e86a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import create_sql_agent\n",
"from langchain.agents.agent_toolkits import SQLDatabaseToolkit\n",
"\n",
"toolkit = SQLDatabaseToolkit(db=db, llm=llm)\n",
"agent = create_sql_agent(\n",
" llm=llm,\n",
" toolkit=toolkit,\n",
" verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c484a76e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mAction: list_tables_sql_db\n",
"Action Input: \u001B[0m\n",
"Observation: \u001B[38;5;200m\u001B[1;3mtrips\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3mI should check the schema of the trips table to see if it has the necessary columns for trip distance and duration.\n",
"Action: schema_sql_db\n",
"Action Input: trips\u001B[0m\n",
"Observation: \u001B[33;1m\u001B[1;3m\n",
"CREATE TABLE trips (\n",
"\ttpep_pickup_datetime TIMESTAMP, \n",
"\ttpep_dropoff_datetime TIMESTAMP, \n",
"\ttrip_distance FLOAT, \n",
"\tfare_amount FLOAT, \n",
"\tpickup_zip INT, \n",
"\tdropoff_zip INT\n",
") USING DELTA\n",
"\n",
"/*\n",
"3 rows from trips table:\n",
"tpep_pickup_datetime\ttpep_dropoff_datetime\ttrip_distance\tfare_amount\tpickup_zip\tdropoff_zip\n",
"2016-02-14 16:52:13+00:00\t2016-02-14 17:16:04+00:00\t4.94\t19.0\t10282\t10171\n",
"2016-02-04 18:44:19+00:00\t2016-02-04 18:46:00+00:00\t0.28\t3.5\t10110\t10110\n",
"2016-02-17 17:13:57+00:00\t2016-02-17 17:17:55+00:00\t0.7\t5.0\t10103\t10023\n",
"*/\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3mThe trips table has the necessary columns for trip distance and duration. I will write a query to find the longest trip distance and its duration.\n",
"Action: query_checker_sql_db\n",
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001B[0m\n",
"Observation: \u001B[31;1m\u001B[1;3mSELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3mThe query is correct. I will now execute it to find the longest trip distance and its duration.\n",
"Action: query_sql_db\n",
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3m[(30.6, '0 00:43:31.000000000')]\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3mI now know the final answer.\n",
"Final Answer: The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.\u001B[0m\n",
"\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
"data": {
"text/plain": "'The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.'"
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What is the longest trip distance and how long did it take?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -7,6 +7,14 @@ It is broken into two parts: installation and setup, and then references to spec
- Get your DeepInfra api key from this link [here](https://deepinfra.com/).
- Get an DeepInfra api key and set it as an environment variable (`DEEPINFRA_API_TOKEN`)
## Available Models
DeepInfra provides a range of Open Source LLMs ready for deployment.
You can list supported models [here](https://deepinfra.com/models?type=text-generation).
google/flan\* models can be viewed [here](https://deepinfra.com/models?type=text2text-generation).
You can view a list of request and response parameters [here](https://deepinfra.com/databricks/dolly-v2-12b#API)
## Wrappers
### LLM

View File

@@ -0,0 +1,18 @@
# Diffbot
>[Diffbot](https://docs.diffbot.com/docs) is a service to read web pages. Unlike traditional web scraping tools,
> `Diffbot` doesn't require any rules to read the content on a page.
>It starts with computer vision, which classifies a page into one of 20 possible types. Content is then interpreted by a machine learning model trained to identify the key attributes on a page based on its type.
>The result is a website transformed into clean-structured data (like JSON or CSV), ready for your application.
## Installation and Setup
Read [instructions](https://docs.diffbot.com/reference/authentication) how to get the Diffbot API Token.
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/diffbot.ipynb).
```python
from langchain.document_loaders import DiffbotLoader
```

View File

@@ -0,0 +1,30 @@
# Discord
>[Discord](https://discord.com/) is a VoIP and instant messaging social platform. Users have the ability to communicate
> with voice calls, video calls, text messaging, media and files in private chats or as part of communities called
> "servers". A server is a collection of persistent chat rooms and voice channels which can be accessed via invite links.
## Installation and Setup
```bash
pip install pandas
```
Follow these steps to download your `Discord` data:
1. Go to your **User Settings**
2. Then go to **Privacy and Safety**
3. Head over to the **Request all of my Data** and click on **Request Data** button
It might take 30 days for you to receive your data. You'll receive an email at the address which is registered
with Discord. That email will have a download button using which you would be able to download your personal Discord data.
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/discord.ipynb).
```python
from langchain.document_loaders import DiscordChatLoader
```

View File

@@ -0,0 +1,20 @@
# Docugami
>[Docugami](https://docugami.com) converts business documents into a Document XML Knowledge Graph, generating forests
> of XML semantic trees representing entire documents. This is a rich representation that includes the semantic and
> structural characteristics of various chunks in the document as an XML tree.
## Installation and Setup
```bash
pip install lxml
```
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/docugami.ipynb).
```python
from langchain.document_loaders import DocugamiLoader
```

View File

@@ -0,0 +1,19 @@
# DuckDB
>[DuckDB](https://duckdb.org/) is an in-process SQL OLAP database management system.
## Installation and Setup
First, you need to install `duckdb` python package.
```bash
pip install duckdb
```
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/duckdb.ipynb).
```python
from langchain.document_loaders import DuckDBLoader
```

View File

@@ -0,0 +1,24 @@
# Elasticsearch
>[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine.
> It provides a distributed, multi-tenant-capable full-text search engine with an HTTP web interface and schema-free
> JSON documents.
## Installation and Setup
```bash
pip install elasticsearch
```
## Retriever
>In information retrieval, [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) (BM is an abbreviation of best matching) is a ranking function used by search engines to estimate the relevance of documents to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and others.
>The name of the actual ranking function is BM25. The fuller name, Okapi BM25, includes the name of the first system to use it, which was the Okapi information retrieval system, implemented at London's City University in the 1980s and 1990s. BM25 and its newer variants, e.g. BM25F (a version of BM25 that can take document structure and anchor text into account), represent TF-IDF-like retrieval functions used in document retrieval.
See a [usage example](../modules/indexes/retrievers/examples/elastic_search_bm25.ipynb).
```python
from langchain.retrievers import ElasticSearchBM25Retriever
```

View File

@@ -0,0 +1,20 @@
# EverNote
>[EverNote](https://evernote.com/) is intended for archiving and creating notes in which photos, audio and saved web content can be embedded. Notes are stored in virtual "notebooks" and can be tagged, annotated, edited, searched, and exported.
## Installation and Setup
First, you need to install `lxml` and `html2text` python packages.
```bash
pip install lxml
pip install html2text
```
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/evernote.ipynb).
```python
from langchain.document_loaders import EverNoteLoader
```

View File

@@ -0,0 +1,21 @@
# Facebook Chat
>[Messenger](https://en.wikipedia.org/wiki/Messenger_(software)) is an American proprietary instant messaging app and
> platform developed by `Meta Platforms`. Originally developed as `Facebook Chat` in 2008, the company revamped its
> messaging service in 2010.
## Installation and Setup
First, you need to install `pandas` python package.
```bash
pip install pandas
```
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/facebook_chat.ipynb).
```python
from langchain.document_loaders import FacebookChatLoader
```

View File

@@ -0,0 +1,21 @@
# Figma
>[Figma](https://www.figma.com/) is a collaborative web application for interface design.
## Installation and Setup
The Figma API requires an `access token`, `node_ids`, and a `file key`.
The `file key` can be pulled from the URL. https://www.figma.com/file/{filekey}/sampleFilename
`Node IDs` are also available in the URL. Click on anything and look for the '?node-id={node_id}' param.
`Access token` [instructions](https://help.figma.com/hc/en-us/articles/8085703771159-Manage-personal-access-tokens).
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/figma.ipynb).
```python
from langchain.document_loaders import FigmaFileLoader
```

19
docs/integrations/git.md Normal file
View File

@@ -0,0 +1,19 @@
# Git
>[Git](https://en.wikipedia.org/wiki/Git) is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers collaboratively developing source code during software development.
## Installation and Setup
First, you need to install `GitPython` python package.
```bash
pip install GitPython
```
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/git.ipynb).
```python
from langchain.document_loaders import GitLoader
```

View File

@@ -0,0 +1,15 @@
# GitBook
>[GitBook](https://docs.gitbook.com/) is a modern documentation platform where teams can document everything from products to internal knowledge bases and APIs.
## Installation and Setup
There isn't any special setup for it.
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/gitbook.ipynb).
```python
from langchain.document_loaders import GitbookLoader
```

View File

@@ -0,0 +1,20 @@
# Google BigQuery
>[Google BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.
`BigQuery` is a part of the `Google Cloud Platform`.
## Installation and Setup
First, you need to install `google-cloud-bigquery` python package.
```bash
pip install google-cloud-bigquery
```
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/google_bigquery.ipynb).
```python
from langchain.document_loaders import BigQueryLoader
```

View File

@@ -0,0 +1,26 @@
# Google Cloud Storage
>[Google Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage) is a managed service for storing unstructured data.
## Installation and Setup
First, you need to install `google-cloud-bigquery` python package.
```bash
pip install google-cloud-storage
```
## Document Loader
There are two loaders for the `Google Cloud Storage`: the `Directory` and the `File` loaders.
See a [usage example](../modules/indexes/document_loaders/examples/google_cloud_storage_directory.ipynb).
```python
from langchain.document_loaders import GCSDirectoryLoader
```
See a [usage example](../modules/indexes/document_loaders/examples/google_cloud_storage_file.ipynb).
```python
from langchain.document_loaders import GCSFileLoader
```

View File

@@ -0,0 +1,22 @@
# Google Drive
>[Google Drive](https://en.wikipedia.org/wiki/Google_Drive) is a file storage and synchronization service developed by Google.
Currently, only `Google Docs` are supported.
## Installation and Setup
First, you need to install several python package.
```bash
pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib
```
## Document Loader
See a [usage example and authorizing instructions](../modules/indexes/document_loaders/examples/google_drive.ipynb).
```python
from langchain.document_loaders import GoogleDriveLoader
```

View File

@@ -1,4 +1,4 @@
# Google Search Wrapper
# Google Search
This page covers how to use the Google Search API within LangChain.
It is broken into two parts: installation and setup, and then references to the specific Google Search wrapper.

View File

@@ -1,4 +1,4 @@
# Google Serper Wrapper
# Google Serper
This page covers how to use the [Serper](https://serper.dev) Google Search API within LangChain. Serper is a low-cost Google Search API that can be used to add answer box, knowledge graph, and organic results data from Google Search.
It is broken into two parts: setup, and then references to the specific Google Serper wrapper.

View File

@@ -0,0 +1,24 @@
# Google Vertex AI
>[Vertex AI](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform) is a machine learning (ML)
> platform that lets you train and deploy ML models and AI applications.
> `Vertex AI` combines data engineering, data science, and ML engineering workflows, enabling your teams to
> collaborate using a common toolset.
## Installation and Setup
```bash
pip install google-cloud-aiplatform
```
See the [setup instructions](../modules/models/chat/integrations/google_vertex_ai_palm.ipynb)
## Chat Models
See a [usage example](../modules/models/chat/integrations/google_vertex_ai_palm.ipynb)
```python
from langchain.chat_models import ChatVertexAI
```

View File

@@ -3,6 +3,7 @@
This page covers how to use the `GPT4All` wrapper within LangChain. The tutorial is divided into two parts: installation and setup, followed by usage with an example.
## Installation and Setup
- Install the Python package with `pip install pyllamacpp`
- Download a [GPT4All model](https://github.com/nomic-ai/pyllamacpp#supported-model) and place it in your desired directory
@@ -28,16 +29,16 @@ To stream the model's predictions, add in a CallbackManager.
```python
from langchain.llms import GPT4All
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
# There are many CallbackHandlers supported, such as
# from langchain.callbacks.streamlit import StreamlitCallbackHandler
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
model = GPT4All(model="./models/gpt4all-model.bin", n_ctx=512, n_threads=8, callback_handler=callback_handler, verbose=True)
callbacks = [StreamingStdOutCallbackHandler()]
model = GPT4All(model="./models/gpt4all-model.bin", n_ctx=512, n_threads=8)
# Generate text. Tokens are streamed through the callback manager.
model("Once upon a time, ")
model("Once upon a time, ", callbacks=callbacks)
```
## Model File

View File

@@ -0,0 +1,15 @@
# Gutenberg
>[Project Gutenberg](https://www.gutenberg.org/about/) is an online library of free eBooks.
## Installation and Setup
There isn't any special setup for it.
## Document Loader
See a [usage example](../modules/indexes/document_loaders/examples/gutenberg.ipynb).
```python
from langchain.document_loaders import GutenbergLoader
```

Some files were not shown because too many files have changed in this diff Show More