Compare commits

..

1273 Commits

Author SHA1 Message Date
Eugene Yurtsev
ff1de3d347 x 2023-12-13 16:47:03 -05:00
Bagatur
b72b19b593 experimental[patch]: Release 0.0.47 (#14617) 2023-12-12 11:09:39 -08:00
William FH
c32554a3e0 Add image (#14611) 2023-12-12 10:55:42 -08:00
Bagatur
57337b4862 langchain[patch]: Release 0.0.350 (#14612) 2023-12-12 10:10:34 -08:00
Bagatur
d388863a3b community[patch]: Release 0.0.2 (#14610) 2023-12-12 09:58:04 -08:00
Bagatur
5d1deddbfb core[minor]: Release 0.1.0 (#14607) 2023-12-12 09:33:11 -08:00
Harrison Chase
ad8d8f71aa allow other namespaces (#14606) 2023-12-12 09:09:59 -08:00
William FH
ce61a8ca98 Add Gmail Agent Example (#14567)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-12 08:34:28 -08:00
Eugene Yurtsev
76905aa043 Update RunnableWithMessageHistory (#14351)
This PR updates RunnableWithMessage history to support user specific
configuration for the factory.

It extends support to passing multiple named arguments into the factory
if the factory takes more than a single argument.
2023-12-11 21:34:49 -05:00
Bagatur
8a126c5d04 docs[patch]: update installation with core and community (#14577) 2023-12-11 18:31:25 -08:00
Harutaka Kawamura
b54a1a3ef1 docs[patch]: Fix embeddings example for Databricks (#14576)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Fix `from langchain.llms import DatabricksEmbeddings` to `from
langchain.embeddings import DatabricksEmbeddings`.

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
2023-12-11 16:55:23 -08:00
Bagatur
9ffca3b92a docs[patch], templates[patch]: Import from core (#14575)
Update imports to use core for the low-hanging fruit changes. Ran
following

```bash
git grep -l 'langchain.schema.runnable' {docs,templates,cookbook}  | xargs sed -i '' 's/langchain\.schema\.runnable/langchain_core.runnables/g'
git grep -l 'langchain.schema.output_parser' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.output_parser/langchain_core.output_parsers/g'
git grep -l 'langchain.schema.messages' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.messages/langchain_core.messages/g'
git grep -l 'langchain.schema.chat_histry' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.chat_history/langchain_core.chat_history/g'
git grep -l 'langchain.schema.prompt_template' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.prompt_template/langchain_core.prompts/g'
git grep -l 'from langchain.pydantic_v1' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.pydantic_v1/from langchain_core.pydantic_v1/g'
git grep -l 'from langchain.tools.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.tools\.base/from langchain_core.tools/g'
git grep -l 'from langchain.chat_models.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.chat_models.base/from langchain_core.language_models.chat_models/g'
git grep -l 'from langchain.llms.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.llms\.base\ /from langchain_core.language_models.llms\ /g'
git grep -l 'from langchain.embeddings.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.embeddings\.base/from langchain_core.embeddings/g'
git grep -l 'from langchain.vectorstores.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.vectorstores\.base/from langchain_core.vectorstores/g'
git grep -l 'from langchain.agents.tools' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.agents\.tools/from langchain_core.tools/g'
git grep -l 'from langchain.schema.output' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.output\ /from langchain_core.outputs\ /g'
git grep -l 'from langchain.schema.embeddings' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.embeddings/from langchain_core.embeddings/g'
git grep -l 'from langchain.schema.document' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.document/from langchain_core.documents/g'
git grep -l 'from langchain.schema.agent' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.agent/from langchain_core.agents/g'
git grep -l 'from langchain.schema.prompt ' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.prompt\ /from langchain_core.prompt_values /g'
git grep -l 'from langchain.schema.language_model' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.language_model/from langchain_core.language_models/g'


```
2023-12-11 16:49:10 -08:00
Erick Friis
0a9d933bb2 infra: import checking bugfix (#14569) 2023-12-11 15:53:51 -08:00
Bagatur
8bdaf55e92 experimental[patch]: Release 0.0.46 (#14572) 2023-12-11 15:46:14 -08:00
Bagatur
14bfc5f9f4 langchain[patch]: Release 0.0.349 (#14570) 2023-12-11 15:30:14 -08:00
Erick Friis
482e2b94fa infra: import CI speed (#14566)
Was taking 10 mins. Now a few seconds.
2023-12-11 15:19:21 -08:00
Bagatur
6a828e60ee community[patch]: Release 0.0.1 (#14565) 2023-12-11 15:18:55 -08:00
Erick Friis
5418d8bfd6 infra: import CI fix (#14562)
TIL `**` globstar doesn't work in make

Makefile changes fix that.

`__getattr__` changes allow import of all files, but raise error when
accessing anything from the module.

file deletions were corresponding libs change from #14559
2023-12-11 14:59:10 -08:00
Bagatur
48b7a0584d infra: Turn release branch check back on (#14563) 2023-12-11 14:40:24 -08:00
Bagatur
9cb128e6e2 core[patch]: Release 0.0.13 (#14558) 2023-12-11 14:36:28 -08:00
Bagatur
a844b495c4 community[patch]: Fix agenttoolkits imports (#14559) 2023-12-11 14:19:25 -08:00
Nuno Campos
3b5b0f16c6 Move runnable context to beta (#14507)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-11 13:58:30 -08:00
Bagatur
ed58eeb9c5 community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463)
Moved the following modules to new package langchain-community in a backwards compatible fashion:

```
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/retrievers community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/retrievers community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
```

Moved the following to core
```
mv langchain/langchain/utils/json_schema.py core/langchain_core/utils
mv langchain/langchain/utils/html.py core/langchain_core/utils
mv langchain/langchain/utils/strings.py core/langchain_core/utils
cat langchain/langchain/utils/env.py >> core/langchain_core/utils/env.py
rm langchain/langchain/utils/env.py
```

See .scripts/community_split/script_integrations.sh for all changes
2023-12-11 13:53:30 -08:00
Eugene Yurtsev
c0f4b95aa9 RunnableWithMessageHistory: Fix input schema (#14516)
Input schema should not have history key
2023-12-10 23:33:02 -05:00
Leonid Ganeline
d9bfdc95ea docs[patch]: google platform page update (#14475)
Added missed tools

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2023-12-08 17:40:44 -08:00
Leonid Ganeline
2fa81739b6 docs[patch]: microsoft platform page update (#14476)
Added `presidio` and `OneNote` references to `microsoft.mdx`; added link
and description to the `presidio` notebook

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2023-12-08 17:40:30 -08:00
Yelin Zhang
84a57f5350 docs[patch]: add missing imports for local_llms (#14453)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
Keeping it consistent with everywhere else in the docs and adding the
missing imports to be able to copy paste and run the code example.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-08 17:23:29 -08:00
Harrison Chase
f5befe3b89 manual mapping (#14422) 2023-12-08 16:29:33 -08:00
Erick Friis
c24f277b7c langchain[patch], docs[patch]: use byte store in multivectorretriever (#14474) 2023-12-08 16:26:11 -08:00
Leonid Ganeline
1ef13661b9 docs[patch]: link and description cleanup (#14471)
Fixed inconsistencies; added links and descriptions

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2023-12-08 15:24:38 -08:00
Lance Martin
6fbfc375b9 Update README and vectorstore path for multi-modal template (#14473) 2023-12-08 15:24:05 -08:00
Anish Nag
6da0cfea0e experimental[patch]: SmartLLMChain Output Key Customization (#14466)
**Description**
The `SmartLLMChain` was was fixed to output key "resolution".
Unfortunately, this prevents the ability to use multiple `SmartLLMChain`
in a `SequentialChain` because of colliding output keys. This change
simply gives the option the customize the output key to allow for
sequential chaining. The default behavior is the same as the current
behavior.

Now, it's possible to do the following:
```
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain_experimental.smart_llm import SmartLLMChain
from langchain.chains import SequentialChain

joke_prompt = PromptTemplate(
    input_variables=["content"],
    template="Tell me a joke about {content}.",
)
review_prompt = PromptTemplate(
    input_variables=["scale", "joke"],
    template="Rate the following joke from 1 to {scale}: {joke}"
)

llm = ChatOpenAI(temperature=0.9, model_name="gpt-4-32k")
joke_chain = SmartLLMChain(llm=llm, prompt=joke_prompt, output_key="joke")
review_chain = SmartLLMChain(llm=llm, prompt=review_prompt, output_key="review")

chain = SequentialChain(
    chains=[joke_chain, review_chain],
    input_variables=["content", "scale"],
    output_variables=["review"],
    verbose=True
)
response = chain.run({"content": "chickens", "scale": "10"})
print(response)
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-08 13:55:51 -08:00
Leonid Ganeline
0797358c1b docs networkxupdate (#14426)
Added setting up instruction, package description and link
2023-12-08 13:39:50 -08:00
Bagatur
300305e5e5 infra: add langchain-community release workflow (#14469) 2023-12-08 13:31:15 -08:00
Ben Flast
b32fcb550d Update mongodb_atlas docs for GA (#14425)
Updated the MongoDB Atlas Vector Search docs to indicate the service is
Generally Available, updated the example to use the new index
definition, and added an example that uses metadata pre-filtering for
semantic search

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-08 13:23:01 -08:00
Erick Friis
b3f226e8f8 core[patch], langchain[patch], experimental[patch]: import CI (#14414) 2023-12-08 11:28:55 -08:00
Leonid Ganeline
ba083887e5 docs Dependents updated statistics (#14461)
Updated statistics for the dependents (packages dependent on `langchain`
package). Only packages with 100+ starts
2023-12-08 14:14:41 -05:00
Eugene Yurtsev
37bee92b8a Use deepcopy in RunLogPatch (#14244)
This PR adds deepcopy usage in RunLogPatch.

I included a unit-test that shows an issue that was caused in LangServe
in the RemoteClient.

```python
import jsonpatch

s1 = {}
s2 = {'value': []}
s3 = {'value': ['a']}

ops0 = list(jsonpatch.JsonPatch.from_diff(None, s1))
ops1 = list(jsonpatch.JsonPatch.from_diff(s1, s2))
ops2 = list(jsonpatch.JsonPatch.from_diff(s2, s3))
ops = ops0 + ops1 + ops2

jsonpatch.apply_patch(None, ops)
{'value': ['a']}

jsonpatch.apply_patch(None, ops)
{'value': ['a', 'a']}

jsonpatch.apply_patch(None, ops)
{'value': ['a', 'a', 'a']}
```
2023-12-08 14:09:36 -05:00
Erick Friis
1d7e5c51aa langchain[patch]: xfail unstable vertex test (#14462) 2023-12-08 11:00:37 -08:00
Erick Friis
477b274a62 langchain[patch]: fix scheduled testing ci dep install (#14460) 2023-12-08 10:37:44 -08:00
Harrison Chase
02ee0073cf revoke serialization (#14456) 2023-12-08 10:31:05 -08:00
Erick Friis
ff0d5514c1 langchain[patch]: fix scheduled testing ci variables (#14459) 2023-12-08 10:27:21 -08:00
Erick Friis
1d725327eb langchain[patch]: Fix scheduled testing (#14428)
- integration tests in pyproject
- integration test fixes
2023-12-08 10:23:02 -08:00
Harrison Chase
7be3eb6fbd fix imports from core (#14430) 2023-12-08 09:33:35 -08:00
Leonid Ganeline
a05230a4ba docs[patch]: promptlayer pages update (#14416)
Updated provider page by adding LLM and ChatLLM references; removed a
content that is duplicate text from the LLM referenced page.
Updated the collback page
2023-12-07 15:48:10 -08:00
Leonid Ganeline
18aba7fdef docs: notebook linting (#14366)
Many jupyter notebooks didn't pass linting. List of these files are
presented in the [tool.ruff.lint.per-file-ignores] section of the
pyproject.toml . Addressed these bugs:
- fixed bugs; added missed imports; updated pyproject.toml
 Only the `document_loaders/tensorflow_datasets.ipyn`,
`cookbook/gymnasium_agent_simulation.ipynb` are not completely fixed.
I'm not sure about imports.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-07 15:47:48 -08:00
Bagatur
52052cc7b9 experimental[patch]: Release 0.0.45 (#14418) 2023-12-07 15:01:39 -08:00
Bagatur
e4d6e55c5e langchain[patch]: Release 0.0.348 (#14417) 2023-12-07 14:52:43 -08:00
Bagatur
eb209e7ee3 core[patch]: Release 0.0.12 (#14415) 2023-12-07 14:37:00 -08:00
Bagatur
b2280fd874 core[patch], langchain[patch]: fix required deps (#14373) 2023-12-07 14:24:58 -08:00
Leonid Ganeline
7186faefb2 API Reference building script update (#13587)
The namespaces like `langchain.agents.format_scratchpad` clogging the
API Reference sidebar.
This change removes those 3-level namespaces from sidebar (this issue
was discussed with @efriis )

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-07 11:43:42 -08:00
Kacper Łukawski
76f30f5297 langchain[patch]: Rollback multiple keys in Qdrant (#14390)
This reverts commit 38813d7090. This is a
temporary fix, as I don't see a clear way on how to use multiple keys
with `Qdrant.from_texts`.

Context: #14378
2023-12-07 11:13:19 -08:00
Erick Friis
54040b00a4 langchain[patch]: fix ChatVertexAI streaming (#14369) 2023-12-07 09:46:11 -08:00
Bagatur
db6bf8b022 langchain[patch]: Release 0.0.347 (#14368) 2023-12-06 16:13:29 -08:00
Bagatur
a7271cf5bd core[patch]: Release 0.0.11 (#14367) 2023-12-06 15:53:49 -08:00
Nuno Campos
77c38df36c [core/minor] Runnables: Implement a context api (#14046)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Brace Sproul <braceasproul@gmail.com>
2023-12-06 15:02:29 -08:00
Erick Friis
8f95a8206b core[patch]: message history error typo (#14361) 2023-12-06 14:20:10 -08:00
William FH
e5bd32ff6d Include run_id (#14331)
in the test run outputs
2023-12-06 14:07:45 -08:00
Bagatur
cc76f0e834 langchain[patch]: import nits (#14354)
import from core instead of langchain.schema
2023-12-06 11:45:05 -08:00
Bagatur
ce4d81f88b infra: ci matrix (#14306) 2023-12-06 11:43:03 -08:00
Jacob Lee
867ca6d0be Fix multi vector retriever subclassing (#14350)
Fixes #14342

@eyurtsev @baskaryan

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-06 11:12:50 -08:00
Erick Friis
7bdfc43766 core[patch], langchain[patch]: ByteStore (#14312) 2023-12-06 10:05:43 -08:00
Brace Sproul
b9087e765d docs[patch]: Fix broken link 'tip' in docs (#14349) 2023-12-06 09:44:54 -08:00
Eugene Yurtsev
0dea8cc62d Update doc-string in RunnableWithMessageHistory (#14262)
Update doc-string in RunnableWithMessageHistory
2023-12-06 12:31:46 -05:00
Erick Friis
2aaf8e11e0 docs[patch]: fix ipynb links (#14325)
Keeping it simple for now.

Still iterating on our docs build in pursuit of making everything mdxv2
compatible for docusaurus 3, and the fewer custom scripts we're reliant
on through that, the less likely the docs will break again.

Other things to consider in future:

Quarto rewriting in ipynbs:
https://quarto.org/docs/extensions/nbfilter.html (but this won't do
md/mdx files)

Docusaurus plugins for rewriting these paths
2023-12-06 09:29:07 -08:00
Jean-Baptiste dlb
38813d7090 Qdrant metadata payload keys (#13001)
- **Description:** In Qdrant allows to input list of keys as the
content_payload_key to retrieve multiple fields (the generated document
will contain the dictionary {field: value} in a string),
- **Issue:** Previously we were able to retrieve only one field from the
vector database when making a search
  - **Dependencies:** 
  - **Tag maintainer:** 
  - **Twitter handle:** @jb_dlb

---------

Co-authored-by: Jean Baptiste De La Broise <jeanbaptiste.delabroise@mdpi.com>
2023-12-06 09:12:54 -08:00
Yuchen Liang
ad6dfb6220 feat: mask api key for cerebriumai llm (#14272)
- **Description:** Masking API key for CerebriumAI LLM to protect user
secrets.
 - **Issue:** #12165 
 - **Dependencies:** None
 - **Tag maintainer:** @eyurtsev

---------

Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-06 09:06:00 -08:00
newfinder
d4d64daa1e Mask API key for baidu qianfan (#14281)
Description: This PR masked baidu qianfan - Chat_Models API Key and
added unit tests.
Issue: the issue langchain-ai#12165.
Tag maintainer: @eyurtsev

---------

Co-authored-by: xiayi <xiayi@bytedance.com>
2023-12-06 08:47:09 -08:00
cxumol
06e3316f54 feat(add): LLM integration of Cloudflare Workers AI (#14322)
Add [Text Generation by Cloudflare Workers
AI](https://developers.cloudflare.com/workers-ai/models/text-generation/).
It's a new LLM integration.

- Dependencies: N/A
2023-12-06 08:24:19 -08:00
Harutaka Kawamura
5efaedf488 Exclude max_tokens from request if it's None (#14334)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->


We found a request with `max_tokens=None` results in the following error
in Anthropic:

```
HTTPError: 400 Client Error: Bad Request for url: https://oregon.staging.cloud.databricks.com/serving-endpoints/corey-anthropic/invocations. 
Response text: {"error_code":"INVALID_PARAMETER_VALUE","message":"INVALID_PARAMETER_VALUE: max_tokens was not of type Integer: null"}
```

This PR excludes `max_tokens` if it's None.
2023-12-06 08:23:17 -08:00
Nicolas Bondoux
86b08d7753 Fix typo in lcel example for rerank in doc (#14336)
fix typo in lcel example for rerank in doc
2023-12-06 08:21:41 -08:00
Matt Wells
e1ea191237 Demonstrate use of get_buffer_string (#13013)
**Description**

The docs for creating a RAG chain with Memory [currently use a manual
lambda](https://python.langchain.com/docs/expression_language/cookbook/retrieval#with-memory-and-returning-source-documents)
to format chat history messages. [There exists a helper method within
the
codebase](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/schema/messages.py#L14C15-L14C15)
to perform this task so I've updated the documentation to demonstrate
its usage

Also worth noting that the current documented method of using the
included `_format_chat_history ` function actually results in an error:

```
TypeError: 'HumanMessage' object is not subscriptable
```

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-05 20:08:50 -08:00
MinjiK
a1a11ffd78 Amadeus toolkit minor update (#13002)
- update `Amadeus` toolkit with ability to switch Amadeus environments 
- update minor code explanations

---------

Co-authored-by: MinjiK <minji.kim@amadeus.com>
2023-12-05 20:08:34 -08:00
Alexandre Dumont
b05c46074b OpenAIEmbeddings: retry_min_seconds/retry_max_seconds parameters (#13138)
- **Description:** new parameters in OpenAIEmbeddings() constructor
(retry_min_seconds and retry_max_seconds) that allow parametrization by
the user of the former min_seconds and max_seconds that were hidden in
_create_retry_decorator() and _async_retry_decorator()
  - **Issue:** #9298, #12986
  - **Dependencies:** none
  - **Tag maintainer:** @hwchase17
  - **Twitter handle:** @adumont

make format 
make lint 
make test 

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-05 20:08:17 -08:00
mogith-pn
9e5d146409 Updated integration with Clarifai python SDK functions (#13671)
Description :

Updated the functions with new Clarifai python SDK.
Enabled initialisation of Clarifai class with model URL.
Updated docs with new functions examples.
2023-12-05 20:08:00 -08:00
dudub12
8f403ea2d7 info sql tool remove whitespaces in table names (#13712)
Remove whitespaces from the input of the ListSQLDatabaseTool for better
support.
for example, the input "table1,table2,table3" will throw an exception
whiteout the change although it's a valid input.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-05 20:07:38 -08:00
balaba-max
64d5108f99 Feature: GitLab url from ENV (#14221)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** add gitlab url from env, 
  - **Issue:** no issue,
  - **Dependencies:** no,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-05 19:41:36 -08:00
kavinraj A S
ab6b41937a Fixed a typo in smart_llm prompt (#13052)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-05 19:16:18 -08:00
jeffpezzone
7c2ef06136 Adds "NIN" metadata filter for pgvector to all checking for set absence (#14205)
This PR adds support for metadata filters of the form:

`{"filter": {"key": { "NIN" : ["list", "of", "values"]}}}`

"IN" is already supported, so this is a quick & related update to add
"NIN"
2023-12-05 19:07:33 -08:00
lif
20d2b4a6ba feat: Increased compatibility with new and old versions for dalle (#14222)
- **Description:** Increased compatibility with all versions openai for
dalle,

This pr add support for openai version from 0 ~ 1.3.
2023-12-05 17:31:28 -08:00
Wang Wei
7205bfdd00 feat: 1. Add system parameters, 2. Align with the QianfanChatEndpoint for function calling (#14275)
- **Description:** 
1. Add system parameters to the ERNIE LLM API to set the role of the
LLM.
2. Add support for the ERNIE-Bot-turbo-AI model according from the
document https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Alp0kdm0n.
3. For the function call of ErnieBotChat, align with the
QianfanChatEndpoint.

With this PR, the `QianfanChatEndpoint()` can use the `function calling`
ability with `create_ernie_fn_chain()`. The example is as the following:

```
from langchain.prompts import ChatPromptTemplate
import json
from langchain.prompts.chat import (
    ChatPromptTemplate,
)

from langchain.chat_models import QianfanChatEndpoint
from langchain.chains.ernie_functions import (
    create_ernie_fn_chain,
)

def get_current_news(location: str) -> str:
    """Get the current news based on the location.'

    Args:
        location (str): The location to query.
    
    Returs:
        str: Current news based on the location.
    """

    news_info = {
        "location": location,
        "news": [
            "I have a Book.",
            "It's a nice day, today."
        ]
    }

    return json.dumps(news_info)

def get_current_weather(location: str, unit: str="celsius") -> str:
    """Get the current weather in a given location

    Args:
        location (str): location of the weather.
        unit (str): unit of the tempuature.
    
    Returns:
        str: weather in the given location.
    """

    weather_info = {
        "location": location,
        "temperature": "27",
        "unit": unit,
        "forecast": ["sunny", "windy"],
    }
    return json.dumps(weather_info)

template = ChatPromptTemplate.from_messages([
    ("user", "{user_input}"),
])

chat = QianfanChatEndpoint(model="ERNIE-Bot-4")
chain = create_ernie_fn_chain([get_current_weather, get_current_news], chat, template, verbose=True)
res = chain.run("北京今天的新闻是什么?")
print(res)
```

The result of the above code:
```
> Entering new LLMChain chain...
Prompt after formatting:
Human: 北京今天的新闻是什么?
> Finished chain.
{'name': 'get_current_news', 'arguments': {'location': '北京'}}
```

For the `ErnieBotChat`, now can use the `system` parameter to set the
role of the LLM.

```
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
from langchain.chat_models import ErnieBotChat

llm = ErnieBotChat(model_name="ERNIE-Bot-turbo-AI", system="你是一个能力很强的机器人,你的名字叫 小叮当。无论问你什么问题,你都可以给出答案。")
prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{query}"),
    ]
)
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
res = chain.run(query="你是谁?")
print(res)
```

The result of the above code:

```
> Entering new LLMChain chain...
Prompt after formatting:
Human: 你是谁?
> Finished chain.
我是小叮当,一个智能机器人。我可以为你提供各种服务,包括回答问题、提供信息、进行计算等。如果你需要任何帮助,请随时告诉我,我会尽力为你提供最好的服务。
```
2023-12-05 17:28:31 -08:00
Leonid Kuligin
fd5be55a7b added get_num_tokens to GooglePalm (#14282)
added get_num_tokens to GooglePalm + a little bit of refactoring
2023-12-05 17:24:19 -08:00
Massimiliano Pronesti
c215a4c9ec feat(embeddings): text-embeddings-inference (#14288)
- **Description:** Added a notebook to illustrate how to use
`text-embeddings-inference` from huggingface. As
`HuggingFaceHubEmbeddings` was using a deprecated client, I made the
most of this PR updating that too.

- **Issue:** #13286 

- **Dependencies**: None

- **Tag maintainer:** @baskaryan
2023-12-05 17:22:05 -08:00
Tim Van Wassenhove
85b88c33f3 Fixes issue-14295: Correctly pass along the kwargs (#14296)
- **Description:** Update code to correctly pass the kwargs 
  - **Issue:** #14295 
  - **Dependencies:**  - 
  - **Tag maintainer:** 

<--
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

#issue-14295
2023-12-05 17:14:00 -08:00
Alex Kira
62b59048de docs[patch] Add how-to doc for RunnablePassthrough and nav modifications (#14255)
- **Description:** Add How To docs for `RunnablePassthrough` with
examples. Also redo the ordering and some of the other How-To docs.
2023-12-05 17:01:07 -08:00
Bob Lin
5a23608c41 Add custom async generator example (#14299)
<img width="1172" alt="Screenshot 2023-12-05 at 11 19 16 PM"
src="https://github.com/langchain-ai/langchain/assets/10000925/6b0fbd70-9f6b-4f91-b494-9e88676b4786">
2023-12-05 16:08:19 -08:00
Bob Lin
63fdc6e818 Update docs (#14294)
### Description

Fixed 3 doc  issues:

1. `ConfigurableField ` needs to be imported in
`docs/docs/expression_language/how_to/configure.ipynb`
2. use `error` instead of `RateLimitError()` in
`docs/docs/expression_language/how_to/fallbacks.ipynb`
3. I think it might be better to output the fixed json data(when I
looked at this example, I didn't understand its purpose at first, but
then I suddenly realized):
<img width="1219" alt="Screenshot 2023-12-05 at 10 34 13 PM"
src="https://github.com/langchain-ai/langchain/assets/10000925/7623ba13-7b56-4964-8c98-b7430fabc6de">
2023-12-05 16:08:03 -08:00
Jarkko Lagus
667ad6a5de Add support for CORS options for AzureSearch (#14305)
- **Description:** Add support for setting the CORS options when using
AzureSearch indexes
2023-12-05 16:05:40 -08:00
Karim Assi
9401539e43 Allow not enforcing function usage when a single function is passed to openai function executable (#14308)
- **Description:** allows not enforcing function usage when a single
function is passed to an openAI function executable (or corresponding
legacy chain). This is a desired feature in the case where the model
does not have enough information to call a function, and needs to get
back to the user.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Tag maintainer:** N/A
2023-12-05 15:56:31 -08:00
Ran
d22c13ec48 Mask API key for Minimax LLM (#14309)
- **Description:** Added masking for the API key for Minimax LLM + tests
inspired by https://github.com/langchain-ai/langchain/pull/12418.
- **Issue:** the issue # fixes
https://github.com/langchain-ai/langchain/issues/12165
- **Dependencies:** this fix is dependent on Minimax instantiation fix
which is introduced in
https://github.com/langchain-ai/langchain/pull/13439, so merge this one
after.
  - **Tag maintainer:** @eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-05 15:42:00 -08:00
Lance Martin
29e993a5f2 Update OpenCLIP docs (#14319) 2023-12-05 15:31:10 -08:00
Eugene Yurtsev
a74c03da3c Add metadata to blob (#14162)
Add metadata to the blob object. This makes it easier
to make a pipeline that properly propagates metadata information
from raw content to the derived content.
2023-12-05 17:17:41 -05:00
Lance Martin
66848871fc Multi-modal RAG template (#14186)
* OpenCLIP embeddings
* GPT-4V

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-05 13:36:38 -08:00
James Braza
3b75d37cee Adding BaseChatMessageHistory.__str__ (#14311)
Adding __str__ to base chat message history to make it easier to debug
2023-12-05 16:22:31 -05:00
James Braza
8b0060184d Fixing empty input variable crashing PromptTemplate validations (#14314)
- Fixes `input_variables=[""]` crashing validations with a template
`"{}"`
- Uses `__cause__` for proper `Exception` chaining in
`check_valid_template`
2023-12-05 13:13:08 -08:00
Leonid Ganeline
0f02e94565 docs: integrations/providers/ update (#14315)
- added missed provider files (from `integrations/Callbacks`
- updated notebooks: added links; updated into consistent formats
2023-12-05 13:05:29 -08:00
Bagatur
6607cc6eab experimental[patch]: Release 0.0.44 (#14310) 2023-12-05 12:11:42 -08:00
Eugene Yurtsev
80637727ea hide api key: arcee (#14304)
Hide API key for Arcee

---------

Co-authored-by: raphael <raph.nunes95@gmail.com>
2023-12-05 14:49:55 -05:00
Bagatur
b2e756c0a8 langchain[patch]: Release 0.0.346 (#14307) 2023-12-05 11:38:52 -08:00
Bagatur
4a5a13aab3 core[patch]: Release 0.0.10 (#14303) 2023-12-05 10:20:57 -08:00
Eugene Yurtsev
7ad75edf8b Fix rag google cloud vertex ai template (#14300)
Fix template by exposing chain correctly
2023-12-05 09:38:04 -08:00
Eun Hye Kim
f758c8adc4 Fix #11737 issue (extra_tools option of create_pandas_dataframe_agent is not working) (#13203)
- **Description:** Fix #11737 issue (extra_tools option of
create_pandas_dataframe_agent is not working),
  - **Issue:** #11737 ,
  - **Dependencies:** no,
- **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17 I needed this
method at work, so I modified it myself and used it. There is a similar
issue(#11737) and PR(#13018) of @PyroGenesis, so I combined my code at
the original PR.
You may be busy, but it would be great help for me if you checked. Thank
you.
  - **Twitter handle:** @lunara_x 

If you need an .ipynb example about this, please tag me. 
I will share what I am working on after removing any work-related
content.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 20:54:08 -08:00
Sean Bearden
77a15fa988 Added ability to pass arguments to the Playwright browser (#13146)
- **Description:** Enhanced `create_sync_playwright_browser` and
`create_async_playwright_browser` functions to accept a list of
arguments. These arguments are now forwarded to
`browser.chromium.launch()` for customizable browser instantiation.
  - **Issue:** #13143
  - **Dependencies:** None
  - **Tag maintainer:** @eyurtsev,
  - **Twitter handle:** Dr_Bearden

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 20:48:09 -08:00
Joan Fontanals
dcccf8fa66 adapt Jina Embeddings to new Jina AI Embedding API (#13658)
- **Description:** Adapt JinaEmbeddings to run with the new Jina AI
Embedding platform
- **Twitter handle:** https://twitter.com/JinaAI_

---------

Co-authored-by: Joan Fontanals Martinez <joan.fontanals.martinez@jina.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 20:40:33 -08:00
Philippe PRADOS
e0c03d6c44 Pprados/lite google drive (#13175)
- Fix bug in the document
 - Add clarification on the use of langchain-google drive.
2023-12-04 20:31:21 -08:00
guillaumedelande
ea0afd07ca Update azuresearch.py following recent change from azure-search-documents library (#13472)
- **Description:** 

Reference library azure-search-documents has been adapted in version
11.4.0:

1. Notebook explaining Azure AI Search updated with most recent info
2. HnswVectorSearchAlgorithmConfiguration --> HnswAlgorithmConfiguration
3. PrioritizedFields(prioritized_content_fields) -->
SemanticPrioritizedFields(content_fields)
4. SemanticSettings --> SemanticSearch
5. VectorSearch(algorithm_configurations) -->
VectorSearch(configurations)

--> Changes now reflected on Langchain: default vector search config
from langchain is now compatible with officially released library from
Azure.

  - **Issue:**
Issue creating a new index (due to wrong class used for default vector
search configuration) if using latest version of azure-search-documents
with current langchain version
  - **Dependencies:** azure-search-documents>=11.4.0,
  - **Tag maintainer:** ,

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-04 20:29:20 -08:00
price-deshaw
5cb3393e20 update OpenAI function agents' llm validation (#13538)
- **Description:** This PR modifies the LLM validation in OpenAI
function agents to check whether the LLM supports OpenAI functions based
on a property (`supports_oia_functions`) instead of whether the LLM
passed to the agent `isinstance` of `ChatOpenAI`. This allows classes
that extend `BaseChatModel` to be passed to these agents as long as
they've been integrated with the OpenAI APIs and have this property set,
even if they don't extend `ChatOpenAI`.
  - **Issue:** N/A
  - **Dependencies:** none
2023-12-04 20:28:13 -08:00
Max Weng
74c7b799ef migrate openai audio api (#13557)
for issue https://github.com/langchain-ai/langchain/issues/13162
migrate openai audio api, as [openai v1.0.0 Migration
Guide](https://github.com/openai/openai-python/discussions/742)

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Double Max <max@ground-map.com>
2023-12-04 20:27:54 -08:00
Arnaud Gelas
abbba6c7d8 openapi/planner.py: Deal with json in markdown output cases (#13576)
- **Description:** In openapi/planner deal with json in markdown output
cases
- **Issue:** In some cases LLMs could return json in markdown which
can't be loaded.
  - **Dependencies:**
  - **Tag maintainer:** @eyurtsev
  - **Twitter handle:**

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 20:27:22 -08:00
Harrison Chase
8eab4d95c0 Harrison/delegate from template (#14266)
Co-authored-by: M.R. Sopacua <144725145+msopacua@users.noreply.github.com>
2023-12-04 20:18:15 -08:00
Erick Friis
956d55de2b docs[patch]: chat model page names (#14264) 2023-12-04 20:08:41 -08:00
Nolan
b49104c2c9 Add missing doc key to metadata field in AzureSearch Vectorstore (#13328)
- **Description:** Adds doc key to metadata field when adding document
to Azure Search.
  - **Issue:** -,
  - **Dependencies:** -,
  - **Tag maintainer:** @eyurtsev,
  - **Twitter handle:** @finnless

Right now the document key with the name FIELDS_ID is not included in
the FIELDS_METADATA field, and therefore is not included in the Document
returned from a query. This is really annoying if you want to be able to
modify that item in the vectorstore.

Other's thoughts on this are welcome.
2023-12-04 19:53:27 -08:00
Jon Watte
e042e5df35 fix: call _on_llm_error() (#13581)
Description: There's a copy-paste typo where on_llm_error() calls
_on_chain_error() instead of _on_llm_error().
Issue: #13580 
Dependencies: None
Tag maintainer: @hwchase17 
Twitter handle: @jwatte

"Run `make format`, `make lint` and `make test` to check this locally."
The test scripts don't work in a plain Ubuntu LTS 20.04 system.
It looks like the dev container pulling is stuck. Or maybe the internet
is just ornery today.

---------

Co-authored-by: jwatte <jwatte@observeinc.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 19:44:50 -08:00
Hamza Ahmed
fcc8e5e839 Update geodataframe.py (#13573)
here it is validating shapely.geometry.point.Point: if not
isinstance(data_frame[page_content_column].iloc[0], gpd.GeoSeries):
raise ValueError(
f"Expected data_frame[{page_content_column}] to be a GeoSeries" you need
it to validate the geoSeries and not the shapely.geometry.point.Point

if not isinstance(data_frame[page_content_column], gpd.GeoSeries):
            raise ValueError(
f"Expected data_frame[{page_content_column}] to be a GeoSeries"

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-04 19:44:30 -08:00
Harrison Chase
2213fc9711 Harrison/bookend ai (#14258)
Co-authored-by: stvhu-bookend <142813359+stvhu-bookend@users.noreply.github.com>
2023-12-04 19:42:15 -08:00
cxumol
0d47d15a9f add(feat): Text Embeddings by Cloudflare Workers AI (#14220)
Add [Text Embeddings by Cloudflare Workers
AI](https://developers.cloudflare.com/workers-ai/models/text-embeddings/).
It's a new integration.
Trying to align it with its langchain-js version counterpart
[here](https://api.js.langchain.com/classes/embeddings_cloudflare_workersai.CloudflareWorkersAIEmbeddings.html).
- Dependencies: N/A
- Done `make format` `make lint` `make spell_check` `make
integration_tests` and all my changes was passed
2023-12-04 19:25:05 -08:00
Harrison Chase
c51001f01e fix comet tracer (#14259) 2023-12-04 19:03:19 -08:00
Erick Friis
4351b99d2b docs[patch]: search experiment (#14254)
- npm
- search config
- custom
2023-12-04 16:58:26 -08:00
Harrison Chase
4fb72ff76f fake consistent embeddings cleanup (#14256)
delete code that could never be reached
2023-12-04 16:55:30 -08:00
Michael Landis
e26906c1dc feat: implement max marginal relevance for momento vector index (#13619)
**Description**

Implements `max_marginal_relevance_search` and
`max_marginal_relevance_search_by_vector` for the Momento Vector Index
vectorstore.

Additionally bumps the `momento` dependency in the lock file and adds
logging to the implementation.

**Dependencies**

 updates `momento` dependency in lock file

**Tag maintainer**

@baskaryan 

**Twitter handle**

Please tag @momentohq for Momento Vector Index and @mloml for the
contribution 🙇

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-04 16:50:23 -08:00
deedy5
ee9abb6722 Bugfix duckduckgo_search news search (#13670)
- **Description:** 
Bugfix duckduckgo_search news search
  - **Issue:** 
https://github.com/langchain-ai/langchain/issues/13648
  - **Dependencies:** 
None
  - **Tag maintainer:** 
@baskaryan

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 16:48:20 -08:00
Aliaksandr Kuzmik
676a077c4e Add CometTracer (#13661)
Hi! I'm Alex, Python SDK Team Lead from
[Comet](https://www.comet.com/site/).

This PR contains our new integration between langchain and Comet -
`CometTracer` class which uses new `comet_llm` python package for
submitting data to Comet.

No additional dependencies for the langchain package are required
directly, but if the user wants to use `CometTracer`, `comet-llm>=2.0.0`
should be installed. Otherwise an exception will be raised from
`CometTracer.__init__`.

A test for the feature is included.

There is also an already existing callback (and .ipynb file with
example) which ideally should be deprecated in favor of a new tracer. I
wasn't sure how exactly you'd prefer to do it. For example we could open
a separate PR for that.

I'm open to your ideas :)
2023-12-04 16:46:48 -08:00
Harrison Chase
921c4b5597 Harrison/searchapi (#14252)
Co-authored-by: SebastjanPrachovskij <86522260+SebastjanPrachovskij@users.noreply.github.com>
2023-12-04 16:34:15 -08:00
Ravidhu
224aa5151d Fix Sagemaker Endpoint documentation (#13660)
- **Description:** fixed the transform_input method in the example., 
  - **Issue:** example didn't work,
  - **Dependencies:** None,
  - **Tag maintainer:** @baskaryan,
  - **Twitter handle:** @Ravidhu87
2023-12-04 16:28:29 -08:00
Colin Ulin
9f9cb71d26 Embaas - added backoff retries for network requests (#13679)
Running a large number of requests to Embaas' servers (or any server)
can result in intermittent network failures (both from local and
external network/service issues). This PR implements exponential backoff
retries to help mitigate this issue.
2023-12-04 16:21:35 -08:00
Erick Friis
f26d88ca60 docs[patch]: fix columns (#14251) 2023-12-04 16:03:09 -08:00
Kastan Day
65faba91ad langchain[patch]: Adding new Github functions for reading pull requests (#9027)
The Github utilities are fantastic, so I'm adding support for deeper
interaction with pull requests. Agents should read "regular" comments
and review comments, and the content of PR files (with summarization or
`ctags` abbreviations).

Progress:
- [x] Add functions to read pull requests and the full content of
modified files.
- [x] Function to use Github's built in code / issues search.

Out of scope:
- Smarter summarization of file contents of large pull requests (`tree`
output, or ctags).
- Smarter functions to checkout PRs and edit the files incrementally
before bulk committing all changes.
- Docs example for creating two agents:
- One watches issues: For every new issue, open a PR with your best
attempt at fixing it.
- The other watches PRs: For every new PR && every new comment on a PR,
check the status and try to finish the job.

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-04 15:53:36 -08:00
Hynek Kydlíček
aa8ae31e5b core[patch]: add response kwarg to on_llm_error
# Dependencies
None

# Twitter handle
@HKydlicek

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-04 15:04:48 -08:00
Leonid Ganeline
1750cc464d docs[patch]: moved vectorstore notebook file (#14181)
The `/docs/integrations/toolkits/vectorstore` page is not the
Integration page. The best place is in `/docs/modules/agents/how_to/`
- Moved the file
- Rerouted the page URL
2023-12-04 14:44:06 -08:00
Jacob Lee
a26c4a0930 Allow base_store to be used directly with MultiVectorRetriever (#14202)
Allow users to pass a generic `BaseStore[str, bytes]` to
MultiVectorRetriever, removing the need to use the `create_kv_docstore`
method. This encoding will now happen internally.

@rlancemartin @eyurtsev

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-12-04 14:43:32 -08:00
Vincent Brouwers
67662564f3 langchain[patch]: Fix config arg detection for wrapped lambdarunnable (#14230)
**Description:**
When a RunnableLambda only receives a synchronous callback, this
callback is wrapped into an async one since #13408. However, this
wrapping with `(*args, **kwargs)` causes the `accepts_config` check at
[/libs/core/langchain_core/runnables/config.py#L342](ee94ef55ee/libs/core/langchain_core/runnables/config.py (L342))
to fail, as this checks for the presence of a "config" argument in the
method signature.

Adding a `functools.wraps` around it, resolves it.
2023-12-04 14:18:30 -08:00
Jacob Lee
de86b84a70 Prefer byte store interface for Upstash BaseStore to match other Redis (#14201)
If we are not going to make the existing Docstore class also implement
`BaseStore[str, Document]`, IMO all base store implementations should
always be `[str, bytes]` so that they are more interchangeable.

CC @rlancemartin @eyurtsev
2023-12-04 14:17:33 -08:00
Harrison Chase
411aa9a41e Harrison/nasa tool (#14245)
Co-authored-by: Jacob Matias <88005863+matiasjacob25@users.noreply.github.com>
Co-authored-by: Karam Daid <karam.daid@mail.utoronto.ca>
Co-authored-by: Jumana <jumana.fanous@mail.utoronto.ca>
Co-authored-by: KaramDaid <38271127+KaramDaid@users.noreply.github.com>
Co-authored-by: Anna Chester <74325334+CodeMakesMeSmile@users.noreply.github.com>
Co-authored-by: Jumana <144748640+jfanous@users.noreply.github.com>
2023-12-04 13:43:11 -08:00
nceccarelli
5fea63327b Support Azure gov cloud in Azure Cognitive Search retriever (#13695)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description:** The existing version hardcoded search.windows.net in
the base url. This is not compatible with the gov cloud. I am allowing
the user to override the default for gov cloud support.,
  - **Issue:** N/A, did not write up in an issue,
  - **Dependencies:** None

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Nicholas Ceccarelli <nceccarelli2@moog.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 12:56:35 -08:00
ealt
e09b876863 Fixes error loading Obsidian templates (#13888)
- **Description:** Obsidian templates can include
[variables](https://help.obsidian.md/Plugins/Templates#Template+variables)
using double curly braces. `ObsidianLoader` uses PyYaml to parse the
frontmatter of documents. This parsing throws an error when encountering
variables' curly braces. This is avoided by temporarily substituting
safe strings before parsing.
  - **Issue:** #13887
  - **Tag maintainer:** @hwchase17
2023-12-04 12:55:37 -08:00
Erick Friis
f6d68d78f3 nbdoc -> quarto (#14156)
Switches to a more maintained solution for building ipynb -> md files
(`quarto`)

Also bumps us down to python3.8 because it's significantly faster in the
vercel build step. Uses default openssl version instead of upgrading as
well.
2023-12-04 12:50:56 -08:00
Nithish Raghunandanan
eecfa3f9e5 Add Couchbase document loader (#13979)
**Description:** 
Adds the document loader for [Couchbase](http://couchbase.com/), a
distributed NoSQL database.
**Dependencies:** 
Added the Couchbase SDK as an optional dependency.
**Twitter handle:** nithishr

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-04 12:28:12 -08:00
Bob Lin
805e9bfc24 Add doc for the development of core and experimental sections (#13966)
### **Description**

Hi, I just started learning the source code of `langchain` and hope to
contribute code. However, according to the instructions in the
[CONTRIBUTING.md](https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md)
document, I could not run the test command `make test` to run normally.
I found that many modules did not exist after [splitting
`langchain_core`](https://github.com/langchain-ai/langchain/discussions/13823),
so I updated the document.

### **Twitter handle** 

lin_bob57617
2023-12-04 12:27:57 -08:00
Muntaqa Mahmood
25f72944a0 Add: Steam API tool (#14008)
- **Description:** Our PR is an integration of a Steam API Tool that
makes recommendations on steam games based on user's Steam profile and
provides information on games based on user provided queries.
- **Issue:** the issue # our PR implements:
https://github.com/langchain-ai/langchain/issues/12120
- **Dependencies:** python-steam-api library, steamspypi library and
decouple library
  - **Tag maintainer:** @baskaryan, @hwchase17 
  - **Twitter handle:** N/A

Hello langchain Maintainers,

We are a team of 4 University of Toronto students contributing to
langchain as part of our course [CSCD01 (link to course
page)](https://cscd01.com/work/open-source-project). We hope our changes
help the community. We have run make format, make lint and make test
locally before submitting the PR. To our knowledge, our changes do not
introduce any new errors.

Our PR integrates the python-steam-api, steamspypi and decouple
packages. We have added integration tests to test our python API
integration into langchain and an example notebook is also provided.

Our amazing team that contributed to this PR: @JohnY2002, @shenceyang,
@andrewqian2001 and @muntaqamahmood

Thank you in advance to all the maintainers for reviewing our PR!

---------

Co-authored-by: Shence <ysc1412799032@163.com>
Co-authored-by: JohnY2002 <johnyuan0526@gmail.com>
Co-authored-by: Andrew Qian <andrewqian2001@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: JohnY <94477598+JohnY2002@users.noreply.github.com>
2023-12-04 12:27:38 -08:00
Bob Lin
cd2028288e Add openai v2 adapter (#14063)
### Description

Starting from [openai version
1.0.0](17ac677995 (module-level-client)),
the camel case form of `openai.ChatCompletion` is no longer supported
and has been changed to lowercase `openai.chat.completions`. In
addition, the returned object only accepts attribute access instead of
index access:

```python
import openai

# optional; defaults to `os.environ['OPENAI_API_KEY']`
openai.api_key = '...'

# all client options can be configured just like the `OpenAI` instantiation counterpart
openai.base_url = "https://..."
openai.default_headers = {"x-foo": "true"}

completion = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "How do I output all files in a directory using Python?",
        },
    ],
)
print(completion.choices[0].message.content)
```

So I implemented a compatible adapter that supports both attribute
access and index access:

```python
In [1]: from langchain.adapters import openai as lc_openai
   ...: messages = [{"role": "user", "content": "hi"}]

In [2]: result = lc_openai.chat.completions.create(
   ...:     messages=messages, model="gpt-3.5-turbo", temperature=0
   ...: )

In [3]: result.choices[0].message
Out[3]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [4]: result["choices"][0]["message"]
Out[4]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [5]: result = await lc_openai.chat.completions.acreate(
   ...:     messages=messages, model="gpt-3.5-turbo", temperature=0
   ...: )

In [6]: result.choices[0].message
Out[6]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [7]: result["choices"][0]["message"]
Out[7]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [8]: for rs in lc_openai.chat.completions.create(
    ...:     messages=messages, model="gpt-3.5-turbo", temperature=0, stream=True
    ...: ):
    ...:     print(rs.choices[0].delta)
    ...:     print(rs["choices"][0]["delta"])
    ...:
{'role': 'assistant', 'content': ''}
{'role': 'assistant', 'content': ''}
{'content': 'Hello'}
{'content': 'Hello'}
{'content': '!'}
{'content': '!'}

In [20]: async for rs in await lc_openai.chat.completions.acreate(
    ...:     messages=messages, model="gpt-3.5-turbo", temperature=0, stream=True
    ...: ):
    ...:     print(rs.choices[0].delta)
    ...:     print(rs["choices"][0]["delta"])
    ...:
{'role': 'assistant', 'content': ''}
{'role': 'assistant', 'content': ''}
{'content': 'Hello'}
{'content': 'Hello'}
{'content': '!'}
{'content': '!'}
...
```

### Twitter handle

[lin_bob57617](https://twitter.com/lin_bob57617)
2023-12-04 12:12:30 -08:00
billytrend-cohere
0f02081392 Add input_type override (#14068)
Add option to override input_type for cohere's v3 embeddings models

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-04 12:10:24 -08:00
Dmitrii Rashchenko
aaabc1574f Support of custom hugging face inference endpoints url (#14125)
- **Description:** to support not only publicly available Hugging Face
endpoints, but also protected ones (created with "Inference Endpoints"
Hugging Face feature), I have added ability to specify custom api_url.
But if not specified, default behaviour won't change
  - **Issue:** #9181,
  - **Dependencies:** no extra dependencies
2023-12-04 12:08:51 -08:00
Bob Lin
702a6d7044 Closed #14159 (#14165)
### Description

Fix: #14159

Use `from pydantic.v1 import BaseModel, Field` instead of `from pydantic
import BaseModel, Field`

### [lin_bob57617](https://twitter.com/lin_bob57617)
2023-12-04 12:06:04 -08:00
Perry Lee
641e401ba8 Shorten wget commands (#14211)
- **Description:** The commands can be more efficient if the output name
is set to the destined filename instead of renaming in the second
command.
2023-12-04 12:03:47 -08:00
Harrison Chase
e32185193e Harrison/embass (#14242)
Co-authored-by: Julius Lipp <lipp.julius@gmail.com>
2023-12-04 11:58:52 -08:00
umair mehmood
8504ec56e4 fixed: ModuleNotFoundError: No module named 'clarifai.auth' (#14215)
Updated the clarifai imports 

fixed: #14175 

@efriis 
@baskaryan
2023-12-04 11:53:34 -08:00
Hieu Lam
ca8a022cd9 Fixed OpenAIFunctionsAgent not returning when receiving AgentFinish (#14236)
**Description:** The way the condition is checked in the
`return_stopped_response` function of `OpenAIAgent` may not be correct,
when the value returned is `AgentFinish` from the tools it does not work
properly.


Thanks for review, @baskaryan, @eyurtsev, @hwchase17.
2023-12-04 11:43:04 -08:00
Unai Garay Maestre
6826feea14 Adds llm_chain_kwargs to BaseRetrievalQA.from_llm (#14224)
- **Description:** Adds `llm_chain_kwargs` to `BaseRetrievalQA.from_llm`
so these can be passed to the LLM at runtime,
- **Issue:** https://github.com/langchain-ai/langchain/issues/14216,

---------

Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
2023-12-04 11:34:01 -08:00
James Braza
6ce5dab38c Clarifying descriptions in GuardrailsOutputParser (#14228)
Upstreaming knowledge from
https://github.com/guardrails-ai/guardrails/discussions/473 to LangChain
2023-12-04 11:33:22 -08:00
geret1
50aee687c6 langchain[patch]: Cerebrium model_api_request deprecation (#12704)
- **Description:** As part of my conversation with Cerebrium team,
`model_api_request` will be no longer available in cerebrium lib so it
needs to be replaced.
  - **Issue:** #12705 12705,
  - **Dependencies:** Cerebrium team (agreed)
  - **Tag maintainer:** @eyurtsev 
  - **Twitter handle:** No official Twitter account sorry :D

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-04 09:26:32 -08:00
Harutaka Kawamura
ee94ef55ee docs[patch]: Update MLflow and Databricks docs (#14011)
Depends on #13699. Updates the existing mlflow and databricks examples.

---------

Co-authored-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com>
2023-12-03 16:07:09 -08:00
Leonid Ganeline
94bf733dae docs[patch]: AWS platform page update (#14160)
The `AWS` platform page has many missed integrations.
- added missed integration references to the `AWS` platform page
- added/updated descriptions and links in the referenced notebooks
- renamed two notebook files. They have file names != page Title, which
generate unordered ToC.
- reroute the URLs for renamed files
- fixed `amazon_textract` notebook: removed failed cell outputs
2023-12-03 15:42:52 -08:00
Leonid Ganeline
74d4154bcc docs[patch]: added Templates Hub menu item (#14148)
This link was missing in Docs.
Added it.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-03 15:36:35 -08:00
William FH
246dc4f9cc langchain[patch]: Pass kwargs to chat fireworks (#14183)
Otherwise `.bind()` isn't really any good
2023-12-03 15:12:02 -08:00
Kaiboon Ee
e961c57fd2 langchain[patch]: Mask API key for Arcee LLM (#14193)
- **Description:** Mask API key for Arcee LLM and its associated unit
tests
  - **Issue:** https://github.com/langchain-ai/langchain/issues/12165
  - **Dependencies:** N/A
  - **Tag maintainer:** @eyurtsev
  - **Twitter handle:** `eekaiboon`

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-03 15:11:43 -08:00
Daniyar Supiyev
092f302c0f langchain[patch]: Asynchronous human-in-the-loop callback (#14195)
**Description:** Adding a possibility to use asynchronous callback
handler in human-in-the-loop validation tool. Very useful, for example,
if you want to implement a validation over Telegram bot.
**Issue:** -
**Dependencies:** -

---------

Co-authored-by: Daniyar_Supiyev <daniyar_supiyev@epam.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-03 14:57:07 -08:00
Leonid Ganeline
c660b0cf79 docs[patch]: moved semadb.mdx file (#14204)
SemaDB.mdx file was placed with additional sub-folder:
`https://python.langchain.com/docs/integrations/providers/providers/semadb`
- Moved file to the
`https://python.langchain.com/docs/integrations/providers/semadb`
- Added a redirect for the file URL
2023-12-03 14:36:47 -08:00
Mark Cusack
16c83f786c Adds the Yellowbrick Data Warehouse as a supported vector store (#13820)
- **Description** An integration to allow the Yellowbrick Data Warehouse
to function as a vector store

---------

Co-authored-by: markcusack <markcusack@markcusacksmac.lan>
Co-authored-by: markcusack <markcusack@Mark-Cusack-sMac.local>
2023-12-03 13:35:53 -08:00
Hendrik Hogertz
e6862e6e7d Fix Azure Openai function calling in streaming mode (#13768)
- **Description**: This PR addresses an issue with the OpenAI API
streaming response, where initially the key (arguments) is provided but
the value is None. Subsequently, it updates with {"arguments": "{\n"},
leading to a type inconsistency that causes an exception. The specific
error encountered is ValueError: additional_kwargs["arguments"] already
exists in this message, but with a different type. This change aims to
resolve this inconsistency and ensure smooth API interactions.
- **Issue**: None.
- **Dependencies**: None.
- **Tag maintainer**: @eyurtsev

This is an updated version of #13229 based on the refactored code.
Credit goes to @superken01.

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-03 12:07:15 -08:00
Nicolò Boschi
e204657b3c AstraDB VectorStore: implement pre_delete_collection (#13780)
- **Description:** some vector stores have a flag for try deleting the
collection before creating it (such as ´vectorpg´). This is a useful
flag when prototyping indexing pipelines and also for integration tests.
Added the bool flag `pre_delete_collection ` to the constructor (default
False)
  - **Tag maintainer:** @hemidactylus 
  - **Twitter handle:** nicoloboschi

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-03 12:06:20 -08:00
Chelsea E. Manning
2780d2d4dd Extend OpenAIEmbeddings class to support non-tiktoken based embeddings (#13884)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description:** This extends `OpenAIEmbeddings` to add support for
non-`tiktoken` based embeddings, specifically for use with the new
`text-generation-webui` API (`--extensions openai`) which does not
support `tiktoken` encodings, but rather strings
  - **Issue:** Not found,
- **Dependencies:** HuggingFace `transformers.AutoTokenizer` is new
dependency for running the model without `tiktoken`
- **Tag maintainer:** @baskaryan based on last commit for
`langchain-core` refactor
  - **Twitter handle:** @xychelsea

Modified the tokenization process to be model-agnostic, allowing for
both OpenAI and non-OpenAI model tokenizations, by setting the new
default `bool` flag `tiktoken_enabled` to `False`. This requeires
HuggingFace’s AutoTokenizer and handling tokenization for models
requiring different preprocessing steps to generate a chunked string
request rather than a list of integers.

Updated the embeddings generation process to accommodate non-OpenAI
models. This includes converting tokenized text into embeddings using
OpenAI’s and Hugging Face’s model architectures.
 -->
2023-12-03 12:04:17 -08:00
Changgeng Zhao
9b59bde93d Update Hologres vector store: use hologres-vector (#13767)
Hi,
I made some code changes on the Hologres vector store to improve the
data insertion performance.
Also, this version of the code uses `hologres-vector` library. This
library is more convenient for us to update, and more efficient in
performance.
The code has passed the format/lint/spell check. I have run the unit
test for Hologres connecting to my own database.
Please check this PR again and tell me if anything needs to change.

Best,
Changgeng,
Developer @ Alibaba Cloud

Co-authored-by: Changgeng Zhao <zhaochanggeng.zcg@alibaba-inc.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-03 11:50:45 -08:00
Nicolò Boschi
0de7cf898d Ensure AstraDB integration tests clean up the environment (#13774)
- **Description:** currently astra_db integration tests might leave
orphan collections
  - **Tag maintainer:** @hemidactylus 
  - **Twitter handle:** nicoloboschi
2023-12-03 11:14:42 -08:00
Harrison Chase
7bc4c12477 delete stray test (#14200)
was added to an old path

also im not sure this is even really a test file? which is why i didnt
move it
2023-12-03 11:06:57 -08:00
Leonid Ganeline
283c2994de docs: Hugging Face platform page (#13831)
`Hugging Face` is definitely a platform. It includes many integrations
for many modules (LLM, Embedding, DocumentLoader, Tool)
So, a doc page was added that defines Hugging Face as a platform.
2023-12-03 11:06:43 -08:00
Chad Norvell
8a0951d934 Fix Mathpix PDF loader integration (#13949)
- **Description:** Fixes the Mathpix PDF loader API integration.
Specifically, ensures that Mathpix auth headers are provided for every
request, and ensures that we recognize all errors that can occur during
a request. Also, the option to provide API keys as kwargs never actually
worked before, but now that's fixed too.
  - **Issue:** #11249
  - **Dependencies:** None
2023-12-03 10:36:49 -08:00
gzyJoy
32d4bb4590 Added Slacktoolkit (#14012)
- **Description:** 
This PR introduces the Slack toolkit to LangChain, which allows users to
read and write to Slack using the Slack API. Specifically, we've added
the following tools.
1. get_channel: Provides a summary of all the channels in a workspace.
2. get_message: Gets the message history of a channel.
3. send_message: Sends a message to a channel.
4. schedule_message: Sends a message to a channel at a specific time and
date.

- **Issue:** This pull request addresses [Add Slack Toolkit
#11747](https://github.com/langchain-ai/langchain/issues/11747)
  - **Dependencies:** package`slack_sdk`
Note: For this toolkit to function you will need to add a Slack app to
your workspace. Additional info can be found
[here](https://slack.com/help/articles/202035138-Add-apps-to-your-Slack-workspace).

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ArianneLavada <ariannelavada@gmail.com>
Co-authored-by: ArianneLavada <84357335+ArianneLavada@users.noreply.github.com>
Co-authored-by: ariannelavada@gmail.com <you@example.com>
2023-12-03 10:25:38 -08:00
Richie
99e5ee6a84 fix(vectorstores): incorrect import for mongodb atlas DriverInfo (#14060)
- **Description:** fix `import` issue for `mongodb atlas` vectore store
integration
  - **Issue:** none
  - **Dependencies:** none

while trying to follow official `langchain`'s [mongodb integration
guide](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas),
an import error will happen.

It's caused by incorrect import location:
- `from pymongo import DriverInfo` should be `from pymongo.driver_info
import DriverInfo`
- reference: [pymongo's DriverInfo
class](https://pymongo.readthedocs.io/en/stable/api/pymongo/driver_info.html#pymongo.driver_info.DriverInfo)

Thanks!
2023-12-03 10:22:13 -08:00
ggeutzzang
03d6b94c29 Fix: (issue #14066) DOC: Summarization output broken (#14078)
- **Description:** : As described in the issue below, 
https://python.langchain.com/docs/use_cases/summarization  
I've modified the Python code in the above notebook to perform well. 

I also modified the OpenAI LLM model to the latest version as shown
below.
`gpt-3.5-turbo-16k --> gpt-3.5-turbo-1106`
This is because it seems to be a bit more responsive.
  - **Issue:** : #14066
2023-12-03 10:13:57 -08:00
James Braza
3833882ab7 Removing extra StdOutCallbackHandler overridden methods (#14136)
Unnecessarily overridden methods:

- Give the idea the subclass is doing something special (when it isn't)
- Block CTRL-click to the actual method

This PR removes some unnecessarily overridden methods in
`StdOutCallbackHandler`

Supercedes https://github.com/langchain-ai/langchain/pull/12858
2023-12-03 09:38:49 -08:00
Bob Lin
ac449f186b Update docs to use new usage in openai>1.0.0 (#14163)
### Description

Use new
[APIs](https://github.com/openai/openai-python/blob/main/api.md#finetuning)

### Twitter handle

[lin_bob57617](https://twitter.com/lin_bob57617)
2023-12-03 09:37:35 -08:00
James Braza
052e23be3e Added Python logging tracer (#14190)
This PR creates a logging handler and adds a simple unit test of it

Supercedes https://github.com/langchain-ai/langchain/pull/12862

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-03 09:36:30 -08:00
Bob Lin
1ea48a31da Update fallback cases (#14164)
### Description

The `RateLimitError` initialization method has changed after openai v1,
and the usage of `patch` needs to be changed.

### Twitter handle

[lin_bob57617](https://twitter.com/lin_bob57617)
2023-12-03 08:56:07 -08:00
Bob Lin
62505043be Closed #14069 (#14166)
### Description

Fix #14069

### Twitter handle

[lin_bob57617](https://twitter.com/lin_bob57617)
2023-12-03 08:55:25 -08:00
Yong woo Song
9938086df0 Fix Html2TextTransformer for shallow copy (#14197)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
Hi,
There is some unintended behavior in Html2TextTransformer.
The current code is **directly modifying the original documents that are
passed as arguments to the function.**
Therefore, not only the return of the function but also the input
variables are being modified simultaneously.
**To resolve this, I added unit test code as well.**

reference link: [Shallow vs Deep Copying of Python
Objects](https://realpython.com/copying-python-objects/)

Thanks! ☺️
2023-12-03 08:45:35 -08:00
h3l
818252b1f8 Fix: (issue #14127) Volc Engine MaaS import error (#14194)
- **Description:** fix Volc Engine MaaS import error
- **Issue:** [the issue # it fixes (if
applicable),](https://github.com/langchain-ai/langchain/issues/14127)
  - **Dependencies:** None
  - **Tag maintainer:** @baskaryan 
  - **Twitter handle:**

Co-authored-by: lvzhong <lvzhong@bytedance.com>
2023-12-03 08:43:23 -08:00
Leonid Ganeline
6ae0194dc7 docs: integrations/toolkits/office365 notebook update (#14188)
Added more descriptions and authentication details.
2023-12-03 08:43:00 -08:00
Bagatur
0bdb434383 langchain[patch]: Release langchain 0.0.345 (#14184) 2023-12-02 15:53:49 -08:00
Bagatur
15c04a5670 core[patch]: Release 0.0.9 (#14182) 2023-12-02 14:40:56 -08:00
James Braza
bdb6ae2ed3 core[patch]: BaseTracer helper method for Run lookup (#14139)
I observed the same run ID extraction logic is repeated many times in
`BaseTracer`.

This PR creates a helper method for DRY code.
2023-12-02 14:05:50 -08:00
Harutaka Kawamura
41ee3be95f langchain[patch]: Support passing parameters to llms.Databricks and llms.Mlflow (#14100)
Before, we need to use `params` to pass extra parameters:

```python
from langchain.llms import Databricks

Databricks(..., params={"temperature": 0.0})
```

Now, we can directly specify extra params:

```python
from langchain.llms import Databricks

Databricks(..., temperature=0.0)
```
2023-12-01 19:27:18 -08:00
Abdul
82102c99b3 langchain[patch]: Running SQLDatabaseChain adds prefix "SQLQuery:\n" (#14058)
- **Issue:** https://github.com/langchain-ai/langchain/issues/12077

---------

Co-authored-by: Abdul Kader Maliyakkal <maliyakk@amazon.com>
2023-12-01 19:26:16 -08:00
Samuel Kemp
fd781c89cc langchain[minor]: add azure ai data document loader (#13404)
This PR adds an "Azure AI data" document loader, which allows Azure AI
users to load their registered data assets as a document object in
langchain.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 19:25:55 -08:00
James Braza
24385a00de core[minor], langchain[patch], experimental[patch]: Added missing py.typed to langchain_core (#14143)
See PR title.

From what I can see, `poetry` will auto-include this. Please let me know
if I am missing something here.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 19:15:23 -08:00
quantum00549
f7c257553d langchain[patch]: fixed a bug that was causing the streaming transfer to not work… (#10827)
… properly

Fixed a bug that was causing the streaming transfer to not work
properly.
 - **Description: 
1、The on_llm_new_token method in the streaming callback can now be
called properly in streaming transfer mode.
2、In streaming transfer mode, LLM can now correctly output the complete
response instead of just the first token.
- **Tag maintainer: @wangxuqi 
- **Twitter handle: @kGX7XJjuYxzX9Km

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 18:57:50 -08:00
Eugene Yurtsev
6d0209e0aa Improve file system blob loader and generic loader (#14004)
* Add support for passing a specific file to the file system blob loader
* Allow specifying a class parameter for the parser for the generic
loader

```python

class AudioLoader(GenericLoader):
  @staticmethod
  def get_parser(**kwargs):
     return MyAudioParser(**kwargs):
```

The intent of the GenericLoader is to provide on-ramps from different
sources (e.g., web, s3, file system).

An alternative is to use pipelining syntax or creating a Pipeline

```
FileSystemBlobLoader(...) | MyAudioParser
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 21:23:40 -05:00
Erick Friis
700428593a fix broken api docs links (#14154) 2023-12-01 17:17:52 -08:00
Bagatur
340b42d8ee docs[minor]: lcel why page (#14089) 2023-12-01 16:13:31 -08:00
Lance Martin
cbe4753e1a Update Open CLIP embd (#14155)
Prior default model required a large amt of RAM and often crashed
Jupyter ntbk kernel.
2023-12-01 15:13:20 -08:00
Erick Friis
b01d9d27d9 docs[patch]: docs local build (#14152) 2023-12-01 14:03:36 -08:00
Alex Kira
0caef3cde7 Change RunnableMap to RunnableParallel for consistency (#14142)
- **Description:** Change instances of RunnableMap to RunnableParallel,
as that should be the one used going forward. This makes it consistent
across the codebase.
2023-12-01 13:36:40 -08:00
Erick Friis
96f6b90349 templates[patch]: relock templates (#14149) 2023-12-01 13:35:54 -08:00
Martin Jul
e3a7c96a8e docs[patch]: Fix minor typos (casing) in quickstart (#14138)
Fix casing of API and LangChain in the description text for the
LangServe example server.
2023-12-01 13:29:53 -08:00
Erick Friis
8cf4cb9e48 docs[patch]: Fix templates/index (#14146) 2023-12-01 13:09:36 -08:00
Amyh102
b6d26d3f9f infra[patch]: Add unit tests for Huggingface dataset loader (#14053)
- **Description:** Add unit tests for huggingface dataset loader and
sample huggingface dataset for future tests. Updates dependencies for
`datasets` module.
- Adds coverage for [previous pull
request](https://github.com/langchain-ai/langchain/pull/13864)
  - **Tag maintainer:** @hwchase17

---------

Co-authored-by: Amy Han <amyhan@Amys-Air.lan>
Co-authored-by: Amy Han <amyhan@Amys-MacBook-Air.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 12:42:31 -08:00
Alex Kira
6eb40db353 docs[patch]: Add getting started section to LCEL doc (#14045)
### Description:
Doc addition for LCEL introduction. Adds a more basic starter guide for
using LCEL.

---------

Co-authored-by: Alex Kira <akira@Alexs-MBP.local.tld>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 12:23:43 -08:00
Govinda Totla
62a3473ac0 docs[patch]: add text_splitter.py test (#14025)
Description: Add HTMLHeaderTextSplitter unit test
Dependencies: none
2023-12-01 11:57:50 -08:00
Bagatur
7d5341dbd3 docs[patch]: add contribs to readme (#14137) 2023-12-01 11:34:28 -08:00
axiangcoding
1b36ddf16c docs[patch]: add deprecated note for ErnieChatBot (#14061)
- **Description:** just a little change of ErnieChatBot class
description, sugguesting user to use more suitable class
  - **Issue:** none,
  - **Dependencies:** none,
  - **Tag maintainer:** @baskaryan ,
  - **Twitter handle:** none
2023-12-01 11:16:31 -08:00
Alex Kira
1757258b2a docs[patch]: Add mermaid JS theme dependency to docusaurus (#14051)
- **Description:** Add mermaid JS dependency and configs to
documentation. Allows inline doc diagrams in markdown.
  - **Dependencies:** NPM package @docusaurus/theme-mermaid
2023-12-01 11:06:29 -08:00
Devin Dahoon Kim
32da0a4d71 langchain[patch]: use async_embed_with_retry in _aget_len_safe_embeddings (#14110)
**Description**

`embed_with_retry` is for sync operations and not for async operations.
Use `async_embed_with_retry` for appropriate async operations.


I'm using `OpenAIEmbedding(http_client=httpx.AsyncClient())` with only
async operations.
However, I got an error when I use `embedding.aembed_documents` because
`embed_with_retry` uses sync OpenAI client with async http client.
2023-12-01 10:47:07 -08:00
lijie
371bcb7580 langchain[patch]: set maxsplit when parse python function docstring (#14121)
Description

when the desc of arg in python docstring contains ":", the
`_parse_python_function_docstring` will raise **ValueError: too many
values to unpack (expected 2)**.

A sample desc would be:
"""
Args: 
    error_arg: this is an arg with an additional ":" symbol
"""

So, set `maxsplit` parameter to fix it.
2023-12-01 10:46:53 -08:00
Harrison Chase
ae646701c4 Harrison/ibm (#14133)
Co-authored-by: Mateusz Szewczyk <139469471+MateuszOssGit@users.noreply.github.com>
2023-12-01 12:44:11 -05:00
Eugene Yurtsev
943aa01c14 Improve indexing performance for Postgres (remote database) for refresh for async API (#14132)
This PR speeds up the indexing api on the async path by batching the uid
updates in the sql record manager (which may be remote).
2023-12-01 12:10:07 -05:00
William FH
528fc76d6a Update Prompt Format Error (#14044)
The number of times I try to format a string (especially in lcel) is
embarrassingly high. Think this may be more actionable than the default
error message. Now I get nice helpful errors


```
KeyError: "Input to ChatPromptTemplate is missing variable 'input'.  Expected: ['input'] Received: ['dialogue']"
```
2023-12-01 09:06:35 -08:00
William FH
71c2e184b4 [Nits] Evaluation - Some Rendering Improvements (#14097)
- Improve rendering of aggregate results at the end
- flatten reference if present
2023-12-01 09:06:07 -08:00
Bob Lin
f15859bd86 docs[patch]: Update discord.ipynb (#14099)
### Description

Now if `example` in Message is False, it will not be displayed. Update
the output in this document.

```python
In [22]: m = HumanMessage(content="Text")

In [23]: m
Out[23]: HumanMessage(content='Text')

In [24]: m = HumanMessage(content="Text", example=True)

In [25]: m
Out[25]: HumanMessage(content='Text', example=True)
```

### Twitter handle

[lin_bob57617](https://twitter.com/lin_bob57617)
2023-12-01 08:54:31 -08:00
Lance Martin
b07a5a9509 Template for Ollama + Multi-query retriever (#14092) 2023-12-01 08:53:17 -08:00
Bob Lin
75312c3694 docs[patch]: Update facebook.ipynb (#14102)
### Description

Openai version 1.0.0 and later no longer supports the usage of camel
case, So [the
APIs](https://github.com/openai/openai-python/blob/main/api.md#finetuning)
needs to be modified.

### Twitter handle

[lin_bob57617](https://twitter.com/lin_bob57617)
2023-12-01 08:49:56 -08:00
Erick Friis
a3ae8e0a41 templates[patch]: opensearch readme update (#14103) 2023-12-01 08:48:00 -08:00
Ean Yang
ac1c8634a8 docs[patch] Update invalid guides link (#14106) 2023-12-01 08:47:38 -08:00
Mark Scannell
9b0e46dcf0 Improve indexing performance for Postgres (remote database) for refresh (#14126)
**Description:** By combining the document timestamp refresh within a
single call to update(), this enables batching of multiple documents in
a single SQL statement. This is important for non-local databases where
tens of milliseconds has a huge impact on performance when doing
document-by-document SQL statements.
**Issue:** #11935 
**Dependencies:** None
**Tag maintainer:** @eyurtsev
2023-12-01 11:36:02 -05:00
Erick Friis
b161f302ff docs[patch]: local docs build <5s (#14096) 2023-11-30 17:39:30 -08:00
Hubert Yuan
80ed588733 docs[patch]: Update metaphor_search.ipynb (#14093)
- **Description:** Touch up of the documentation page for Metaphor
Search Tool integration. Removes documentation for old built-in tool
wrapper.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-30 16:34:05 -08:00
Jacob Lee
3328507f11 langchain[patch], experimental[minor]: Adds OllamaFunctions wrapper (#13330)
CC @baskaryan @hwchase17 @jmorganca 

Having a bit of trouble importing `langchain_experimental` from a
notebook, will figure it out tomorrow

~Ah and also is blocked by #13226~

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-30 16:13:57 -08:00
Bagatur
4063bf144a langchain[patch]: release 0.0.344 (#14095) 2023-11-30 15:57:11 -08:00
Bagatur
efce352d6b core[patch]: release 0.0.8 (#14086) 2023-11-30 15:12:06 -08:00
Harutaka Kawamura
0d08a692a3 langchain[minor]: Migrate mlflow and databricks classes to deployments APIs. (#13699)
## Description

Related to https://github.com/mlflow/mlflow/pull/10420. MLflow AI
gateway will be deprecated and replaced by the `mlflow.deployments`
module. Happy to split this PR if it's too large.

```
pip install git+https://github.com/langchain-ai/langchain.git@refs/pull/13699/merge#subdirectory=libs/langchain
```

## Dependencies

Install mlflow from https://github.com/mlflow/mlflow/pull/10420:

```
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10420/merge
```

## Testing plan

The following code works fine on local and databricks:

<details><summary>Click</summary>
<p>

```python
"""
Setup
-----
mlflow deployments start-server --config-path examples/gateway/openai/config.yaml
databricks secrets create-scope <scope>
databricks secrets put-secret <scope> openai-api-key --string-value $OPENAI_API_KEY

Run
---
python /path/to/this/file.py secrets/<scope>/openai-api-key
"""
from langchain.chat_models import ChatMlflow, ChatDatabricks
from langchain.embeddings import MlflowEmbeddings, DatabricksEmbeddings
from langchain.llms import Databricks, Mlflow
from langchain.schema.messages import HumanMessage
from langchain.chains.loading import load_chain
from mlflow.deployments import get_deploy_client
import uuid
import sys
import tempfile
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

###############################
# MLflow
###############################
chat = ChatMlflow(
    target_uri="http://127.0.0.1:5000", endpoint="chat", params={"temperature": 0.1}
)
print(chat([HumanMessage(content="hello")]))

embeddings = MlflowEmbeddings(target_uri="http://127.0.0.1:5000", endpoint="embeddings")
print(embeddings.embed_query("hello")[:3])
print(embeddings.embed_documents(["hello", "world"])[0][:3])

llm = Mlflow(
    target_uri="http://127.0.0.1:5000",
    endpoint="completions",
    params={"temperature": 0.1},
)
print(llm("I am"))

llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate(
        input_variables=["adjective"],
        template="Tell me a {adjective} joke",
    ),
)
print(llm_chain.run(adjective="funny"))

# serialization/deserialization
with tempfile.TemporaryDirectory() as tmpdir:
    print(tmpdir)
    path = f"{tmpdir}/llm.yaml"
    llm_chain.save(path)
    loaded_chain = load_chain(path)
    print(loaded_chain("funny"))

###############################
# Databricks
###############################
secret = sys.argv[1]
client = get_deploy_client("databricks")

# External - chat
name = f"chat-{uuid.uuid4()}"
client.create_endpoint(
    name=name,
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "gpt-4",
                    "provider": "openai",
                    "task": "llm/v1/chat",
                    "openai_config": {
                        "openai_api_key": "{{" + secret + "}}",
                    },
                },
            }
        ],
    },
)
try:
    chat = ChatDatabricks(
        target_uri="databricks", endpoint=name, params={"temperature": 0.1}
    )
    print(chat([HumanMessage(content="hello")]))
finally:
    client.delete_endpoint(endpoint=name)

# External - embeddings
name = f"embeddings-{uuid.uuid4()}"
client.create_endpoint(
    name=name,
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "text-embedding-ada-002",
                    "provider": "openai",
                    "task": "llm/v1/embeddings",
                    "openai_config": {
                        "openai_api_key": "{{" + secret + "}}",
                    },
                },
            }
        ],
    },
)
try:
    embeddings = DatabricksEmbeddings(target_uri="databricks", endpoint=name)
    print(embeddings.embed_query("hello")[:3])
    print(embeddings.embed_documents(["hello", "world"])[0][:3])
finally:
    client.delete_endpoint(endpoint=name)

# External - completions
name = f"completions-{uuid.uuid4()}"
client.create_endpoint(
    name=name,
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "gpt-3.5-turbo-instruct",
                    "provider": "openai",
                    "task": "llm/v1/completions",
                    "openai_config": {
                        "openai_api_key": "{{" + secret + "}}",
                    },
                },
            }
        ],
    },
)
try:
    llm = Databricks(
        endpoint_name=name,
        model_kwargs={"temperature": 0.1},
    )
    print(llm("I am"))
finally:
    client.delete_endpoint(endpoint=name)


# Foundation model - chat
chat = ChatDatabricks(
    endpoint="databricks-llama-2-70b-chat", params={"temperature": 0.1}
)
print(chat([HumanMessage(content="hello")]))

# Foundation model - embeddings
embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
print(embeddings.embed_query("hello")[:3])

# Foundation model - completions
llm = Databricks(
    endpoint_name="databricks-mpt-7b-instruct", model_kwargs={"temperature": 0.1}
)
print(llm("hello"))
llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate(
        input_variables=["adjective"],
        template="Tell me a {adjective} joke",
    ),
)
print(llm_chain.run(adjective="funny"))

# serialization/deserialization
with tempfile.TemporaryDirectory() as tmpdir:
    print(tmpdir)
    path = f"{tmpdir}/llm.yaml"
    llm_chain.save(path)
    loaded_chain = load_chain(path)
    print(loaded_chain("funny"))

```

Output:

```
content='Hello! How can I assist you today?'
[-0.025058426, -0.01938856, -0.027781019]
[-0.025058426, -0.01938856, -0.027781019]
sorry, but I cannot continue the sentence as it is incomplete. Can you please provide more information or context?
Sure, here's a classic one for you:

Why don't scientists trust atoms?

Because they make up everything!
/var/folders/dz/cd_nvlf14g9g__n3ph0d_0pm0000gp/T/tmpx_4no6ad
{'adjective': 'funny', 'text': "Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"}
content='Hello! How can I assist you today?'
[-0.025058426, -0.01938856, -0.027781019]
[-0.025058426, -0.01938856, -0.027781019]
 a 23 year old female and I am currently studying for my master's degree
content="\nHello! It's nice to meet you. Is there something I can help you with or would you like to chat for a bit?"
[0.051055908203125, 0.007221221923828125, 0.003879547119140625]
[0.051055908203125, 0.007221221923828125, 0.003879547119140625]

hello back
 Well, I don't really know many jokes, but I do know this funny story...
/var/folders/dz/cd_nvlf14g9g__n3ph0d_0pm0000gp/T/tmp7_ds72ex
{'adjective': 'funny', 'text': " Well, I don't really know many jokes, but I do know this funny story..."}
```

</p>
</details>

The existing workflow doesn't break:

<details><summary>click</summary>
<p>

```python
import uuid

import mlflow
from mlflow.models import ModelSignature
from mlflow.types.schema import ColSpec, Schema


class MyModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input):
        return str(uuid.uuid4())


with mlflow.start_run():
    mlflow.pyfunc.log_model(
        "model",
        python_model=MyModel(),
        pip_requirements=["mlflow==2.8.1", "cloudpickle<3"],
        signature=ModelSignature(
            inputs=Schema(
                [
                    ColSpec("string", "prompt"),
                    ColSpec("string", "stop"),
                ]
            ),
            outputs=Schema(
                [
                    ColSpec(name=None, type="string"),
                ]
            ),
        ),
        registered_model_name=f"lang-{uuid.uuid4()}",
    )

# Manually create a serving endpoint with the registered model and run
from langchain.llms import Databricks

llm = Databricks(endpoint_name="<name>")
llm("hello")  # 9d0b2491-3d13-487c-bc02-1287f06ecae7
```

</p>
</details> 

## Follow-up tasks

(This PR is too large. I'll file a separate one for follow-up tasks.)

- Update `docs/docs/integrations/providers/mlflow_ai_gateway.mdx` and
`docs/docs/integrations/providers/databricks.md`.

---------

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-30 15:06:58 -08:00
Tyler Hutcherson
dc31714ec5 templates[patch]: Rag redis template dependency update (#13614)
- **Description:** Update RAG Redis template readme and dependencies.
2023-11-30 12:22:13 -08:00
Jeremy Naccache
a14cf87576 core[patch]: Add **kwargs to Langchain's dumps() to allow passing of json.dumps() … (#10628)
…parameters.

In Langchain's `dumps()` function, I've added a `**kwargs` parameter.
This allows users to pass additional parameters to the underlying
`json.dumps()` function, providing greater flexibility and control over
JSON serialization.

Many parameters available in `json.dumps()` can be useful or even
necessary in specific situations. For example, when using an Agent with
return_intermediate_steps set to true, the output is a list of
AgentAction objects. These objects can't be serialized without using
Langchain's `dumps()` function.

The issue arises when using the Agent with a language other than
English, which may contain non-ASCII characters like 'é'. The default
behavior of `json.dumps()` sets ensure_ascii to true, converting
`{"name": "José"}` into `{"name": "Jos\u00e9"}`. This can make the
output hard to read, especially in the case of intermediate steps in
agent logs.

By allowing users to pass additional parameters to `json.dumps()` via
Langchain's dumps(), we can solve this problem. For instance, users can
set `ensure_ascii=False` to maintain the original characters.

This update also enables users to pass other useful `json.dumps()`
parameters like `sort_keys`, providing even more flexibility.

The implementation takes into account edge cases where a user might pass
a "default" parameter, which is already defined by `dumps()`, or an
"indent" parameter, which is also predefined if `pretty=True` is set.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-30 08:52:24 -08:00
Erick Friis
8078caf764 templates[patch]: rag-google-cloud-sdp readme (#14043) 2023-11-30 08:17:51 -08:00
Yong woo Song
f4d520ccb5 Fix .env file path in integration_test README.md (#14028)
<!-- Thank you for contributing to LangChain!



Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

### Description
Hello, 

The [integration_test
README](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/tests)
was indicating incorrect paths for the `.env.example` and `.env` files.

`tests/.env.example` ->`tests/integration_tests/.env.example`

While it’s a minor error, it could **potentially lead to confusion** for
the document’s readers, so I’ve made the necessary corrections.

Thank you! ☺️

### Related Issue
- https://github.com/langchain-ai/langchain/pull/2806
2023-11-29 22:14:28 -05:00
Rohan Dey
41a4c06a94 Added support for a Pandas DataFrame OutputParser (#13257)
**Description:**

Added support for a Pandas DataFrame OutputParser with format
instructions, along with unit tests and a demo notebook. Namely, we've
added the ability to request data from a DataFrame, have the LLM parse
the request, and then use that request to retrieve a well-formatted
response.

Within LangChain, it seamlessly integrates with language models like
OpenAI's `text-davinci-003`, facilitating streamlined interaction using
the format instructions (just like the other output parsers).

This parser structures its requests as
`<operation/column/row>[<optional_array_params>]`. The instructions
detail permissible operations, valid columns, and array formats,
ensuring clarity and adherence to the required format.

For example:

- When the LLM receives the input: "Retrieve the mean of `num_legs` from
rows 1 to 3."
- The provided format instructions guide the LLM to structure the
request as: "mean:num_legs[1..3]".

The parser processes this formatted request, leveraging the LLM's
understanding to extract the mean of `num_legs` from rows 1 to 3 within
the Pandas DataFrame.

This integration allows users to communicate requests naturally, with
the LLM transforming these instructions into structured commands
understood by the `PandasDataFrameOutputParser`. The format instructions
act as a bridge between natural language queries and precise DataFrame
operations, optimizing communication and data retrieval.

**Issue:**

- https://github.com/langchain-ai/langchain/issues/11532

**Dependencies:**

No additional dependencies :)

**Tag maintainer:**

@baskaryan 

**Twitter handle:**

No need. :)

---------

Co-authored-by: Wasee Alam <waseealam@protonmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-29 22:08:50 -05:00
Masanori Taniguchi
235bdb9fa7 Support Vald secure connection (#13269)
**Description:** 
When using Vald, only insecure grpc connection was supported, so secure
connection is now supported.
In addition, grpc metadata can be added to Vald requests to enable
authentication with a token.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-29 22:07:29 -05:00
Nico Puhlmann
54355b651a Update index.mdx (#13285)
grammar correction

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-29 22:06:33 -05:00
sudranga
d1d693b2a7 Fix issue where response_if_no_docs_found is not implemented on async… (#13297)
Response_if_no_docs_found is not implemented in
ConversationalRetrievalChain for async code paths. Implemented it and
added test cases

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-29 22:06:13 -05:00
AthulVincent
67c55cb5b0 Implemented MongoDB Atlas Self-Query Retriever (#13321)
# Description 
This PR implements Self-Query Retriever for MongoDB Atlas vector store.

I've implemented the comparators and operators that are supported by
MongoDB Atlas vector store according to the section titled "Atlas Vector
Search Pre-Filter" from
https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/.

Namely:
```
allowed_comparators = [
      Comparator.EQ,
      Comparator.NE,
      Comparator.GT,
      Comparator.GTE,
      Comparator.LT,
      Comparator.LTE,
      Comparator.IN,
      Comparator.NIN,
  ]

"""Subset of allowed logical operators."""
allowed_operators = [
    Operator.AND,
    Operator.OR
]
```
Translations from comparators/operators to MongoDB Atlas filter
operators(you can find the syntax in the "Atlas Vector Search
Pre-Filter" section from the previous link) are done using the following
dictionary:
```
map_dict = {
            Operator.AND: "$and",
            Operator.OR: "$or",
            Comparator.EQ: "$eq",
            Comparator.NE: "$ne",
            Comparator.GTE: "$gte",
            Comparator.LTE: "$lte",
            Comparator.LT: "$lt",
            Comparator.GT: "$gt",
            Comparator.IN: "$in",
            Comparator.NIN: "$nin",
        }
```

In visit_structured_query() the filters are passed as "pre_filter" and
not "filter" as in the MongoDB link above since langchain's
implementation of MongoDB atlas vector
store(libs\langchain\langchain\vectorstores\mongodb_atlas.py) in
_similarity_search_with_score() sets the "filter" key to have the value
of the "pre_filter" argument.
```
params["filter"] = pre_filter
```
Test cases and documentation have also been added.

# Issue
#11616 

# Dependencies
No new dependencies have been added.

# Documentation
I have created the notebook mongodb_atlas_self_query.ipynb outlining the
steps to get the self-query mechanism working.

I worked closely with [@Farhan-Faisal](https://github.com/Farhan-Faisal)
on this PR.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-29 22:05:06 -05:00
Josef Zoller
c2e3963da4 Merriam-Webster Dictionary Tool (#12044)
# Description

We implemented a simple tool for accessing the Merriam-Webster
Collegiate Dictionary API
(https://dictionaryapi.com/products/api-collegiate-dictionary).

Here's a simple usage example:

```py
from langchain.llms import OpenAI
from langchain.agents import load_tools, initialize_agent, AgentType

llm = OpenAI()
tools = load_tools(["serpapi", "merriam-webster"], llm=llm) # Serp API gives our agent access to Google
agent = initialize_agent(
  tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run("What is the english word for the german word Himbeere? Define that word.")
```

Sample output:

```
> Entering new AgentExecutor chain...
 I need to find the english word for Himbeere and then get the definition of that word.
Action: Search
Action Input: "English word for Himbeere"
Observation: {'type': 'translation_result'}
Thought: Now I have the english word, I can look up the definition.
Action: MerriamWebster
Action Input: raspberry
Observation: Definitions of 'raspberry':

1. rasp-ber-ry, noun: any of various usually black or red edible berries that are aggregate fruits consisting of numerous small drupes on a fleshy receptacle and that are usually rounder and smaller than the closely related blackberries
2. rasp-ber-ry, noun: a perennial plant (genus Rubus) of the rose family that bears raspberries
3. rasp-ber-ry, noun: a sound of contempt made by protruding the tongue between the lips and expelling air forcibly to produce a vibration; broadly : an expression of disapproval or contempt
4. black raspberry, noun: a raspberry (Rubus occidentalis) of eastern North America that has a purplish-black fruit and is the source of several cultivated varieties —called also blackcap

Thought: I now know the final answer.
Final Answer: Raspberry is an english word for Himbeere and it is defined as any of various usually black or red edible berries that are aggregate fruits consisting of numerous small drupes on a fleshy receptacle and that are usually rounder and smaller than the closely related blackberries.

> Finished chain.
```

# Issue

This closes #12039.

# Dependencies

We added no extra dependencies.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Lara <63805048+larkgz@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-29 20:28:29 -05:00
Mohammad Mohtashim
f3dd4a10cf DROP BOX Loader Documentation Update (#14047)
- **Description:** Update the document for drop box loader + made the
messages more verbose when loading pdf file since people were getting
confused
  - **Issue:** #13952
  - **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17,

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-29 17:25:35 -08:00
Cheng (William) Huang
a00db4b28f Add multi-input Reddit search tool (#13893)
- **Description:** Added a tool called RedditSearchRun and an
accompanying API wrapper, which searches Reddit for posts with support
for time filtering, post sorting, query string and subreddit filtering.
  - **Issue:** #13891 
  - **Dependencies:** `praw` module is used to search Reddit
- **Tag maintainer:** @baskaryan , and any of the other maintainers if
needed
  - **Twitter handle:** None.

  Hello,

This is our first PR and we hope that our changes will be helpful to the
community. We have run `make format`, `make lint` and `make test`
locally before submitting the PR. To our knowledge, our changes do not
introduce any new errors.

Our PR integrates the `praw` package which is already used by
RedditPostsLoader in LangChain. Nonetheless, we have added integration
tests and edited unit tests to test our changes. An example notebook is
also provided. These changes were put together by me, @Anika2000,
@CharlesXu123, and @Jeremy-Cheng-stack

Thank you in advance to the maintainers for their time.

---------

Co-authored-by: What-Is-A-Username <49571870+What-Is-A-Username@users.noreply.github.com>
Co-authored-by: Anika2000 <anika.sultana@mail.utoronto.ca>
Co-authored-by: Jeremy Cheng <81793294+Jeremy-Cheng-stack@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-29 20:16:40 -05:00
Jawad Arshad
00a6e8962c langchain[minor]: Add serpapi tools (#13934)
- **Description:** Added some of the more endpoints supported by serpapi
that are not suported on langchain at the moment, like google trends,
google finance, google jobs, and google lens
- **Issue:** [Add support for many of the querying endpoints with
serpapi #11811](https://github.com/langchain-ai/langchain/issues/11811)

---------

Co-authored-by: zushenglu <58179949+zushenglu@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Ian Xu <ian.xu@mail.utoronto.ca>
Co-authored-by: zushenglu <zushenglu1809@gmail.com>
Co-authored-by: KevinT928 <96837880+KevinT928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-29 14:02:57 -08:00
h3l
dbaeb163aa langchain[minor]: add volcengine endpoint as LLM (#13942)
- **Description:** Volc Engine MaaS serves as an enterprise-grade,
large-model service platform designed for developers. You can visit its
homepage at https://www.volcengine.com/docs/82379/1099455 for details.
This change will facilitate developers to integrate quickly with the
platform.
  - **Issue:** None
  - **Dependencies:** volcengine
  - **Tag maintainer:** @baskaryan 
  - **Twitter handle:** @he1v3tica

---------

Co-authored-by: lvzhong <lvzhong@bytedance.com>
2023-11-29 13:16:42 -08:00
Mohammad Ahmad
1600ebe6c7 langchain[patch]: Mask API key for ForeFrontAI LLM (#14013)
- **Description:** Mask API key for ForeFrontAI LLM and associated unit
tests
  - **Issue:** https://github.com/langchain-ai/langchain/issues/12165
  - **Dependencies:** N/A
  - **Tag maintainer:** @eyurtsev 
  - **Twitter handle:** `__mmahmad__`

I made the API key non-optional since linting required adding validation
for None, but the key is required per documentation:
https://python.langchain.com/docs/integrations/llms/forefrontai
2023-11-29 13:12:19 -08:00
yoch
a0e859df51 langchain[patch]: fix cohere reranker init #12899 (#14029)
- **Description:** use post field validation for `CohereRerank`
  - **Issue:** #12899 and #13058
  - **Dependencies:** 
  - **Tag maintainer:** @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-29 12:57:06 -08:00
123-fake-st
9bd6e9df36 update pdf document loaders' metadata source to url for online pdf (#13274)
- **Description:** Update 5 pdf document loaders in
`langchain.document_loaders.pdf`, to store a url in the metadata
(instead of a temporary, local file path) if the user provides a web
path to a pdf: `PyPDFium2Loader`, `PDFMinerLoader`,
`PDFMinerPDFasHTMLLoader`, `PyMuPDFLoader`, and `PDFPlumberLoader` were
updated.
- The updates follow the approach used to update `PyPDFLoader` for the
same behavior in #12092
- The `PyMuPDFLoader` changes required additional work in updating
`langchain.document_loaders.parsers.pdf.PyMuPDFParser` to be able to
process either an `io.BufferedReader` (from local pdf) or `io.BytesIO`
(from online pdf)
- The `PDFMinerPDFasHTMLLoader` change used a simpler approach since the
metadata is assigned by the loader and not the parser
  - **Issue:** Fixes #7034
  - **Dependencies:** None


```python
# PyPDFium2Loader example:
# old behavior
>>> from langchain.document_loaders import PyPDFium2Loader
>>> loader = PyPDFium2Loader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': '/var/folders/7z/d5dt407n673drh1f5cm8spj40000gn/T/tmpm5oqa92f/tmp.pdf', 'page': 0}

# new behavior
>>> from langchain.document_loaders import PyPDFium2Loader
>>> loader = PyPDFium2Loader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0}
```
2023-11-29 15:07:46 -05:00
Toshish Jawale
6f64cb5078 Remove deprecated param and flexibility for prompt (#13310)
- **Description:** Updated to remove deprecated parameter penalty_alpha,
and use string variation of prompt rather than json object for better
flexibility. - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** N/A
  - **Tag maintainer:** @eyurtsev
  - **Twitter handle:** @symbldotai

---------

Co-authored-by: toshishjawale <toshish@symbl.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-29 14:48:25 -05:00
Tomaz Bratanic
3eb391561b langchain[minor]: Reduce the number of tokens required to describe a Cypher/Neo4j schema (#13851)
Instead of using JSON-like syntax to describe node and relationship
properties we changed to a shorter and more concise schema description

Old:

```
        Node properties are the following:
        [{'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Movie'}, {'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Actor'}]
        Relationship properties are the following:
        []
        The relationships are the following:
        ['(:Actor)-[:ACTED_IN]->(:Movie)']
```

New:

```
Node properties are the following:
Movie {name: STRING},Actor {name: STRING}
Relationship properties are the following:

The relationships are the following:
(:Actor)-[:ACTED_IN]->(:Movie)
```
2023-11-29 11:13:12 -08:00
Sauhaard
7ec4dbeb80 langchain[minor]: Add StackExchange API integration (#14002)
Implements
[#12115](https://github.com/langchain-ai/langchain/issues/12115)

Who can review?
@baskaryan , @eyurtsev , @hwchase17 

Integrated Stack Exchange API into Langchain, enabling access to diverse
communities within the platform. This addition enhances Langchain's
capabilities by allowing users to query Stack Exchange for specialized
information and engage in discussions. The integration provides seamless
interaction with Stack Exchange content, offering content from varied
knowledge repositories.

A notebook example and test cases were included to demonstrate the
functionality and reliability of this integration.

- Add StackExchange as a tool.
- Add unit test for the StackExchange wrapper and tool.
- Add documentation for the StackExchange wrapper and tool.

If you have time, could you please review the code and provide any
feedback as necessary! My team is welcome to any suggestions.

---------

Co-authored-by: Yuval Kamani <yuvalkamani@gmail.com>
Co-authored-by: Aryan Thakur <aryanthakur@Aryans-MacBook-Pro.local>
Co-authored-by: Manas1818 <79381912+manas1818@users.noreply.github.com>
Co-authored-by: aryan-thakur <61063777+aryan-thakur@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-29 10:32:07 -08:00
Bagatur
d4405bc94e langchain[patch]: Release 0.0.343 (#14037) 2023-11-29 10:31:03 -08:00
Erick Friis
3c29b0ded5 templates[patch]: template pyproject updates (#14035) 2023-11-29 10:21:18 -08:00
Yves Zumbühl
9c0ad0cebb langchain[patch]: Improve HyDe with custom prompts and ability to supply the run_manager (#14016)
- **Description:** The class allows to only select between a few
predefined prompts from the paper. That is not ideal, since other use
cases might need a custom prompt. The changes made allow for this. To be
able to monitor those, I also added functionality to supply a custom
run_manager.
  - **Issue:** no issue, but a new feature,
  - **Dependencies:** none,
  - **Tag maintainer:** @hwchase17,
  - **Twitter handle:** @yvesloy

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-29 09:40:53 -08:00
Anton Romanov
4964278ce4 docs[patch]: Update typo in map.ipynb (#14030)
fix the typo in docs, using "with" instead of "when"
2023-11-29 09:14:29 -08:00
Chad Norvell
1c4bfb8c5f langchain[patch]: Mathpix PDF loader supports arbitrary extra params (#13950)
- **Description:** Support providing whatever extra parameters you want
to the Mathpix PDF loader API request.
  - **Issue:** #12773
  - **Dependencies:** None

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-29 02:12:32 -08:00
Unai Garay Maestre
9e2ae866c4 langchain[patch]: Adds progress bar to GooglePalmEmbeddings (#13812)
- **Description:** Adds a tqdm progress bar to GooglePalmEmbeddings when
embedding a list.
  - **Issue:** #13637
  - **Dependencies:** TQDM as a main dependency (instead of extra)


Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>

---------

Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-29 01:58:53 -08:00
Richie
1cd9d5f332 docs[patch]: fix typo langchain version for mongodb integration (#14006)
- **Description:** update minimal supported langchain version for
[mongodb atlast integration
webpage](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas)
- **Issue:** none
- **Dependencies:** none

-----

Just fixing a typo. 
In [mongodb atlas vectorstore integration
page](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas),
`langchain` support for `$vectorSearch MQL stage` should be `0.0.305`
rather than `0.0.35`
2023-11-28 21:20:30 -08:00
David Norman
a578076aea Mask api key for Together LLM (#13981)
- **Description:** Add unit tests and mask api key for Together LLM
- **Issue:** the issue
https://github.com/langchain-ai/langchain/issues/12165 ,
  - **Dependencies:** N/A
  - **Tag maintainer:** ?,
  - **Twitter handle:** N/A

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-11-28 22:57:40 -05:00
Pavel Zwerschke
5f5c701f2c docs: Install langsmith from conda-forge (#13335)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

langsmith is available on conda-forge as well and also a dependency of
the package so it gets installed either way by conda
306ed13308/recipe/meta.yaml (L43)
2023-11-28 22:44:02 -05:00
Piotr Ząbek
d0b818b634 DOCS: added missing imports (#13736) (#13737)
- **Description:** Fixed missing imports in docs 
- **Issue:**
[#13736](https://github.com/langchain-ai/langchain/issues/13736)
- **Dependencies:** N/A
2023-11-28 22:42:43 -05:00
Johnny
6463d2d0bd small fix matching engine AttributeError - object has no attribute (#13763)
This PR is fixing an attributeError: object endpoint has no attribute
"_public_match_client" when using gcp matching engine with private VPC
network.

@baskaryan, @eyurtsev, @hwchase17.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-28 22:42:29 -05:00
Amyh102
750485eaa8 Add object parsing functionality (#13864)
* **Description:** Parses huggingface dataset Sequence objects into
strings for Document loading.
* **Issue:** Fixes #10674 
* **Tag maintainter:** @baskaryan @eyurtsev

---------

Co-authored-by: Amy Han <amyhan@Amys-Air.lan>
Co-authored-by: Amy Han <amyhan@Amys-MacBook-Air.local>
2023-11-28 22:33:16 -05:00
ggeutzzang
981f78f920 Fix: (issue #13825) Getting an error with DallEAPIWrapper (#13874)
- **Description:** As of OpenAI's Python package 1.0, the existing
DallEAPIWrapper does not work correctly, so the example in the LangChain
Documentation link below does not work either.

https://python.langchain.com/docs/integrations/tools/dalle_image_generator
Also, since OpenAI only supports DALL-E version 2 or version 3, I
modified the DallEAPIWrapper to support it.

  - **Issue:** #13825 

  - **Twitter handle:** ggeutzzang
2023-11-28 22:31:25 -05:00
Kunal
74045bf5c0 max length attribute for spacy splitter for large docs (#13875)
For large size documents spacy splitter doesn't work it throws an error
as shown in below screenshot.
Reason its default max_length is 1000000 and there is no option to
increase it. So i added it in this PR.


![image](https://github.com/langchain-ai/langchain/assets/73680423/613625c3-0e21-4834-9aad-2a73cf56eecc)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-28 22:30:26 -05:00
Yusuf Khan
0bc7c1b5b4 Add Outline provider doc (#13938)
- **Description:** Added a provider doc to `docs/integrations/providers`
for the new Outline integration in #13889
  - **Tag maintainer:** @baskaryan
2023-11-28 22:29:30 -05:00
colton
643d28847d [docs] fix reduce prompt in summarization example (#13726)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Small fix to _summarization_ example, `reduce_template` should use
`{docs}` variable.

Bug likely introduced as following code suggests using
`hub.pull("rlm/map-prompt")` instead of defined prompt.
2023-11-28 22:22:42 -05:00
Wang Wei
fe9341a29c feat: Add ERNIE-Bot-8K model support for ErnieBotChat. (#13716)
- **Description:** According to the document
https://cloud.baidu.com/doc/WENXINWORKSHOP/s/6lp69is2a, add ERNIE-Bot-8K
model support for ErnieBotChat.
- **Dependencies:** Before using the ERNIE-Bot-8K, you should have the
model's access authority.
2023-11-28 22:22:23 -05:00
Leonid Ganeline
5c28bb63dd docs microsoft page updates (#14000)
The Excel, PowerPoint and SharePoint document loaders were missed in the
`Microsoft` platform page.
- added these references
2023-11-28 22:20:21 -05:00
Leonid Ganeline
15b32cfcd4 docs OpenAI platform page update (#14001)
Missed the OpenAI adapter reference in the OpenAI platform page
- Added this reference
2023-11-28 22:08:21 -05:00
Burak Ömür
0e462b72ef Update openai/create_llm_result function to consider kwargs (#13815)
Replace this entire comment with:
- **Description:** updates `create_llm_result` function within
`openai.py` to consider latest `params`,
  - **Issue:** #8928
  - **Dependencies:** -,
  - **Tag maintainer:** -
  - **Twitter handle:** [burkomr](https://twitter.com/burkomr)

<!-- If no one reviews your PR within a few days, please @-mention one
of @baskaryan, @eyurtsev, @hwchase17. -->

---------

Co-authored-by: Burak Ömür <burakomur@retorio.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-28 22:02:38 -05:00
chyroc
f97ab84c6b Merge pull request #13907
* feat: mask api_key for jina
2023-11-28 21:24:50 -05:00
nhywieza
9b86fb3fcb secretStr for baichuan chat model api key (#13946)
Merge pull request #13946
* secretStr for baichuan chat model api key
2023-11-28 21:20:23 -05:00
卢靖轩
aff1dba252 Merge pull request #13945
* feat: mask api key for nlpcloud
2023-11-28 21:16:36 -05:00
Leonid Kuligin
85bb3a418c Switched VertexAI models from preview (#13657)
Replace this entire comment with:
- **Description:** VertexAI models are now GA, moved away from using
preview ones from the SDK
  - **Issue:** #13606

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-11-28 20:38:04 -05:00
WaseemH
a47f1da884 docs[patch]: RAG Cookbook example fix (#13914)
### Description:
Hey 👋🏽  this is a small docs example fix. Hoping it helps future developers  who are working with Langchain.

### Problem:
Take a look at the original example code. You were not able to get the `dialogue_turn[0]` while it was a tuple.

Original code:
```python
def _format_chat_history(chat_history: List[Tuple]) -> str:
    buffer = ""
    for dialogue_turn in chat_history:
        human = "Human: " + dialogue_turn[0]
        ai = "Assistant: " + dialogue_turn[1]
        buffer += "\n" + "\n".join([human, ai])
    return buffer
```
In the original code you were getting this error:
```bash
    human = "Human: " + dialogue_turn[0].content
                        ~~~~~~~~~~~~~^^^
TypeError: 'HumanMessage' object is not subscriptable
```
### Solution:
The fix is to just for loop over the chat history and look to see if its a human or ai message and add it to the buffer.
2023-11-28 17:37:03 -08:00
Erick Friis
5eca1bd93f Library Licenses (#13300)
Same change as #8403 but in other libs

also updates (c) LangChain Inc. instead of @hwchase17
2023-11-28 17:34:27 -08:00
Bagatur
14799b139a infra[patch]: add base deps and fix docs lint (#13998) 2023-11-28 17:27:37 -08:00
Théo LEBRUN
926d4cfda7 Set default region from boto3 session for Bedrock (#13694)
- **Description:** Set default region from boto3 session for Bedrock 
- **Issue:** #13683
2023-11-28 20:26:54 -05:00
Snow
1a33e5b500 Repair Wikipedia document loader load_max_docs and improve test coverage. (#13769)
**Description:** 

Repair Wikipedia document loader `load_max_docs` and improve test
coverage.

**Issue:** 

The Wikipedia document loader was not respecting the `load_max_docs`
paramater (not reported) and would always return a maximum of 10
documents. This is because the API wrapper (in `utilities/wikipedia.py`)
wasn't passing `top_k_results` to the underlying [Wikipedia
library](https://wikipedia.readthedocs.io/en/latest/code.html#module-wikipedia).
By default this library returns 10 results.

The default number of results for the document loader has been reduced
from 100 to 25. This is because loading 100 results takes a very long
time and is an inconvenient default. It should possibly be 10.

In addition, the documentation for the loader reported that there was a
hard limit (300) on the number of documents returned. In actuality 300
is the maximum Wikipedia query character length set by the API wrapper.

Tests have been added for the document loader (previously missing) and
to test the correct numbers of documents are being returned by each
class, both by default, and when overridden. Also repaired is the
`assert_docs` test which has been updated to correctly test for the
default metadata (which includes `source` in recent releases).

**Dependencies:** 
nil

**Tag maintainer:**
@leo-gan

**Twitter handle:**
@queenvictoria
2023-11-28 20:26:40 -05:00
Bob Lin
04c4878306 Remove python_repl from _BASE_TOOLS (#13962)
### **Description:**

Previously `python_repl` was a built-in tool, but now it has been moved
to `langchain_experimental`.

When I use `load_tools` I get an error:

```python
In [1]: from langchain.agents import load_tools

In [2]: load_tools(["python_repl"])
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[2], line 1
----> 1 load_tools(["python_repl"])

File ~/workspace/langchain/libs/langchain/langchain/agents/load_tools.py:530, in load_tools(tool_names, llm, callbacks, **kwargs)
    528     tool_names.extend(requests_method_tools)
    529 elif name in _BASE_TOOLS:
--> 530     tools.append(_BASE_TOOLS[name]())
    531 elif name in _LLM_TOOLS:
    532     if llm is None:

File ~/workspace/langchain/libs/langchain/langchain/agents/load_tools.py:84, in _get_python_repl()
     83 def _get_python_repl() -> BaseTool:
---> 84     raise ImportError(
     85         "This tool has been moved to langchain experiment. "
     86         "This tool has access to a python REPL. "
     87         "For best practices make sure to sandbox this tool. "
     88         "Read https://github.com/langchain-ai/langchain/blob/master/SECURITY.md "
     89         "To keep using this code as is, install langchain experimental and "
     90         "update relevant imports replacing 'langchain' with 'langchain_experimental'"
     91     )

ImportError: This tool has been moved to langchain experiment. This tool has access to a python REPL. For best practices make sure to sandbox this tool. Read https://github.com/langchain-ai/langchain/blob/master/SECURITY.md To keep using this code as is, install langchain experimental and update relevant imports replacing 'langchain' with 'langchain_experimental'
```

In this case, it will be very confusing. I think it is no longer a
built-in tool now, so it can be removed from `_BASE_TOOLS`

### **Issue:** 

https://github.com/langchain-ai/langchain/issues/13858,
https://github.com/langchain-ai/langchain/issues/13859,
https://github.com/langchain-ai/langchain/issues/13856
### **Twitter handle:** 

[lin_bob57617](https://twitter.com/lin_bob57617)
2023-11-28 20:13:54 -05:00
Leonid Ganeline
52eee458bb renamed google_vertex_ai_vector_search notebook (#13484)
The `integrations/vectorstores/matchingengine.ipynb` example has the
"Google Vertex AI Vector Search" title. This place this Title in the
wrong order in the ToC (it is sorted by the file name).
- Renamed `integrations/vectorstores/matchingengine.ipynb` into
`integrations/vectorstores/google_vertex_ai_vector_search.ipynb`.
- Updated a correspondent comment in docstring
- Rerouted old URL to a new URL

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-28 16:58:29 -08:00
Leonid Ganeline
f5326cfb4e docs[patch]: link to LangSmith docs (#13740)
It happens that there is no link to the LangSmith Docs from the LangChain Docs.
Added this link
2023-11-28 16:44:45 -08:00
Leonid Ganeline
bf5787f58b experimental[patch]: fixed namespace bug (#13585)
It was :
`from langchain.schema.prompts import BasePromptTemplate`
but because of the breaking change in the ns, it is now
`from langchain.schema.prompt_template import BasePromptTemplate`

This bug prevents building the API Reference for the langchain_experimental
2023-11-28 16:40:27 -08:00
Leonid Ganeline
1ab8a14742 docs[patch]: top menu (#13748)
Addressed this issue with the top menu: It allocates too much space. If the screen is small, then the top menu items are split into two lines and look unreadable.
Another issue is with several top menu items: "Chat our docs" and "Also by LangChain". They are compound of several words which also hurts readability. The top menu items should be 1-word size.
Updates:
- "Chat our docs" -> "Chat" (the meaning is clean after clicking/opening the item)
- "Also by LangChain" -> "🦜🔗"
- "🦜🔗" moved before "Chat" item. This new item is partially copied from the first left item, the "🦜🔗 LangChain". This design (with two 🦜🔗 elements, visually splits the top menu into two parts. The first item in each part holds the 🦜🔗 symbols and, when we click the second 🦜🔗 item, it opens the drop-down menu. So, we've got two visually similar parts, which visually split the top menu on the right side: the LangChain Docs (and Doc-related items) and the lift side: other LangChain.ai (company) products/docs.
2023-11-28 16:35:38 -08:00
Bob Lin
41b3968d39 docs[patch]: Update CONTRIBUTING.md doc (#13965)
- **Description:** The new demo notebook should be placed in
[docs/docs/modules](https://github.com/langchain-ai/langchain/tree/master/docs/docs/modules)
  - **Twitter handle:**  lin_bob57617

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-28 16:32:25 -08:00
Taqi Jaffri
144710ad9a langchain[minor]: Updated DocugamiLoader, includes breaking changes (#13265)
There are the following main changes in this PR:

1. Rewrite of the DocugamiLoader to not do any XML parsing of the DGML
format internally, and instead use the `dgml-utils` library we are
separately working on. This is a very lightweight dependency.
2. Added MMR search type as an option to multi-vector retriever, similar
to other retrievers. MMR is especially useful when using Docugami for
RAG since we deal with large sets of documents within which a few might
be duplicates and straight similarity based search doesn't give great
results in many cases.

We are @docugami on twitter, and I am @tjaffri

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
2023-11-28 15:56:22 -08:00
Bagatur
a20e8f8bb0 experimental[patch]: release 0.0.43 (#13570) 2023-11-28 15:38:09 -08:00
juan-calvo-datatonic
6137894008 templates[minor]: Add rag google sensitive data protection template (#13921)
This is a template demonstrating how to utilize Google Sensitive Data
Protection in conjunction with ChatVertexAI(). Tagging you @efriis as
you reviewed my last template. :) Thanks!

Proof of successful execution: 

![image](https://github.com/langchain-ai/langchain/assets/82172964/e4d678aa-85c8-482b-b09d-81fe7e912dd4)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-28 15:15:58 -08:00
Erick Friis
8b9dc5e6d3 langchain[patch]: contributing test guide update (#13993) 2023-11-28 14:38:11 -08:00
Bagatur
95a472a85f docs[patch]: install local core (#13990) 2023-11-28 14:36:22 -08:00
Bagatur
d8fe987ef5 langchain[patch]: release 0.0.342 (#13992) 2023-11-28 14:34:57 -08:00
Bagatur
61ec71064a docs[patch]: update stack diagram (#13902) 2023-11-28 14:19:13 -08:00
david qiu
9fb6805be4 langchain[minor]: Add retriever for Knowledge Bases for Amazon Bedrock (#13980)
- **Description:** Adds a retriever implementation for [Knowledge Bases
for Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/), a
new service announced at AWS re:Invent, shortly before this PR was
opened. This depends on the `bedrock-agent-runtime` service, which will
be included in a future version of `boto3` and of `botocore`. We will
open a follow-up PR documenting the minimum required versions of `boto3`
and `botocore` after that information is available.
  - **Issue:** N/A
  - **Dependencies:** `boto3>=1.33.2, botocore>=1.33.2`
  - **Tag maintainer:** @baskaryan
  - **Twitter handles:** `@pjain7` `@dead_letter_q`

This PR includes a documentation notebook under
`docs/docs/integrations/retrievers`, which I (@dlqqq) have verified
independently.

EDIT: `bedrock-agent-runtime` service is now included in
`boto3>=1.33.2`:
5cf793f493

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-28 14:10:23 -08:00
Bagatur
1aed2d1f08 core[patch]: release 0.0.7 (#13989) 2023-11-28 14:05:01 -08:00
David Duong
eb67f07e32 Track RunnableAssign as a separate run trace (#13972)
Addressing incorrect order being sent to callbacks / tracers, due to the
nature of threading

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-11-28 22:02:31 +00:00
Nuno Campos
0f255bb6c4 In Runnable.stream_log build up final_output from adding output chunks (#12781)
Add arg to omit streamed_output list, in cases where final_output is
enough this saves bandwidth

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-28 21:50:41 +00:00
Nuno Campos
970fe23feb Fixes for opengpts release (#13960) 2023-11-28 21:49:43 +00:00
David Duong
947daaf833 Exclude Bedrock client and credentials_profile_name fields from serialisation (#13603) 2023-11-28 16:34:46 -05:00
Bagatur
48fbc5513d infra[patch], langchain[patch]: fix test deps and upper bound langchain dep on core(#13984) 2023-11-28 13:26:15 -08:00
Stefano Lottini
1fd724293b Astra DB vector store, move constructor docstring to class docstring (#13784)
This PR rearranges the docstring for the `AstraDB` vector store class so
as to have all useful information in the _class_ docstring for ease of
reading.

(incidentally, due to an oversight, the docstring that was in the
constructor ended up buried below some lines of code, thereby
disappearing altogether from accessibility. Apologies.)
2023-11-28 16:25:44 -05:00
Johannes Foulds
fc40bd4cdb AnthropicFunctions function_call compatibility (#13901)
- **Description:** Updates to `AnthropicFunctions` to be compatible with
the OpenAI `function_call` functionality.
- **Issue:** The functionality to indicate `auto`, `none` and a forced
function_call was not completely implemented in the existing code.
  - **Dependencies:** None
- **Tag maintainer:** @baskaryan , and any of the other maintainers if
needed.
  - **Twitter handle:** None

I have specifically tested this functionality via AWS Bedrock with the
Claude-2 and Claude-Instant models.
2023-11-28 16:22:55 -05:00
Varun
14cc907d35 Update the stable docs link (#13798)
- **Description:** Point to the stable version of documentation, 
  - **Twitter handle:** varunzxzx
2023-11-28 21:11:16 +00:00
mengjincn
05ea4fd37d fix merge None value and non None value error (#13703)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-28 15:49:56 -05:00
Amélie
d2cad53ec0 Fix broken link on Meilisearch vector-store documentation (#13604)
- **Description:** dead link replacement 
  - **Issue:** no open issue

**Note:**
Hi langchain team,
Sorry to open a PR for this concern but we realized that one of the
links present in the documentation booklet was broken 😄
2023-11-28 15:49:32 -05:00
Ali Orozgani
32d794f5a3 iMessage loader: implement message content extraction from attributed… (#13634)
- **Description:** We are adding functionality to extract message
content from the `attributedBody` field of the database, in case the
content is not in the `text` field.
  - **Issue:** Closes #13326 and #10680 
  - **Dependencies:** None.
  - **Tag maintainer:** @eyurtsev, @hwchase17

---------

Co-authored-by: onotate <johnp.pham@mail.utoronto.ca>
2023-11-28 15:45:43 -05:00
William FH
e5256bcb69 [Evals] Add Project Tags (#13982)
Add them to project extra
2023-11-28 11:38:59 -08:00
Rihards Gravis
9e017ff6ba docs[patch]: Reduce largest static image file size (#13508)
- **Description:** Reduce image asset file size used in documentation by
running them via lossless image optimization
([tinypng](https://www.npmjs.com/package/tinypng-cli) was used in this
case). Images wider than 1916px (the maximum width of an image displayed
in documentation) where downsized.
- **Issue:** No issue is created for this, but the large image file
assets caused slow documentation load times
  - **Dependencies:** No dependencies affected
2023-11-28 13:00:53 -05:00
Nuno Campos
e0bcc98436 infra[patch]: Use langchain core in-tree as a dev dependency (#13957)
Using the published version means master is broken for contributors
whenever we make changes in one lib that depend on the other.
2023-11-28 09:23:43 -08:00
unifyh
2703a1b061 Fix MarkdownHeaderTextSplitter not recognizing tilde-fenced code blocks (#13511)
- **Description:** Previously `MarkdownHeaderTextSplitter` did not
consider tilde-fenced code blocks
(https://spec.commonmark.org/0.30/#fenced-code-blocks). This PR fixes
that.
   ````md
   # Bug caused by previous implementation:
   ~~~py
   foo()
   # This is a comment that would be considered header
   bar()
   ~~~
   ````
 - **Tag maintainer:** @baskaryan
2023-11-28 11:52:38 -05:00
Leonid Ganeline
7929b26017 office365 toolkit bug fixes (#13618)
Several bug fixes:
- emails: instead of `bcc` the `cc` is used.
- errors in the truncation descriptions
- no truncation of the `message_search`
Several updates:
- generalized UTC format 
- truncation limit can be changed now in _call()
2023-11-28 11:49:24 -05:00
William FH
60309341bd Eval Error Key (#13974) 2023-11-28 08:38:30 -08:00
Erick Friis
f9bef600f1 RELEASE: core 0.0.7 (#13973) 2023-11-28 10:28:28 -05:00
Nicolas Bondoux
e17edc4d0b RunnableLambda: create afunc instance from func when not provided (#13408)
Fixes #13407.

This workaround consists in letting the RunnableLambda create its
self.afunc from its self.func when self.afunc is not provided; the
change has no dependency.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
2023-11-28 11:18:26 +00:00
Nuno Campos
391f200eaa Implement stream() and astream() for agents (#12783)
```
---- chunk 1
{'actions': [AgentActionMessageLog(tool='Search', tool_input="Leo DiCaprio's current girlfriend", log="\nInvoking: `Search` with `Leo DiCaprio's current girlfriend`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}})])],
 'messages': [AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}})]}
---- chunk 2
{'messages': [FunctionMessage(content="According to Us, the 48-year-old actor is now “exclusively” dating Italian model Vittoria Ceretti. A source told Us that DiCaprio is “completely smitten” with Ceretti, and their relationship is “going so well that Leo's actually being exclusive.”", name='Search')],
 'steps': [AgentStep(action=AgentActionMessageLog(tool='Search', tool_input="Leo DiCaprio's current girlfriend", log="\nInvoking: `Search` with `Leo DiCaprio's current girlfriend`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}})]), observation="According to Us, the 48-year-old actor is now “exclusively” dating Italian model Vittoria Ceretti. A source told Us that DiCaprio is “completely smitten” with Ceretti, and their relationship is “going so well that Leo's actually being exclusive.”")]}
---- chunk 3
{'actions': [AgentActionMessageLog(tool='Search', tool_input='Vittoria Ceretti age', log='\nInvoking: `Search` with `Vittoria Ceretti age`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Vittoria Ceretti age"\n}'}})])],
 'messages': [AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Vittoria Ceretti age"\n}'}})]}
---- chunk 4
{'messages': [FunctionMessage(content='25 years', name='Search')],
 'steps': [AgentStep(action=AgentActionMessageLog(tool='Search', tool_input='Vittoria Ceretti age', log='\nInvoking: `Search` with `Vittoria Ceretti age`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Vittoria Ceretti age"\n}'}})]), observation='25 years')]}
---- chunk 5
{'actions': [AgentActionMessageLog(tool='Calculator', tool_input='25^0.43', log='\nInvoking: `Calculator` with `25^0.43`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n  "__arg1": "25^0.43"\n}'}})])],
 'messages': [AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n  "__arg1": "25^0.43"\n}'}})]}
---- chunk 6
{'messages': [FunctionMessage(content='Answer: 3.991298452658078', name='Calculator')],
 'steps': [AgentStep(action=AgentActionMessageLog(tool='Calculator', tool_input='25^0.43', log='\nInvoking: `Calculator` with `25^0.43`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n  "__arg1": "25^0.43"\n}'}})]), observation='Answer: 3.991298452658078')]}
---- chunk 7
{'messages': [AIMessage(content="Leonardo DiCaprio's current girlfriend is the Italian model Vittoria Ceretti, who is 25 years old. Her age raised to the 0.43 power is approximately 3.99.")],
 'output': "Leonardo DiCaprio's current girlfriend is the Italian model "
           'Vittoria Ceretti, who is 25 years old. Her age raised to the 0.43 '
           'power is approximately 3.99.'}
---- final
{'actions': [AgentActionMessageLog(tool='Search', tool_input="Leo DiCaprio's current girlfriend", log="\nInvoking: `Search` with `Leo DiCaprio's current girlfriend`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}})]),
             AgentActionMessageLog(tool='Search', tool_input='Vittoria Ceretti age', log='\nInvoking: `Search` with `Vittoria Ceretti age`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Vittoria Ceretti age"\n}'}})]),
             AgentActionMessageLog(tool='Calculator', tool_input='25^0.43', log='\nInvoking: `Calculator` with `25^0.43`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n  "__arg1": "25^0.43"\n}'}})])],
 'messages': [AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}}),
              FunctionMessage(content="According to Us, the 48-year-old actor is now “exclusively” dating Italian model Vittoria Ceretti. A source told Us that DiCaprio is “completely smitten” with Ceretti, and their relationship is “going so well that Leo's actually being exclusive.”", name='Search'),
              AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Vittoria Ceretti age"\n}'}}),
              FunctionMessage(content='25 years', name='Search'),
              AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n  "__arg1": "25^0.43"\n}'}}),
              FunctionMessage(content='Answer: 3.991298452658078', name='Calculator'),
              AIMessage(content="Leonardo DiCaprio's current girlfriend is the Italian model Vittoria Ceretti, who is 25 years old. Her age raised to the 0.43 power is approximately 3.99.")],
 'output': "Leonardo DiCaprio's current girlfriend is the Italian model "
           'Vittoria Ceretti, who is 25 years old. Her age raised to the 0.43 '
           'power is approximately 3.99.',
 'steps': [AgentStep(action=AgentActionMessageLog(tool='Search', tool_input="Leo DiCaprio's current girlfriend", log="\nInvoking: `Search` with `Leo DiCaprio's current girlfriend`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Leo DiCaprio\'s current girlfriend"\n}'}})]), observation="According to Us, the 48-year-old actor is now “exclusively” dating Italian model Vittoria Ceretti. A source told Us that DiCaprio is “completely smitten” with Ceretti, and their relationship is “going so well that Leo's actually being exclusive.”"),
           AgentStep(action=AgentActionMessageLog(tool='Search', tool_input='Vittoria Ceretti age', log='\nInvoking: `Search` with `Vittoria Ceretti age`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Search', 'arguments': '{\n  "__arg1": "Vittoria Ceretti age"\n}'}})]), observation='25 years'),
           AgentStep(action=AgentActionMessageLog(tool='Calculator', tool_input='25^0.43', log='\nInvoking: `Calculator` with `25^0.43`\n\n\n', message_log=[AIMessageChunk(content='', additional_kwargs={'function_call': {'name': 'Calculator', 'arguments': '{\n  "__arg1": "25^0.43"\n}'}})]), observation='Answer: 3.991298452658078')]}
```
2023-11-28 08:11:37 +00:00
Michael Feil
686162670e langchain[minor]: Adding infinity embedding integration. (#13928)
This adds integation to https://github.com/michaelfeil/infinity. Users
requested it in https://github.com/michaelfeil/infinity/issues/36
@saatvikshah

Follows my implementation of gradient.ai.

Feedback 1: Well done - I love your CI / repo / poetry setup - I adapted
a lot in https://github.com/michaelfeil/infinity.
Feedback 2: Not so good: The openai integration contains to much reverse
engineering - in general projects such as michaelfeil/infinity and
huggingface/text-embeddings-inference are compatible to the `pip install
openai` package.

Reverse engineering like this one is really hindering the use for me:

8e88ba16a8/libs/langchain/langchain/embeddings/openai.py (L347)

8e88ba16a8/libs/langchain/langchain/embeddings/openai.py (L351)
- it is about preventing 3rd party providers to use the same url + uses
interfaces of openai, that are not publically documented.
2023-11-27 16:43:47 -08:00
Bagatur
10a6e7cbb6 langchain[patch], core[patch]: Make common utils public (#13932)
- rename `langchain_core.chat_models.base._generate_from_stream` -> `generate_from_stream`
- rename `langchain_core.chat_models.base._agenerate_from_stream` -> `agenerate_from_stream`
- export `langchain_core.utils.utils.build_extra_kwargs` from `langchain_core.utils`
2023-11-27 15:34:46 -08:00
Oleksandr Yaremchuk
c0277d06e8 experimental[patch] Update prompt injection model (#13930)
- **Description:** Existing model used for Prompt Injection is quite
outdated but we fine-tuned and open-source a new model based on the same
model deberta-v3-base from Microsoft -
[laiyer/deberta-v3-base-prompt-injection](https://huggingface.co/laiyer/deberta-v3-base-prompt-injection).
It supports more up-to-date injections and less prone to
false-positives.
  - **Dependencies:** No
  - **Tag maintainer:** -
  - **Twitter handle:** @alex_yaremchuk

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-27 17:56:53 -05:00
Bob Lin
e6ebde9688 experimental[patch]: Add experimental.agent imports (#13839)
- **Description:** The experimental package needs to be compatible with
the usage of importing agents

For example, if i use `from langchain.agents import
create_pandas_dataframe_agent`, running the program will prompt the
following information:

```
Traceback (most recent call last):
   File "/Users/dongwm/test/main.py", line 1, in <module>
     from langchain.agents import create_pandas_dataframe_agent
   File "/Users/dongwm/test/venv/lib/python3.11/site-packages/langchain/agents/__init__.py", line 87, in __getattr__
     raise ImportError(
ImportError: create_pandas_dataframe_agent has been moved to langchain experimental. See https://github.com/langchain-ai/langchain/discussions/11680 for more information.
Please update your import statement from: `langchain.agents.create_pandas_dataframe_agent` to `langchain_experimental.agents.create_pandas_dataframe_agent`.
```

But when I changed to `from langchain_experimental.agents import
create_pandas_dataframe_agent`, it was actually wrong:

```python
Traceback (most recent call last):
  File "/Users/dongwm/test/main.py", line 2, in <module>
    from langchain_experimental.agents import create_pandas_dataframe_agent
ImportError: cannot import name 'create_pandas_dataframe_agent' from 'langchain_experimental.agents' (/Users/dongwm/test/venv/lib/python3.11/site-packages/langchain_experimental/agents/__init__.py)
```

I should use `from langchain_experimental.agents.agent_toolkits import
create_pandas_dataframe_agent`. In order to solve the problem and make
it compatible, I added additional import code to the
langchain_experimental package. Now it can be like this Used `from
langchain_experimental.agents import create_pandas_dataframe_agent`

  - **Twitter handle:** [lin_bob57617](https://twitter.com/lin_bob57617)
2023-11-27 14:03:47 -08:00
Tyler Titsworth
afcfa2a5e7 langchain[patch]: Add progress bar option to OllamaEmbeddings (#13882)
- **Description:** Adds a tqdm progress bar to OllamaEmbeddings when
embedding a list.
- **Issue:** Related to #13637, but extended to Ollama.
- **Dependencies:** `tqdm` made a necessary dependency.

Thanks to @ugm2 for helping identify a common problem. Embeddings take a
very long time to finish on local machines, and require a progress bar
to help identify if one should even attempt the workload.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-27 13:56:13 -08:00
Kalyan
ec53d983a1 TEMPLATES Add rag-opensearch template (#13501)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Adding rag-opensearch template.

---------

Signed-off-by: kalyanr <kalyan.ben10@live.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-27 16:21:39 -05:00
Leonid Ganeline
e47b9c5285 DOCS: move adapters to integrations (#13862)
Current docs for adapters are in the `Guides/Adapters which is not a
good place.
- moved Adapters into `Integratons/Components/Adapters/
- simplified the OpenAI adapter notebook
- rerouted the old OpenAI adapter page URL to a new one.
2023-11-27 13:05:43 -08:00
jeremyb-data
cd77fba562 Improvement: Weaviate multitenant adddocs (#13827)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description:** Added a line to pass the tenant parameter to
add_data_object
  - **Issue:** An extra line added from the fix for #9956
  - **Dependencies:** n/a
  - **Tag maintainer:** @baskaryan 

Tested locally, works as expected with the line change.

---------

Co-authored-by: Simon Dai <simon6752@gmail.com>
2023-11-27 12:59:57 -08:00
jiangying
3e30cd8261 NIT: comment typo (#13817) 2023-11-27 12:59:12 -08:00
Manuel Riezebosch
92b07ecaf3 DOCS: fix link to question answering (#13806)
first link in
[overview](https://python.langchain.com/docs/use_cases/question_answering/code_understanding#overview)
2023-11-27 12:56:15 -08:00
Assaf Toledo
ba62ff89cc BUGFIX: Support for elastic indices that don't return 'metadata' in '_source' (#13903)
Description: Some Elastic indexes do not return a 'metadata' field in
'_source'. However, prior to this PR, the code assumed there always is a
'metadata' field. This PR adds support for cases where the field is
missing by adding it manually.

Issue: #13869
2023-11-27 12:52:57 -08:00
Enric Soler Rastrollo
c156d0281a BUGFIX: Use embedding key in azure_cosmos_db index creation (#13919)
Description: Implement embedding key parametrisation
Issue: https://github.com/langchain-ai/langchain/issues/13918
Dependencies: None
Tag maintainer: @hwchase17 @izzymsft
Twitter handle:@MaddogoS
2023-11-27 12:51:08 -08:00
Bagatur
ac67422a3d IMPROVEMENT: import Document from core (#13905) 2023-11-27 12:48:43 -08:00
chyroc
886bc2d50a IMPROVEMENT: fix qianfan validate_environment typo (#13908) 2023-11-27 11:17:27 -08:00
Chengzu Ou
4b8e053fe8 FEATURE: Add Databricks Vector Search as a new vector store (#13621)
**Description:**
This PR adds Databricks Vector Search as a new vector store in
LangChain.

- [x] Add `DatabricksVectorSearch` in `langchain/vectorstores/`
- [x] Unit tests
- [x] Add
[`databricks-vectorsearch`](https://pypi.org/project/databricks-vectorsearch/)
as a new optional dependency

We ran the following checks:
- `make format` passed  
- `make lint` failed but the failures were caused by other files
    + Files touched by this PR passed the linter  
- `make test` passed  
- `make coverage` failed but the failures were caused by other files.
Tests added by or related to this PR all passed
+ langchain/vectorstores/databricks_vector_search.py test coverage 94% 
- `make spell_check` passed  

The example notebook and updates to the [provider's documentation
page](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/providers/databricks.md)
will be added later in a separate PR.

**Dependencies:**
Optional dependency:
[`databricks-vectorsearch`](https://pypi.org/project/databricks-vectorsearch/)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-27 11:07:26 -08:00
Leonid Kuligin
25387db432 BUFIX: add support for various OSS images from Vertex Model Garden (#13917)
- **Description:** add support for various OSS images from Model
Garden
  - **Issue:** #13370
2023-11-27 10:31:53 -08:00
Eugene Yurtsev
e186637921 Document Runnable Binding (#13927)
Document runnable binding
2023-11-27 13:21:27 -05:00
Bagatur
46b3311190 RELEASE: 0.0.341 (#13926) 2023-11-27 09:51:12 -08:00
Nuno Campos
f6b05cacd0 Update root poetry lock with core (#13922)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-27 17:30:44 +00:00
umair mehmood
b3e08f9239 improvement: fix chat prompt loading from config (#13818)
Add loader for loading chat prompt from config file.

fixed: #13667

@efriis 
@baskaryan
2023-11-27 11:39:50 -05:00
Nuno Campos
8a3e0c9afa Add option to prefix config keys in configurable_alts (#13714) 2023-11-27 15:25:17 +00:00
Tomaz Bratanic
4ce5254442 Add Cypher template diagrams (#13913) 2023-11-27 10:18:51 -05:00
Taqi Jaffri
bfc12a4a76 DOCS: Simplified Docugami cookbook to remove code now available in docugami library (#13828)
The cookbook had some code to upload files, and wait for the processing
to finish.

This code is now moved to the `docugami` library so removing from the
cookbook to simplify.

Thanks @rlancemartin for suggesting this when working on evals.

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
2023-11-27 00:07:24 -08:00
ggeutzzang
3749af79ae DOCS: fixed error in the docstring of RunnablePassthrough class (#13843)
This pull request addresses an issue found in the example code within
the docstring of `libs/core/langchain_core/runnables/passthrough.py`

The original code snippet caused a `NameError` due to the missing import
of `RunnableLambda`. The error was as follows:
```
     12     return "completion"
     13 
---> 14 chain = RunnableLambda(fake_llm) | {
     15     'original': RunnablePassthrough(), # Original LLM output
     16     'parsed': lambda text: text[::-1] # Parsing logic

NameError: name 'RunnableLambda' is not defined
```
To resolve this, I have modified the example code to include the
necessary import statement for `RunnableLambda`. Additionally, I have
adjusted the indentation in the code snippet to ensure consistency and
readability.

The modified code now successfully defines and utilizes
`RunnableLambda`, ensuring that users referencing the docstring will
have a functional and clear example to follow.

There are no related GitHub issues for this particular change.

Modified Code:
```python
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.runnables import RunnableLambda

runnable = RunnableParallel(
    origin=RunnablePassthrough(),
    modified=lambda x: x+1
)

runnable.invoke(1) # {'origin': 1, 'modified': 2}

def fake_llm(prompt: str) -> str: # Fake LLM for the example
    return "completion"

chain = RunnableLambda(fake_llm) | {
    'original': RunnablePassthrough(), # Original LLM output
    'parsed': lambda text: text[::-1] # Parsing logic
}

chain.invoke('hello') # {'original': 'completion', 'parsed': 'noitelpmoc'}
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-27 00:06:55 -08:00
Dylan Williams
1983a39894 FEATURE: Add OneNote document loader (#13841)
- **Description:** Added OneNote document loader
  - **Issue:** #12125
  - **Dependencies:** msal

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-26 23:59:52 -08:00
Ikko Eltociear Ashimine
ff7d4d9c0b Update llamacpp.ipynb (#13840)
specifed -> specified
2023-11-26 23:47:19 -08:00
Tomaz Bratanic
1ad65f7a98 BUGFIX: Fix bugs with Cypher validation (#13849)
Fixes https://github.com/langchain-ai/langchain/issues/13803. Thanks to
@sakusaku-rich
2023-11-26 19:30:11 -08:00
Sᴜᴘᴇʀ Lᴇᴇ
e42e95cc11 docs: fix link to local_retrieval_qa (#13872)
\The original link in [this
section](https://python.langchain.com/docs/use_cases/question_answering/#:~:text=locally%2Drunning%20models-,here,-.):

https://python.langchain.com/docs/modules/use_cases/question_answering/local_retrieval_qa

After fix:

https://python.langchain.com/docs/use_cases/question_answering/local_retrieval_qa
2023-11-26 19:16:46 -08:00
Harrison Chase
6a35831128 BUGFIX: export more types (#13886)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-26 19:15:34 -08:00
Yusuf Khan
935f78c944 FEATURE: Add retriever for Outline (#13889)
- **Description:** Added a retriever for the Outline API to ask
questions on knowledge base
  - **Issue:** resolves #11814
  - **Dependencies:** None
  - **Tag maintainer:** @baskaryan
2023-11-26 18:56:12 -08:00
ggeutzzang
f2af82058f DOCS: Fix Sample Code for Compatibility with Pydantic 2.0 (#13890)
- **Description:** 
I encountered an issue while running the existing sample code on the
page https://python.langchain.com/docs/modules/agents/how_to/agent_iter
in an environment with Pydantic 2.0 installed. The following error was
triggered:

```python
ValidationError                           Traceback (most recent call last)
<ipython-input-12-2ffff2c87e76> in <cell line: 43>()
     41 
     42 tools = [
---> 43     Tool(
     44         name="GetPrime",
     45         func=get_prime,

2 frames
/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py in __init__(__pydantic_self__, **data)
    339         values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    340         if validation_error:
--> 341             raise validation_error
    342         try:
    343             object_setattr(__pydantic_self__, '__dict__', values)

ValidationError: 1 validation error for Tool
args_schema
  subclass of BaseModel expected (type=type_error.subclass; expected_class=BaseModel)
```

I have made modifications to the example code to ensure it functions
correctly in environments with Pydantic 2.0.
2023-11-26 18:21:13 -08:00
Harrison Chase
968ba6961f add skeleton of thought (#13883) 2023-11-26 19:31:41 -05:00
Bagatur
0efa59cbb8 RELEASE: 0.0.339rc3 (#13852) 2023-11-25 10:37:30 -08:00
Bagatur
7222c42077 RELEASE: core 0.0.6 (#13853) 2023-11-25 10:21:14 -08:00
raelix
c172605ea6 IMPROVEMENT: Added title metadata to GoogleDriveLoader for optional File Loaders (#13832)
- **Description:** Simple change, I just added title metadata to
GoogleDriveLoader for optional File Loaders
  - **Dependencies:** no dependencies
  - **Tag maintainer:** @hwchase17

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-24 18:53:55 -08:00
Stefano Lottini
19c68c7652 FEATURE: Astra DB, LLM cache classes (exact-match and semantic cache) (#13834)
This PR provides idiomatic implementations for the exact-match and the
semantic LLM caches using Astra DB as backend through the database's
HTTP JSON API. These caches require the `astrapy` library as dependency.

Comes with integration tests and example usage in the `llm_cache.ipynb`
in the docs.

@baskaryan this is the Astra DB counterpart for the Cassandra classes
you merged some time ago, tagging you for your familiarity with the
topic. Thank you!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-24 18:53:37 -08:00
Stefano Lottini
272df9dcae Astra DB, chat message history (#13836)
This PR adds a chat message history component that uses Astra DB for
persistence through the JSON API.
The `astrapy` package is required for this class to work.

I have added tests and a small notebook, and updated the relevant
references in the other docs pages.

(@rlancemartin this is the counterpart of the Cassandra equivalent class
you so helpfully reviewed back at the end of June)

Thank you!
2023-11-24 18:12:29 -08:00
Bagatur
58f7e109ac BUGFIX: Add import types and typevars from core (#13829) 2023-11-24 17:04:10 -08:00
Bagatur
751226e067 bump 0.0.339rc2 (#13787) 2023-11-23 12:50:09 -08:00
Bagatur
300ff01824 RELEASE: core 0.0.5 (#13786) 2023-11-23 12:23:50 -08:00
Bagatur
bcf83988ec Revert "INFRA: temp rm master condition (#13753)" (#13759) 2023-11-22 17:22:07 -08:00
Bagatur
df471b0c0b INFRA: temp rm master condition (#13753) 2023-11-22 16:59:50 -08:00
Bagatur
72c108b003 IMPROVEMENT: filter global warnings properly (#13754) 2023-11-22 16:26:37 -08:00
William FH
163bf165ed Add Batch Size kwarg to the llm start callback (#13483)
So you can more easily use the token counts directly from the API
endpoint for batch size of 1
2023-11-22 14:47:57 -08:00
Bagatur
23566cbea9 DOCS: core editable dep api refs (#13747) 2023-11-22 14:33:30 -08:00
Bagatur
0be515f720 RELEASE: 0.0.339rc1 (#13746) 2023-11-22 14:29:49 -08:00
Bagatur
2bc5bd67f7 RELEASE: core 0.0.4 (#13745) 2023-11-22 13:57:28 -08:00
Bagatur
b6b7654f7f INFRA: run LC ci after core changes (#13742) 2023-11-22 13:38:48 -08:00
Bagatur
3d28c1a9e0 DOCS: fix core api ref build (#13744) 2023-11-22 15:42:35 -05:00
Bagatur
32d087fcb8 REFACTOR: combine core documents files (#13733) 2023-11-22 10:10:26 -08:00
h3l
14d4fb98fc DOCS: Fix typo/line break in python code (#13708) 2023-11-22 09:10:07 -08:00
William FH
5b90fe5b1c Fix locking (#13725) 2023-11-22 07:37:25 -08:00
Bagatur
16af282429 BUGFIX: add prompt imports for backwards compat (#13702) 2023-11-21 23:04:20 -08:00
Erick Friis
78da34153e TEMPLATES Metadata (#13691)
Co-authored-by: Lance Martin <lance@langchain.dev>
2023-11-22 01:41:12 -05:00
Bagatur
e327bb4ba4 IMPROVEMENT: Conditionally import core type hints (#13700) 2023-11-21 21:38:49 -08:00
dandanwei
d47ee1ae79 BUGFIX: redis vector store overwrites falsey metadata (#13652)
- **Description:** This commit fixed the problem that Redis vector store
will change the value of a metadata from 0 to empty when saving the
document, which should be an un-intended behavior.
  - **Issue:** N/A
  - **Dependencies:** N/A
2023-11-21 20:16:23 -08:00
Bagatur
a21e84faf7 BUGFIX: llm backwards compat imports (#13698) 2023-11-21 20:12:35 -08:00
Yujie Qian
ace9e64d62 IMPROVEMENT: VoyageEmbeddings embed_general_texts (#13620)
- **Description:** add method embed_general_texts in VoyageEmebddings to
support input_type
  - **Issue:** 
  - **Dependencies:** 
  - **Tag maintainer:** 
  - **Twitter handle:** @Voyage_AI_
2023-11-21 18:33:07 -08:00
tanujtiwari-at
5064890fcf BUGFIX: handle tool message type when converting to string (#13626)
**Description:** Currently, if we pass in a ToolMessage back to the
chain, it crashes with error

`Got unsupported message type: `

This fixes it. 

Tested locally

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-21 18:20:58 -08:00
Josep Pon Farreny
143049c90f Added partial_variables to BaseStringMessagePromptTemplate.from_template(...) (#13645)
**Description:** BaseStringMessagePromptTemplate.from_template was
passing the value of partial_variables into cls(...) via **kwargs,
rather than passing it to PromptTemplate.from_template. Which resulted
in those *partial_variables being* lost and becoming required
*input_variables*.

Co-authored-by: Josep Pon Farreny <josep.pon-farreny@siemens.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-21 17:48:38 -08:00
Erick Friis
c5ae9f832d INFRA: Lint for imports (#13632)
- Adds pydantic/import linting to core
- Adds a check for `langchain_experimental` imports to langchain
2023-11-21 17:42:56 -08:00
Erick Friis
131db4ba68 BUGFIX: anthropic models on bedrock (#13629)
Introduced in #13403
2023-11-21 17:40:29 -08:00
David Ruan
04bddbaba4 BUGFIX: Update bedrock.py to fix provider bug (#13646)
Provider check was incorrectly failing for anything other than "meta"
2023-11-21 17:28:38 -08:00
Guangya Liu
aec8715073 DOCS: remove openai api key from cookbook (#13633) 2023-11-21 17:25:06 -08:00
Guangya Liu
bb18b0266e DOCS: fixed import error for BashOutputParser (#13680) 2023-11-21 16:33:40 -08:00
Bagatur
dc53523837 IMPROVEMENT: bump core dep 0.0.3 (#13690) 2023-11-21 15:50:19 -08:00
Bagatur
a208abe6b7 add callback import test (#13689) 2023-11-21 15:28:49 -08:00
Bagatur
083afba697 BUG: Add core utils imports (#13688) 2023-11-21 15:25:47 -08:00
Bagatur
c61e30632e BUG: more core fixes (#13665)
Fix some circular deps:
- move PromptValue into top level module bc both PromptTemplates and
OutputParsers import
- move tracer context vars to `tracers.context` and import them in
functions in `callbacks.manager`
- add core import tests
2023-11-21 15:15:48 -08:00
William FH
59df16ab92 Update name (#13676) 2023-11-21 13:39:30 -08:00
Erick Friis
bfb980b968 CLI 0.0.19 (#13677) 2023-11-21 12:34:38 -08:00
Taqi Jaffri
d65c36d60a docugami cookbook (#13183)
Adds a cookbook for semi-structured RAG via Docugami. This follows the
same outline as the semi-structured RAG with Unstructured cookbook:
https://github.com/langchain-ai/langchain/blob/master/cookbook/Semi_Structured_RAG.ipynb

The main change is this cookbook uses Docugami instead of Unstructured
to find text and tables, and shows how XML markup in the output helps
with retrieval and generation.

We are \@docugami on twitter, I am \@tjaffri

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
2023-11-21 12:02:20 -08:00
jakerachleff
249c796785 update langserve to v0.0.30 (#13673)
Upgrade langserve template version to 0.0.30 to include new improvements
2023-11-21 11:17:47 -08:00
jakerachleff
c6937a2eb4 fix templates dockerfile (#13672)
- **Description:** We need to update the Dockerfile for templates to
also copy your README.md. This is because poetry requires that a readme
exists if it is specified in the pyproject.toml
2023-11-21 11:09:55 -08:00
Bagatur
11614700a4 bump 0.0.339rc0 (#13664) 2023-11-21 08:41:59 -08:00
Bagatur
d32e511826 REFACTOR: Refactor langchain_core (#13627)
Changes:
- remove langchain_core/schema since no clear distinction b/n schema and
non-schema modules
- make every module that doesn't end in -y plural
- where easy have 1-2 classes per file
- no more than one level of nesting in directories
- only import from top level core modules in langchain
2023-11-21 08:35:29 -08:00
William FH
17c6551c18 Add error rate (#13568)
To the in-memory outputs. Separate it out from the outputs so it's
present in the dataframe.describe() results
2023-11-21 07:51:30 -08:00
Nuno Campos
8329f81072 Use pytest asyncio auto mode (#13643)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-21 15:00:13 +00:00
Lance Martin
611e1e0ca4 Add template for gpt-crawler (#13625)
Template for RAG using
[gpt-crawler](https://github.com/BuilderIO/gpt-crawler).

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-20 21:32:57 -08:00
Bagatur
99b4f46cbe REFACTOR: Add core as dep (#13623) 2023-11-20 14:38:10 -08:00
Harrison Chase
d82cbf5e76 Separate out langchain_core package (#13577)
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-20 13:09:30 -08:00
Bagatur
4eec47b191 DOCS: update rag use case images (#13615) 2023-11-20 10:14:52 -08:00
Bagatur
e620347a83 RELEASE: bump 339 (#13613) 2023-11-20 09:56:43 -08:00
Ofer Mendelevitch
52e23e50b1 BUG: Fix search_kwargs in Vectara retriever (#13299)
- **Description:** fix a bug that prevented as_retriever() in Vectara to
use the desired input arguments
  - **Issue:** as_retriever did not pass the arguments properly
  - **Tag maintainer:** @baskaryan
  - **Twitter handle:** @ofermend
2023-11-20 09:44:43 -08:00
Holt Skinner
1c08dbfb33 IMPROVEMENT: Reduce post-processing time for DocAIParser (#13210)
- Remove `WrappedDocument` introduced in
https://github.com/langchain-ai/langchain/pull/11413
- https://github.com/googleapis/python-documentai-toolbox/issues/198 in
Document AI Toolbox to improve initialization time for `WrappedDocument`
object.

@lkuligin

@baskaryan

@hwchase17
2023-11-20 09:41:44 -08:00
Leonid Kuligin
f3fcdea574 fixed an UnboundLocalError when no documents are found (#12995)
Replace this entire comment with:
  - **Description:** fixed a bug
  - **Issue:** the issue # #12780
2023-11-20 09:41:14 -08:00
Stijn Tratsaert
b6f70d776b VertexAI LLM count_tokens method requires list of prompts (#13451)
I encountered this during summarization with VertexAI. I was receiving
an INVALID_ARGUMENT error, as it was trying to send a list of about
17000 single characters.

The [count_tokens
method](https://github.com/googleapis/python-aiplatform/blob/main/vertexai/language_models/_language_models.py#L658)
made available by Google takes in a list of prompts. It does not fail
for small texts, but it does for longer documents because the argument
list will be exceeding Googles allowed limit. Enforcing the list type
makes it work successfully.

This change will cast the input text to count to a list of that single
text so that the input format is always correct.

[Twitter](https://www.x.com/stijn_tratsaert)
2023-11-20 09:40:48 -08:00
Wang Wei
fe7b40cb2a feat: add ERNIE-Bot-4 Function Calling (#13320)
- **Description:** ERNIE-Bot-Chat-4 Large Language Model adds the
ability of `Function Calling` by passing parameters through the
`functions` parameter in the request. To simplify function calling for
ERNIE-Bot-Chat-4, the `create_ernie_fn_chain()` function has been added.
The definition and usage of the `create_ernie_fn_chain()` function is
similar to that of the `create_openai_fn_chain()` function.

Examples as the follows:

```
import json

from langchain.chains.ernie_functions import (
    create_ernie_fn_chain,
)
from langchain.chat_models import ErnieBotChat
from langchain.prompts import ChatPromptTemplate

def get_current_news(location: str) -> str:
    """Get the current news based on the location.'

    Args:
        location (str): The location to query.
    
    Returs:
        str: Current news based on the location.
    """

    news_info = {
        "location": location,
        "news": [
            "I have a Book.",
            "It's a nice day, today."
        ]
    }

    return json.dumps(news_info)

def get_current_weather(location: str, unit: str="celsius") -> str:
    """Get the current weather in a given location

    Args:
        location (str): location of the weather.
        unit (str): unit of the tempuature.
    
    Returns:
        str: weather in the given location.
    """

    weather_info = {
        "location": location,
        "temperature": "27",
        "unit": unit,
        "forecast": ["sunny", "windy"],
    }
    return json.dumps(weather_info)

llm = ErnieBotChat(model_name="ERNIE-Bot-4")
prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{query}"),
    ]
)

chain = create_ernie_fn_chain([get_current_weather, get_current_news], llm, prompt, verbose=True)
res = chain.run("北京今天的新闻是什么?")
print(res)
```

The running results of the above program are shown below:
```
> Entering new LLMChain chain...
Prompt after formatting:
Human: 北京今天的新闻是什么?



> Finished chain.
{'name': 'get_current_news', 'thoughts': '用户想要知道北京今天的新闻。我可以使用get_current_news工具来获取这些信息。', 'arguments': {'location': '北京'}}
```
2023-11-19 22:36:12 -08:00
Adilkhan Sarsen
10418ab0c1 DeepLake Backwards compatibility fix (#13388)
- **Description:** during search with DeepLake some people are facing
backwards compatibility issues, this PR fixes it by making search
accessible for the older datasets

---------

Co-authored-by: adolkhan <adilkhan.sarsen@alumni.nu.edu.kz>
2023-11-19 21:46:01 -08:00
Tyler Hutcherson
190952fe76 IMPROVEMENT: Minor redis improvements (#13381)
- **Description:**
- Fixes a `key_prefix` bug where passing it in on
`Redis.from_existing(...)` did not work properly. Updates doc strings
accordingly.
- Updates Redis filter classes logic with best practices on typing,
string formatting, and handling "empty" filters.
- Fixes a bug that would prevent multiple tag filters from being applied
together in some scenarios.
- Added a whole new filter unit testing module. Also updated code
formatting for a number of modules that were failing the `make`
commands.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Tag maintainer:** @baskaryan 
  - **Twitter handle:** @tchutch94
2023-11-19 19:15:45 -08:00
Sijun He
674bd90a47 DOCS: Fix typo in MongoDB memory docs (#13588)
- **Description:** Fix typo in MongoDB memory docs
  - **Tag maintainer:** @eyurtsev

<!-- Thank you for contributing to LangChain!

  - **Description:** Fix typo in MongoDB memory docs
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
  - **Tag maintainer:** @baskaryan
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-19 19:13:35 -08:00
Sergey Kozlov
df03267edf Fix tool arguments formatting in StructuredChatAgent (#10480)
In the `FORMAT_INSTRUCTIONS` template, 4 curly braces (escaping) are
used to get single curly brace after formatting:

```
"{{{ ... }}}}" -> format_instructions.format() ->  "{{ ... }}" -> template.format() -> "{ ... }".
```

Tool's `args_schema` string contains single braces `{ ... }`, and is
also transformed to `{{{{ ... }}}}` form. But this is not really correct
since there is only one `format()` call:

```
"{{{{ ... }}}}" -> template.format() -> "{{ ... }}".
```

As a result we get double curly braces in the prompt:
````
Respond to the human as helpfully and accurately as possible. You have access to the following tools:

foo: Test tool FOO, args: {{'tool_input': {{'type': 'string'}}}}    # <--- !!!
...
Provide only ONE action per $JSON_BLOB, as shown:

```
{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}
```
````

This PR fixes curly braces escaping in the `args_schema` to have single
braces in the final prompt:
````
Respond to the human as helpfully and accurately as possible. You have access to the following tools:

foo: Test tool FOO, args: {'tool_input': {'type': 'string'}}    # <--- !!!
...
Provide only ONE action per $JSON_BLOB, as shown:

```
{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}
```
````

---------

Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>
2023-11-19 18:45:43 -08:00
Wouter Durnez
ef7802b325 Add llama2-13b-chat-v1 support to chat_models.BedrockChat (#13403)
Hi 👋 We are working with Llama2 on Bedrock, and would like to add it to
Langchain. We saw a [pull
request](https://github.com/langchain-ai/langchain/pull/13322) to add it
to the `llm.Bedrock` class, but since it concerns a chat model, we would
like to add it to `BedrockChat` as well.

- **Description:** Add support for Llama2 to `BedrockChat` in
`chat_models`
- **Issue:** the issue # it fixes (if applicable)
[#13316](https://github.com/langchain-ai/langchain/issues/13316)
  - **Dependencies:** any dependencies required for this change `None`
  - **Tag maintainer:** /
  - **Twitter handle:** `@SimonBockaert @WouterDurnez`

---------

Co-authored-by: wouter.durnez <wouter.durnez@showpad.com>
Co-authored-by: Simon Bockaert <simon.bockaert@showpad.com>
2023-11-19 18:44:58 -08:00
jwbeck97
a93616e972 FEAT: Add azure cognitive health tool (#13448)
- **Description:** This change adds an agent to the Azure Cognitive
Services toolkit for identifying healthcare entities
  - **Dependencies:** azure-ai-textanalytics (Optional)

---------

Co-authored-by: James Beck <James.Beck@sa.gov.au>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-19 18:44:01 -08:00
Massimiliano Pronesti
6bf9b2cb51 BUG: Limit Azure OpenAI embeddings chunk size (#13425)
Hi! 
This short PR aims at:
* Fixing `OpenAIEmbeddings`' check on `chunk_size` when used with Azure
OpenAI (thus with openai < 1.0). Azure OpenAI embeddings support at most
16 chunks per batch, I believe we are supposed to take the min between
the passed value/default value and 16, not the max - which, I suppose,
was introduced by accident while refactoring the previous version of
this check from this other PR of mine: #10707
* Porting this fix to the newest class (`AzureOpenAIEmbeddings`) for
openai >= 1.0

This fixes #13539 (closed but the issue persists).  

@baskaryan @hwchase17
2023-11-19 18:34:51 -08:00
Zeyang Lin
e53f59f01a DOCS: doc-string - langchain.vectorstores.dashvector.DashVector (#13502)
- **Description:** There are several mistakes in the sample code in the
doc-string of `DashVector` class, and this pull request aims to correct
them.
The correction code has been tested against latest version (at the time
of creation of this pull request) of: `langchain==0.0.336`
`dashvector==1.0.6` .
- **Issue:** No issue is created for this.
- **Dependencies:** No dependency is required for this change,
<!-- - **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below), -->
- **Twitter handle:** `zeyanglin`

<!-- Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
2023-11-19 18:24:05 -08:00
John Mai
16f7912e1b BUG: fix hunyuan appid type (#13496)
- **Description: fix hunyuan appid type
- **Issue:
https://github.com/langchain-ai/langchain/pull/12022#issuecomment-1815627855
2023-11-19 18:23:45 -08:00
Leonid Ganeline
43972be632 docs updating AzureML notebooks (#13492)
- Added/updated descriptions and links

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-19 18:07:12 -08:00
Nicolò Boschi
8362bd729b AstraDB: use includeSimilarity option instead of $similarity (#13512)
- **Description:** AstraDB is going to deprecate the `$similarity`
projection property in favor of the ´includeSimilarity´ option flag. I
moved all the queries to the new format.
- **Tag maintainer:** @hemidactylus 
- **Twitter handle:** nicoloboschi
2023-11-19 17:54:35 -08:00
shumpei
7100d586ef Introduce search_kwargs for Custom Parameters in BingSearchAPIWrapper (#13525)
Added a `search_kwargs` field to BingSearchAPIWrapper in
`bing_search.py,` enabling users to include extra keyword arguments in
Bing search queries. This update, like specifying language preferences,
adds more customization to searches. The `search_kwargs` seamlessly
merge with standard parameters in `_bing_search_results` method.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-19 17:51:02 -08:00
Nicolò Boschi
ad0c3b9479 Fix Astra integration tests (#13520)
- **Description:** Fix Astra integration tests that are failing. The
`delete` always return True as the deletion is successful if no errors
are thrown. I aligned the test to verify this behaviour
  - **Tag maintainer:** @hemidactylus 
  - **Twitter handle:** nicoloboschi

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-19 17:50:49 -08:00
umair mehmood
69d39e2173 fix: VLLMOpenAI -- create() got an unexpected keyword argument 'api_key' (#13517)
The issue was accuring because of `openai` update in Completions. its
not accepting `api_key` and 'api_base' args.

The fix is we check for the openai version and if ats v1 then remove
these keys from args before passing them to `Compilation.create(...)`
when sending from `VLLMOpenAI`

Fixed: #13507 

@eyu
@efriis 
@hwchase17

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-19 17:49:55 -08:00
Manuel Alemán Cueto
6bc08266e0 Fix for oracle schema parsing stated on the issue #7928 (#13545)
- **Description:** In this pull request, we address an issue related to
assigning a schema to the SQLDatabase class when utilizing an Oracle
database. The current implementation encounters a bug where, upon
attempting to execute a query, the alter session parse is not
appropriately defined for Oracle, leading to an error,
  - **Issue:** #7928,
  - **Dependencies:** No dependencies,
  - **Tag maintainer:** @baskaryan,

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-19 17:35:27 -08:00
Andrew Teeter
325bdac673 feat: load all namespaces (#13549)
- **Description:** This change allows for the `MWDumpLoader` to load all
namespaces including custom by default instead of only loading the
[default
namespaces](https://www.mediawiki.org/wiki/Help:Namespaces#Localisation).
  - **Tag maintainer:** @hwchase17
2023-11-19 17:35:17 -08:00
Taranjeet Singh
47451764a7 Add embedchain retriever (#13553)
**Description:**

This commit adds embedchain retriever along with tests and docs.
Embedchain is a RAG framework to create data pipelines.

**Twitter handle:**
- [Taranjeet's twitter](https://twitter.com/taranjeetio) and
[Embedchain's twitter](https://twitter.com/embedchain)

**Reviewer**
@hwchase17

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-19 17:35:03 -08:00
rafly lesmana
420a17542d fix: Make YoutubeLoader support on demand language translation (#13583)
**Description:**
Enhance the functionality of YoutubeLoader to enable the translation of
available transcripts by refining the existing logic.

**Issue:**
Encountering a problem with YoutubeLoader (#13523) where the translation
feature is not functioning as expected.

Tag maintainers/contributors who might be interested:
@eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-19 17:34:48 -08:00
Leonid Ganeline
cc50e023d1 DOCS langchain decorators update (#13535)
added disclaimer

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2023-11-19 17:30:05 -08:00
Brace Sproul
02a13030c0 DOCS: updated langchain stack img to be svg (#13540) 2023-11-19 16:26:53 -08:00
Bagatur
78a1f4b264 bump 338, exp 42 (#13564) 2023-11-18 15:12:07 -08:00
Bagatur
790ed8be69 update multi index templates (#13569) 2023-11-18 14:42:22 -08:00
Harrison Chase
f4c0e3cc15 move streaming stdout (#13559) 2023-11-18 12:24:49 -05:00
Leonid Ganeline
43dad6cb91 BUG fixed openai_assistant namespace (#13543)
BUG: langchain.agents.openai_assistant has a reference as
`from langchain_experimental.openai_assistant.base import
OpenAIAssistantRunnable`
should be 
`from langchain.agents.openai_assistant.base import
OpenAIAssistantRunnable`

This prevents building of the API Reference docs
2023-11-17 17:15:33 -08:00
Bassem Yacoube
ff382b7b1b IMPROVEMENT Adds support for new OctoAI endpoints (#13521)
small fix to add support for new OctoAI LLM endpoints
2023-11-17 17:15:21 -08:00
Mark Silverberg
cda1b33270 Fix typo/line break in the middle of a word (#13314)
- **Description:** a simple typo/extra line break fix
  - **Dependencies:** none
2023-11-17 16:43:42 -08:00
William FH
cac849ae86 Use random seed (#13544)
For default eval llm
2023-11-17 16:33:31 -08:00
Martin Krasser
79ed66f870 EXPERIMENTAL Generic LLM wrapper to support chat model interface with configurable chat prompt format (#8295)
## Update 2023-09-08

This PR now supports further models in addition to Lllama-2 chat models.
See [this comment](#issuecomment-1668988543) for further details. The
title of this PR has been updated accordingly.

## Original PR description

This PR adds a generic `Llama2Chat` model, a wrapper for LLMs able to
serve Llama-2 chat models (like `LlamaCPP`,
`HuggingFaceTextGenInference`, ...). It implements `BaseChatModel`,
converts a list of chat messages into the [required Llama-2 chat prompt
format](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) and
forwards the formatted prompt as `str` to the wrapped `LLM`. Usage
example:

```python
# uses a locally hosted Llama2 chat model
llm = HuggingFaceTextGenInference(
    inference_server_url="http://127.0.0.1:8080/",
    max_new_tokens=512,
    top_k=50,
    temperature=0.1,
    repetition_penalty=1.03,
)

# Wrap llm to support Llama2 chat prompt format.
# Resulting model is a chat model
model = Llama2Chat(llm=llm)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    MessagesPlaceholder(variable_name="chat_history"),
    HumanMessagePromptTemplate.from_template("{text}"),
]

prompt = ChatPromptTemplate.from_messages(messages)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
chain = LLMChain(llm=model, prompt=prompt, memory=memory)

# use chat model in a conversation
# ...
```

Also part of this PR are tests and a demo notebook.

- Tag maintainer: @hwchase17
- Twitter handle: `@mrt1nz`

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-17 16:32:13 -08:00
William FH
c56faa6ef1 Add execution time (#13542)
And warn instead of raising an error, since the chain API is too
inconsistent.
2023-11-17 16:04:16 -08:00
pedro-inf-custodio
0fb5f857f9 IMPROVEMENT WebResearchRetriever error handling in urls with connection error (#13401)
- **Description:** Added a method `fetch_valid_documents` to
`WebResearchRetriever` class that will test the connection for every url
in `new_urls` and remove those that raise a `ConnectionError`.
- **Issue:** [Previous
PR](https://github.com/langchain-ai/langchain/pull/13353),
  - **Dependencies:** None,
  - **Tag maintainer:** @efriis 

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
2023-11-17 14:02:26 -08:00
Piyush Jain
d2335d0114 IMPROVEMENT Neptune graph updates (#13491)
## Description
This PR adds an option to allow unsigned requests to the Neptune
database when using the `NeptuneGraph` class.

```python
graph = NeptuneGraph(
    host='<my-cluster>',
    port=8182,
    sign=False
)
```

Also, added is an option in the `NeptuneOpenCypherQAChain` to provide
additional domain instructions to the graph query generation prompt.
This will be injected in the prompt as-is, so you should include any
provider specific tags, for example `<instructions>` or `<INSTR>`.

```python
chain = NeptuneOpenCypherQAChain.from_llm(
    llm=llm,
    graph=graph,
    extra_instructions="""
    Follow these instructions to build the query:
    1. Countries contain airports, not the other way around
    2. Use the airport code for identifying airports
    """
)
```
2023-11-17 13:49:31 -08:00
William FH
5a28dc3210 Override Keys Option (#13537)
Should be able to override the global key if you want to evaluate
different outputs in a single run
2023-11-17 13:32:43 -08:00
Bagatur
e584b28c54 bump 337 (#13534) 2023-11-17 12:50:52 -08:00
Wietse Venema
e80b53ff4f TEMPLATE Add VertexAI Chuck Norris template (#13531)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-17 12:27:52 -08:00
Bagatur
2e2114d2d0 FEATURE: Runnable with message history (#13418)
Add RunnableWithMessageHistory class that can wrap certain runnables and manages chat history for them.
2023-11-17 12:00:01 -08:00
Bagatur
0fc3af8932 IMPROVEMENT: update assistants output and doc (#13480) 2023-11-17 11:58:54 -08:00
Bagatur
b4312aac5c TEMPLATES: Add multi-index templates (#13490)
One that routes and one that fuses

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-17 02:00:11 -08:00
Hugues Chocart
35e04f204b [LLMonitorCallbackHandler] Various improvements (#13151)
Small improvements for the llmonitor callback handler, like better
support for non-openai models.


---------

Co-authored-by: vincelwt <vince@lyser.io>
2023-11-16 23:39:36 -08:00
Noah Stapp
c1b041c188 Add Wrapping Library Metadata to MongoDB vector store (#13084)
**Description**
MongoDB drivers are used in various flavors and languages. Making sure
we exercise our due diligence in identifying the "origin" of the library
calls makes it best to understand how our Atlas servers get accessed.
2023-11-16 22:20:04 -08:00
Leonid Ganeline
21552628c8 DOCS updated data_connection index page (#13426)
- the `Index` section was missed. Created it.
- text simplification

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-16 18:16:50 -08:00
Guy Korland
7f8fd70ac4 Add optional arguments to FalkorDBGraph constructor (#13459)
**Description:** Add optional arguments to FalkorDBGraph constructor
**Tag maintainer:** baskaryan 
**Twitter handle:** @g_korland
2023-11-16 18:15:40 -08:00
Leonid Ganeline
e3a5cd7969 docs integrations/vectorstores/ cleanup (#13487)
- updated titles to consistent format
- added/updated descriptions and links
- format heading
2023-11-16 17:51:49 -08:00
Leonid Ganeline
1d2981114f DOCS updated async-faiss example (#13434)
The original notebook has the `faiss` title which is duplicated in
the`faiss.jpynb`. As a result, we have two `faiss` items in the
vectorstore ToC. And the first item breaks the searching order (it is
placed between `A...` items).
- I updated title to `Asynchronous Faiss`.
2023-11-16 17:41:26 -08:00
Erick Friis
9dfad613c2 IMPROVEMENT Allow openai v1 in all templates that require it (#13489)
- pyproject change
- lockfiles
2023-11-16 17:10:08 -08:00
chris stucchio
d7f014cd89 Bug: OpenAIFunctionsAgentOutputParser doesn't handle functions with no args (#13467)
**Description/Issue:** 
When OpenAI calls a function with no args, the args are `""` rather than
`"{}"`. Then `json.loads("")` blows up. This PR handles it correctly.

**Dependencies:** None
2023-11-16 16:47:05 -08:00
Yujie Qian
41a433fa33 IMPROVEMENT: add input_type to VoyageEmbeddings (#13488)
- **Description:** add input_type to VoyageEmbeddings
2023-11-16 16:35:36 -08:00
David Duong
ea6e017b85 Add serialisation arguments to Bedrock and ChatBedrock (#13465) 2023-11-17 01:33:24 +01:00
Erick Friis
427331d621 IMPROVEMENT Lock pydantic v1 in app template, cli 0.0.18 (#13485) 2023-11-16 15:22:11 -08:00
Erick Friis
75363f048f BUG Fix app_name in cli app new (#13482) 2023-11-16 14:19:35 -08:00
Leonid Ganeline
9ff8f69e75 DOCS updated memory Titles (#13435)
- Fixed titles for two notebooks. They were inconsistent with other
titles and clogged ToC.
- Added `Upstash` description and link
- Moved the authentication text up in the `Elasticsearch` nb, right
after package installation. It was on the end of the page which was a
wrong place.
2023-11-16 13:24:05 -08:00
ifduyue
324ab382ad Use List instead of list (#13443)
Unify List usages in libs/langchain/langchain/text_splitter.py, only one
place it's `list`, all other ocurrences are `List`
2023-11-16 13:15:58 -08:00
Stefano Lottini
b029d9f4e6 Astra DB: minor improvements to docstrings and demo notebook (#13449)
This PR brings a few minor improvements to the docs, namely class/method
docstrings and the demo notebook.

- A note on how to control concurrency levels to tune performance in
bulk inserts, both in the class docstring and the demo notebook;
- Slightly increased concurrency defaults after careful experimentation
(still on the conservative side even for clients running on
less-than-typical network/hardware specs)
- renamed the DB token variable to the standardized
`ASTRA_DB_APPLICATION_TOKEN` name (used elsewhere, e.g. in the Astra DB
docs)
- added a note and a reference (add_text docstring, demo notebook) on
allowed metadata field names.

Thank you!
2023-11-16 12:48:32 -08:00
Eugene Yurtsev
1e43fd6afe Add ahandle_event to _all_ (#13469)
Add ahandle_event for backwards compatibility as it is used by langserve
2023-11-16 12:46:20 -08:00
Leonid Ganeline
283ef1f66d DOCS fix for integratons/document_loaders sidebar (#13471)
The current `integrations/document_loaders/` sidebar has the
`example_data` item, which is a menu with a single item: "Notebook".
It is happening because the `integrations/document_loaders/` folder has
the `example_data/notebook.md` file that is used to autogenerate the
above menu item.
- removed an example_data/notebook.md file. Docusaurus doesn't have
simple ways to fix this problem (to exclude folders/files from an
autogenerated sidebar). Removing this file didn't break any existing
examples, so this fix is safe.
2023-11-16 12:02:30 -08:00
Leonid Ganeline
b1fcf5b481 DOCS: integrations/text_embeddings/ cleanup (#13476)
Updated several notebooks:
- fixed titles which are inconsistent or break the ToC sorting order.
- added missed soruce descriptions and links
- fixed formatting
2023-11-16 11:56:53 -08:00
Bagatur
6030ab9779 Update chain of note README.md (#13473) 2023-11-16 10:47:27 -08:00
Lance Martin
cf66a4737d Update multi-modal RAG cookbook (#13429)
Use example
[blog](https://cloudedjudgement.substack.com/p/clouded-judgement-111023)
w/ tables, charts as images.
2023-11-16 10:34:13 -08:00
Bagatur
10fddac4b5 Bagatur/chain of note template(#13470) 2023-11-16 10:34:04 -08:00
Leonid Ganeline
d5b1a21ae4 DOCS updated semadb example (#13431)
- the `SemaDB` notebook was placed in additional subfolder which breaks
the vectorstore ToC. I moved file up, removed this unnecessary
subfolder; updated the `vercel.json` with rerouting for the new URL
- Added SemaDB description and link
- improved text consistency
2023-11-16 09:57:22 -08:00
Leonid Ganeline
17c2007e0c DOCS updated Activeloop DeepMemory notebook (#13428)
- Fixed the title of the notebook. It created an ugly ToC element as
`Activeloop DeepLake's DeepMemory + LangChain + ragas or how to get +27%
on RAG recall.`
- Added Activeloop description
- improved consistency in text
- fixed ToC (it was using HTML tagas that break left-side in-page ToC).
Now in-page ToC works
2023-11-16 09:56:28 -08:00
Harrison Chase
f90249305a callback refactor (#13372)
Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-11-16 08:25:09 -08:00
Bagatur
9e6748e198 DOCS: rag nit (#13436) 2023-11-15 18:06:52 -08:00
Leonid Ganeline
8a52c1456b updated clickup example (#13424)
- Fixed headers (was more then 1 Titles)
- Removed security token value. It was OK to have it, because it is
temporary token, but the automatic security swippers raise warnings on
that.
- Added `ClickUp` service description and link.
2023-11-15 15:11:24 -08:00
Brace Sproul
79fa9a81f4 Fix a link in docs (#13423) 2023-11-15 15:02:26 -08:00
Nuno Campos
a632f61f3d IMPROVEMENT pirate-speak-configurable alternatives env vars (#13395)
…rnative LLMs until used

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-15 14:38:03 -08:00
Bagatur
f0bb839506 DOCS: langchain stack img update (#13421) 2023-11-15 14:10:02 -08:00
Bagatur
a9b2c943e6 bump 336, exp 44 (#13420) 2023-11-15 14:08:34 -08:00
Bagatur
1372296dc8 FIX: Infer runnable agent single or multi action (#13412) 2023-11-15 13:58:14 -08:00
Eugene Yurtsev
accadccf8e Use secretstr for api keys for javelin-ai-gateway (#13417)
- Make javelin_ai_gateway_api_key a SecretStr

---------

Co-authored-by: Hiroshi Tashiro <hiroshitash@gmail.com>
2023-11-15 16:12:05 -05:00
William FH
ba501b27a0 Fix Runnable Lambda Afunc Repr (#13413)
Otherwise, you get an error when using async functions.


h/t to Chris Ruppelt
2023-11-15 16:11:42 -05:00
Sumukh Sridhara
1726d5dcdd Merge pull request #13232
* PGVector needs to close its connection if its garbage collected
2023-11-15 15:34:37 -05:00
Nuno Campos
85a77d2c27 IMPROVEMENT Passthrough kwargs in runnable lambda (#13405)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-15 11:45:16 -08:00
Bagatur
76c317ed78 DOCS: update rag use case (#13319) 2023-11-15 10:54:15 -08:00
Bagatur
a0b39a4325 DOCS: install nit (#13380) 2023-11-15 10:27:00 -08:00
Clay Elmore
8823e3831f FEAT Bedrock cohere embedding support (#13366)
- **Description:** adding cohere embedding support to bedrock embedding
class
  - **Issue:** N/A
  - **Dependencies:** None
  - **Tag maintainer:** @3coins 
  - **Twitter handle:** celmore25

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-15 10:19:12 -08:00
Bagatur
9f543634e2 Agent window management how to (#13033) 2023-11-15 09:38:02 -08:00
Nuno Campos
d5aeff706a Make it easier to subclass RunnableEach (#13346)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-15 13:12:57 +00:00
Erick Friis
bed06a4f4a IMPROVEMENT research-assistant configurable report type (#13312)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-14 21:04:57 -08:00
竹内謙太
3b5e8bacfa FEAT Add some properties to NotionDBLoader (#13358)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

fix #13356

Add supports following properties for metadata to NotionDBLoader.

- `checkbox`
- `email`
- `number`
- `select`

There are no relevant tests for this code to be updated.
2023-11-14 20:31:12 -08:00
Leonid Ganeline
c9b9359647 FEAT docs integration cards site (#13379)
The `Integrations` site is hidden now.
I've added it into the `More` menu.
The name is `Integration Cards` otherwise, it is confused with the
`Integrations` menu.

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2023-11-14 19:49:17 -08:00
Erick Friis
0f25ea9671 api doc newlines (#13378)
cc @leo-gan 

Deploying at
https://api.python.langchain.com/en/erick-api-doc-newlines-/api_reference.html
(will take a bit)
2023-11-14 19:16:31 -08:00
Fielding Johnston
37eb44c591 BUG Add limit_to_domains to APIChain based tools (#13367)
- **Description:** Adds `limit_to_domains` param to the APIChain based
tools (open_meteo, TMDB, podcast_docs, and news_api)
- **Issue:** I didn't open an issue, but after upgrading to 0.0.328
using these tools would throw an error.
  - **Dependencies:** N/A
  - **Tag maintainer:** @baskaryan 
  
  
**Note**: I included the trailing / simply because the docs here did
fc886cc303/docs/docs/use_cases/apis.ipynb (L246)
, but I checked the code and it is using `urlparse`. SoI followed the
docs since it comes down to stylee.
2023-11-14 19:07:16 -08:00
Predrag Gruevski
91443cacdb Update templates/rag-self-query with newer dependencies without CVEs. (#13362)
The `langchain` repo was being flagged for using vulnerable
dependencies, some of which were in this template's lockfile. Updating
to newer versions should fix that.
2023-11-14 19:06:18 -08:00
Predrag Gruevski
ac7e88fbbe Update rag-timescale-conversation to dependencies without CVEs. (#13364)
Just `poetry lock` and moving `langchain` to the latest version, in case
folks copy this template.

This resolves some vulnerable dependency alerts GitHub code scanning was
flagging.
2023-11-14 19:05:12 -08:00
Leonid Ganeline
342ed5c77a Yi model from 01.ai , example (#13375)
Added an example with new soa `Yi` model to `HuggingFace-hub` notebook
2023-11-14 17:10:53 -08:00
Bagatur
38180ad25f bump openai support (#13262) 2023-11-14 16:50:23 -08:00
Erick Friis
9545f0666d fix cli release (#13373)
My thought is that the ==version would prevent pip from finding the
package on regular [pypi.org](http://pypi.org/), so it would look at
[test.pypi.org](http://test.pypi.org/) for that. Otherwise it'll pull
package from [pypi.org](http://pypi.org/) (e.g. sub deps)

Right now, the cli release is failing because it's going to
test.pypi.org by default, so it finds this incorrect FASTAPI package
instead of the real one: https://test.pypi.org/project/FASTAPI/
2023-11-14 15:08:35 -08:00
Erick Friis
7c3066f9ec more cli interactivity, bugfix (#13360) 2023-11-14 14:49:43 -08:00
Bagatur
3596be5210 DOCS: format notebooks (#13371) 2023-11-14 14:17:44 -08:00
Predrag Gruevski
d63d4994c0 Bump all libraries to the latest ruff version. (#13350)
This version of `ruff` is the one we'll be using to lint the docs and
cookbooks (#12677), so I'm making it used everywhere else too.
2023-11-14 16:00:21 -05:00
Predrag Gruevski
2ebd167dba Lint Python notebooks with ruff. (#12677)
The new ruff version fixed the blocking bugs, and I was able to fairly
easily us to a passing state: ruff fixed some issues on its own, I fixed
a handful by hand, and I added a list of narrowly-targeted exclusions
for files that are currently failing ruff rules that we probably should
look into eventually.

I went pretty lenient on the docs / cookbooks rules, allowing dead code
and such things. Perhaps in the future we may want to tighten the rules
further, but this is already a good set of checks that found real issues
and will prevent them going forward.
2023-11-14 15:58:22 -05:00
Massimiliano Pronesti
344cab0739 IMPROVEMENT: support Openai API v1 for Azure OpenAI completions (#13231)
Hi,
this PR adds support for OpenAI API v1 for Azure OpenAI completion API.
@baskaryan @hwchase17

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-14 12:10:18 -08:00
dependabot[bot]
fc886cc303 Bump pyarrow from 13.0.0 to 14.0.1 in /libs/langchain (#13363)
Bumps [pyarrow](https://github.com/apache/arrow) from 13.0.0 to 14.0.1.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ba53748361"><code>ba53748</code></a>
MINOR: [Release] Update versions for 14.0.1</li>
<li><a
href="529f3768fa"><code>529f376</code></a>
MINOR: [Release] Update .deb/.rpm changelogs for 14.0.1</li>
<li><a
href="b84bbcac64"><code>b84bbca</code></a>
MINOR: [Release] Update CHANGELOG.md for 14.0.1</li>
<li><a
href="f141709763"><code>f141709</code></a>
<a
href="https://redirect.github.com/apache/arrow/issues/38607">GH-38607</a>:
[Python] Disable PyExtensionType autoload (<a
href="https://redirect.github.com/apache/arrow/issues/38608">#38608</a>)</li>
<li><a
href="5a37e74198"><code>5a37e74</code></a>
<a
href="https://redirect.github.com/apache/arrow/issues/38431">GH-38431</a>:
[Python][CI] Update fs.type_name checks for s3fs tests (<a
href="https://redirect.github.com/apache/arrow/issues/38455">#38455</a>)</li>
<li><a
href="2dcee3f82c"><code>2dcee3f</code></a>
MINOR: [Release] Update versions for 14.0.0</li>
<li><a
href="297428cbf2"><code>297428c</code></a>
MINOR: [Release] Update .deb/.rpm changelogs for 14.0.0</li>
<li><a
href="3e9734f883"><code>3e9734f</code></a>
MINOR: [Release] Update CHANGELOG.md for 14.0.0</li>
<li><a
href="9f90995c8c"><code>9f90995</code></a>
<a
href="https://redirect.github.com/apache/arrow/issues/38332">GH-38332</a>:
[CI][Release] Resolve symlinks in RAT lint (<a
href="https://redirect.github.com/apache/arrow/issues/38337">#38337</a>)</li>
<li><a
href="bd61239a32"><code>bd61239</code></a>
<a
href="https://redirect.github.com/apache/arrow/issues/35531">GH-35531</a>:
[Python] C Data Interface PyCapsule Protocol (<a
href="https://redirect.github.com/apache/arrow/issues/37797">#37797</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/apache/arrow/compare/go/v13.0.0...go/v14.0.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pyarrow&package-manager=pip&previous-version=13.0.0&new-version=14.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/langchain-ai/langchain/network/alerts).

</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>
2023-11-14 14:23:52 -05:00
Leonid Ganeline
f5bf3bdf14 added Cookbooks link (#13078)
It is a temporary solution before major documents refactoring.
Related to #13070 (not solving it)
2023-11-14 10:52:47 -08:00
Erick Friis
c0e6045c0b cli 0.0.17 (#13359) 2023-11-14 09:56:18 -08:00
Erick Friis
927824b7cb CLI interactivity (#13148)
Will implement more later
2023-11-14 09:53:29 -08:00
billytrend-cohere
2f6fe6ddf3 Fix latest message index (#13355)
There is a bug which caused the earliest message rather than the latest
message being sent
2023-11-14 09:23:25 -08:00
Manuel Soria
58f5a4d30a Pgvector template (#13267)
Including pvector template, adapting what is covered in the
[cookbook](https://github.com/langchain-ai/langchain/blob/master/cookbook/retrieval_in_sql.ipynb).

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-14 07:47:48 -08:00
Harrison Chase
be854225c7 add more reasonable arxiv retriever (#13327) 2023-11-13 20:54:14 -08:00
Harrison Chase
4b7a85887e arxiv retrieval agent improvement (#13329) 2023-11-13 20:54:03 -08:00
Krish Dholakia
5a920e14c0 fix litellm openai imports (#13307) 2023-11-13 17:55:10 -08:00
Bagatur
1c67db4c18 Move OAI assistants to langchain and add callbacks (#13236) 2023-11-13 17:42:07 -08:00
Bagatur
8006919e52 DOCS: cleanup docs directory (#13301) 2023-11-13 17:38:45 -08:00
Bagatur
c3f94f4c12 Update main readme (#13298) 2023-11-13 17:37:54 -08:00
Harrison Chase
5f60439221 add retrieval agent (#13317) 2023-11-13 17:22:39 -08:00
Harrison Chase
2ff30b50f2 FEATURE gpt researcher template (#13062)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-13 15:52:25 -08:00
Erick Friis
280ecfd8eb IMPROVEMENT redirect root to docs in langserve app template (#13303) 2023-11-13 15:51:41 -08:00
wemysschen
a591cdb67d add cookbook for RAG with baidu QIANFAN and elasticsearch (#13287)
**Description:** 
Add cookbook for RAG with baidu QIANFAN and elasticsearch.

Co-authored-by: wemysschen <root@icoding-cwx.bcc-szzj.baidu.com>
2023-11-13 14:45:24 -08:00
mertkayhan
9b4974871d IMPROVEMENT Increase flexibility of ElasticVectorSearch (#6863)
Hey @rlancemartin, @eyurtsev ,

I did some minimal changes to the `ElasticVectorSearch` client so that
it plays better with existing ES indices.

Main changes are as follows:

1. You can pass the dense vector field name into `_default_script_query`
2. You can pass a custom script query implementation and the respective
parameters to `similarity_search_with_score`
3. You can pass functions for building page content and metadata for the
resulting `Document`

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  4. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-11-13 14:36:03 -08:00
Lance Martin
39852dffd2 Cookbook for multi-modal RAG eval (#13272) 2023-11-13 14:26:02 -08:00
Erick Friis
50a5c919f0 IMPROVEMENT self-query template (#13305)
- [ ]
https://github.com/langchain-ai/langchain/pull/12694#discussion_r1391334719
-> keep date
- [x]
https://github.com/langchain-ai/langchain/pull/12694#discussion_r1391336586
2023-11-13 14:03:15 -08:00
Yasin
b46f88d364 IMPROVEMENT add license file to subproject (#8403)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

hi!
This is pretty straight-forward: The sdist package does not contain the
license file (which is needed by e.g. conda) because the package is
built from the subdir and can't see the license.
I _copied_ the license but since I'm unfamiliar with the projects
direction, I'm not sure that's correct.
thanks!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-13 11:48:21 -08:00
Rui Ramos
ff19a62afc Fix Pinecone cosine relevance score (#8920)
Fixes: #8207

Description:
Pinecone returns scores (not distances) with cosine similarity. The
values according to the docs are [-1, 1], although I could never
reproduce negative values.

This PR ensures that the score returned from Pinecone is preserved,
rather than inverted, so the most relevant documents can be filtered (eg
when using similarity thresholds)

I'll leave this as a draft PR as I couldn't run the tests (my pinecone
account might not be enough - some errors were being thrown around
namespaces) so hopefully someone who _can_ will pick this up.

Maintainers:
@rlancemartin, @eyurtsev

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-13 11:47:38 -08:00
Bagatur
2e42ed5de6 Self-query template (#12694)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-13 11:44:19 -08:00
Konstantin Spieß
1e43025bf5 Fix serialization issue in Matching Engine Vector Store (#13266)
- **Description:** Fixed a serialization issue in the add_texts method
of the Matching Engine Vector Store caused by a typo, leading to an
attempt to serialize the json module itself.
  - **Issue:** #12154 
  - **Dependencies:** ./.
  - **Tag maintainer:**
2023-11-13 11:04:11 -08:00
William FH
9169d77cf6 Update error message in evaluation runner (#13296) 2023-11-13 11:03:20 -08:00
Leonie
32c493e3df Refine Weaviate docs and add RAG example (#13057)
- **Description:** Refine Weaviate tutorial and add an example for
Retrieval-Augmented Generation (RAG)
  - **Issue:** (not applicable),
  - **Dependencies:** none
  - **Tag maintainer:** @baskaryan <!--
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
  - **Twitter handle:** @helloiamleonie

Co-authored-by: Leonie <leonie@Leonies-MBP-2.fritz.box>
2023-11-13 10:59:19 -08:00
takatost
f22f273f93 FIX: 'from_texts' method in Weaviate with non-existent kwargs param (#11604)
Due to the possibility of external inputs including UUIDs, there may be
additional values in **kwargs, while Weaviate's `__init__` method does
not support passing extra **kwarg parameters.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-13 10:32:20 -08:00
Frank995
971d2b2e34 Add missing filter to max_marginal_relevance_search inner call to max_marginal_relevance_search_by_vector (#13260)
When calling max_marginal_relevance_search from PGVector the filter
param is not carried over to max_marginal_relevance_search_by_vector

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-13 10:31:34 -08:00
chevalmuscle
3ad78e48e2 Use endpoint_url if provided with boto3 session for dynamodb (#11622)
- **Description:** Uses `endpoint_url` if provided with a boto3 session.
When running dynamodb locally, credentials are required even if invalid.
With this change, it will be possible to pass a boto3 session with
credentials and specify an endpoint_url

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-13 10:31:16 -08:00
Erick Friis
18acc22f29 Ollama pass kwargs as options instead of top (#13280)
Noticed params are really in `options` instead while reviewing #12895
2023-11-13 10:28:47 -08:00
刘 方瑞
46af56dc4f Add MyScaleWithoutJSON which allows user to wrap columns into Document's Metadata (#13164)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
Replace this entire comment with:
- **Description:** Add MyScaleWithoutJSON which allows user to wrap
columns into Document's Metadata
  - **Tag maintainer:** @baskaryan
2023-11-13 10:10:36 -08:00
Michael Landis
2aa13f1e10 chore: bump momento dependency version and refactor search hit usage (#13111)
**Description**

Bumps the Momento dependency to the latest version and refactors the
usage of `SearchHit` in the Momento Vector Index (MVI) vector store
integration. This change is a one liner where we use the preferred
attribute `score` to read the query-document similarity instead of
`distance`. The latest versions of Momento clients will use this
attribute going forward.

**Dependencies**

Updated the Momento dependency to latest version.

**Tests**

💚 I re-ran the existing MVI integration tests
(`tests/integration_tests/vectorstores/test_momento_vector_index.py`)
and they pass.

**Review**
cc @baskaryan @eyurtsev
2023-11-13 09:12:21 -08:00
Junlin Zhou
4da2faba41 docs: align custom_tool document headers (#13252)
On the [Defining Custom
Tools](https://python.langchain.com/docs/modules/agents/tools/custom_tools)
page, there's a 'Subclassing the BaseTool class' paragraph under the
'Completely New Tools - String Input and Output' header. Also there's
another 'Subclassing the BaseTool' paragraph under no header, which I
think may belong to the 'Custom Structured Tools' header.

Another thing is, there's a 'Using the tool decorator' and a 'Using the
decorator' paragraph, I think should belong to 'Completely New Tools -
String Input and Output' and 'Custom Structured Tools' separately.

This PR moves those paragraphs to corresponding headers.
2023-11-13 09:03:56 -08:00
Ikko Eltociear Ashimine
700293cae9 Fix typo in timescalevector.ipynb (#13239)
enviornment -> environment
2023-11-13 09:03:07 -08:00
kYLe
cc55d2fcee Add OpenAI API v1 support for ChatAnyscale and fixed a bug with openai_api_key (#13237)
1. Add OpenAI API v1 support
2. Fixed a bug to call `get_secret_value` on a str value
(values["openai_api_key"])
2023-11-13 09:01:54 -08:00
juan-calvo-datatonic
545b76b0fd Add rag google vertex ai search template (#13294)
- **Description:** This is a template demonstrating how to utilize
Google Vertex AI Search in conjunction with ChatVertexAI()
2023-11-13 08:45:36 -08:00
Govind.S.B
9024593468 added system prompt and template fields to ollama (#13022)
**Description**
the ollama api now supports passing system prompt and template directly
instead of modifying the model file , but the ollama integration in
langchain did not have this change updated . The update just adds these
two parameters to it ( there are 2 more parameters that are pending to
be updated, I was not sure about their utility wrt to langchain )
Refer :
8713ac23a8

**Issue** : None Applicable

**Dependencies** : None Changed

**Twitter handle** : https://twitter.com/violetto96

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-11-13 08:45:11 -08:00
langchain-infra
f55f67055f Add dockerfile template (#13240) 2023-11-13 10:33:01 -05:00
Shaurya Rohatgi
f70aa82c84 Update README.md - Added notebook for extraction_openai_tools (#13205)
added Parallel Function Calling for Structured Data Extraction notebook

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-13 00:12:46 -08:00
Guillem Orellana Trullols
0f31cd8b49 Remove _get_kwarg_value function (#13184)
`_get_kwarg_value` function is useless, one can rely on python builtin
functionalities to do the exact same thing.

- **Description:** Removed `_get_kwarg_value`. Helps with code
readability.
  - **Issue:** the issue # it fixes (if applicable),
  - **Twitter handle:** @Guillem_96
2023-11-13 00:09:54 -08:00
SuperDa Fu
e1c020dfe1 dalle add model parameter (#13201)
- **Description:** dalle_image_generator adding a new model parameter,
  - **Issue:** N/A,
  - **Dependencies:** 
  - **Tag maintainer: @hwchase17
  - **Twitter handle:**

---------

Co-authored-by: dafu <xiangbingze@wenru.wang>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
2023-11-13 00:09:20 -08:00
Mario Angst
96b56a4d4f Typo fix to quickstart.mdx (#13178)
- **Description:** I fixed a very small typo in the quickstart docs
(BaeMessage -> BaseMessage)
2023-11-13 00:02:18 -08:00
Dennis de Greef
64e11592bb Improve CSV reader which can't call .strip() on NoneType (#13079)
Improve CSV reader which can't call .strip() on NoneType if there are
less cells in the row compared to the header

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** 
I have a CSV file as followed

```
headerA,headerB,headerC
v1A,v1B,v1C,
v2A,v2B
v3A,v3B,v3C
```
In this case, row 2 is missing a value, which results in reading a None
type. The strip() method can not be called on None, hence raising. In
this PR I am making the change to only call strip if the value if not
None.

  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-12 23:51:39 -08:00
glad4enkonm
339973db47 Update ollama.py (#12895)
duplicate option removed
**Description:**  An issue fix, http stop option duplicate removed.
**Issue:** the issue #12892 fix
**Dependencies:** no
**Tag maintainer:** @eyurtsev

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-12 23:43:59 -08:00
刘 方瑞
e89e830c55 Free knowledge base pod information update (#12813)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

We updated MyScale free knowledge base, where you can try your RAG with
36 million paragraphs from wikipedia and 2 million paragraphs from
ArXiv.

The pod has two tables
```sql
CREATE TABLE default.ChatArXiv (
    `abstract` String, 
    `id` String, 
    `vector` Array(Float32), 
    `metadata` Object('JSON'), 
    `pubdate` DateTime,
    `title` String,
    `categories` Array(String),
    `authors` Array(String), 
    `comment` String,
    `primary_category` String,
    VECTOR INDEX vec_idx vector TYPE MSTG('metric_type=Cosine'), 
    CONSTRAINT vec_len CHECK length(vector) = 768) 
ENGINE = ReplacingMergeTree ORDER BY id;

CREATE TABLE wiki.Wikipedia (
    `id` String, 
    `title` String, 
    `text` String,
    `url` String,
    `wiki_id` UInt64,
    `views` Float32,
    `paragraph_id` UInt64,
    `langs` UInt32, 
    `emb` Array(Float32), 
    VECTOR INDEX emb_idx emb TYPE MSTG('metric_type=Cosine'), 
    CONSTRAINT emb_len CHECK length(emb) = 768) 
ENGINE = ReplacingMergeTree ORDER BY id;
```

You can connect those two tables using credentials below (just the same
to the old one)
URL: `msc-4a9e710a.us-east-1.aws.staging.myscale.cloud`
Port: `443`
Username: `chatdata`
Password: `myscale_rocks`

It's FREE and you can also use it with 
ChatData: https://github.com/myscale/ChatData
Retrieval-QA-Benchmark:
https://github.com/myscale/Retrieval-QA-Benchmark
... and also LangChain!

Request for review @baskaryan
2023-11-12 23:22:42 -08:00
Luis Valencia
c40973814d Update README.md (#8570)
- Description: updated readme.
  - Tag maintainer: @baskaryan
  - Twitter handle: @Levalencia

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2023-11-12 22:07:49 -08:00
Isak Nyberg
8f81703d76 Add new models to openai callback (#13244)
**Description:** Adding the new models to the openai callback function,
info taken from [model
announcement](https://platform.openai.com/docs/models) and
[pricing](https://openai.com/pricing)

A short description for a short PR :)
2023-11-12 12:01:19 -08:00
Bagatur
ea6dd3a550 bump 335 (#13261) 2023-11-12 11:30:25 -08:00
William FH
a837b03e55 Update langsmith version 0.63 (#13208) 2023-11-12 11:29:25 -08:00
Harrison Chase
7f1d26160d update tools (#13243) 2023-11-12 10:22:54 -08:00
Nuno Campos
8d6faf5665 Make it easier to subclass runnable binding with custom init args (#13189) 2023-11-11 09:01:17 +00:00
Peter Vandenabeele
7f1964b264 Fix BeautifulSoupTransformer: no more duplicates and correct order of tags + tests (#12596) 2023-11-11 08:56:37 +00:00
Bagatur
937d7c41f3 update stack diagram (#13213) 2023-11-10 16:50:20 -08:00
Erick Friis
9c7afa8adb Upgrade cohere embedding model to v3 (#13219)
Just updates API docs, doesn't change default param from 2.0 (could be
breaking change)
2023-11-10 16:25:58 -08:00
Matvey Arye
180657ca7a Add template for conversational rag with timescale vector (#13041)
**Description:** This is like the rag-conversation template in many
ways. What's different is:
- support for a timescale vector store.
- support for time-based filters.
- support for metadata filters.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-10 16:12:32 -08:00
Andrew Zhou
1a1a1a883f fleet_context docs update (#13221)
- **Description:** Changed the fleet_context documentation to use
`context.download_embeddings()` from the latest release from our
package. More details here:
https://github.com/fleet-ai/context/tree/main#api
  - **Issue:** n/a
  - **Dependencies:** n/a
  - **Tag maintainer:** @baskaryan 
  - **Twitter handle:** @andrewthezhou
2023-11-10 14:53:57 -08:00
Erick Friis
8fdf15c023 Fix Document Loader Unit Test - Docusaurus (#13228) 2023-11-10 14:52:01 -08:00
Lee
72ad448daa feat: Docusaurus Loader (#9138)
Added a Docusaurus Loader

Issue: #6353

I had to implement this for working with the Ionic documentation, and
wanted to open this up as a draft to get some guidance on building this
out further. I wasn't sure if having it be a light extension of the
SitemapLoader was in the spirit of a proper feature for the library --
but I'm grateful for the opportunities Langchain has given me and I'd
love to build this out properly for the sake of the community.

Any feedback welcome!
2023-11-10 14:21:55 -08:00
VAS
8fa960641a Update Documentation: Corrected Typos and Improved Clarity (#11725)
Docs updates

---------

Co-authored-by: Advaya <126754021+bluevayes@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-10 14:14:44 -08:00
Leonid Ganeline
e165daa0ae new course on DeepLearning.ai (#12755)
Added a new course on
[DeepLearning.ai](https://learn.deeplearning.ai/functions-tools-agents-langchain)
Added the LangChain `Wikipedia` link. Probably, it can be placed in the
"More" menu.
2023-11-10 13:55:27 -08:00
Erick Friis
93ae589f1b Add mongo parent template to index (#13222) 2023-11-10 11:56:44 -08:00
Tomaz Bratanic
0dc4ab0be1 Neo4j chat message history (#13008) 2023-11-10 11:53:34 -08:00
Bagatur
bf8cf7e042 Bagatur/langserve blurb (#13217) 2023-11-10 14:05:43 -05:00
fyasla
d266b3ea4a issue #12165 mask API key in chat_models/azureml_endpoint module (#12836)
- **Description:** `AzureMLChatOnlineEndpoint` object from
langchain/chat_models/azureml_endpoint.py safe to print
without having any secrets included in raw format in the string
representation.
  - **Issue:** #12165,
  - **Tag maintainer:** @eyurtsev

---------

Co-authored-by: Faysal Bougamale <faysal.bougamale@horiba.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-10 14:05:19 -05:00
Anush
52f34de9b7 feat: FastEmbed embedding provider (#13109)
## Description:
This PR intends to add
[Qdrant/FastEmbed](https://qdrant.github.io/fastembed/) as a local
embeddings provider, associated tests and documentation.

**Documentation preview:**
https://langchain-git-fork-anush008-master-langchain.vercel.app/docs/integrations/text_embedding/fastembed

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-11-10 13:51:52 -05:00
Eugene Yurtsev
b0e8cbe0b3 Add RunnableSequence documentation (#13094)
Add RunnableSequence documentation
2023-11-10 13:44:43 -05:00
Eugene Yurtsev
869df62736 Document RunnableWithFallbacks (#13088)
Add documentation to RunnableWithFallbacks
2023-11-10 13:16:21 -05:00
Eugene Yurtsev
8313c218da Add more runnable documentation (#13083)
- Adding documentation to the runnable.
- Documentation is not organized in the best way for the runnable; i.e.,
in
terms of LCEL vs. other standard methods, will follow up with more
edits.
2023-11-10 13:14:57 -05:00
Erick Friis
a26105de8e vectara rag mq (#13214)
Description: another Vectara template for MultiQuery RAG flow
Twitter handle: @ofermend

Fixes to #13106

---------

Co-authored-by: Ofer Mendelevitch <ofer@vectara.com>
Co-authored-by: Ofer Mendelevitch <ofermend@gmail.com>
2023-11-10 10:08:45 -08:00
Bagatur
24386e0860 bump 334, exp 40 (#13211) 2023-11-10 09:43:29 -08:00
Lance Martin
d2e50b3108 Add Chroma multimodal cookbook (#12952)
Pending:
* https://github.com/chroma-core/chroma/pull/1294
* https://github.com/chroma-core/chroma/pull/1293

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-10 09:43:10 -08:00
The1Bill
55912868da Update toolkit.py to remove single quotes around table names (#12445)
**Description:** Removing the single quote wrapper around the table
names in the SQL agent toolkit.py file as it misleads the LLM into
querying against tables with single quotes around their names.
**Issue:** #7457 
**Dependencies:** None
**Tag maintainer:** @hwchase17 
**Twitter handle:** None
2023-11-10 06:39:15 -08:00
Nuno Campos
362a446999 Changes to root listener (#12174)
- Implement config_specs to include session_id
- Remove Runnable method and update notebook
- Add more details to notebook, eg. show input schema and config schema
before and after adding message history

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-10 09:53:48 +00:00
Nuno Campos
b2b94424db Update return type for Runnable.__or__ (#12880)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-10 09:52:38 +00:00
Bagatur
dd7959f4ac template readme's in docs (#13152) 2023-11-09 23:36:21 -08:00
Bagatur
86b93b5810 Add serve to quickstart (#13174) 2023-11-09 23:10:26 -08:00
Bagatur
fbf7047468 Bagatur/update agent docs (#13167) 2023-11-09 21:14:30 -08:00
Harrison Chase
0a2b1c7471 improve duck duck go tool (#13165) 2023-11-09 20:49:39 -08:00
Bagatur
850336bcf1 Update model i/o docs (#13160) 2023-11-09 20:35:55 -08:00
Jacob Lee
cf271784fa Add basic critique revise template (#12688)
@baskaryan @hwchase17

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-09 17:33:29 -08:00
Cweili
ee3ceb0fb8 Document: Fix "Biadu" typo (#12985)
Fix document "Baidu Cloud ElasticSearch VectorSearch" `Biadu` typo.
2023-11-09 17:32:38 -08:00
Chenyu Zhao
defd4b4f11 Clean up Fireworks provider documentation (#13157) 2023-11-09 16:35:05 -08:00
Bagatur
d9e493e96c fix module sidebar (#13158) 2023-11-09 16:31:45 -08:00
wemysschen
e76ff63125 fix baiducloud_vector_search document typo (#12976)
**Issue:**
fix baiducloud_vector_search document typo

---------

Co-authored-by: wemysschen <root@icoding-cwx.bcc-szzj.baidu.com>
2023-11-09 16:27:04 -08:00
Holt Skinner
fceae456b9 fix: Updates to formatting in Google Drive Retriever docs (#13015)
- Minor updates to formatting to make easier to read
2023-11-09 16:15:55 -08:00
Bagatur
c63eb9d797 LCEL nits (#13155) 2023-11-09 16:09:33 -08:00
Shinya Maeda
28cc60b347 Fix langchain.llms OpenAI completion doesn't work due to v1 client update (#13099)
This commit fixes the issue that langchain.llms OpenAI completion
stopped working since the V1 openai client update.

Replace this entire comment with:
- **Description:** This PR fixes the issue [AttributeError: module
'openai' has no attribute
'Completion'](https://github.com/langchain-ai/langchain/issues/12967)
similar to
8e0cb2eb84
and https://github.com/langchain-ai/langchain/pull/12969,
  - **Issue:** https://github.com/langchain-ai/langchain/issues/12967,
  - **Dependencies:** `openai` v1.x.x client,
  - **Tag maintainer:** @baskaryan,
  - **Twitter handle:** @dosuken123 

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-09 15:12:19 -08:00
Bagatur
555ce600ef Bagatur/docs serve context (#13150) 2023-11-09 15:05:18 -08:00
Bagatur
ff43cd6701 OpenAI remove httpx typing (#13154)
Addresses #13124
2023-11-09 14:32:09 -08:00
Erick Friis
8ad3b255dc Pirate Speak Configurable Template (#13153) 2023-11-09 22:13:45 +00:00
Bagatur
eb51150557 update oai tool agent doc (#13147) 2023-11-09 12:37:30 -08:00
Bagatur
b298f550fe update modules sidebar (#13141) 2023-11-09 11:57:09 -08:00
Bagatur
84e65533e9 Docs: combine LCEL index and why (#13142) 2023-11-09 11:16:45 -08:00
Bagatur
1311450646 fix langsmith links (#13144) 2023-11-09 11:12:50 -08:00
Bagatur
8b2a82b5ce Bagatur/docs smith context (#13139) 2023-11-09 10:22:49 -08:00
Erick Friis
58da6e0d47 Multimodal rag traces (#13140) 2023-11-09 09:54:00 -08:00
Bagatur
150d58304d update oai cookbooks (#13135) 2023-11-09 08:04:51 -08:00
Bagatur
f04cc4b7e1 bump 333 (#13131) 2023-11-09 07:33:15 -08:00
billytrend-cohere
b346d4a455 Add message to documents (#12552)
This adds the response message as a document to the rag retriever so
users can choose to use this. Also drops document limit.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-09 07:30:48 -08:00
Harrison Chase
5f38770161 Support oai tool call (#13110)
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-11-09 07:29:29 -08:00
Stefano Lottini
c52725bdc5 (Astra DB/Cassandra) Minor clarification about dependencies in the demo notebook (#13118)
This PR helps developers trying the Astra DB / Cassandra vector store
quickstart notebook by making it clear what other dependencies are
required.
2023-11-09 09:19:15 -05:00
Holt Skinner
0fc8fd12bd feat: Vertex AI Search - Add Snippet Retrieval for Non-Advanced Website Data Stores (#13020)
https://cloud.google.com/generative-ai-app-builder/docs/snippets#snippets

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-11-08 21:52:50 -05:00
Erick Friis
3dbaaf59b2 Tool Retrieval Template (#13104)
Adds a template like
https://python.langchain.com/docs/modules/agents/how_to/custom_agent_with_tool_retrieval

Uses OpenAI functions, LCEL, and FAISS
2023-11-08 18:33:31 -08:00
Jacob Lee
76283e9625 Adds embeddings filter option to return scores in state (#12489)
CC @baskaryan @assafelovic
2023-11-08 17:50:06 -08:00
jakerachleff
18601bd4c8 Get project from langchain sdk (#13100)
## Description
We need to centralize the API we use to get the project name for our
tracers. This PR makes it so we always get this from a shared function
in the langsmith sdk.

## Dependencies
Upgraded langsmith from 0.52 to 0.62 to include the new API
`get_tracer_project`
2023-11-08 17:10:12 -08:00
Bagatur
72e12f6bcf update more azure docs (#13093) 2023-11-08 14:11:16 -08:00
Bagatur
1703f132c6 update azure embedding docs (#13091) 2023-11-08 13:39:31 -08:00
Bagatur
9fdfac22c2 bump 332 (#13089) 2023-11-08 13:23:16 -08:00
Bagatur
1f85ec34d5 bump 331rc3 exp 39 (#13086) 2023-11-08 13:00:13 -08:00
Anton Troynikov
9f077270c8 Don't pass EF to chroma (#13085)
- **Description:** 

Recently Chroma rolled out a breaking change on the way we handle
embedding functions, in order to support multi-modal collections.

This broke the way LangChain's `Chroma` objects get created, because we
were passing the EF down into the Chroma collection:
https://docs.trychroma.com/migration#migration-to-0416---november-7-2023

However, internally, we are never actually using embeddings on the
chroma collection - LangChain's `Chroma` object calls it instead. Thus
we just don't pass an `embedding_function` to Chroma itself, which fixes
the issue.
2023-11-08 12:55:35 -08:00
Erick Friis
f15f8e01cf Azure OpenAI Embeddings (#13039)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-08 12:37:17 -08:00
David Peterson
37561d8986 Add Proper Import Error (#13042)
- **Description:** The issue was not listing the proper import error for
amazon textract loader.
- **Issue:** Time wasted trying to figure out what to install...
(langchain docs don't list the dependency either)
  - **Dependencies:** N/A
  - **Tag maintainer:** @sbusso 
  - **Twitter handle:** @h9ste

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-08 10:29:08 -08:00
Eugene Yurtsev
06c503f672 Add RunnableRetry Documentation (#13074) 2023-11-08 18:20:18 +00:00
Bagatur
55aeff6777 oai assistant multiple actions (#13068) 2023-11-08 08:25:37 -08:00
Erick Friis
a9b70baef9 cli updates, 0.0.16 (#13034)
- confirm flags, serve detection
- 0.0.16
- always gen code
- pip bool
2023-11-08 07:47:30 -08:00
Bagatur
1f27104626 Fleet context (#13038)
cc @adrwz
2023-11-07 18:57:09 -08:00
Bagatur
d26fd6f0d1 redirect langsmith walkthrough (#13040) 2023-11-07 18:24:13 -08:00
Erick Friis
6f45532620 Upgrade docs postcss (#13031) 2023-11-07 15:50:25 -08:00
Erick Friis
54ad3cc2b8 template versions again (#13030)
- scipy was locked due to py version
- same guardrails-output-parser
- rag-redis
2023-11-07 15:15:18 -08:00
Erick Friis
506f81563f Update Deps in Experimental (#13029) 2023-11-07 15:15:09 -08:00
Erick Friis
db4b97d590 Relock Templates (#13028) 2023-11-07 15:01:49 -08:00
Stefano Lottini
4f4b020582 Add "Astra DB" vector store integration (#12966)
# Astra DB Vector store integration

- **Description:** This PR adds a `VectorStore` implementation for
DataStax Astra DB using its HTTP API
  - **Issue:** (no related issue)
- **Dependencies:** A new required dependency is `astrapy` (`>=0.5.3`)
which was added to pyptoject.toml, optional, as per guidelines
- **Tag maintainer:** I recently mentioned to @baskaryan this
integration was coming
  - **Twitter handle:** `@rsprrs` if you want to mention me

This PR introduces the `AstraDB` vector store class, extensive
integration test coverage, a reworking of the documentation which
conflates Cassandra and Astra DB on a single "provider" page and a new,
completely reworked vector-store example notebook (common to the
Cassandra store, since parts of the flow is shared by the two APIs). I
also took care in ensuring docs (and redirects therein) are behaving
correctly.

All style, linting, typechecks and tests pass as far as the `AstraDB`
integration is concerned.

I could build the documentation and check it all right (but ran into
trouble with the `api_docs_build` makefile target which I could not
verify: `Error: Unable to import module
'plan_and_execute.agent_executor' with error: No module named
'langchain_experimental'` was the first of many similar errors)

Thank you for a review!
Stefano

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-07 14:45:33 -08:00
Tomaz Bratanic
13bd83bd61 Add neo4j vector memory template (#12993) 2023-11-07 13:00:49 -08:00
Bagatur
5ac2fc5bb2 update stack diagram (#13021) 2023-11-07 12:59:24 -08:00
Yang, Bo
600caff03c Add Memorize tool (#11722)
- **Description:** Add `Memorize` tool
  - **Tag maintainer:** @hwchase17

This PR added a new tool `Memorize` so that an agent can use it to
fine-tune itself. This tool requires `TrainableLLM` introduced in #11721

DEMO:
6a9003d5db

![image](https://github.com/langchain-ai/langchain/assets/601530/d6f0cb45-54df-4dcf-b143-f8aefb1e76e3)
2023-11-07 12:42:10 -08:00
Bagatur
cf481c9418 bump exp 38 (#13016) 2023-11-07 11:49:23 -08:00
Bagatur
57e19989f6 Bagatur/oai assistant (#13010) 2023-11-07 11:44:53 -08:00
Erick Friis
74134dd7e1 cli pyproject updating (#12945)
`langchain app add` and `langchain app remove` will now keep the
dependencies list updated.

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-11-07 11:06:08 -08:00
Tomaz Bratanic
d9abcf1aae Neo4j conversation cypher template (#12927)
Adding custom graph memory to Cypher chain

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-07 11:05:28 -08:00
Lance Martin
2287a311cf Multi modal RAG + QA Cookbooks (#12946)
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Vinzenz Klass <76391770+VinzenzKlass@users.noreply.github.com>
Co-authored-by: Praveen Venkateswaran <praveenv@uci.edu>
Co-authored-by: Praveen Venkateswaran <praveen.venkateswaran@ibm.com>
Co-authored-by: Kacper Łukawski <kacperlukawski@users.noreply.github.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Ofer Mendelevitch <ofermend@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-11-07 09:10:24 -08:00
Bagatur
6175dc30aa bump 331rc2 (#13006) 2023-11-07 08:52:17 -08:00
Jasan
ff87f4b4f9 Fix for rag-supabase readme (#12869)
- **Description:** Correct naming for package in README
- **Issue:** README wasn't aligned with pyproject.toml, resulting in not
being able to install the rag-supabase package.
  - **Tag maintainer:** @gregnr
2023-11-06 19:38:22 -08:00
Harrison Chase
99ffeb239f add ingest for mongo (#12897) 2023-11-06 19:28:22 -08:00
Ofer Mendelevitch
ce21308f29 Vectara RAG template (#12975)
- **Description:** RAG template using Vectara
  - **Twitter handle:** @ofermend
2023-11-06 19:24:00 -08:00
Erick Friis
0c81cd923e oai v1 embeddings (#12969)
Initial PR to get OpenAIEmbeddings working with the new sdk

fyi @rlancemartin 

Fixes #12943

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-06 18:52:33 -08:00
Bagatur
fdbb45d79e bump 331rc1 (#12965) 2023-11-06 15:36:43 -08:00
Bagatur
3bb8030a6e fix max_tokens (#12964) 2023-11-06 15:36:05 -08:00
Bagatur
a9002a82b8 bump 331rc0 (#12963) 2023-11-06 15:19:33 -08:00
Harrison Chase
c27400efeb Support multimodal messages (#11320)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-06 15:14:18 -08:00
Bagatur
388f248391 add oai v1 cookbook (#12961) 2023-11-06 14:28:32 -08:00
Bagatur
4f7dff9d66 Record system fingerprint chat openai (#12960) 2023-11-06 14:25:53 -08:00
Bagatur
8e0cb2eb84 ChatOpenAI and AzureChatOpenAI openai>=1 compatible (#12948) 2023-11-06 13:24:18 -08:00
Kacper Łukawski
52d0055a91 Add support of Cohere Embed v3 (#12940)
Cohere released the new embedding API (Embed v3:
https://txt.cohere.com/introducing-embed-v3/) that treats document and
query embeddings differently. This PR updated the `CohereEmbeddings` to
use them appropriately. It also works with the old models.
2023-11-06 15:06:58 -05:00
Praveen Venkateswaran
8e0dcb37d2 Add SecretStr for Symbl.ai Nebula API (#12896)
Description: This PR masks API key secrets for the Nebula model from
Symbl.ai
Issue: #12165 
Maintainer: @eyurtsev

---------

Co-authored-by: Praveen Venkateswaran <praveen.venkateswaran@ibm.com>
2023-11-06 14:13:59 -05:00
Vinzenz Klass
59d0bd2150 feat: acquire advisory lock before creating extension in pgvector (#12935)
- **Description:** Acquire advisory lock before attempting to create
extension on postgres server, preventing errors in concurrent
executions.
  - **Issue:** #12933
  - **Dependencies:** None

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-11-06 14:00:39 -05:00
Eugene Yurtsev
b376854b26 Fix for anyscale chat model api key (#12938)
* ChatAnyscale was missing coercion to SecretStr for anyscale api key
* The model inherits from ChatOpenAI so it should not force the openai
api key to be secret str until openai model has the same changes

https://github.com/langchain-ai/langchain/issues/12841
2023-11-06 13:28:02 -05:00
Bagatur
58889149c2 fix guides link (#12941) 2023-11-06 08:13:02 -08:00
matthieudelaro
52503a367f Remove useless line of code from sql.ipynb (#12906)
This PR remove a single line of code from a notebook of the
documentation. This line used to define a variable, which is never used
in the code.
For further context, for reviewers, here is the online documentation:
https://python.langchain.com/docs/use_cases/qa_structured/sql#case-3-sql-agents
2023-11-06 07:59:12 -08:00
hmasdev
622bf12c2e fix regex pattern of structured output parser (#12929)
- **Description:** fix the regex pattern of
[StructuredChatOutputParser](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/structured_chat/output_parser.py#L18)
and add unit tests for the code change.
- **Issue:** #12158 #12922
- **Dependencies:** None
- **Tag maintainer:** 
- **Twitter handle:** @hmdev3
- **NOTE:** This PR conflicts #7495 . After #7495 is merged, I am going
to update PR.
2023-11-06 07:53:14 -08:00
wemysschen
8c02f4fbd8 add baidu cloud vectorsearch document (#12928)
**Description:** 
Add BaiduCloud VectorSearch document with implement of BESVectorSearch
in langchain vectorstores

---------

Co-authored-by: wemysschen <root@icoding-cwx.bcc-szzj.baidu.com>
2023-11-06 07:52:50 -08:00
wemysschen
8d7144e6a6 fix baiducloud directory loader import file loader (#12924)
**Issue:** 
fix baiducloud BOS directory loader imports its file loader

---------

Co-authored-by: wemysschen <root@icoding-cwx.bcc-szzj.baidu.com>
2023-11-06 07:52:31 -08:00
Alex Howard
5bb2ea51a5 docs: clean up vestigial markdown (#12907)
- **Description:** Remove text "LangChain currently does not support"
which appears to be vestigial leftovers from a previous change.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Tag maintainer:** @baskaryan, @eyurtsev
  - **Twitter handle:** thezanke
2023-11-06 07:51:56 -08:00
Praveen Venkateswaran
1eb7d3a862 docs: update hf pipeline docs (#12908)
- **Description:** Noticed that the Hugging Face Pipeline documentation
was a bit out of date.
Updated with information about passing in a pipeline directly
(consistent with docstring) and a recent contribution of mine on adding
support for multi-gpu specifications with Accelerate in
21eeba075c
2023-11-06 07:51:31 -08:00
Christoffer Bo Petersen
37da6e546b Fix typo in e2b_data_analysis.ipynb (#12930)
Just a small typo fix
2023-11-06 07:37:30 -08:00
Kacper Łukawski
621419f71e Fix normalizing the cosine distance in Qdrant (#12934)
Qdrant was incorrectly calculating the cosine similarity and returning
`0.0` for the best match, instead of `1.0`. Internally Qdrant returns a
cosine score from `-1.0` (worst match) to `1.0` (best match), and the
current formula reflects it.
2023-11-06 07:36:59 -08:00
Hech
8fe6bcc662 Fix return metadata when searching for DingoDB (#12937) 2023-11-06 07:35:36 -08:00
Jakub Novák
ada3d2cbd1 Add possibility to pass on_artifacts for a specific conversation (#12687)
Possibility to pass on_artifacts to a conversation. It can be then
achieved by adding this way:

```python
result = agent.run(
    input=message.text,
    metadata={
        "on_artifact": CALLBACK_FUNCTION
    },
)
```
2023-11-06 07:29:47 -08:00
Bagatur
0378662e1d fix langsmith link (#12939) 2023-11-06 07:17:05 -08:00
Harrison Chase
1a92d2245d Harrison/docs smith serve (#12898)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-06 07:07:25 -08:00
Bagatur
53f453f01a bump 331 (#12932) 2023-11-06 05:58:12 -08:00
Priyadutt
a4d9e986fb Update csv.ipynb description (#12878)
The line removed is not required as there are no other alternative
solutions above than that.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-06 03:32:04 -08:00
Erick Friis
5000c7308e cli template gitignores (#12914)
- ap gitignore
- package
2023-11-05 22:34:45 -08:00
Harrison Chase
aba407f774 use keys not items (#12918) 2023-11-05 22:08:29 -08:00
Harrison Chase
60d025b83b mongo parent document retrieval (#12887) 2023-11-04 10:16:02 -07:00
Michael Hunger
e43b4079c8 template: use dashes instead of underscores for neo4j-cypher package and path in readme (#12827)
Minimal readme template update

underscores didn't work, dashes do
2023-11-03 15:54:48 -07:00
wemysschen
e14aa37d59 fix bes vector store search (#12828)
**Issue:** 
fix search body in baidu cloud vectorsearch

---------

Co-authored-by: wemysschen <root@icoding-cwx.bcc-szzj.baidu.com>
2023-11-03 15:39:19 -07:00
standby24x7
f04e4df7f9 coockbook: Fix typo in wikibase_agent.ipynb (#12839)
This patch fixes a spelling typo in message
within wikibase_agent.ipynb.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
2023-11-03 14:57:37 -07:00
Kacper Łukawski
66c41c0dbf Add template for self-query-qdrant (#12795)
This PR adds a self-querying template using Qdrant as a vector store.
The template uses an artificial dataset and was implemented in a way
that simplifies passing different components and choosing LLM and
embedding providers.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-03 13:37:29 -07:00
Daniel Chalef
f41f4c5e37 zep/rag conversation zep template (#12762)
LangServe template for a RAG Conversation App using Zep.

 @baskaryan, @eyurtsev

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-03 13:34:44 -07:00
Lance Martin
ea1ab391d4 Open Clip multimodal embeddings (#12754) 2023-11-03 13:33:36 -07:00
Bagatur
ebee616822 bump 330 (#12853) 2023-11-03 13:26:41 -07:00
Tomaz Bratanic
0dbdb8498a Neo4j Advanced RAG template (#12794)
Todo:

- [x] Docs
2023-11-03 13:22:55 -07:00
Harrison Chase
83cee2cec4 Template Readmes and Standardization (#12819)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-03 13:15:29 -07:00
Erick Friis
6c237716c4 Update readmes with new cli install (#12847)
Old command still works. Just simplifying.

Merge after releasing CLI 0.0.15
2023-11-03 12:10:32 -07:00
Erick Friis
7db49d3842 Confirm sys.path includes current dir for app serve (#12851)
- Make sure sys.path is set properly for langchain app serve
- bump
2023-11-03 11:37:20 -07:00
Erick Friis
1bc35f61cb CLI 0.0.14, Uvicorn update and no more [serve] (#12845)
Calls uvicorn directly from cli:
Reload works if you define app by import string instead of object.
(was doing subprocess in order to get reloading)

Version bump to 0.0.14

Remove the need for [serve] for simplicity.

Readmes are updated in #12847 to avoid cluttering this PR
2023-11-03 11:05:52 -07:00
Brace Sproul
76bcac5bb3 Remove admin prefix/suffix from docs for anthropic (#12849) 2023-11-03 10:54:16 -07:00
Harrison Chase
523e5803bb update mongo template (#12838) 2023-11-03 10:31:53 -07:00
William FH
18005c6384 Disable trace_on_chain_group auto-tracing (#12807)
Previously we treated trace_on_chain_group as a command to always start
tracing. This is unintuitive (makes the function do 2 things), and makes
it harder to toggle tracing
2023-11-03 10:05:09 -07:00
Erick Friis
0da75b9ebd Autopopulate module name in cli init (#12814) 2023-11-02 23:45:38 -07:00
William FH
98aff29fbd Add Dataset Page to printout (#12816) 2023-11-02 20:36:56 -07:00
Joseph Martinez
f573a4d0b3 Update quickstart.mdx (#12386)
**Description**
Removed confusing sentence. 
Not clear what "both" was referring to. The two required components
mentioned previously? The two methods listed below?

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-02 18:38:21 -07:00
Leonid Ganeline
e112b2f2e6 updated integrations/providers/google (#12226)
Added missed integrations. Updated formats.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-02 18:35:31 -07:00
Manuel Rech
2e2b9c76d9 Keep also original query - multi_query.py (#12696)
When you use a MultiQuery it might be useful to use the original query
as well as the newly generated ones to maximise the changes to retriever
the correct document. I haven't created an issue, it seems a very small
and easy thing.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-02 18:15:02 -07:00
Michael Landis
4fe9bf70b6 feat: add a rag template for momento vector index (#12757)
# Description
Add a RAG template showcasing Momento Vector Index as a vector store.
Includes a project directory and README.

# **Twitter handle** 

Tag the company @momentohq for a mention and @mlonml for the
contribution.
2023-11-02 17:59:15 -07:00
刘 方瑞
26c4ec1eaf myscale notebook url change (#12810)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-02 17:56:26 -07:00
Lance Martin
2683c2fc53 Update template index (#12809) 2023-11-02 17:51:40 -07:00
apeng-singlestore
5c0e9ac578 Add template for rag-singlestoredb (#12805)
This change adds a new template for simple RAG using the SingleStoreDB
vectorstore.

Twitter: @alexjpeng

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-02 17:51:00 -07:00
Bagatur
658a3a8607 FEAT: Merge TileDB vecstore (#12811) 2023-11-02 17:40:32 -07:00
Akio Nishimura
c04647bb4e Correct number of elements in config list in batch() and abatch() of BaseLLM (#12713)
- **Description:** Correct number of elements in config list in
`batch()` and `abatch()` of `BaseLLM` in case `max_concurrency` is not
None.
- **Issue:** #12643
- **Twitter handle:** @akionux

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-02 17:28:48 -07:00
James Braza
88b506b321 Adds missing urllib.parse for IDE warning of PubMedAPIWrapper (#12808)
Resolves an IDE (PyCharm 2023.2.3 PE) warning around
`urllib.parse.quote`, also enabling CTRL-click
2023-11-02 17:27:25 -07:00
Bagatur
a2bb0dd445 TileDB update import unit tests 2023-11-02 17:24:22 -07:00
Nikos Papailiou
2fdaa1e5fd Add TileDB vectorstore implementation (#12624)
- **Description:** Add [TileDB](https://tiledb.com) vectorstore
implementation. TileDB offers ANN search capabilities using the
[TileDB-Vector-Search](https://github.com/TileDB-Inc/TileDB-Vector-Search)
module. It provides serverless execution of ANN queries and storage of
vector indexes both on local disk and cloud object stores (i.e. AWS S3).
More details in:
- [Why TileDB as a Vector
Database](https://tiledb.com/blog/why-tiledb-as-a-vector-database)
- [TileDB 101: Vector
Search](https://tiledb.com/blog/tiledb-101-vector-search)
- **Twitter handle:** @tiledb
2023-11-02 17:21:03 -07:00
盐粒 Yanli
1b233798a0 feat: Supprt pgvecto.rs as a VectorStore (#12718)
Supprt [pgvecto.rs](https://github.com/tensorchord/pgvecto.rs) as a new
VectorStore type.

This introduces a new dependency
[pgvecto_rs](https://pypi.org/project/pgvecto_rs/) and upgrade
SQLAlchemy to ^2.

Relate to https://github.com/tensorchord/pgvecto.rs/issues/11

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-02 17:16:04 -07:00
Daniel Chalef
0cbdba6a9b zep: VectorStore: Use Native MMR (#12690)
- refactor to use Zep's native MMR; update example
- 
@baskaryan @eyurtsev
2023-11-02 16:45:42 -07:00
Daniel Chalef
cc3d3920e3 Zep: Summary Search and Example (#12686)
Zep now has the ability to search over chat history summaries. This PR
adds support for doing so. More here: https://blog.getzep.com/zep-v0-17/

@baskaryan @eyurtsev
2023-11-02 16:31:11 -07:00
Bagatur
526313002c add import tests to all modules (#12806) 2023-11-02 15:32:55 -07:00
Harrison Chase
6609a6033f fix vectorstore imports (#12804)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-02 15:32:31 -07:00
Nuno Campos
f66a9d2adf Automatically add configurable key to config_schema if config_specs i… (#12798)
…s present

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-02 21:46:15 +00:00
Praveen Venkateswaran
21eeba075c enable the device_map parameter in huggingface pipeline (#12731)
### Enabling `device_map` in HuggingFacePipeline 

For multi-gpu settings with large models, the
[accelerate](https://huggingface.co/docs/accelerate/usage_guides/big_modeling#using--accelerate)
library provides the `device_map` parameter to automatically distribute
the model across GPUs / disk.

The [Transformers
pipeline](3520e37e86/src/transformers/pipelines/__init__.py (L543))
enables users to specify `device` (or) `device_map`, and handles cases
(with warnings) when both are specified.

However, Langchain's HuggingFacePipeline only supports specifying
`device` when calling transformers which limits large models and
multi-gpu use-cases.
Additionally, the [default
value](8bd3ce59cd/libs/langchain/langchain/llms/huggingface_pipeline.py (L72))
of `device` is initialized to `-1` , which is incompatible with the
transformers pipeline when `device_map` is specified.

This PR addresses the addition of `device_map` as a parameter , and
solves the incompatibility of `device = -1` when `device_map` is also
specified.
An additional test has been added for this feature. 

Additionally, some existing tests no longer work since 
1. `max_new_tokens` has to be specified under `pipeline_kwargs` and not
`model_kwargs`
2. The GPT2 tokenizer raises a `ValueError: Pipeline with tokenizer
without pad_token cannot do batching`, since the `tokenizer.pad_token`
is `None` ([related
issue](https://github.com/huggingface/transformers/issues/19853) on the
transformers repo).

This PR handles fixing these tests as well.

Co-authored-by: Praveen Venkateswaran <praveen.venkateswaran@ibm.com>
2023-11-02 14:29:06 -07:00
Mark Bell
3276aa3e17 __getattr__ should rase AttributeError not ImportError on missing attributes (#12801)
[The python
spec](https://docs.python.org/3/reference/datamodel.html#object.__getattr__)
requires that `__getattr__` throw `AttributeError` for missing
attributes but there are several places throwing `ImportError` in the
current code base. This causes a specific problem with `hasattr` since
it calls `__getattr__` then looks only for `AttributeError` exceptions.
At present, calling `hasattr` on any of these modules will raise an
unexpected exception that most code will not handle as `hasattr`
throwing exceptions is not expected.

In our case this is triggered by an exception tracker (Airbrake) that
attempts to collect the version of all installed modules with code that
looks like: `if hasattr(mod, "__version__"):`. With `HEAD` this is
causing our exception tracker to fail on all exceptions.

I only changed instances of unknown attributes raising `ImportError` and
left instances of known attributes raising `ImportError`. It feels a
little weird but doesn't seem to break anything.
2023-11-02 17:08:54 -04:00
Daniel Chalef
d966e4d13a zep: Update Zep docs and messaging (#12764)
Update Zep documentation with messaging, more details.

 @baskaryan, @eyurtsev
2023-11-02 13:39:17 -07:00
Illia
71d1a48b66 Use data from all Google search results in SerpApi.com wrapper (#12770)
- **Description:** Use all Google search results data in SerpApi.com
wrapper instead of the first one only
  - **Tag maintainer:** @hwchase17 

_P.S. `libs/langchain/tests/integration_tests/utilities/test_serpapi.py`
are not executed during the `make test`._
2023-11-02 13:31:27 -07:00
ba230t
9214d8e6ed Fixed a typo in templates/docs/CONTRIBUTING.md (delimeters =>delimiters) (#12774)
- **Description:** Just fixed a minor typo in
templates/docs/CONTRIBUTING.md.
  - **Issue:** No linked issues.

Very small contribution!
2023-11-02 13:31:04 -07:00
Armin Stepanjan
185ddc573e Fix broken links to use cases (#12777)
This PR replaces broken links to end to end usecases
([/docs/use_cases](https://python.langchain.com/docs/use_cases)) with a
non-broken version
([/docs/use_cases/qa_structured/sql](https://python.langchain.com/docs/use_cases/qa_structured/sql)),
consistently with the "Use cases" navigation button at the top of the
page.

---------

Co-authored-by: Matvey Arye <mat@timescale.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-02 13:20:54 -07:00
니콜라스
25ee10ed4f Docs: 'memory' -> 'history' typo. (#12779)
The 'MessagesPlaceholder' expects 'history' but 'RunnablePassthrough' is
assigning 'memory'.
2023-11-02 13:09:39 -07:00
yudai yamamoto
1f7e811156 Fixed broken link in Quickstart page (#12516)
- **Description:** 
Corrected a specific link within the documentation.
  
  - **Issue:**
  #12490 

  - **Dependencies:**
  - **Tag maintainer:**
  - **Twitter handle:**

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-02 13:00:53 -07:00
Ikko Eltociear Ashimine
9b02f7d59c Update llamacpp.ipynb (#12791)
HuggingFace -> Hugging Face
2023-11-02 12:52:12 -07:00
Tomaz Bratanic
2a9f40ed28 Add input types to cypher templates (#12800) 2023-11-02 12:46:02 -07:00
Nuno Campos
c4fdf78d03 Fix AddableDict raising exception when used with non-addable values (#12785)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-02 18:56:29 +00:00
Erick Friis
49e283a0cd CLI 0.0.13, Configurable Template Demo (#12796) 2023-11-02 11:42:57 -07:00
Nuno Campos
d1c6ad7769 Fix on_llm_new_token(chunk=) for some chat models (#12784)
It was passing in message instead of generation

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-02 16:33:44 +00:00
Erick Friis
070823f294 CLI 0.0.12 (#12787) 2023-11-02 08:29:27 -07:00
Bagatur
979501c0ca bump 329 (#12778) 2023-11-02 06:02:43 -07:00
Matvey Arye
9369d6aca0 Fixes to the docs for timescale vector template (#12756) 2023-11-01 18:48:23 -07:00
Lance Martin
33810126bd Update chat prompt structure in LLaMA SQL cookbook (#12364)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-01 16:37:03 -07:00
ElliotKetchup
58b90f30b0 Update llama.cpp integration (#11864)
<!-- 
- **Description:** removed redondant link, replaced it with Meta's LLaMA
repo, add resources for models' hardware requirements,
  - **Issue:** None,
  - **Dependencies:** None,
  - **Tag maintainer:** None,
  - **Twitter handle:** @ElliotAlladaye
 -->
2023-11-01 16:32:02 -07:00
Manuel Soria
a228f340f1 Semantic search within postgreSQL using pgvector (#12365)
Cookbook showing how to incoporate RAG search within a postgreSQL
database using pgvector.

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-01 16:21:34 -07:00
Erick Friis
da821320d3 Fixes 'Nonetype' not iterable for ObsidianLoader (#12751)
Implements #12726 from @Di3mex
2023-11-01 16:07:09 -07:00
Juan Bustos
67b6f4dc71 Update google_vertex_ai_palm.ipynb (#12715)
Fixed a typo

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** Fixed a typo on the code
  - **Issue:** the issue # it fixes (if applicable),


Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-01 16:05:44 -07:00
Eugene Yurtsev
b1caae62fd APIChain add restrictions to domains (CVE-2023-32786) (#12747)
* Restrict the chain to specific domains by default
* This is a breaking change, but it will fail loudly upon object
instantiation -- so there should be no silent errors for users
* Resolves CVE-2023-32786
2023-11-01 18:50:34 -04:00
Erick Friis
4421ba46d7 Demo Server, Fix Timescale (#12746)
- improve demo server
- missing deps
2023-11-01 15:29:34 -07:00
Eugene Yurtsev
0e1aedb9f4 Use jinja2 sandboxing by default (#12733)
* This is an opt-in feature, so users should be aware of risks if using
jinja2.
* Regardless we'll add sandboxing by default to jinja2 templates -- this
  sandboxing is a best effort basis.
* Best strategy is still to make sure that jinja2 templates are only
loaded from trusted sources.
2023-11-01 14:54:01 -07:00
Erick Friis
ab5309f6f2 template updates (#12736)
- langchain license
- add timescale vector dep to that template
2023-11-01 13:53:26 -07:00
Lance Martin
6406c53089 Update template index w/ Timescale (#12729) 2023-11-01 12:04:54 -07:00
Erick Friis
14340ee7cd use http.client instead of urllib3 (#12660)
dep problems with requests

cloudflare debugging not worth it with urllib
2023-11-01 11:15:05 -07:00
Bagatur
eee5181b7a bump 328, exp 37 (#12722) 2023-11-01 10:27:39 -07:00
Erick Friis
3405dbbc64 dash not underscore (#12716)
template names are auto-populating with the wrong convention (with
underscores)
2023-11-01 09:48:37 -07:00
123-fake-st
8bd3ce59cd PyPDFLoader use url in metadata source if file is a web path (#12092)
**Description:** Update `langchain.document_loaders.pdf.PyPDFLoader` to
store url in metadata (instead of a temporary file path) if user
provides a web path to a pdf

- **Issue:** Related to #7034; the reporter on that issue submitted a PR
updating `PyMuPDFParser` for this behavior, but it has unresolved merge
issues as of 20 Oct 2023 #7077
- In addition to `PyPDFLoader` and `PyMuPDFParser`, these other classes
in `langchain.document_loaders.pdf` exhibit similar behavior and could
benefit from an update: `PyPDFium2Loader`, `PDFMinerLoader`,
`PDFMinerPDFasHTMLLoader`, `PDFPlumberLoader` (I'm happy to contribute
to some/all of that, including assisting with `PyMuPDFParser`, if my
work is agreeable)
- The root cause is that the underlying pdf parser classes, e.g.
`langchain.document_loaders.parsers.pdf.PyPDFParser`, never receive
information about the url; the parsers receive a
`langchain.document_loaders.blob_loaders.blob`, which contains the pdf
contents and local file path, but not the url
- This update passes the web path directly to the parser since it's
minimally invasive and doesn't require further changes to maintain
existing behavior for local files... bigger picture, I'd consider
extending `blob` so that extra information like this can be
communicated, but that has much bigger implications on the codebase
which I think warrants maintainer input

  - **Dependencies:** None

```python
# old behavior
>>> from langchain.document_loaders import PyPDFLoader
>>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': '/var/folders/w2/zx77z1cs01s1thx5dhshkd58h3jtrv/T/tmpfgrorsi5/tmp.pdf', 'page': 0}

# new behavior
>>> from langchain.document_loaders import PyPDFLoader
>>> loader = PyPDFLoader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0}
```
2023-11-01 11:27:00 -04:00
Dave Kwon
b1954aab13 feat: Add page metadata on PDFMinerLoader (#12277)
- **Description:** #12273 's suggestion PR
Like other PDFLoader, loading pdf per each page and giving page
metadata.
  - **Issue:** #12273 
  - **Twitter handle:** @blue0_0hope

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-11-01 11:25:37 -04:00
Duda Nogueira
7148f3e1fe Weaviate - Fix schema existence check (#12711)
This will allow you create the schema beforehand. The check was failing
and preventing importing into existing classes.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-11-01 08:22:15 -07:00
Sayandip
8dbbcf0b6c Adding a template for Solo Performance Prompting Agent (#12627)
**Description:** This template creates an agent that transforms a single
LLM into a cognitive synergist by engaging in multi-turn
self-collaboration with multiple personas.
**Tag maintainer:** @hwchase17

---------

Co-authored-by: Sayandip Sarkar <sayandip.sarkar@skypointcloud.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-01 08:10:07 -07:00
Aidos Kanapyanov
ae63c186af Mask API key for Anyscale LLM (#12406)
Description: Add masking of API Key for Anyscale LLM when printed.
Issue: #12165 
Dependencies: None
Tag maintainer: @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-11-01 10:22:26 -04:00
Predrag Gruevski
5ae51a8a85 Fix typo highlighted by ruff autoformatter. (#12691)
H/t @MichaReiser for spotting it:
https://github.com/langchain-ai/langchain/pull/12585/files#r1378253045
2023-10-31 22:16:06 -04:00
Predrag Gruevski
724b92231d Remove black caching config from CI lint workflow. (#12594)
To merge after #12585 is merged.
2023-10-31 21:39:05 -04:00
Predrag Gruevski
0ea837404a Only publish to test PyPI from the _test_release.yml workflow. (#12668)
PyPI trusted publishing wants to know which workflow is expected to do
the publish. We always want to publish from the same workflow, so we're
making `_test_release.yml` the only workflow that publishes to Test
PyPI.
2023-10-31 21:36:38 -04:00
Predrag Gruevski
321cd44f13 Use separate jobs for building and publishing test releases. (#12671)
This follows the principle of least privilege. Our `poetry build` step
doesn't need, and shouldn't get, access to our GitHub OIDC capability.

This is the same structure as I used in the already-merged PR for
refactoring the regular PyPI release workflow: #12578.
2023-10-31 21:36:26 -04:00
Erick Friis
44c8b159b9 properly increment version in cli (#12685)
Went from 0.0.9 -> 0.0.11 without releasing. Back to 10, then release.
2023-10-31 17:27:43 -07:00
Erick Friis
b825dddf95 fix elastic rag template in playground (#12682)
- a few instructions in the readme (load_documents -> ingest.py)
- added docker run command for local elastic
- adds input type definition to render playground properly
2023-10-31 17:18:35 -07:00
Lance Martin
f0eba1ac63 Add RAG input types (#12684)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-31 17:13:44 -07:00
Erick Friis
392cfbee24 link to templates (#12680) 2023-10-31 16:19:22 -07:00
Leonid Ganeline
ddcec005bc fix for YahooFinanceNewsTool (#12665)
Added YahooFinanceNewsTool to the __init__.py 
It was missed here.
2023-10-31 14:58:09 -07:00
Predrag Gruevski
09711ad5a1 Both lint and format templates with ruff v0.1.3. (#12676)
- Both lint and format code in `templates`.
- Upgrade to ruff v0.1.3.
2023-10-31 14:52:00 -07:00
Predrag Gruevski
01a3c9b94e Use an in-project virtualenv in the CLI package. (#12678)
Keeping it in sync with how our other packages are configured.
2023-10-31 14:51:24 -07:00
Predrag Gruevski
f7f35a9102 Use black to lint notebooks and docs for now. (#12679)
Due to #12677 having lots of errors for the time being.
2023-10-31 14:51:05 -07:00
Jacob Lee
bd668fcea1 Adds version CLI command (#12619)
Will be automatically bumped with `poetry version patch`.

@efriis @hwchase17

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-31 14:50:04 -07:00
Frank
bf5805bb32 Add quip loader (#12259)
- **Description:** implement [quip](https://quip.com) loader
  - **Issue:** https://github.com/langchain-ai/langchain/issues/10352
  - **Dependencies:** No
  -  pass make format, make lint, make test

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-31 14:11:24 -07:00
Roman Vasilyev
c9a6940d58 PGVector fix (#12592)
latest release broken, this fixes it

---------

Co-authored-by: Roman Vasilyev <rvasilyev@mozilla.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-31 17:01:15 -04:00
Lance Martin
9e17d1a225 Update Vertex template (#12644)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-31 14:00:22 -07:00
Predrag Gruevski
aa3f4a9bc8 Remove the CLI package's pydantic compatibility tests. (#12675)
They aren't necessary, since the CLI package doesn't have a direct
dependency on pydantic.
2023-10-31 16:57:38 -04:00
Predrag Gruevski
e8b99364b3 Use ruff for both linting and formatting in langchain-cli. (#12672)
Prior to this PR, `ruff` was used only for linting and not for
formatting, despite the names of the commands. This PR makes it be used
for both linting code and autoformatting it.
2023-10-31 13:52:25 -07:00
Harrison Chase
9a10b2b047 fix plate chain (#12673) 2023-10-31 13:45:09 -07:00
Margaret Qian
acfc485808 Update MosaicML Embedding Input Key (#12657)
This input key was missed in the last update PR:
https://github.com/langchain-ai/langchain/pull/7391

The input/output formats are intended to be like this:

```
{"inputs": [<prompt>]} 

{"outputs": [<output_text>]}
```
2023-10-31 14:43:30 -04:00
Erika Cardenas
d26ac5f999 Update README for Hybrid Search Weaviate (#12661)
- **Description:** Updated the README for Hybrid Search Weaviate
2023-10-31 11:02:34 -07:00
Predrag Gruevski
c871cc5055 Remove print() statements which seemed leftover from debugging. (#12648)
Added in #12159 presumably during debugging. Right now they cause a bit of visual noise.
2023-10-31 13:45:48 -04:00
Erick Friis
2a7e0a27cb update lc version (#12655)
also updated py version in `csv-agent` and `rag-codellama-fireworks`
because they have stricter python requirements
2023-10-31 10:19:15 -07:00
Predrag Gruevski
360cff81a3 Overwrite existing distributions when uploading to test PyPI. (#12658) 2023-10-31 10:02:50 -07:00
Lance Martin
da94c750c5 Add RAG template for Timescale Vector (#12651)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Matvey Arye <mat@timescale.com>
2023-10-31 09:56:29 -07:00
Noam Gat
14e8c74736 LM Format Enforcer Integration + Sample Notebook (#12625)
## Description

This PR adds support for
[lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer) to
LangChain.

![image](https://raw.githubusercontent.com/noamgat/lm-format-enforcer/main/docs/Intro.webp)

The library is similar to jsonformer / RELLM which are supported in
Langchain, but has several advantages such as
- Batching and Beam search support
- More complete JSON Schema support
- LLM has control over whitespace, improving quality
- Better runtime performance due to only calling the LLM's generate()
function once per generate() call.

The integration is loosely based on the jsonformer integration in terms
of project structure.

## Dependencies

No compile-time dependency was added, but if `lm-format-enforcer` is not
installed, a runtime error will occur if it is trying to be used.

## Tests

Due to the integration modifying the internal parameters of the
underlying huggingface transformer LLM, it is not possible to test
without building a real LM, which requires internet access. So, similar
to the jsonformer and RELLM integrations, the testing is via the
notebook.

## Twitter Handle

[@noamgat](https://twitter.com/noamgat)


Looking forward to hearing feedback!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-31 09:49:01 -07:00
Stefano Lottini
a4e4b5a86f Relax python version and remove need for explicit setup step (#12637)
This PR addresses what seems like a unnecessary Python version
restriction in the pyroject.toml specs within both Cassandra (/Astra DB)
templates. With "^3.11" I got some version incompatibilities with the
latest "langchain add [...]" commands, so these are now relaxed in line
with the other templates I could inspect.

Incidentally, in the "entomology" template, the need for an explicit
"setup" step for the user to carry on has been removed, replaced by a
check-and-execute-if-necessary instruction on app startup.

Thank you for your attention!
2023-10-31 09:42:27 -07:00
Predrag Gruevski
5308b836c7 Upgrade to actions/checkout@v4 in the docs lint job. (#12581) 2023-10-31 12:41:18 -04:00
Predrag Gruevski
94f018f1ba Support release-testing packages with dashes in their names. (#12654) 2023-10-31 12:40:34 -04:00
Erick Friis
912ace18e9 fix template py verisons (#12650) 2023-10-31 09:20:29 -07:00
Brian McBrayer
b74468f399 Fix small typo on Founcational -> Router notebook (#12634)
- **Description:** Fix small typo on Founcational -> Router notebook
2023-10-31 09:16:29 -07:00
Predrag Gruevski
72fa5a463d Show ruff output inline in GitHub PRs. (#12647) 2023-10-31 12:16:01 -04:00
William FH
17c2e3b87e Rename Template (#12649)
To chatbot feedback. Update import
2023-10-31 09:15:30 -07:00
Erick Friis
7f6e751a3d template updates (#12646) 2023-10-31 09:13:58 -07:00
Leonid Kuligin
a53cac4508 added template to use Vertex Vector Search for q&a (#12622)
added template to use Vertex Vector Search for q&a
2023-10-31 08:49:24 -07:00
Lance Martin
944cb552bb Minor updates to READMEs (#12642) 2023-10-31 08:34:46 -07:00
William FH
88f0f1e73b Conversational Feedback (#12590)
Context in the README.

Show how score chat responses based on a followup from the user and then
log that as feedback in LangSmith
2023-10-31 08:34:17 -07:00
Predrag Gruevski
f94e24dfd7 Install and use ruff format instead of black for code formatting. (#12585)
Best to review one commit at a time, since two of the commits are 100%
autogenerated changes from running `ruff format`:
- Install and use `ruff format` instead of black for code formatting.
- Output of `ruff format .` in the `langchain` package.
- Use `ruff format` in experimental package.
- Format changes in experimental package by `ruff format`.
- Manual formatting fixes to make `ruff .` pass.
2023-10-31 10:53:12 -04:00
William FH
bfd719f9d8 bind_functions convenience method (#12518)
I always take 20-30 seconds to re-discover where the
`convert_to_openai_function` wrapper lives in our codebase. Chat
langchain [has no
clue](https://smith.langchain.com/public/3989d687-18c7-4108-958e-96e88803da86/r)
what to do either. There's the older `create_openai_fn_chain` , but we
haven't been recommending it in LCEL. The example we show in the
[cookbook](https://python.langchain.com/docs/expression_language/how_to/binding#attaching-openai-functions)
is really verbose.


General function calling should be as simple as possible to do, so this
seems a bit more ergonomic to me (feel free to disagree). Another option
would be to directly coerce directly in the class's init (or when
calling invoke), if provided. I'm not 100% set against that. That
approach may be too easy but not simple. This PR feels like a decent
compromise between simple and easy.

```
from enum import Enum
from typing import Optional

from pydantic import BaseModel, Field


class Category(str, Enum):
    """The category of the issue."""

    bug = "bug"
    nit = "nit"
    improvement = "improvement"
    other = "other"


class IssueClassification(BaseModel):
    """Classify an issue."""

    category: Category
    other_description: Optional[str] = Field(
        description="If classified as 'other', the suggested other category"
    )
    

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI().bind_functions([IssueClassification])
llm.invoke("This PR adds a convenience wrapper to the bind argument")

# AIMessage(content='', additional_kwargs={'function_call': {'name': 'IssueClassification', 'arguments': '{\n  "category": "improvement"\n}'}})
```
2023-10-31 07:15:37 -07:00
Nuno Campos
3143324984 Improve Runnable type inference for input_schemas (#12630)
- Prefer lambda type annotations over inferred dict schema
- For sequences that start with RunnableAssign infer seq input type as
"input type of 2nd item in sequence - output type of runnable assign"
2023-10-31 13:22:54 +00:00
Nuno Campos
2f563cee20 Add Runnable.with_listeners() (#12549)
- This binds start/end/error listeners to a runnable, which will be
called with the Run object
2023-10-31 11:04:51 +00:00
Bagatur
bcc62d63be bump 327 (#12623) 2023-10-31 02:18:08 -07:00
Erick Friis
a1fae1fddd Readme rewrite (#12615)
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-31 00:06:02 -07:00
Ankur Singh
00766c9f31 Improves the description of the installation command (#12354)
- **Description:**

 Before: 
`
To install modules needed for the common LLM providers, run:
`

After:
`
To install modules needed for the common LLM providers, run the
following command. Please bear in mind that this command is exclusively
compatible with the `bash` shell:
`


> This is required for the user so that the user will know if this
command is compatible with `zsh` or not.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:56:48 -07:00
Yujie Qian
1dbb77d7db VoyageEmbeddings (#12608)
- **Description:** Integrate VoyageEmbeddings into LangChain, with tests
and docs
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Tag maintainer:** N/A
  - **Twitter handle:** @Voyage_AI_

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:37:43 -07:00
chocolate4
92bf40a921 Add a new vector store hippo for langchain #11763 (#12412)
#11763

---------

Co-authored-by: TranswarpHippo <hippo.0.assistant@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:35:23 -07:00
Karthik Raja A
342d6c7ab6 Multi on client toolkit (#12392)
Replace this entire comment with:
-Add MultiOn close function and update key value and add async
functionality
- solved the key value TabId not found.. (updated to use latest key
value)
  
@hwchase17
2023-10-30 18:34:56 -07:00
Prabin Nepal
b109cb031b SecretStr for fireworks api (#12475)
- **Description:** This pull request removes secrets present in raw
format,
- **Issue:** Fireworks api key was exposed when printing out the
langchain object
[#12165](https://github.com/langchain-ai/langchain/issues/12165)
 - **Maintainer:** @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:17:53 -07:00
Harrison Chase
f35a65124a improve agent templates (#12528)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-30 18:15:13 -07:00
Harrison Chase
75bb28afd8 Harrison/pii chatbot (#12523)
the pii detection in the template is pretty basic, will need to be
customized per use case

the chain it "protects" can be swapped out for any chain
2023-10-30 18:13:12 -07:00
Harrison Chase
a32c236c64 bump cli to 009 (#12611) 2023-10-30 18:12:08 -07:00
Erika Cardenas
b97b9eda21 Hybrid Search Weaviate Template (#12606)
- **Description:** This template covers hybrid search in Weaviate
  - **Dependencies:** No
  - **Twitter handle:** @ecardenas300

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-30 18:10:48 -07:00
Martin Schade
0c7f1d8b21 Textract linearizer (#12446)
**Description:** Textract PDF Loader generating linearized output,
meaning it will replicate the structure of the source document as close
as possible based on the features passed into the call (e. g. LAYOUT,
FORMS, TABLES). With LAYOUT reading order for multi-column documents or
identification of lists and figures is supported and with TABLES it will
generate the table structure as well. FORMS will indicate "key: value"
with columms.
  - **Issue:** the issue fixes #12068 
- **Dependencies:** amazon-textract-textractor is added, which provides
the linearization
  - **Tag maintainer:** @3coins 

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 18:02:10 -07:00
Harrison Chase
a7d5e0ce8a add guardrails profanity (#12609) 2023-10-30 17:01:23 -07:00
Erick Friis
e933212a3d run poetry build in working dir (#12610)
Was failing because was trying to build from root:
https://github.com/langchain-ai/langchain/actions/runs/6700033981/job/18205251365
2023-10-30 16:58:34 -07:00
Erick Friis
f39246bd7e cli should pull instead of delete+clone (#12607)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-30 16:44:09 -07:00
Harrison Chase
8b5e879171 add a template for the package readme (#12499)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-30 16:39:39 -07:00
Bagatur
9bedda50f2 Bagatur/lakefs loader2 (#12524)
Co-authored-by: Jonathan Rosenberg <96974219+Jonathan-Rosenberg@users.noreply.github.com>
2023-10-30 16:30:27 -07:00
Brian McBrayer
3243dcc83e Fix very small typo (#12603)
- **Description:** this is the world's smallest typo change of a typo I
saw while reading the docs
2023-10-30 16:30:18 -07:00
Ackermann Yuriy
99b69fe607 Fixed missing optional tags. Added default key value for Ollama (#12599)
Added missing Optional typings. Added default values for Ollama optional
keys.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 16:30:10 -07:00
Lance Martin
f6f3ca12e7 Codebase RAG fireworks (#12597) 2023-10-30 16:21:56 -07:00
Harrison Chase
481bf6fae6 hosting note (#12589) 2023-10-30 15:31:31 -07:00
David Duong
b5c17ff188 Force List[Tuple[str,str]] to chat history widget (#12530)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 15:19:32 -07:00
David Duong
d39b4b61b6 Batch apply poetry lock --no-update for all templates (#12531)
Ran the following bash script for all templates

```bash
#!/bin/bash

set -e
current_dir="$(pwd)"
for directory in */; do
    if [ -d "$directory" ]; then
        (cd "$directory" && poetry lock --no-update)
    fi
done

cd "$current_dir"
```

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 15:18:53 -07:00
Kenzie Mihardja
e914283cf9 add docs to min_chunk_size (#12537)
Minor addition to documentation to elaborate on min_chunk_size.

Co-authored-by: Kenzie Mihardja <kenzie@docugami.com>
2023-10-30 15:13:52 -07:00
Bagatur
016813d189 factor out to_secret (#12593) 2023-10-30 15:10:25 -07:00
hsuyuming
630ae24b28 implement get_num_tokens to use google's count_tokens function (#10565)
can get the correct token count instead of using gpt-2 model

**Description:** 
Implement get_num_tokens within VertexLLM to use google's count_tokens
function.
(https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count).
So we don't need to download gpt-2 model from huggingface, also when we
do the mapreduce chain we can get correct token count.

**Tag maintainer:** 
@lkuligin 
**Twitter handle:** 
My twitter: @abehsu1992626

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 15:10:05 -07:00
Pham Vu Thai Minh
33e77a1007 Async support for FAISS (#11333)
Following this tutoral about using OpenAI Embeddings with FAISS

https://python.langchain.com/docs/integrations/vectorstores/faiss

```python
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
from langchain.document_loaders import TextLoader

loader = TextLoader("../../../extras/modules/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
```

This works fine

```python
db = FAISS.from_documents(docs, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
```

But the async version is not

```python
db = await FAISS.afrom_documents(docs, embeddings)  # NotImplementedError
query = "What did the president say about Ketanji Brown Jackson"

docs = await db.asimilarity_search(query) # this will use await asyncio.get_event_loop().run_in_executor under the hood and will not call OpenAIEmbeddings.aembed_query but call OpenAIEmbeddings.embed_query
```

So this PR add async/await supports for FAISS

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-30 15:08:53 -07:00
Lance Martin
26f0ca222d RAG template for MongoDB Atlas Vector Search (#12526) 2023-10-30 14:31:34 -07:00
Jeff Zhuo
13b89815a3 Issue: fix the issue #11648 init minimax llm (#12554)
e https://github.com/langchain-ai/langchain/issues/11648 Minimax
llm failed to initialize

The idea of this fix is
https://github.com/langchain-ai/langchain/issues/10917#issuecomment-1765606725

do not use  underscore in python model class

---------

Co-authored-by: zhuojianming@cmcm.com <zhuojianming@cmcm.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 14:30:17 -07:00
Florian Valeye
bfb27324cb [Matching Engine] Update the Matching Engine to include the distance and filters (#12555)
Hello 👋,

This Pull Request adds more capability to the
[MatchingEngine](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.matching_engine.MatchingEngine.html)
vectorstore of GCP. It includes the
`similarity_search_by_vector_with_relevance_scores` function and also
[filters](https://cloud.google.com/vertex-ai/docs/vector-search/filtering)
to `filter` the namespaces when retrieving the results.

- **Description:** Add
[filter](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndexEndpoint#google_cloud_aiplatform_MatchingEngineIndexEndpoint_find_neighbors)
in `similarity_search` and add
`similarity_search_by_vector_with_relevance_scores` method
  - **Dependencies:** None
  - **Tag maintainer:** Unknown

Thank you!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 14:12:59 -07:00
Predrag Gruevski
3c5c384f1a Test-publish to test PyPI and separate jobs to limit permissions. (#12578)
Before making a new `langchain` release, we want to test that everything
works as expected. This PR lets us publish `langchain` to test PyPI,
then install it from there and run checks to ensure everything works
normally before publishing it "for real".

It also takes the opportunity to refactor the build process, splitting
up the build, release-creation, and PyPI upload steps into separate jobs
that do not share their elevated permissions with each other.
2023-10-30 17:10:14 -04:00
Harrison Chase
1d51363e49 change project template (#12493) 2023-10-30 14:06:30 -07:00
Holt Skinner
e53b9ccd70 feat: Add Google Cloud Text-to-Speech Tool (#12572)
- Add Tool for [Google Cloud
Text-to-Speech](https://cloud.google.com/text-to-speech)
- Follows similar structure to [Eleven Labs
Text2Speech](https://python.langchain.com/docs/integrations/tools/eleven_labs_tts)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 14:05:39 -07:00
Bagatur
1f2c672d4a add routing by embedding doc (#12580) 2023-10-30 13:03:16 -07:00
William FH
199630ff93 Replace You with DDG in xml agent (#12504)
You requires an email to get an API key which IMO is too much friction.
Duckduck go is free and easy to install.
2023-10-30 12:51:00 -07:00
Adilkhan Sarsen
6e702b9c36 Deep memory support in LangChain (#12268)
- Description: adding support to Activeloop's DeepMemory feature that
boosts recall up to 25%. Added Jupyter notebook showcasing the feature
and also made index params explicit.
- Twitter handle: will really appreciate if we could announce this on
twitter.

---------

Co-authored-by: adolkhan <adilkhan.sarsen@alumni.nu.edu.kz>
2023-10-30 12:16:14 -07:00
Lance Martin
c57945e0a8 Formatting on ntbks (#12576) 2023-10-30 11:32:31 -07:00
Lance Martin
08103e6d48 Minor template cleaning (#12573) 2023-10-30 11:27:44 -07:00
billytrend-cohere
b1e3843931 Add client_name="langchain" to Cohere usage (#11328)
Hey, we're looking to invest more in adding cohere integrations to
langchain so would love to get more of an idea for how it's used.
Hopefully this pr is acceptable. This week I'm also going to be looking
into adding our new [retrieval augmented generation
product](https://txt.cohere.com/chat-with-rag/) to langchain.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-30 11:20:55 -07:00
Bagatur
37aec1e050 bump 326 (#12569) 2023-10-30 10:11:17 -07:00
Eugene Yurtsev
1b1a2d5740 Image Caption accepts bytes for images (#12561)
Accept bytes for images in image caption

---------

Co-authored-by: webcoderz <19884161+webcoderz@users.noreply.github.com>
2023-10-30 12:29:54 -04:00
Nuno Campos
7897483819 Allow astream_log to be used inside atrace_as_chain_group (#12558)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-30 15:55:16 +00:00
Tomaz Bratanic
8e88ba16a8 Update neo4j template readmes (#12540) 2023-10-30 07:57:53 -07:00
Bagatur
b2138508cb google translate nb formatting (#12534) 2023-10-29 21:27:04 -07:00
Holt Skinner
e05bb938de Merge pull request #12433
* feat: Add Google Cloud Translation document transformer

* Merge branch 'langchain-ai:master' into google-translate

* Add documentation for Google Translate Document Transformer

* Fix line length error

* Merge branch 'master' into google-translate

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Addressed code review comments

* Merge branch 'master' into google-translate

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Removed extra variable

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Merge branch 'master' into google-translate

* Merge branch 'google-translate' of https://github.com/holtskinner/lan…

* Removed extra import
2023-10-29 21:22:36 -04:00
Samad Koita
d1fdcd4fcb Masking of API Key for GooseAI LLM (#12496)
Description: Add masking of API Key for GooseAI LLM when printed.
Issue: https://github.com/langchain-ai/langchain/issues/12165
Dependencies: None
Tag maintainer: @eyurtsev

---------

Co-authored-by: Samad Koita <>
2023-10-29 21:21:33 -04:00
Andrew Zhou
64c4a698a8 More comprehensive readthedocs document loader (#12382)
## **Description:**
When building our own readthedocs.io scraper, we noticed a couple
interesting things:

1. Text lines with a lot of nested <span> tags would give unclean text
with a bunch of newlines. For example, for [Langchain's
documentation](https://api.python.langchain.com/en/latest/document_loaders/langchain.document_loaders.readthedocs.ReadTheDocsLoader.html#langchain.document_loaders.readthedocs.ReadTheDocsLoader),
a single line is represented in a complicated nested HTML structure, and
the naive `soup.get_text()` call currently being made will create a
newline for each nested HTML element. Therefore, the document loader
would give a messy, newline-separated blob of text. This would be true
in a lot of cases.

<img width="945" alt="Screenshot 2023-10-26 at 6 15 39 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/eca85d1f-d2bf-4487-a18a-e1e732fadf19">
<img width="1031" alt="Screenshot 2023-10-26 at 6 16 00 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/035938a0-9892-4f6a-83cd-0d7b409b00a3">

Additionally, content from iframes, code from scripts, css from styles,
etc. will be gotten if it's a subclass of the selector (which happens
more often than you'd think). For example, [this
page](https://pydeck.gl/gallery/contour_layer.html#) will scrape 1.5
million characters of content that looks like this:

<img width="1372" alt="Screenshot 2023-10-26 at 6 32 55 PM"
src="https://github.com/langchain-ai/langchain/assets/44193474/dbd89e39-9478-4a18-9e84-f0eb91954eac">

Therefore, I wrote a recursive _get_clean_text(soup) class function that
1. skips all irrelevant elements, and 2. only adds newlines when
necessary.

2. Index pages (like [this
one](https://api.python.langchain.com/en/latest/api_reference.html))
would be loaded, chunked, and eventually embedded. This is really bad
not just because the user will be embedding irrelevant information - but
because index pages are very likely to show up in retrieved content,
making retrieval less effective (in our tests). Therefore, I added a
bool parameter `exclude_index_pages` defaulted to False (which is the
current behavior — although I'd petition to default this to True) that
will skip all pages where links take up 50%+ of the page. Through manual
testing, this seems to be the best threshold.



## Other Information:
  - **Issue:** n/a
  - **Dependencies:** n/a
  - **Tag maintainer:** n/a
  - **Twitter handle:** @andrewthezhou

---------

Co-authored-by: Andrew Zhou <andrew@heykona.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-29 16:26:53 -07:00
Peter Vandenabeele
3468c038ba Add unit tests for document_transformers/beautiful_soup_transformer.py (#12520)
- **Description:**
* Add unit tests for document_transformers/beautiful_soup_transformer.py
* Basic functionality is tested (extract tags, remove tags, drop lines)
    * add a FIXME comment about the order of tags that is not preserved
      (and a passing test, but with the expected tags now out-of-order)
  - **Issue:** None
  - **Dependencies:** None
  - **Tag maintainer:** @rlancemartin 
  - **Twitter handle:** `peter_v`

Please make sure your PR is passing linting and testing before
submitting.

=> OK: I ran `make format`, `make test` (passing after install of
beautifulsoup4) and `make lint`.
2023-10-29 16:24:47 -07:00
Bagatur
d31d705407 update contributing (#12532) 2023-10-29 16:22:18 -07:00
Bagatur
0b4b9e61fc Bagatur/fix doc ci (#12529) 2023-10-29 16:15:18 -07:00
Bagatur
2424fff3f1 notebook fmt (#12498) 2023-10-29 15:50:09 -07:00
Harrison Chase
56cc5b847c Harrison/add descriptions (#12522) 2023-10-29 15:11:37 -07:00
Anirudh Gautam
b257e6a4e8 Mask API key for AI21 LLM (#12418)
- **Description:** Added masking of the API Key for AI21 LLM when
printed and improved the docstring for AI21 LLM.
- Updated the AI21 LLM to utilize SecretStr from pydantic to securely
manage API key.
- Made improvements in the docstring of AI21 LLM. It now mentions that
the API key can also be passed as a named parameter to the constructor.
    - Added unit tests.
  - **Issue:** #12165 
  - **Tag maintainer:** @eyurtsev

---------

Co-authored-by: Anirudh Gautam <anirudh@Anirudhs-Mac-mini.local>
2023-10-29 14:53:41 -07:00
Nico Baier
35d726dc15 docs(prompt_templates): fix typo in prompt template (#12497)
- **Description:** Fixes a small typo in the [Prompt template
document](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/)
  - **Dependencies:** none
2023-10-29 14:52:37 -07:00
silvhua
9dead1034c _dalle_image_url returns list of urls if n>1 (#11800)
- **Description:** Updated the `_dalle_image_url` method to return a
list of URLs if self.n>1,
  - **Issue:** #10691,
  - **Dependencies:** unsure,
  - **Tag maintainer:** @eyurtsev,
  - **Twitter handle:** @silvhua
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-29 14:23:23 -07:00
Bagatur
1815ea2fdb OpenAI runnable constructor (#12455) 2023-10-29 13:40:30 -07:00
William FH
a830b809f3 Patch forward ref bug (#12508)
Currently this gives a bug:
```
from langchain.schema.runnable import RunnableLambda

bound = RunnableLambda(lambda x: x).with_config({"callbacks": []})

# ConfigError: field "callbacks" not yet prepared so type is still a ForwardRef, you might need to call RunnableConfig.update_forward_refs().
```

Rather than deal with cyclic imports and extra load time, etc., I think
it makes sense to just have a separate Callbacks definition here that is
a relaxed typehint.
2023-10-29 00:53:01 -07:00
William FH
36204c2baf Evaluation Callback Multi Response (#12505)
1. Allow run evaluators to return {"results": [list of evaluation
results]} in the evaluator callback.
2. Allows run evaluators to pick the target run ID to provide feedback
to

(1) means you could do something like a function call that populates a
full rubric in one go (not sure how reliable that is in general though)
rather than splitting off into separate LLM calls - cheaper and less
code to write
(2) means you can provide feedback to runs on subsequent calls.
Immediate use case is if you wanted to add an evaluator to a chat bot
and assign to assign to previous conversation turns


have a corresponding one in the SDK
2023-10-28 23:18:29 -07:00
Harrison Chase
9e0ae56287 various templates improvements (#12500) 2023-10-28 22:13:22 -07:00
Harrison Chase
d85d4d7822 add cookbook for selectins llms based on context length (#12486) 2023-10-28 21:50:14 -07:00
Harrison Chase
0660c06cf1 add gha for cli (#12492) 2023-10-28 21:49:28 -07:00
0xC9
79cf01366e Update tool.py (#12472)
In the GoogleSerperResults class, the name field is defined as
'google_serrper_results_json'. This looks like a typo, and perhaps
should be 'google_serper_results_json'.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-28 21:49:01 -07:00
Harrison Chase
61f5ea4b5e Sphinxbio nls/add plate chain template (#12502)
Co-authored-by: Nicholas Larus-Stone <7347808+nlarusstone@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-28 21:48:17 -07:00
Harrison Chase
221134d239 Harrison/quick start (#12491)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-28 16:26:52 -07:00
Bagatur
e130680d74 Bagatur/self query doc update (#12461) 2023-10-28 14:37:14 -07:00
Piyush Jain
689853902e Added a rag template for Kendra (#12470)
## Description
Adds a rag template for Amazon Kendra with Bedrock.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-28 08:58:28 -07:00
Harrison Chase
eb903e211c bump to 36 (#12487) 2023-10-28 08:51:23 -07:00
Tyler Hutcherson
4209457bdc Redis langserve template (#12443)
Add Redis langserve template! Eventually will add semantic caching to
this too. But I was struggling to get that to work for some reason with
the LCEL implementation here.

- **Description:** Introduces the Redis LangServe template. A simple RAG
based app built on top of Redis that allows you to chat with company's
public financial data (Edgar 10k filings)
  - **Issue:** None
- **Dependencies:** The template contains the poetry project
requirements to run this template
  - **Tag maintainer:** @baskaryan @Spartee 
  - **Twitter handle:** @tchutch94

**Note**: this requires the commit here that deletes the
`_aget_relevant_documents()` method from the Redis retriever class that
wasn't implemented. That was breaking the langserve app.

---------

Co-authored-by: Sam Partee <sam.partee@redis.com>
2023-10-28 08:31:12 -07:00
Erick Friis
9adaa78c65 cli improvements (#12465)
Features
- add multiple repos by their branch/repo
- generate `pip install` commands and `add_route()` code
![Screenshot 2023-10-27 at 4 49 52
PM](https://github.com/langchain-ai/langchain/assets/9557659/3aec4cbb-3f67-4f04-8370-5b54ea983b2a)

Optimizations:
- group installs by repo/branch to avoid duplicate cloning
2023-10-28 08:25:31 -07:00
Piyush Jain
5545de0466 Updated the Bedrock rag template (#12462)
Updates the bedrock rag template.
- Removes pinecone and replaces with FAISS as the vector store
- Fixes the environment variables, setting defaults
- Adds a `main.py` test file quick sanity testing
- Updates README.md with correct instructions
2023-10-27 17:02:28 -07:00
Lance Martin
5c2243ee91 Update llama.cpp and Ollama templates (#12466) 2023-10-27 16:54:54 -07:00
Lance Martin
f10c17c6a4 Update SQL templates (#12464) 2023-10-27 16:34:37 -07:00
Lance Martin
a476147189 Add Weaviate RAG template (#12460) 2023-10-27 15:19:34 -07:00
Adam Law
df4960a6d8 add reranking to azuresearch (#12454)
-**Description** Adds returning the reranking score when using semantic
search
-**Issue:* #12317

---------

Co-authored-by: Adam Law <adamlaw@microsoft.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-27 14:14:09 -07:00
dependabot[bot]
389459af8f Bump @babel/traverse from 7.22.8 to 7.23.2 in /docs (#12453)
Bumps
[@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse)
from 7.22.8 to 7.23.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/babel/babel/releases"><code>@​babel/traverse</code>'s
releases</a>.</em></p>
<blockquote>
<h2>v7.23.2 (2023-10-11)</h2>
<p><strong>NOTE</strong>: This release also re-publishes
<code>@babel/core</code>, even if it does not appear in the linked
release commit.</p>
<p>Thanks <a
href="https://github.com/jimmydief"><code>@​jimmydief</code></a> for
your first PR!</p>
<h4>🐛 Bug Fix</h4>
<ul>
<li><code>babel-traverse</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/16033">#16033</a>
Only evaluate own String/Number/Math methods (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-preset-typescript</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/16022">#16022</a>
Rewrite <code>.tsx</code> extension when using
<code>rewriteImportExtensions</code> (<a
href="https://github.com/jimmydief"><code>@​jimmydief</code></a>)</li>
</ul>
</li>
<li><code>babel-helpers</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/16017">#16017</a>
Fix: fallback to typeof when toString is applied to incompatible object
(<a href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
<li><code>babel-helpers</code>,
<code>babel-plugin-transform-modules-commonjs</code>,
<code>babel-runtime-corejs2</code>, <code>babel-runtime-corejs3</code>,
<code>babel-runtime</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/16025">#16025</a>
Avoid override mistake in namespace imports (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
</ul>
<h4>Committers: 5</h4>
<ul>
<li>Babel Bot (<a
href="https://github.com/babel-bot"><code>@​babel-bot</code></a>)</li>
<li>Huáng Jùnliàng (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
<li>James Diefenderfer (<a
href="https://github.com/jimmydief"><code>@​jimmydief</code></a>)</li>
<li>Nicolò Ribaudo (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
<li><a
href="https://github.com/liuxingbaoyu"><code>@​liuxingbaoyu</code></a></li>
</ul>
<h2>v7.23.1 (2023-09-25)</h2>
<p>Re-publishing <code>@babel/helpers</code> due to a publishing error
in 7.23.0.</p>
<h2>v7.23.0 (2023-09-25)</h2>
<p>Thanks <a
href="https://github.com/lorenzoferre"><code>@​lorenzoferre</code></a>
and <a
href="https://github.com/RajShukla1"><code>@​RajShukla1</code></a> for
your first PRs!</p>
<h4>🚀 New Feature</h4>
<ul>
<li><code>babel-plugin-proposal-import-wasm-source</code>,
<code>babel-plugin-syntax-import-source</code>,
<code>babel-plugin-transform-dynamic-import</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15870">#15870</a>
Support transforming <code>import source</code> for wasm (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-helper-module-transforms</code>,
<code>babel-helpers</code>,
<code>babel-plugin-proposal-import-defer</code>,
<code>babel-plugin-syntax-import-defer</code>,
<code>babel-plugin-transform-modules-commonjs</code>,
<code>babel-runtime-corejs2</code>, <code>babel-runtime-corejs3</code>,
<code>babel-runtime</code>, <code>babel-standalone</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15878">#15878</a>
Implement <code>import defer</code> proposal transform support (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-generator</code>, <code>babel-parser</code>,
<code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15845">#15845</a>
Implement <code>import defer</code> parsing support (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
<li><a
href="https://redirect.github.com/babel/babel/pull/15829">#15829</a> Add
parsing support for the &quot;source phase imports&quot; proposal (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-generator</code>,
<code>babel-helper-module-transforms</code>, <code>babel-parser</code>,
<code>babel-plugin-transform-dynamic-import</code>,
<code>babel-plugin-transform-modules-amd</code>,
<code>babel-plugin-transform-modules-commonjs</code>,
<code>babel-plugin-transform-modules-systemjs</code>,
<code>babel-traverse</code>, <code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15682">#15682</a> Add
<code>createImportExpressions</code> parser option (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
<li><code>babel-standalone</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15671">#15671</a>
Pass through nonce to the transformed script element (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
<li><code>babel-helper-function-name</code>,
<code>babel-helper-member-expression-to-functions</code>,
<code>babel-helpers</code>, <code>babel-parser</code>,
<code>babel-plugin-proposal-destructuring-private</code>,
<code>babel-plugin-proposal-optional-chaining-assign</code>,
<code>babel-plugin-syntax-optional-chaining-assign</code>,
<code>babel-plugin-transform-destructuring</code>,
<code>babel-plugin-transform-optional-chaining</code>,
<code>babel-runtime-corejs2</code>, <code>babel-runtime-corejs3</code>,
<code>babel-runtime</code>, <code>babel-standalone</code>,
<code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15751">#15751</a> Add
support for optional chain in assignments (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-helpers</code>,
<code>babel-plugin-proposal-decorators</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15895">#15895</a>
Implement the &quot;decorator metadata&quot; proposal (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-traverse</code>, <code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15893">#15893</a> Add
<code>t.buildUndefinedNode</code> (<a
href="https://github.com/liuxingbaoyu"><code>@​liuxingbaoyu</code></a>)</li>
</ul>
</li>
<li><code>babel-preset-typescript</code></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/babel/babel/blob/main/CHANGELOG.md"><code>@​babel/traverse</code>'s
changelog</a>.</em></p>
<blockquote>
<h2>v7.23.2 (2023-10-11)</h2>
<h4>🐛 Bug Fix</h4>
<ul>
<li><code>babel-traverse</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/16033">#16033</a>
Only evaluate own String/Number/Math methods (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-preset-typescript</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/16022">#16022</a>
Rewrite <code>.tsx</code> extension when using
<code>rewriteImportExtensions</code> (<a
href="https://github.com/jimmydief"><code>@​jimmydief</code></a>)</li>
</ul>
</li>
<li><code>babel-helpers</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/16017">#16017</a>
Fix: fallback to typeof when toString is applied to incompatible object
(<a href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
<li><code>babel-helpers</code>,
<code>babel-plugin-transform-modules-commonjs</code>,
<code>babel-runtime-corejs2</code>, <code>babel-runtime-corejs3</code>,
<code>babel-runtime</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/16025">#16025</a>
Avoid override mistake in namespace imports (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
</ul>
<h2>v7.23.0 (2023-09-25)</h2>
<h4>🚀 New Feature</h4>
<ul>
<li><code>babel-plugin-proposal-import-wasm-source</code>,
<code>babel-plugin-syntax-import-source</code>,
<code>babel-plugin-transform-dynamic-import</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15870">#15870</a>
Support transforming <code>import source</code> for wasm (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-helper-module-transforms</code>,
<code>babel-helpers</code>,
<code>babel-plugin-proposal-import-defer</code>,
<code>babel-plugin-syntax-import-defer</code>,
<code>babel-plugin-transform-modules-commonjs</code>,
<code>babel-runtime-corejs2</code>, <code>babel-runtime-corejs3</code>,
<code>babel-runtime</code>, <code>babel-standalone</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15878">#15878</a>
Implement <code>import defer</code> proposal transform support (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-generator</code>, <code>babel-parser</code>,
<code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15845">#15845</a>
Implement <code>import defer</code> parsing support (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
<li><a
href="https://redirect.github.com/babel/babel/pull/15829">#15829</a> Add
parsing support for the &quot;source phase imports&quot; proposal (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-generator</code>,
<code>babel-helper-module-transforms</code>, <code>babel-parser</code>,
<code>babel-plugin-transform-dynamic-import</code>,
<code>babel-plugin-transform-modules-amd</code>,
<code>babel-plugin-transform-modules-commonjs</code>,
<code>babel-plugin-transform-modules-systemjs</code>,
<code>babel-traverse</code>, <code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15682">#15682</a> Add
<code>createImportExpressions</code> parser option (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
<li><code>babel-standalone</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15671">#15671</a>
Pass through nonce to the transformed script element (<a
href="https://github.com/JLHwung"><code>@​JLHwung</code></a>)</li>
</ul>
</li>
<li><code>babel-helper-function-name</code>,
<code>babel-helper-member-expression-to-functions</code>,
<code>babel-helpers</code>, <code>babel-parser</code>,
<code>babel-plugin-proposal-destructuring-private</code>,
<code>babel-plugin-proposal-optional-chaining-assign</code>,
<code>babel-plugin-syntax-optional-chaining-assign</code>,
<code>babel-plugin-transform-destructuring</code>,
<code>babel-plugin-transform-optional-chaining</code>,
<code>babel-runtime-corejs2</code>, <code>babel-runtime-corejs3</code>,
<code>babel-runtime</code>, <code>babel-standalone</code>,
<code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15751">#15751</a> Add
support for optional chain in assignments (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-helpers</code>,
<code>babel-plugin-proposal-decorators</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15895">#15895</a>
Implement the &quot;decorator metadata&quot; proposal (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-traverse</code>, <code>babel-types</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15893">#15893</a> Add
<code>t.buildUndefinedNode</code> (<a
href="https://github.com/liuxingbaoyu"><code>@​liuxingbaoyu</code></a>)</li>
</ul>
</li>
<li><code>babel-preset-typescript</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15913">#15913</a> Add
<code>rewriteImportExtensions</code> option to TS preset (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
<li><code>babel-parser</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15896">#15896</a>
Allow TS tuples to have both labeled and unlabeled elements (<a
href="https://github.com/yukukotani"><code>@​yukukotani</code></a>)</li>
</ul>
</li>
</ul>
<h4>🐛 Bug Fix</h4>
<ul>
<li><code>babel-plugin-transform-block-scoping</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15962">#15962</a>
fix: <code>transform-block-scoping</code> captures the variables of the
method in the loop (<a
href="https://github.com/liuxingbaoyu"><code>@​liuxingbaoyu</code></a>)</li>
</ul>
</li>
</ul>
<h4>💅 Polish</h4>
<ul>
<li><code>babel-traverse</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15797">#15797</a>
Expand evaluation of global built-ins in <code>@babel/traverse</code>
(<a
href="https://github.com/lorenzoferre"><code>@​lorenzoferre</code></a>)</li>
</ul>
</li>
<li><code>babel-plugin-proposal-explicit-resource-management</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15985">#15985</a>
Improve source maps for blocks with <code>using</code> declarations (<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
</ul>
<h4>🔬 Output optimization</h4>
<ul>
<li><code>babel-core</code>,
<code>babel-helper-module-transforms</code>,
<code>babel-plugin-transform-async-to-generator</code>,
<code>babel-plugin-transform-classes</code>,
<code>babel-plugin-transform-dynamic-import</code>,
<code>babel-plugin-transform-function-name</code>,
<code>babel-plugin-transform-modules-amd</code>,
<code>babel-plugin-transform-modules-commonjs</code>,
<code>babel-plugin-transform-modules-umd</code>,
<code>babel-plugin-transform-parameters</code>,
<code>babel-plugin-transform-react-constant-elements</code>,
<code>babel-plugin-transform-react-inline-elements</code>,
<code>babel-plugin-transform-runtime</code>,
<code>babel-plugin-transform-typescript</code>,
<code>babel-preset-env</code>
<ul>
<li><a
href="https://redirect.github.com/babel/babel/pull/15984">#15984</a>
Inline <code>exports.XXX =</code> update in simple variable declarations
(<a
href="https://github.com/nicolo-ribaudo"><code>@​nicolo-ribaudo</code></a>)</li>
</ul>
</li>
</ul>
<h2>v7.22.20 (2023-09-16)</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b4b9942a6c"><code>b4b9942</code></a>
v7.23.2</li>
<li><a
href="b13376b346"><code>b13376b</code></a>
Only evaluate own String/Number/Math methods (<a
href="https://github.com/babel/babel/tree/HEAD/packages/babel-traverse/issues/16033">#16033</a>)</li>
<li><a
href="ca58ec15cb"><code>ca58ec1</code></a>
v7.23.0</li>
<li><a
href="0f333dafcf"><code>0f333da</code></a>
Add <code>createImportExpressions</code> parser option (<a
href="https://github.com/babel/babel/tree/HEAD/packages/babel-traverse/issues/15682">#15682</a>)</li>
<li><a
href="3744545649"><code>3744545</code></a>
Fix linting</li>
<li><a
href="c7e6806e21"><code>c7e6806</code></a>
Add <code>t.buildUndefinedNode</code> (<a
href="https://github.com/babel/babel/tree/HEAD/packages/babel-traverse/issues/15893">#15893</a>)</li>
<li><a
href="38ee8b4dd6"><code>38ee8b4</code></a>
Expand evaluation of global built-ins in <code>@babel/traverse</code>
(<a
href="https://github.com/babel/babel/tree/HEAD/packages/babel-traverse/issues/15797">#15797</a>)</li>
<li><a
href="9f3dfd9021"><code>9f3dfd9</code></a>
v7.22.20</li>
<li><a
href="3ed28b29c1"><code>3ed28b2</code></a>
Fully support <code>||</code> and <code>&amp;&amp;</code> in
<code>pluginToggleBooleanFlag</code> (<a
href="https://github.com/babel/babel/tree/HEAD/packages/babel-traverse/issues/15961">#15961</a>)</li>
<li><a
href="77b0d73599"><code>77b0d73</code></a>
v7.22.19</li>
<li>Additional commits viewable in <a
href="https://github.com/babel/babel/commits/v7.23.2/packages/babel-traverse">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@babel/traverse&package-manager=npm_and_yarn&previous-version=7.22.8&new-version=7.23.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/langchain-ai/langchain/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-27 14:13:58 -07:00
Eugene Yurtsev
60d009f75a Add security note to API chain (#12452)
Add security note
2023-10-27 17:09:42 -04:00
Matvey Arye
11505f95d3 Improve handling of empty queries for timescale vector (#12393)
**Description:** Improve handling of empty queries in timescale-vector.
For timescale-vector it is more efficient to get a None embedding when
the embedding has no semantic meaning. It allows timescale-vector to
perform more optimizations. Thus, when the query is empty, use a None
embedding.

 Also pass down constructor arguments to the timescale vector client.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-27 13:55:16 -07:00
Erick Friis
38cee5fae0 cli updates 2 (#12447)
- extras group
- readme
- another readme

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-27 13:37:03 -07:00
Lance Martin
3afa68e30e Update AWS Bedrock README.md (#12451) 2023-10-27 13:21:54 -07:00
Lance Martin
5c564e62e1 AWS Bedrock RAG template (#12450) 2023-10-27 13:15:54 -07:00
William FH
5d40e36c75 Trace if run tree set (#12444)
This code path is hit in the following case:
- Start in langchain code and manually provide a tracer
- Handoff to the traceable
- Hand back to langchain code.

Which happens for evaluating `@traceable` functions unfortunately
2023-10-27 12:29:18 -07:00
Bagatur
c2a0a6b6df make doc utils public (#12394) 2023-10-27 12:08:08 -07:00
Henter
d6888a90d0 Fix the missing temperature parameter for Baichuan-AI chat_model (#12420)
**Description:** the missing `temperature` parameter for Baichuan-AI
chat_model

Baichuan-AI api doc: https://platform.baichuan-ai.com/docs/api
2023-10-27 12:07:21 -07:00
Erick Friis
6908634428 cli updates oct27 (#12436) 2023-10-27 12:06:46 -07:00
Uxywannasleep
3fd9f2752f Fix Typo in clickhouse.ipynb file (#12429) 2023-10-27 11:55:15 -07:00
HwangJohn
d38c8369b3 added rrf argument in ApproxRetrievalStrategy class __init__() (#11987)
- **Description: To handle the hybrid search with RRF(Reciprocal Rank
Fusion) in the Elasticsearch, rrf argument was added for adjusting
'rank_constant' and 'window_size' to combine multiple result sets with
different relevance indicators into a single result set. (ref:
https://www.elastic.co/kr/blog/whats-new-elastic-enterprise-search-8-9-0),
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** No dependencies changed,
  - **Tag maintainer:** @baskaryan,

Nice to meet you,
I'm a newbie for contributions and it's my first PR.

I only changed the langchain/vectorstores/elasticsearch.py file.
I did make format&lint 
I got this message,
```shell
make lint_diff  
./scripts/check_pydantic.sh .
./scripts/check_imports.sh
poetry run ruff .
[ "langchain/vectorstores/elasticsearch.py" = "" ] || poetry run black langchain/vectorstores/elasticsearch.py --check
All done!  🍰 
1 file would be left unchanged.
[ "langchain/vectorstores/elasticsearch.py" = "" ] || poetry run mypy langchain/vectorstores/elasticsearch.py
langchain/__init__.py: error: Source file found twice under different module names: "mvp.nlp.langchain.libs.langchain.langchain" and "langchain"
Found 1 error in 1 file (errors prevented further checking)
make: *** [lint_diff] Error 2
```

Thank you

---------

Co-authored-by: 황중원 <jwhwang@amorepacific.com>
2023-10-27 11:53:19 -07:00
Roman Vasilyev
2c58dca5f0 optional reusable connection (#12051)
My postgres out of connections after continuous PGVector usage, and the
reason because it constantly creates new connections, so adding a
reusable pre established connection seems like solves an issue

---------

Co-authored-by: Roman Vasilyev <rvasilyev@mozilla.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-27 11:52:42 -07:00
Ennio Pastore
48fde2004f Update long_context_reorder.py (#12422)
The function comment was confusing and inaccurate
2023-10-27 11:52:28 -07:00
Bagatur
a8c68d4ffa Type LLMChain.llm as runnable (#12385) 2023-10-27 11:52:01 -07:00
Prakul
224ec0cfd3 Mongo db $vector search doc update (#12404)
**Description:** 
Updates the documentation for MongoDB Atlas Vector Search
2023-10-27 11:50:29 -07:00
Bagatur
d12b88557a Bagatur/bump 325 (#12440) 2023-10-27 11:49:09 -07:00
Eugene Yurtsev
cadfce295f Deprecate PythonRepl tools and Pandas/Xorbits/Spark DataFrame/Python/CSV agents (#12427)
See discussion here:
https://github.com/langchain-ai/langchain/discussions/11680

The code is available for usage from langchain_experimental. The reason
for the deprecation is that the agents are relying on a Python REPL. The
code can only be run safely with appropriate sandboxing.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-27 14:16:42 -04:00
Lance Martin
68e12d34a9 Add invoke example to LLaMA2 function template notebook (#12437) 2023-10-27 10:58:24 -07:00
Harrison Chase
0ca539eb85 Clean up deprecated agents and update __init__ in experimental (#12231)
Update init paths in experimental
2023-10-27 13:52:50 -04:00
Lance Martin
05bbf943f2 LLaMA2 with JSON schema support template (#12435) 2023-10-27 10:34:00 -07:00
Holt Skinner
134f085824 feat: Add Google Speech to Text API Document Loader (#12298)
- Add Document Loader for Google Speech to Text
  - Similar Structure to [Assembly AI Document Loader][1]

[1]:
https://python.langchain.com/docs/integrations/document_loaders/assemblyai
2023-10-27 09:34:26 -07:00
David Duong
52c194ec3a Fix templates typos (#12428) 2023-10-27 09:32:57 -07:00
Massimiliano Pronesti
c8195769f2 fix(openai-callback): completion count logic (#12383)
The changes introduced in #12267 and #12190 broke the cost computation
of the `completion` tokens for fine-tuned models because of the early
return. This PR aims at fixing this.
@baskaryan.
2023-10-27 09:08:54 -07:00
Stefan Langenbach
b22da81af8 Mask API key for Aleph Alpha LLM (#12377)
- **Description:** Add masking of API Key for Aleph Alpha LLM when
printed.
- **Issue**: #12165
- **Dependencies:** None
- **Tag maintainer:** @eyurtsev

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-27 11:32:43 -04:00
Lance Martin
d6acb3ed7e Clean-up template READMEs (#12403)
Normalize, and update notebooks.
2023-10-26 22:23:03 -07:00
William FH
4254028c52 Str Evaluator Mapper (#12401) 2023-10-26 21:38:47 -07:00
William FH
fcad1d2965 Add space (#12395) 2023-10-26 20:32:23 -07:00
William FH
922d7910ef Wfh/json schema evaluation (#12389)
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2023-10-26 20:32:05 -07:00
Erick Friis
afcc12d99e Templates CI (#12313)
Adds a `langchain-location` param to lint, so we can properly locate it.

Regular langchain and experimental lint steps are passing, so default
value seems to be working.
2023-10-26 20:29:36 -07:00
Christian Kasim Loan
a35445c65f johnsnowlabs embeddings support (#11271)
- **Description:** Introducing the
[JohnSnowLabsEmbeddings](https://www.johnsnowlabs.com/)
  - **Dependencies:** johnsnowlabs
  - **Tag maintainer:** @C-K-Loan
- **Twitter handle:** https://twitter.com/JohnSnowLabs
https://twitter.com/ChristianKasimL

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-26 20:22:50 -07:00
SteveLiao
c08b622b2d Add HTML Title and Page Language into metadata for AsyncHtmlLoader (#11326)
**Description:** 
Revise `libs/langchain/langchain/document_loaders/async_html.py` to
store the HTML Title and Page Language in the `metadata` of
`AsyncHtmlLoader`.
2023-10-26 20:22:31 -07:00
Erick Friis
4b16601d33 Format Templates (#12396) 2023-10-26 19:44:30 -07:00
Shorthills AI
25c98dbba9 Fixed some grammatical and Exception types issues (#12015)
Fixed some grammatical issues and Exception types.

@baskaryan , @eyurtsev

---------

Co-authored-by: Sanskar Tanwar <142409040+SanskarTanwarShorthillsAI@users.noreply.github.com>
Co-authored-by: UpneetShorthillsAI <144228282+UpneetShorthillsAI@users.noreply.github.com>
Co-authored-by: HarshGuptaShorthillsAI <144897987+HarshGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: AdityaKalraShorthillsAI <143726711+AdityaKalraShorthillsAI@users.noreply.github.com>
Co-authored-by: SakshiShorthillsAI <144228183+SakshiShorthillsAI@users.noreply.github.com>
2023-10-26 21:12:38 -04:00
William FH
923696b664 Wfh/json edit dist (#12361)
Compare predicted json to reference. First canonicalize (sort keys, rm
whitespace separators), then return normalized string edit distance.

Not a silver bullet but maybe an easy way to capture structure
differences in a less flakey way

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2023-10-26 18:10:28 -07:00
Harrison Chase
56ee56736b add template for hyde (#12390) 2023-10-26 17:38:35 -07:00
Erick Friis
4db8d82c55 CLI CI 2 (#12387)
Will run all CI because of _test change, but future PRs against CLI will
only trigger the new CLI one

Has a bunch of file changes related to formatting/linting.

No mypy yet - coming soon
2023-10-26 17:01:31 -07:00
Tyler Hutcherson
231d553824 Update broken redis tests (#12371)
Update broken redis tests -- tiny PR :) 
- **Description:** Fixes Redis tests on master (look like it was broken
by https://github.com/langchain-ai/langchain/pull/11257)
  - **Issue:** None,
  - **Dependencies:** No
  - **Tag maintainer:** @baskaryan @Spartee 
  - **Twitter handle:** N/A

Co-authored-by: Sam Partee <sam.partee@redis.com>
2023-10-26 16:13:14 -07:00
Lance Martin
b8af5b0a8e Minor updates to ReRank template (#12388) 2023-10-26 16:05:17 -07:00
Bagatur
7cadf00570 better lint triggering (#12376) 2023-10-26 15:31:20 -07:00
Erick Friis
03e79e62c2 cli fix (#12380) 2023-10-26 15:29:49 -07:00
Lance Martin
237026c060 Cohere re-rank template (#12378) 2023-10-26 15:29:10 -07:00
Bagatur
76230d2c08 fireworks scheduled integration tests (#12373) 2023-10-26 14:24:42 -07:00
Josh Phillips
01c5cd365b Fix SupbaseVectoreStore write operation timeout (#12318)
**Description**
This small change will make chunk_size a configurable parameter for
loading documents into a Supabase database.

**Issue**
https://github.com/langchain-ai/langchain/issues/11422

**Dependencies**
No chanages

**Twitter**
@ j1philli

**Reminder**
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.

---------

Co-authored-by: Greg Richardson <greg.nmr@gmail.com>
2023-10-26 14:19:17 -07:00
Bagatur
b10cefb160 lint fix: rm init (#12374) 2023-10-26 14:16:25 -07:00
William FH
f65067b1da Mention other function calling/grammar support (#12369)
In our extraction doc
2023-10-26 13:59:28 -07:00
Chris Lucas
e88fdbba29 Fix langsmith walkthrough doc dataset (#12027) 2023-10-26 13:57:15 -07:00
Jacob Lee
7e5e5e87d8 Adds linter in templates (#12321)
Did not actually run/fix errors yet @efriis
2023-10-26 13:55:07 -07:00
Harrison Chase
b43996e553 Harrison/improve cli (#12368) 2023-10-26 13:53:59 -07:00
Harrison Chase
9ce38726a2 fix some stuff (#12292)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-26 13:30:36 -07:00
Cynthia Yang
6ce276e099 Support Fireworks batching (#8) (#12052)
Description

* Add _generate and _agenerate to support Fireworks batching.
* Add stop words test cases
* Opt out retry mechanism

Issue - Not applicable
Dependencies - None
Tag maintainer - @baskaryan
2023-10-26 16:01:08 -04:00
Bagatur
3fbb2f3e52 update chains how to (#12362) 2023-10-26 12:21:03 -07:00
Tyler Hutcherson
2f0c9d8269 Fix redis vectorfield schema defaults (#12223)
- **Description:** refactors the redis vector field schema to properly
handle default values, includes a new unit test suite.
  - **Issue:** N/A
  - **Dependencies:** nothing new.
  - **Tag maintainer:** @baskaryan @Spartee 
  - **Twitter handle:** this is a tiny fix/improvement :) 

This issue was causing some clients/cuatomers issues when building a
vector index on Redis on smaller db instances (due to fault default
values in index configuration). It would raise an error like:

```redis.exceptions.ResponseError: Vector index initial capacity 20000 exceeded server limit (852 with the given parameters)```

This PR will address this moving forward.
2023-10-26 12:17:58 -07:00
Jakub Novák
9544d64ad8 E2B tool - Improve description wuth uploaded files info (#12355) 2023-10-26 11:44:24 -07:00
Bagatur
dad16af711 langserve doc (#12357) 2023-10-26 11:40:57 -07:00
Lance Martin
0af6e64ad9 Update multi query template README, ntbk (#12356) 2023-10-26 11:24:44 -07:00
Bagatur
f3449ccd20 Docs: Add lcel to combine_docs chains (#12310) 2023-10-26 11:05:36 -07:00
Lance Martin
bc6f6e968e Add template for Pinecone + Multi-Query (#12353) 2023-10-26 10:12:23 -07:00
Bagatur
c6a733802b bump 324 and 35 (#12352) 2023-10-26 10:10:26 -07:00
Nuno Campos
683e97766d Fix json key output parser in partial (streaming) mode (#12332)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-26 17:45:04 +01:00
Nikhil Jha
dff24285ea Comprehend Moderation 0.2 (#11730)
This PR replaces the previous `Intent` check with the new `Prompt
Safety` check. The logic and steps to enable chain moderation via the
Amazon Comprehend service, allowing you to detect and redact PII, Toxic,
and Prompt Safety information in the LLM prompt or answer remains
unchanged.
This implementation updates the code and configuration types with
respect to `Prompt Safety`.


### Usage sample

```python
from langchain_experimental.comprehend_moderation import (BaseModerationConfig, 
                                 ModerationPromptSafetyConfig, 
                                 ModerationPiiConfig, 
                                 ModerationToxicityConfig
)

pii_config = ModerationPiiConfig(
    labels=["SSN"],
    redact=True,
    mask_character="X"
)

toxicity_config = ModerationToxicityConfig(
    threshold=0.5
)

prompt_safety_config = ModerationPromptSafetyConfig(
    threshold=0.5
)

moderation_config = BaseModerationConfig(
    filters=[pii_config, toxicity_config, prompt_safety_config]
)

comp_moderation_with_config = AmazonComprehendModerationChain(
    moderation_config=moderation_config, #specify the configuration
    client=comprehend_client,            #optionally pass the Boto3 Client
    verbose=True
)

template = """Question: {question}

Answer:"""

prompt = PromptTemplate(template=template, input_variables=["question"])

responses = [
    "Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like 323-22-9980. John Doe's phone number is (999)253-9876.", 
    "Final Answer: This is a really shitty way of constructing a birdhouse. This is fucking insane to think that any birds would actually create their motherfucking nests here."
]
llm = FakeListLLM(responses=responses)

llm_chain = LLMChain(prompt=prompt, llm=llm)

chain = ( 
    prompt 
    | comp_moderation_with_config 
    | {llm_chain.input_keys[0]: lambda x: x['output'] }  
    | llm_chain 
    | { "input": lambda x: x['text'] } 
    | comp_moderation_with_config 
)

try:
    response = chain.invoke({"question": "A sample SSN number looks like this 123-456-7890. Can you give me some more samples?"})
except Exception as e:
    print(str(e))
else:
    print(response['output'])

```

### Output

```python
> Entering new AmazonComprehendModerationChain chain...
Running AmazonComprehendModerationChain...
Running pii Validation...
Running toxicity Validation...
Running prompt safety Validation...

> Finished chain.


> Entering new AmazonComprehendModerationChain chain...
Running AmazonComprehendModerationChain...
Running pii Validation...
Running toxicity Validation...
Running prompt safety Validation...

> Finished chain.
Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like XXXXXXXXXXXX John Doe's phone number is (999)253-9876.
```

---------

Co-authored-by: Jha <nikjha@amazon.com>
Co-authored-by: Anjan Biswas <anjanavb@amazon.com>
Co-authored-by: Anjan Biswas <84933469+anjanvb@users.noreply.github.com>
2023-10-26 09:42:18 -07:00
Blake (Yung Cher Ho)
b9410f2b6f Takeoff pro support (#12070)
**Description:**
This PR adds support for the [Pro version of Titan Takeoff
Server](https://docs.titanml.co/docs/category/pro-features). Users of
the Pro version will have to import the TitanTakeoffPro model, which is
different from TitanTakeoff.

**Issue:**
Also minor fixes to docs for Titan Takeoff (Community version)

**Dependencies:**
No additional dependencies

 **Twitter handle:** @becoming_blake

@baskaryan @hwchase17
2023-10-26 09:39:32 -07:00
Leonid Kuligin
4e47fe1dce fixed error message and a check for processor name (#12200)
Replace this entire comment with:
- **Description:** a small fix on error description / a check for
processor name
  - **Issue:** the issue #11407
2023-10-26 09:38:25 -07:00
Nir Kopler
9298aff783 Finetuned openai azure models cost calculation (#12267)
**Description:**
Add cost calculation for fine tuned **Azure** with relevant unit tests.
see
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/fine-tuning?tabs=turbo&pivots=programming-language-studio
for more information.
this PR is the result of this PR:
https://github.com/langchain-ai/langchain/pull/12190

Twitter handle: @nirkopler
2023-10-26 09:38:10 -07:00
Ken
3c168d4d2a Update code_understanding.ipynb (#12309)
- **Description:** Super simple fix for colab link on
code_understanding.ipynb,
  - **Issue:** not applicable
  - **Dependencies:** none,
  - **Tag maintainer:** ,
  - **Twitter handle:** @kengoodridge
2023-10-26 09:35:38 -07:00
Season Saw
4e4b8805d6 Fix a typo in the summarization use case. (#12316)
- **Description:** Fix a tiny typo in the summarization use case Jupyter
notebook.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Tag maintainer:** @hwchase17
  - **Twitter handle:** @seasonsaw
2023-10-26 09:35:11 -07:00
gnakw
20fe515f20 Fix the exception from langchain.utilities import ArceeWrapper (#12342)
- **Description:** Fix the exception from langchain.utilities import
ArceeWrapper
2023-10-26 09:19:43 -07:00
ZC Wong
374f4cd2bf fix typo (#12338)
fixed a typo in docs/docs/integrations/toolkits/github.ipynb
2023-10-26 09:18:47 -07:00
Qihui Xie
6720458c7d add allowed_operators property in QdrantTranslator (#12328)
- **Description:** 
This PR adds `allowd_operators` property to `QdrantTranslator` to fix
the `TypeError: can only join an iterable` bug. This property is
required in `get_query_constructor_prompt` in
`query_constructor\base.py`:
```
allowed_operators=" | ".join(allowed_operators),
```
  - **Issue:** 
#12061

---------

Co-authored-by: XIE Qihui <qihui.xie@bopufund.com>
2023-10-26 09:18:29 -07:00
Bagatur
f5a57fc1ef fix self query constructor (#12349) 2023-10-26 09:18:15 -07:00
Laurent AJDNIK
f05c29180d Fix typos in quickstart.mdx (#12333)
- **Description:** Fixes a few typos in quickstart.mdx
2023-10-26 09:14:49 -07:00
Kishan Kumar Rai
cae6f611d3 Fix Typo in CONTRIBUTING.md (#12320)
I have corrected the typos, grammar, and formatting issues.
2023-10-26 08:56:28 -07:00
Vasek Mlejnsky
cdd75b687e e2b tool - fix initialization and improve tool description (#12345) 2023-10-26 08:47:50 -07:00
Harrison Chase
8ec7aade9f add docs for templates (#12346) 2023-10-26 08:28:01 -07:00
Jacob Lee
28c39503eb Allow index name customization via env var in rag-conversation (#12315) 2023-10-25 22:11:13 -07:00
Leonid Ganeline
869a49a0ab removed CardLists for LLMs and ChatModels (#12307)
Problem statement: 
In the `integrations/llms` and `integrations/chat` pages, we have a
sidebar with ToC, and we also have a ToC at the end of the page.
The ToC at the end of the page is not necessary, and it is confusing
when we mix the index page styles; moreover, it requires manual work.
So, I removed ToC at the end of the page (it was discussed with and
approved by @baskaryan)
2023-10-25 19:13:44 -07:00
Erick Friis
ebf998acb6 Templates (#12294)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Jacob Lee <jacoblee93@gmail.com>
2023-10-25 18:47:42 -07:00
Erick Friis
43257a295c CLI Git Improvements (#12311)
- delete repo sources like pip
- git dep fixes
- error messaging
2023-10-25 18:30:02 -07:00
William FH
1d568e1add Better wrap traceable (#12303)
If user function is wrapped as a traceable function, this will help hand
off the trace between the two.

Also update handling fields to reflect optional values
2023-10-25 16:34:23 -07:00
Eugene Yurtsev
5a71b81609 Relax type annotation for custom input/output types (#12300)
This is needed to be able to do stuff like:

```python
runnable.with_types(input_type=List[str])
```
2023-10-25 19:00:22 -04:00
William FH
988f6d9912 Rm langchain server (#12305) 2023-10-25 15:26:46 -07:00
wemysschen
3f16acc538 Add baidu cloud vector search in vectorstore and fix some unit test in vectorstores (#11605)
**Description:** 
Add baidu cloud vector search in vectorstore

---------

Co-authored-by: root <root@icoding-cwx.bcc-szzj.baidu.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-25 13:44:19 -07:00
mrbean
b7e559c7e1 use snippet search optionally (#12236)
Add an additional flag which allows for hitting our new endpoint.
2023-10-25 13:37:28 -07:00
felixocker
cce132d146 fix sparql queries for relations in schema description (#9136)
- **Description**: Fix for the SPARQL QA chain: fixed SPARQL queries for
retrieving information about relations in the graph to create a textual
description of the schema for the language model. This should resolve
#8907
- **Issue**: #8907
- **Dependencies**: None
- **Tag maintainer**: @baskaryan, @hwchase17
2023-10-25 13:36:57 -07:00
Donato Azevedo
d9f1bcf366 Strips leading/trailing whitespace before parsing xml (#12297)
**Description:** When llms output leading or trailing whitespace for xml
(when using XMLOutputParser) the parser would raise a `ValueError: Could
not parse output: ...`. However, leading or trailing whitespace are
"ignorable" in the sense of XML standard.

**Issue:** I did not find an issue related.

**Dependencies:** None

**Tag maintainer:**

**Twitter handle:** donatoaz

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

Done, updated unit test and ran `make docker_test`.
2023-10-25 13:34:58 -07:00
Rohan Sharma
3da1a65fa0 Update README.md (#12286) 2023-10-25 12:59:30 -07:00
Bagatur
ab3c124ffb Add dev guide to docs(#12291)
copy CONTRIBUTING.md to docs
2023-10-25 12:28:43 -07:00
Bagatur
aa212c3d0e rm .html from local doc links (#12293) 2023-10-25 12:09:41 -07:00
Silva
04d58018e1 Update vectorstore.mdx[Make an improvement] (#12252)
correct some grammatical errors
2023-10-25 12:00:53 -07:00
Bagatur
3d74d5e24d chat loader doc titles (#12289) 2023-10-25 11:47:50 -07:00
Erick Friis
47070b8314 CLI (#12284)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-25 11:06:58 -07:00
Shwu Ku
07c2649753 response parser for ArceeRetriever (#12270)
- **Description:** Response parser for arcee retriever, 
- **Issue:** follow-up pr on #11578 and
[discussion](https://github.com/arcee-ai/arcee-python/issues/15#issuecomment-1759874053),
  - **Dependencies:** NA

This pr implements a parser for the response from ArceeRetreiver to
convert to langchain `Document`. This closes the loop of generation and
retrieval for Arcee DALMs in langchain.

The reference for the response parser is
[api-docs:retrieve](https://api.arcee.ai/docs#/v2/retrieve_model)

Attaching screenshot of working implementation:
<img width="1984" alt="Screenshot 2023-10-25 at 7 42 34 PM"
src="https://github.com/langchain-ai/langchain/assets/65639964/026987b9-34b2-4e4b-b87d-69fcd0c6641a">
\*api key deleted

---
Successful tests, lints, etc.
```shell
Re-run pytest with --snapshot-update to delete unused snapshots.
==================================================================================================================== slowest 5 durations =====================================================================================================================
1.56s call     tests/unit_tests/schema/runnable/test_runnable.py::test_retrying
0.63s call     tests/unit_tests/schema/runnable/test_runnable.py::test_map_astream
0.33s call     tests/unit_tests/schema/runnable/test_runnable.py::test_map_stream_iterator_input
0.30s call     tests/unit_tests/schema/runnable/test_runnable.py::test_map_astream_iterator_input
0.20s call     tests/unit_tests/indexes/test_indexing.py::test_cleanup_with_different_batchsize
======================================================================================================= 1265 passed, 270 skipped, 32 warnings in 6.55s =======================================================================================================
[ "." = "" ] || poetry run black .
All done!  🍰 
1871 files left unchanged.
[ "." = "" ] || poetry run ruff --select I --fix .
./scripts/check_pydantic.sh .
./scripts/check_imports.sh
poetry run ruff .
[ "." = "" ] || poetry run black . --check
All done!  🍰 
1871 files would be left unchanged.
[ "." = "" ] || poetry run mypy .
Success: no issues found in 1868 source files
poetry run codespell --toml pyproject.toml
poetry run codespell --toml pyproject.toml -w
```

Co-authored-by: Shubham Kushwaha <shwu@Shubhams-MacBook-Pro.local>
2023-10-25 10:55:13 -07:00
Johanna Appel
c26ec7789f CohereEmbeddings: Add max_retries and request_timeout (#12275)
Add max_retries and request_timeout to CohereEmbeddings, akin to how it
works in OpenAIEmbeddings.

Since the Cohere client already implements these parameters, we can
simply pass them down.

Uses parameters from these two cohere client objects:

https://github.com/cohere-ai/cohere-python/blob/main/cohere/client.py

https://github.com/cohere-ai/cohere-python/blob/main/cohere/client_async.py
2023-10-25 10:37:25 -07:00
Nuno Campos
7108084947 Remove CLI (#12283)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-25 10:33:52 -07:00
Nuno Campos
b5b2d07681 Pop max concurrency when recursing (#12281)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-25 18:03:58 +01:00
Bagatur
69f4e402e4 bump 323 (#12278) 2023-10-25 09:06:12 -07:00
David Duong
c25b174db5 Add serialisation props to Fireworks and ChatFireworks (#12255) 2023-10-25 11:41:33 +01:00
Richard Adams
fd5f549a9e demonstrate use of RetrievalQAWithSourcesChain.from_chain (#12235)
**Description:** 
Documents further usage of RetrievalQAWithSourcesChain in an existing
test. I'd not found much documented usage of RetrievalQAWithSourcesChain
and how to get the sources out. This additional code will hopefully be
useful to other potential users of this retriever.

 **Issue:** No raised issue
 
**Dependencies:** No new dependencies needed to run the test (it already
needs `open-ai`, `faiss-cpu` and `unstructured`).

Note - `make lint` showed 8 linting errors  in unrelated files

---------

Co-authored-by: richarda23 <richard.c.adams@infinityworks.com>
2023-10-24 21:33:34 -07:00
James Braza
53f35c5f5c Adding STRUCTURED_FORMAT_SIMPLE_INSTRUCTIONS missing backticks (#12238)
This PR fixes the fact that `STRUCTURED_FORMAT_SIMPLE_INSTRUCTIONS` was
missing backticks at the end
2023-10-24 21:30:25 -07:00
Adam Ji
9fc28d50c3 fix: typo in pgvector.ipynb (#12243)
fix: typo in docs/docs/integrations/vectorstores/pgvector.ipynb
2023-10-24 21:26:44 -07:00
William FH
276c6ba115 Check for ls project in run tree context (#12242)
If I go traceable -> runnable when the project is manually specified,
the runnable wont be logged. This makes sure the session/project is
threaded through appropriately.
2023-10-24 17:18:59 -07:00
Vasek Mlejnsky
1f8094938f Integrate E2B's data analysis/code interpreter (#12011)
This PR adds a data [E2B's](https://e2b.dev/) analysis/code interpreter
sandbox as a tool

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Jakub Novak <jakub@e2b.dev>
2023-10-24 16:04:02 -07:00
Bagatur
d2cb95c39d Docs: add lcel to sequential chain (#12234) 2023-10-24 15:15:35 -07:00
Holt Skinner
e7e670805c docs: Google Cloud Documentation Cleanup (#12224)
- Move Document AI provider to the Google provider page
- Change Vertex AI Matching Engine to Vector Search
- Change references from GCP to Google Cloud
- Add Gmail chat loader to Google provider page
- Change Serper page title to "Serper - Google Search API" since it is
not a Google product.
2023-10-24 14:54:43 -07:00
Bagatur
286a29a49e bump 322 and 34 (#12228) 2023-10-24 13:52:17 -07:00
Bagatur
2008a6438c add experimental test release gha (#12229) 2023-10-24 13:49:16 -07:00
Eugene Yurtsev
583dc49477 Add type to Generation and sub-classes, handle root validator (#12220)
* Add a type literal for the generation and sub-classes for serialization purposes.
* Fix the root validator of ChatGeneration to return ValueError instead of KeyError or Attribute error if intialized improperly.
* This change is done for langserve to make sure that llm related callbacks can be serialized/deserialized properly.
2023-10-24 16:21:00 -04:00
Eugene Yurtsev
81052ee18e Fix code block in runnable doc (#12221)
Fix code block syntax in runnable doc-string
2023-10-24 16:11:58 -04:00
Mikelarg
46e28b9613 Added GigaChat chat model support (#12201)
- **Description:** Added integration with
[GigaChat](https://developers.sber.ru/portal/products/gigachat) language
model.
- **Twitter handle:** @dvoshansky
2023-10-24 12:53:51 -07:00
Dayuan Jiang
9c2c9c5274 fix typo in langchain/cookbook/stepback-qa.ipynb (#12204) 2023-10-24 12:51:51 -07:00
Bagatur
87af2360df mv old integration docs (#12217) 2023-10-24 12:38:16 -07:00
Bagatur
6e3f39963f Docs: consolidate top nav (#12219) 2023-10-24 12:28:08 -07:00
Anurag Wagh
d5c2ce7c2e [fix] create redis vector index before adding docs, add prefix to doc… (#11257)
Fix Description: 
For Redis Vector integration in add_texts method, there were two issues
that lead to this bug.
1. Vector index is not being created leading to no such_index error 
2. `doc:index` prefix was also missing for Redis Keys. 

resolves #11197 
Maintainer: @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-24 10:51:25 -07:00
Eugene Yurtsev
079d1f3b8e Expose handle_event and ahandle_events as public API (#12181)
Expose functionality to handle generic events.
2023-10-24 13:42:28 -04:00
William FH
67c4fd0ad0 Update deprecation (#12178)
in runner_utils
2023-10-24 10:37:28 -07:00
Nir Kopler
d3744175bf Finetuned OpenAI models cost calculation #11715 (#12190)
**Description:**
Add cost calculation for fine tuned models (new and legacy), this is
required after OpenAI added new models for fine tuning and separated the
costs of I/O for fine tuned models.
Also I updated the relevant unit tests
see https://platform.openai.com/docs/guides/fine-tuning for more
information.
issue: https://github.com/langchain-ai/langchain/issues/11715

  - **Issue:** 11715
  - **Twitter handle:** @nirkopler
2023-10-24 10:22:05 -07:00
Spyros
a2840a2b42 fix vertexai codey models (#12173)
**Description:**

This PR fixes issue #12156 by checking for Codey models appropriately
before result parsing.


Maintainer: @hwchase17 , @agola11
2023-10-24 10:20:05 -07:00
Leonid Ganeline
386ea48432 updated integrations/providers/microsoft (#12177)
Added several missed tools, utilities, toolkits to the `Microsoft` page.
2023-10-24 10:19:06 -07:00
Hech
d76f026d72 Fix flexible dimension and doc for DingoDB (#12187) 2023-10-24 10:16:19 -07:00
Erick Friis
95ae40ff90 Fix Anthropic Functions ainvoke (#12215)
Removes custom `NotImplementedError` in experimental anthropic
functions, allowing it to fallback on default `ainvoke` implementation.
2023-10-24 10:07:01 -07:00
Iskren Ivov Chernev
d5d7ba582a Improvements to llm/deepinfra (#10846)
- replace `requests` package with `langchain.requests`
- add `_acall` support
- add `_stream` and `_astream`
- freshen up the documentation a bit
- update vendor doc
2023-10-24 09:54:23 -07:00
sudranga
f09f82541b Expose configuration options in GraphCypherQAChain (#12159)
Allows for passing arguments into the LLM chains used by the
GraphCypherQAChain. This is to address a request by a user to include
memory in the Cypher creating chain. Will keep the prompt variables
as-is to be backward compatible. But, would be a good idea to deprecate
them and use the **kwargs variables. Added a test case.

In general, I think it would be good for any chain to automatically pass
in a readonlymemory(of its input) to its subchains whilist allowing for
an override. But, this would be a different change.
2023-10-24 09:52:55 -07:00
Leonid Ganeline
11f13aed53 docstrings update (#12093)
Added missed docstrings. Added missed Args:, Returns: Raises:
2023-10-24 09:34:10 -07:00
Johnny Oshika
ba20c14e28 Fix typo in stuff_prompt's system_template (#12063)
- **Description:** 

Add missing apostrophe in `user's` in stuff_prompt's system_template.
The first sentence in the system template went from:

> Use the following pieces of context to answer the users question.

to

> Use the following pieces of context to answer the user's question.

- **Issue:** 
- **Dependencies:** none
- **Tag maintainer:** @baskaryan
- **Twitter handle:** ojohnnyo
2023-10-24 09:21:28 -07:00
Bagatur
deb8168329 fix note callout (#12214) 2023-10-24 09:17:18 -07:00
Bagatur
8ba97cb408 separate compile integration tests (#12171)
Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>
2023-10-24 08:55:19 -07:00
Bagatur
44dae6936b Docs: Add LCEL to chains/foundational/llm (#12213) 2023-10-24 08:53:55 -07:00
Bagatur
922193475a Docs: Add LCEL to chains/foundational/transform (#12212) 2023-10-24 08:52:47 -07:00
Bagatur
55f0f8dae8 Docs: add LCEL to chains/foundational/router (#12211) 2023-10-24 08:51:12 -07:00
Holt Skinner
69d9eae5cd feat: Add Client Info to available Google Cloud Clients (#12168)
- This is used internally to gather aggregate usage metrics for the
LangChain integrations

- Note: This cannot be added to some of the Vertex AI integrations at
this time because the SDK doesn't allow overriding the
[`ClientInfo`](https://googleapis.dev/python/google-api-core/latest/client_info.html#module-google.api_core.client_info)

- Added to:
  - BigQuery
  - Google Cloud Storage
  - Document AI
  - Vertex AI Model Garden
  - Document AI Warehouse
  - Vertex AI Search
  - Vertex AI Matching Engine (Cloud Storage Client)
 
@baskaryan, @eyurtsev, @hwchase17

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-24 08:49:11 -07:00
Lukas Wolf
69f5f82804 Update extraction.py (#12207)
Description: Pass tags as argument to create_extraction_chain
Issue: create_extraction_chain does not pass tags to chain yet 

@baskaryan
2023-10-24 08:25:14 -07:00
Nuno Campos
34ffb94770 Remove GetLocal, PutLocal (#12133)
Do you agree?
2023-10-24 10:16:46 +01:00
Eric Hartford
8c150ad7f6 Add COBOL parser and splitter (#11674)
- **Description:** Add COBOL parser and splitter
  - **Issue:** n/a
  - **Dependencies:** n/a
  - **Tag maintainer:** @baskaryan 
  - **Twitter handle:** erhartford

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-23 15:44:31 -04:00
Ikko Eltociear Ashimine
bb137fd6e7 Fix typo in jsonformer_experimental.ipynb (#12099)
HuggingFace -> Hugging Face

\
2023-10-23 15:35:54 -04:00
Eugene Yurtsev
ace2234391 Update security.md (#11942)
Update security.md
2023-10-23 15:35:33 -04:00
John Mai
ebf749c40c Baichuan & Hunyuan set default api_base (#12059)
### Description
Baichuan & Hunyuan set default api_base env
2023-10-23 15:33:35 -04:00
Priyanshu Prajapati
283a3ecc9c Create CODE_OF_CONDUCT.md (#12105)
code of conduct.md file is missing it is generally present in good repos
which have large community

Replace this entire comment with:
- **Description:** Added a `code_of_conduct.md` file to the repository
to establish community standards and guidelines for contributors.
- **Issue:** N/A
- **Dependencies:** N/A
- **Tag maintainer:** N/A

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-23 15:15:24 -04:00
Shilong Dai
99afc1b4f8 Fixed hardcoded "vector" and replaced with vector_query_field variable (#12126)
- **Description:** In the max_marginal_relevance_search function of the
ElasticsearchStore vector store, the name of the field corresponding to
the vector embedding of the document is hard coded in the delete
statement that drops the field from the document metadata. This results
in an exception if the vector embedding field is customized. This PR
changes the hard-coded "vector" into the vector_query_field variable.
  - **Issue:** None
  - **Dependencies:** None
  - **Tag maintainer:** @hwchase17

Co-authored-by: Shilong Dai <sdai@viperfish.net>
2023-10-23 15:08:55 -04:00
Vikram Shitole
0d44746430 10634: Added the capability to inject boto3 client in SagemakerEndpointEmbeddings (#12146)
**Description: Allow to inject boto3 client for Cross account access
type of scenarios in using SagemakerEndpointEmbeddings and also updated
the documentation for same in the sample notebook**

**Issue:SagemakerEndpointEmbeddings cross account capability #10634
#10184**

Dependencies: None
Tag maintainer:
Twitter handle:lethargicoder

Co-authored-by: Vikram(VS) <vssht@amazon.com>
2023-10-23 15:08:26 -04:00
Deepanshu
ff79a99825 Fix Typo in CONTRIBUTING.md file (#12145)
Fix Type & add suitable pronoun in CONTRIBUTING.md file


Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-23 14:53:03 -04:00
aubin_mzt
66f8cb015d Add connection args for pgvector vector store (#11930)
- **Description:** sqlalchemy create_engine() does not take into account
connect_args which are mandatory for managed PGSQL instances on cloud
providers (ssl_context for example).
Also re-enabled create_vector_extension at post_init for using pgvector
class seamlessly
- **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17.

---------

Co-authored-by: Sami Bargaoui <bargaoui.sam@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-23 14:43:44 -04:00
NuODaniel
4d6243fa87 fix: doc string of default params in chat_models, llm qianfan (#12153)
- **Description:** a fix of the doc string in Qianfan
  - **Issue:** no
  - **Dependencies:** no
  - **Tag maintainer:** @baskaryan
  - **Twitter handle:** no
2023-10-23 14:03:18 -04:00
Predrag Gruevski
f82bdf4613 Update deprecated langchain imports with suggested new paths. (#12164)
Let's help our users find the proper import to use instead of the
deprecated top-level ones.
2023-10-23 13:52:08 -04:00
Bagatur
963ff93476 bump 321 (#12161) 2023-10-23 12:49:38 -04:00
Nuno Campos
d0505c0d47 Update default recursion_limit, update docs (#12134)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-23 16:29:17 +01:00
William FH
4f23aa677a Fix Pickle Error (#12141)
If non-pickleable objects (like locks) get passed to the tracing
callback, they'll fail in the deepcopy. Fallback to a shallow copy in
these instances .
2023-10-23 08:22:47 -07:00
Predrag Gruevski
95a1b598fe Update to actions/checkout@v4. (#11951)
We don't use any of the new functionality at the moment. Just making
sure we don't fall back on versions and fail to benefit from new
patches. This is an easy upgrade and it's always harder to upgrade
across multiple major versions at once.
2023-10-23 10:01:33 -04:00
William FH
7c4f340cc0 Include Parent Run ID (#12139)
If you set local callbacks
2023-10-22 17:19:11 -07:00
Sanyam Jain
3df0f03928 Improved readability of Docs (#12136)
Replace this entire comment with:
  - **Description:** a description of the change, 
 improved grammar and readability of DOCS
 
@hwchase17
2023-10-22 17:16:30 -07:00
omahs
f3cc9bba5b Fix typos (#12128)
Fix typos
2023-10-22 17:16:03 -07:00
Nuno Campos
1afdb40b48 Add optional config arg to RunnablePassthrough func arg (#12131)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-22 19:57:16 +01:00
Nuno Campos
325fdde8b4 Fix bug where types were lost when calling with_cconfig or bind (#12137)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-22 19:26:13 +01:00
Nuno Campos
2719e49718 Add how-to guide on runnable generators (#12135)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-22 19:02:17 +01:00
Nuno Campos
02dce74b97 Fix type hint for older py versions (#12132)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-22 18:01:09 +01:00
Nuno Campos
d0ce374731 Allow specifying custom input/output schemas for runnables with .with_types() (#12083)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-22 17:26:48 +01:00
Harrison Chase
6fcba975d0 add rag fusion notebook (#12121) 2023-10-21 15:37:11 -07:00
Harrison Chase
dd0374560a fix up notebook (#12119) 2023-10-21 14:06:16 -07:00
Harrison Chase
ee69116761 move csv agent to langchain experimental (#12113) 2023-10-21 10:26:02 -07:00
Harrison Chase
03bf6ef473 add missing init files (#12114) 2023-10-21 10:25:50 -07:00
Harrison Chase
acb82cf25e add step back notebook (#11953) 2023-10-21 10:05:52 -07:00
Harrison Chase
9d9198de0b rewrite (#12111) 2023-10-21 09:31:10 -07:00
Bagatur
ef8b180d6d bump 320 (#12108) 2023-10-21 11:52:52 -04:00
Rotem Weiss
c4f8fefe74 Update Tavily API key link (#12109)
fix broken link to generate tavily api key
2023-10-21 11:44:57 -04:00
Rotem Weiss
78d186fb44 Add Tavily Search API as a Tool (#12103)
Adding Tavily Search API as a tool. I will be the maintainer and
assaf_elovic is the twitter handler.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-21 11:23:21 -04:00
Bagatur
85302a9ec1 Add CI check that integration tests compile (#12090) 2023-10-21 10:52:18 -04:00
verlocks
5dbe456aae Bug fix tongyi.py to be compatible with DashScope API (#11956)
Current ChatTongyi is not compatible with DashScope API, which will
cause error when passing api key to chat model directly.
- **Description:** Update tongyi.py to be compatible with DashScope API.
Specifically, update parameter name "dashscope_api_key" to "api_key".
  - **Issue:** None.
- **Dependencies:** Nothing new, Tongyi would require DashScope as
before.
2023-10-20 18:46:41 -04:00
Abhay Kaushik
39f65fb1c9 Fix typos in whatsapp.ipynb and telegram.ipynb (#12075)
- **Description:** 
    - Replace Telegram with Whatsapp in whatsapp.ipynb
    - Add # to mark the telegram as heading in telegram.ipynb
 
  - **Issue:** None
  - **Dependencies:** None
2023-10-20 18:45:33 -04:00
Tomaz Bratanic
82f4c0589c Add neo4j graph environment variables (#12080) 2023-10-20 14:43:01 -07:00
Mohammad Mohtashim
d5400f6502 Google Scholar Search Tool using serpapi (#11513)
- **Description:** Implementing the Google Scholar Tool as requested in
PR #11505. The tool will be using the [serpapi python
package](https://serpapi.com/integrations/python#search-google-scholar).
The main idea of the tool will be to return the results from a Google
Scholar search given a query as an input to the tool.

- **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17
2023-10-20 17:35:55 -04:00
Ofer Mendelevitch
e542bf1b6b Minor update to doc/text in IPYNB example (#12089)
- **Description:** changed sign-up link in IPYNB example
  - **Tag maintainer:** @baskaryan
  - **Twitter handle:** @ofermend
2023-10-20 17:17:36 -04:00
Shreyas S
2e8637da2f Minor typo fix (#11804)
remove redundant a
langchain > LangChain
2023-10-20 17:11:53 -04:00
Shinya Maeda
89bc73c6c3 Fix superfluous Auto-fixing parser documents (#12062)
Replace this entire comment with:
- **Description:** Fix superfluous [Auto-fixing
parser](https://python.langchain.com/docs/modules/model_io/output_parsers/output_fixing_parser)
docs. Also switching to `langchain.pydantic_v1` from the direct
reference to `pydantic`,
  - **Issue:** N/A,
  - **Dependencies:** N/A,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
  - **Twitter handle:** @dosuken123 

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
2023-10-20 16:07:03 -04:00
Holt Skinner
f5be2d525a fix: Add _serving_config property to GoogleVertexAISearchRetriever (#12084)
- Fixes error:

```
ValueError: "GoogleVertexAISearchRetriever" object has no field "_serving_config"
```

Introduced in #11736

@baskaryan, @eyurtsev, @hwchase17 if you could review and merge quickly,
that would be appreciated :)
2023-10-20 15:16:42 -04:00
Nuno Campos
5fee61a207 Support runnable factories in .configurable_alts() (#12065)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-20 15:22:09 +01:00
Lance Martin
b01a443ee5 Update figures in multi-modal Cookbooks (#12060) 2023-10-19 19:51:36 -07:00
Jacob Lee
34ec2da701 Fix typo in google vertex ai palm notebook documentation (#12056) 2023-10-19 21:46:35 -04:00
Bagatur
56c279015e clear nb img output (#12055) 2023-10-19 15:28:54 -07:00
Bagatur
54a8d70eb5 Bagatur/mv singlestore doc (#12053) 2023-10-19 15:06:26 -07:00
Leonid Ganeline
52b103dd13 update interface notebook (#12042)
Added a use case with parallelise on batches. Simplified text.
2023-10-19 17:06:14 -04:00
Bagatur
8cabb4ee8e add cookbook table (#12043) 2023-10-19 14:05:24 -07:00
Zhitao Xu
a4c3a44712 Fix documentation typo in Clickhouse Class (#12047)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
- **Description:** The return info in the documentation for
similarity_search_by_vector and similarity_search_with_relevance_scores
is wrong
2023-10-19 17:00:22 -04:00
William FH
25418b9b4d Always add run ID (#12046)
in eval callback handler.

Useful if you're using a custom run evaluator and don't want to thread
things through.
2023-10-19 12:38:07 -07:00
Eugene Yurtsev
44d7763580 Add zapier deprecation warning (#12045)
Add zapier deprecation
2023-10-19 15:27:56 -04:00
John Mai
4188f046ec Add Tencent Hunyuan chat model (#12022)
### Description:
The Tencent Hunyuan model, developed by Tencent, is a large language
model by robust Chinese text generation capabilities, adeptness in
logical reasoning within complex contexts, and reliable task execution
proficiency.For more information, see
[https://cloud.tencent.com/document/product/1729](https://cloud.tencent.com/document/product/1729)
2023-10-19 15:10:12 -04:00
Eugene Yurtsev
68599d98c2 More security notes (#12040)
Add more security notes
2023-10-19 14:49:09 -04:00
Bagatur
0006075b08 bump 319 (#12041) 2023-10-19 11:45:27 -07:00
John Mai
8eb40b5fe2 baichuan_secret_key use pydantic.types.SecretStr & Add Baichuan tests (#12031)
### Description
- `baichuan_secret_key` use pydantic.types.SecretStr
- Add Baichuan tests
2023-10-19 14:37:41 -04:00
Nuno Campos
85bac75729 nc/runnable-dynamic-schemas-from-config (#12038)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-19 19:34:35 +01:00
Nuno Campos
85eaa4ccee Revert "nc/runnable-dynamic-schemas-from-config" (#12037)
This reverts commit a46eef64a7.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-19 19:27:02 +01:00
Nuno Campos
a46eef64a7 nc/runnable-dynamic-schemas-from-config 2023-10-19 19:17:48 +01:00
Nuno Campos
d392e030be Add default value (#12032)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-19 18:30:05 +01:00
Kenneth Choe
62efe1ffb9 support add_embeddings for elasticsearch (#11002)
- **Description:** Provide a way to use different text for embedding.
- For example, if you are ingesting stack-overflow Q&As for RAG, you
would want to embed the questions and return the answer(s) for the hits.
With this change, the consumer of langchain can implement that easily.
- I noticed the similar function is added on faiss.py with #1912 which
was for performance reason, but I see the same function can be used to
achieve what I thought. So instead of changing Document class to have
embedding_content, I mimicked the implementation of faiss.py.
- The test should provide some guidance on how to use it. It would be
more intuitive if I just pass texts and embedding_texts as separate
arguments, but I chose to use `zip`-ed object for the consistency with
faiss.py implementation.
      - I plan to make similar pull request for OpenSearch.
  - **Issue:** N/A
  - **Dependencies:** None other than the existing ones.

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-19 09:43:51 -07:00
Bagatur
76d3afaef0 bump 318 (#12030) 2023-10-19 09:33:39 -07:00
Dmitry Tyumentsev
5dd2161c4b add _acall method to YandexGPT (#12029)
- **Description:** Add async support for YandexGPT LLM model

Co-authored-by: Dmitry Tyumentsev <dmitry.tyumentsev@raftds.com>
2023-10-19 09:15:26 -07:00
Palau
720ecacb1c Add notebook for kay.ai press release data (#11575)
- **Description:** Adding a notebook for Press Release data from Kay.ai,
as discussed offline
  - **Tag maintainer:** @baskaryan @hwchase17 
- **Twitter handle:** https://twitter.com/kaydotai
https://twitter.com/vishalrohra_

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-19 08:06:56 -07:00
Peter Krenesky
8425f33363 Pydantic v2 support for OpenAPI Specs (#11936)
- **Description:** Adding Pydantic v2 support for OpenAPI Specs 

- **Issue:**
- OpenAPI spec support was disabled because `openapi-schema-pydantic`
doesn't support Pydantic v2:
     #9205
     
     - Caused errors in `get_openapi_chain`
   
    - This may be the cause of #9520.

- **Tag maintainer:** @eyurtsev
- **Twitter handle:** kreneskyp


The root cause was that `openapi-schema-pydantic` hasn't been updated in
some time but
[openapi-pydantic](https://github.com/mike-oakley/openapi-pydantic)
forked and updated the project.
2023-10-19 11:06:11 -04:00
volodymyr-memsql
4adabd33ac Add example of retriever usage with SingleStoreDB vector store (#12021)
Added a notebook with examples of the creation of a retriever from the
SingleStoreDB vector store, and further usage.

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
2023-10-19 09:48:35 -04:00
Joe McElroy
c9f1768cb9 Elasticsearch Query Retriever: Use match + fuzziness for LIKE (#12023)
Updated the elasticsearch self query retriever to use the match clause
for LIKE operator instead of the non-analyzed fuzzy search clause.

Other small updates include:
- fixing the stack inference integration test where the index's default
pipeline didn't use the inference pipeline created
- adding a user-agent to the old implementation to track usage
- improved the documentation for ElasticsearchStore filters
2023-10-19 09:47:21 -04:00
maks-operlejn-ds
84d250f781 Docs: QA Privacy Nit (#12025)
Resize image in docs for QA Privacy
2023-10-19 09:43:47 -04:00
Nuno Campos
7db6aabf65 Update chat model output type (#11833)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-19 00:55:15 -07:00
Simon Dai
ed62984cb2 update Weaviate to support multi tenancy (#11842)
- **Description:** update Weaviate to support multi tenancy
  - **Issue:** 9956
  - **Dependencies:** 
  - **Tag maintainer:** hwchase17
  - **Twitter handle:** dsx1986_
2023-10-19 00:49:30 -07:00
hiigao
f818ec49b8 Encapsulate alicloud pai-eas access method for chatmodels and llms (#11852)
### Description: 
To provide an eas llm service access methods in this pull request by
impletementing `PaiEasEndpoint` and `PaiEasChatEndpoint` classes in
`langchain.llms` and `langchain.chat_models` modules. Base on this pr,
langchain users can build up a chain to call remote eas llm service and
get the llm inference results.

### About EAS Service
EAS is a Alicloud product on Alibaba Cloud Machine Learning Platform for
AI which is short for AliCloud PAI. EAS provides model inference
deployment services for the users. We build up a llm inference services
on EAS with a general llm docker images. Therefore, end users can
quickly setup their llm remote instances to load majority of the
hugginface llm models, and serve as a backend for most of the llm apps.

### Dependencies
This pr does't involve any new dependencies.

---------

Co-authored-by: 子洪 <gaoyihong.gyh@alibaba-inc.com>
2023-10-19 00:20:18 -07:00
Shinya Maeda
1da6d92369 fix: superfluous List Parser doc (#12014) 2023-10-19 00:14:38 -07:00
John Mai
a6b483dcbc Supported RetryOutputParser & RetryWithErrorOutputParser max_retries (#11903)
Description: Supported RetryOutputParser & RetryWithErrorOutputParser
max_retries
- max_retries: Maximum number of retries to parser.

Issue: None
Dependencies: None
Tag maintainer: @baskaryan 
Twitter handle:
2023-10-18 23:57:16 -07:00
Hugues Chocart
008c7df80d [LLMonitorCallbackHandler] Refactor + add llmonitor-py dependency (#11948)
We now require uses to have the pip package `llmonitor` installed. It
allows us to have cleaner code and avoid duplicates between our library
and our code in Langchain.
2023-10-18 23:54:10 -07:00
Sian Cao
77fc2f7644 fix: impl missing embeddings method (#10823)
FAISS does not implement embeddings method and use embed_query to
embedding texts which is wrong for some embedding models.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-18 23:51:28 -07:00
Holt Skinner
2661dc94f3 feat: Google Vertex AI Search Retriever - Add support for Website Data Stores (#11736)
- Only works for Data stores with Advanced Website Indexing
-
https://cloud.google.com/generative-ai-app-builder/docs/about-advanced-features
- Minor restructuring - Follow up to #10513
- Remove outdated docs (readded in
https://github.com/langchain-ai/langchain/pull/11620)
  - Move legacy class into new py file to clean up the directory
- Shouldn't cause backwards compatibility issues as the import works the
same way for users
2023-10-18 23:41:48 -07:00
Shorthills AI
4b6fdd7bf0 Update modal.py (#11588)
feat: Raise KeyError when 'prompt' key is missing in JSON response

This commit updates the error handling in the code to raise a KeyError
when the 'prompt' key is not found in the JSON response. This change
makes the code more explicit about the nature of the error, helping to
improve clarity and debugging.

@baskaryan, @eyurtsev.
2023-10-18 23:40:37 -07:00
Surav Shrestha
2038c7fd5d fix typo in multi_language.ipynb (#12009)
exprience -> experience
2023-10-18 23:33:25 -07:00
William FH
dfb4baa3f9 Fix Fireworks Callbacks (#12003)
I may be missing something but it seems like we inappropriately overrode
the 'stream()' method, losing callbacks in the process. I don't think
(?) it gave us anything in this case to customize it here?

See new trace:

https://smith.langchain.com/public/fbb82825-3a16-446b-8207-35622358db3b/r

and confirmed it streams.

Also fixes the stopwords issues from #12000
2023-10-18 23:33:09 -07:00
Lance Martin
12f8e87a0e LLaMA2 SQL cookbook clean (#12007) 2023-10-18 21:16:58 -07:00
Harrison Chase
bdecc5bade Harrison/lcel configuration (#11997) 2023-10-18 16:01:38 -07:00
Lance Martin
26d0858a60 Update LLaMA2 SQL notebook (#11995) 2023-10-18 15:01:37 -07:00
Wang Wei
e26559f512 Add ERNIE-Bot-4 model support for ErnieBotChat. (#11969)
- **Description:** According to the document
https://cloud.baidu.com/doc/WENXINWORKSHOP/s/clntwmv7t, add ERNIE-Bot-4
model support for ErnieBotChat.
- **Dependencies:** Before using the ERNIE-Bot-4, you should have the
model's access authority.
2023-10-18 14:55:29 -07:00
Alfrick Opidi
71b0f51003 Update clarifai.mdx (#11964)
Corrected broken link
2023-10-18 13:05:59 -07:00
Alfrick Opidi
5ba7a7d2bc Update clarifai.ipynb (#11963)
documents=docs not required when making a vector search on an existing
Clarifai application
2023-10-18 13:05:43 -07:00
Bagatur
642d2e4b67 caps not title for cookbooks descriptions (#11993) 2023-10-18 12:56:18 -07:00
Bagatur
fd7ab539c8 add cookbook readme (#11992) 2023-10-18 12:36:34 -07:00
Eugene Yurtsev
f4bec9686d Add more security notes (#11990)
Add more security notes
2023-10-18 15:00:56 -04:00
Eugene Yurtsev
3d81c76160 Add security notes to agent toolkits (#11989)
Add more security notes to agent toolkits.
2023-10-18 14:36:29 -04:00
Leonid Ganeline
b81a4c1d94 docstrings added (#11988)
Added docstrings. Some docsctrings formatting.
2023-10-18 13:05:49 -04:00
Bagatur
35c7c1f050 bump 317 (#11986) 2023-10-18 09:25:18 -07:00
Bagatur
122af2effe fix chroma from_texts bug (#11984) 2023-10-18 09:24:04 -07:00
Erick Friis
c149954cc5 Hub Runnable (#11946)
Adds `langchain.runnables.hub.HubRunnable` for pulling configurable
objects from the hub
2023-10-18 09:21:45 -07:00
Owen
9e24626e87 chore: remove duplicated export variables (#11962)
- **Description:** remove duplicated `__all__` variables
2023-10-18 12:08:50 -04:00
Nuno Campos
6bd9c1d2b3 Make prompt validation opt-in (#11973)
By default replace input_variables with the correct value

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-18 16:28:47 +01:00
Nuno Campos
9bc7e1851a Ensure dict() does not raise not implemented error, which should instead be raised in our custom method save() (#11970)
.dict() is a Pydantic method that cannot raise exceptions, as it is used
eg. in `__eq__`

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-18 16:28:33 +01:00
Nuno Campos
653cf56e0e Lint 2023-10-18 16:02:00 +01:00
Predrag Gruevski
debcf053eb Fix invalid escape sequence warnings by using raw strings for regexes. (#11943)
This code also generates warnings when our users' apps hit it, which is
annoying and doesn't look great. Let's fix it.
2023-10-18 10:55:17 -04:00
Nuno Campos
e4ae690244 Sort order 2023-10-18 15:42:13 +01:00
Bagatur
8e1b1db90d bearly api key docs (#11981) 2023-10-18 07:26:10 -07:00
Nuno Campos
b753bf3323 Make prompt validation opt-in
By default replace input_variables with the correct value
2023-10-18 10:46:22 +01:00
Nuno Campos
202acce0c9 Ensure dict() does not raise not implemented error, which should instead be raised in our custom method save() 2023-10-18 09:44:41 +01:00
Predrag Gruevski
392df7b2e3 Type hints on varargs and kwargs that take anything should be Any. (#11950)
Type hinting `*args` as `List[Any]` means that each positional argument
should be a list. Type hinting `**kwargs` as `Dict[str, Any]` means that
each keyword argument should be a dict of strings.

This is almost never what we actually wanted, and doesn't seem to be
what we want in any of the cases I'm replacing here.
2023-10-17 21:31:44 -04:00
volodymyr-memsql
7f17ce3742 SingleStoreDBChatMessageHistory: Add jupiter notebook with usage example (#11941)
The Docs folder changed its structure, and the notebook example for
SingleStoreDChatMessageHistory has not been copied to the new place due
to a merge conflict. Adding the example to the correct place.

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
2023-10-17 21:31:19 -04:00
Eugene Yurtsev
908c7bf33e Add documentation to tools (#11938)
Add security notes to tools

---------

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>
2023-10-17 21:27:59 -04:00
Eugene Yurtsev
43dc669332 Update playwright documentation (#11949)
Add security note to playwright tool
2023-10-17 21:22:26 -04:00
Daniel Chalef
2beb767ae5 zep: Memory Retriever MMR Support & Docs Updates (#11954)
- Update Zep Memory and Retriever docstrings
- Zep Memory Retriever: Add support for native MMR
- Add MMR example to existing ZepRetriever Notebook

@baskaryan
2023-10-17 16:35:11 -07:00
William FH
a27fa9bf10 Use traceable context (#11896)
Example

```
from langchain.schema.runnable import RunnableLambda
from langsmith import traceable

chain = RunnableLambda(lambda x: x)

@traceable(run_type = "chain")
def my_traceable(a):
    chain.invoke(a)
my_traceable(5)
```

Would have a nested result.

This would NOT work for interleaving chains and traceables. E.g., things
like thiswould still not work well

```
from langchain.schema.runnable import RunnableLambda
from langsmith import traceable

@traceable()
def other_traceable(a):
    return a

def foo(x):
    return other_traceable(x)
    
chain = RunnableLambda(foo)

@traceable(run_type = "chain")
def my_traceable(a):
    chain.invoke(a)
my_traceable(5)
```
2023-10-17 15:10:20 -07:00
Predrag Gruevski
dcd0392423 Upgrade to newer black (23.10) and ruff (first 0.1.x!) versions. (#11944)
Minor lint dependency version upgrade to pick up latest functionality.

Ruff's new v0.1 version comes with lots of nice features, like
fix-safety guarantees and a preview mode for not-yet-stable features:
https://astral.sh/blog/ruff-v0.1.0
2023-10-17 17:24:51 -04:00
Trayan Azarov
1fd21ed21c Chroma batching (#11203)
- **Description:** Chroma >= 0.4.10 added support for batch sizes
validation of add/upsert. This batch size is dependent on the SQLite
limits of the target system and varies. In this change, for
Chroma>=0.4.10 batch splitting was added as the aforementioned
validation is starting to surface in the Chroma community (users using
LC)
 - **Issue:** N/A
 - **Dependencies:** N/A
 - **Tag maintainer:** @eyurtsev
 - **Twitter handle:** t_azarov
2023-10-17 13:59:42 -07:00
Guy Korland
9373b9c004 Add Graph interface (#11012)
Replace this entire comment with:
  - **Description:** Add a Graph interface
  - **Tag maintainer:** @baskaryan @hwchase17 
  - **Twitter handle:** @g_korland
2023-10-17 13:54:05 -07:00
DanielZzz
b647505280 feat: support ChatModels Qianfan QianfanChatEndpoint function_call (#11107)
- **Description:** 
* feature for `QianfanChatEndpoint` function_call ability, add
integration_test for it
    * add `model`, `endpoint` supported in calling params
    * add raw response in ChatModel Message
- **Issue:** 
    * #10867 
    * #11105 
    * #10215
- **Dependencies:** no
- **Tag maintainer:** @baskaryan 
- **Twitter handle:** no
2023-10-17 13:33:55 -07:00
M Bharat lal
67300567d3 GCSFileLoader retrieve blob custom metadata and append to document metadata (#11066)
- **Description:** GCSFileLoader retrieve blob's custom metadata and
append to document's metadata
- **Issue:** #9975,
- **Tag maintainer:** @baskaryan please review

Co-authored-by: b0l00ib <bharat.lal@walmart.com>
2023-10-17 12:17:59 -07:00
staoxiao
23c261ba57 Update bge_huggingface.ipynb (#8960)
- Description: Considering the similarity computation method of
[BGE](https://github.com/FlagOpen/FlagEmbedding) model is cosine
similarity, set normalize_embeddings to be True.
- Tag maintainer: @baskaryan

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-17 11:58:29 -07:00
billytrend-cohere
f4742dce50 Add Cohere retrieval augmented generation to retrievers (#11483)
Add Cohere retrieval augmented generation to retrievers

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-17 11:51:04 -07:00
刘 方瑞
0a24ac7388 Revised notebook and add delete to MyScale vector store (#11848)
- **Description:** 
  - Add `.delete` to myscale vector store. 
  - Revised vector store notebooks
- **Tag maintainer:** @baskaryan 
- **Twitter handle:** @myscaledb @mpsk_liu
2023-10-17 11:42:21 -07:00
John Mai
3fb5e4d185 Add Baichuan chat model (#11923)
Description: A large language models developed by Baichuan Intelligent
Technology,https://www.baichuan-ai.com/home
Issue: None
Dependencies: None
Tag maintainer:
Twitter handle:
2023-10-17 11:30:57 -07:00
Eugene Yurtsev
9ecb7240a4 Add security note to recursive url loader (#11934)
Add security note to recursive loader
2023-10-17 13:41:43 -04:00
maks-operlejn-ds
42dcc502c7 Anonymizer small fixes (#11915) 2023-10-17 10:27:29 -07:00
Eugene Yurtsev
90e9ec6962 Sitemap specify default filter url (#11925)
Specify default filter URL in sitemap loader and add a security note

---------

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>
2023-10-17 13:19:27 -04:00
Bagatur
ba0d729961 bump 316 (#11928) 2023-10-17 09:47:57 -07:00
Eugene Yurtsev
83162649bb Add runnables to api reference (#11520)
Need to look at preview whether this works.
2023-10-17 11:46:08 -04:00
Eugene Yurtsev
12d7eaa0c2 Add security notices to toolkits (#11900)
This adds security notices to toolkits init, and to several toolkits.
We'll need to continue documenting the rest of the toolkits.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-17 11:45:09 -04:00
Eugene Yurtsev
5f4a697ce3 Add deprecation warnings (#11899)
Add deprecation warnings

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-17 10:59:38 -04:00
Nuno Campos
8b79cf9566 Add lock for using global config enum weak map (#11920)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-17 15:50:35 +01:00
Nuno Campos
2a8ded6c8c Export merge_configs function (#11916)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-17 15:36:11 +01:00
Nuno Campos
57a02929d5 Add validation for configurable keys passed to .with_config() (#11910)
- Fix some typing issues found while doing that

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-17 15:34:49 +01:00
Nuno Campos
42cd2ef329 Ensure that configurable fields with enums support deduplication (#11909)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-17 15:30:38 +01:00
Nuno Campos
778e7c526e Add comment 2023-10-17 15:29:39 +01:00
Nuno Campos
19319e1746 Allow configs with None values 2023-10-17 15:23:58 +01:00
Nuno Campos
b0d5882fe1 Export merge_configs function 2023-10-17 13:22:07 +01:00
Nuno Campos
12596b9a9b Add validation for configurable keys passed to .with_config()
- Fix some typing issues found while doing that
2023-10-17 08:50:31 +01:00
Nuno Campos
754aca794f remove print 2023-10-17 08:46:07 +01:00
Nuno Campos
cf448a6314 Ensure that configurable fields with enums support deduplication 2023-10-17 08:25:21 +01:00
Leonid Ganeline
31f264169d evaluation criteria (#11681)
the updated value was:
` Criteria.MISOGYNY: "Is the submission misogynistic? If so, respond Y."
`
The " If so, respond Y." should not be here. This sub-string is not
presented in any other criteria and should not be presented here.
I also added a synonym to "misogynistic" as it done in many other
criteria.
2023-10-16 21:05:08 -07:00
Lance Martin
eca8a5e5b8 Flesh out semi-structured cookbook (#11904) 2023-10-16 20:50:15 -07:00
Dmitry Tyumentsev
e8c1850369 Add YandexGPT LLM and Chat model (#11703)
**Description:** Introducing an ability to work with the
[YandexGPT](https://cloud.yandex.com/en/services/yandexgpt) language
model.
2023-10-16 20:30:07 -07:00
eryk-dsai
c4341463e8 Include information on the tools for creating gbnf grammar files in the llama-cpp notebook (#11764)
Hi,

I recently experimented with grammar-based sampling and discovered two
methods for speeding up the creation of gbnf grammar files:
1. [Online grammar generator
app](https://github.com/ggerganov/llama.cpp/discussions/2494) introduced
[here](https://github.com/ggerganov/llama.cpp/discussions/2494)
2.
[Script](https://github.com/ggerganov/llama.cpp/blob/master/examples/json-schema-to-grammar.py)
for parsing json schema to gbnf grammar

I believe it is a good idea to include the information that leads to
them in the `llama-cpp` notebook.

***

Codespell check fails but due to the unrelated script

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-16 20:28:32 -07:00
Bagatur
c15701eebf Revert "Add baichuan model" (#11901)
cc @cloudscool, apologies your PR wasn't actually passing CI
2023-10-16 20:01:12 -07:00
cloudscool
c1d811c4bc Add baichuan model 2023-10-16 19:27:35 -07:00
John Mai
0169d45ba8 Supported OutputFixingParser max_retries (#11754)
Description: Supported OutputFixingParser max_retries
 - max_retries: Maximum number of retries to parser.

Issue: None
Dependencies: None
Tag maintainer: @baskaryan
Twitter handle: @JohnMai95

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-16 19:25:47 -07:00
Leonid Ganeline
c87b5c209d docs safety update (#11789)
The current ToC on the index page and on navbar don't match. Page titles
and Titles in ToC doesn't match
Changes:
- made ToCs equal
- made titles equal
- updated some page formattings.
2023-10-16 19:14:21 -07:00
Surav Shrestha
321506fcd1 fix typos in cookbook/sales_agent_with_context.ipynb (#11790)
I have fixed some typos in file
`cookbook/sales_agent_with_context.ipynb`. I kindly request the repo
maintainers to review and merge it. Thanks!
2023-10-16 19:10:40 -07:00
Surav Shrestha
be04695554 fix typos in cookbook/Semi_structured_multi_modal_RAG_LLaMA2.ipynb (#11791)
I have fixed some typos in file
`cookbook/Semi_structured_multi_modal_RAG_LLaMA2.ipynb`. I kindly
request the repo maintainers to review and merge it. Thanks!
2023-10-16 19:09:20 -07:00
Surav Shrestha
e69218504b fix typos in cookbook/self_query_hotel_search.ipynb (#11792)
I have fixed some typos in file
`cookbook/self_query_hotel_search.ipynb`. I kindly request the repo
maintainers to review and merge it. Thanks!
2023-10-16 19:09:05 -07:00
Surav Shrestha
7f0145315a fix typos in cookbook/Semi_structured_and_multi_modal_RAG.ipynb (#11794)
I have fixed some typos in file
`cookbook/Semi_structured_and_multi_modal_RAG.ipynb`. I kindly request
the repo maintainers to review and merge it. Thanks!
2023-10-16 19:07:21 -07:00
Surav Shrestha
ab145d85ec fix typos in docs/docs/expression_language/cookbook/prompt_llm_parser.ipynb (#11796)
trasform -> transform
2023-10-16 19:07:03 -07:00
volodymyr-memsql
ff8e6981ff SingleStoreDBChatMessageHistory: Add singlestoredb support for ChatMessageHistory (#11705)
**Description**

- Added the `SingleStoreDBChatMessageHistory` class that inherits
`BaseChatMessageHistory` and allows to use of a SingleStoreDB database
as a storage for chat message history.
- Added integration test to check that everything works (requires
`singlestoredb` to be installed)
- Added notebook with usage example
- Removed custom retriever for SingleStoreDB vector store (as it is
useless)

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
2023-10-16 21:59:45 -04:00
Mohammad Mohtashim
634ccb8ccd test_stream_log_retriever Unit Test + Tool names fix (#11808)
## Description



| Tool         | Original Tool Name       |
|-----------------------------|---------------------------|
| open-meteo-api              | Open Meteo API            |
| news-api                    | News API                  |
| tmdb-api                    | TMDB API                  |
| podcast-api                 | Podcast API               |
| golden_query                | Golden Query              |
| dall-e-image-generator      | Dall-E Image Generator    |
| twilio                      | Text Message              |
| searx_search_results        | Searx Search Results      |
| dataforseo                  | DataForSeo Results JSON   |

When using these tools through `load_tools`, I encountered the following
validation error:

```console
openai.error.InvalidRequestError: 'TMDB API' does not match '^[a-zA-Z0-9_-]{1,64}$' - 'functions.0.name'
```

In order to avoid this error, I replaced spaces with hyphens in the tool
names:

| Tool           | Corrected Tool Name       |
|-----------------------------|---------------------------|
| open-meteo-api              | Open-Meteo-API            |
| news-api                    | News-API                  |
| tmdb-api                    | TMDB-API                  |
| podcast-api                 | Podcast-API               |
| golden_query                | Golden-Query              |
| dall-e-image-generator      | Dall-E-Image-Generator    |
| twilio                      | Text-Message              |
| searx_search_results        | Searx-Search-Results      |
| dataforseo                  | DataForSeo-Results-JSON   |

This correction resolved the validation error.

Additionally, a unit test,
`tests/unit_tests/schema/runnable/test_runnable.py::test_stream_log_retriever`,
was failing at random. Upon further investigation, I confirmed that the
failure was not related to the above-mentioned changes. The `stream_log`
variable was generating the order of logs in two ways at random The
reason for this behavior is unclear, but in the assertion, I included
both possible orders to account for this variability.
2023-10-16 18:46:19 -07:00
VAS
a1120e2685 Fixed a typo in bittensor.ipynb (#11821)
Fixed a typo : 

benifits -> benefits

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
2023-10-16 18:43:29 -07:00
VAS
2a6d4acc9d Fixed a typo in anyscale.ipynb (#11822)
Fixed a typo : 

"asyncrhonized" > "asynchronized"

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-16 18:43:15 -07:00
Predrag Gruevski
7c0f1bf23f Upgrade experimental package dependencies and use Poetry 1.6.1. (#11339)
Part of upgrading our CI to use Poetry 1.6.1.
2023-10-16 21:13:31 -04:00
Eugene Yurtsev
c2c0814a94 Add security notice to file management tool (#11878)
Add security notice to file management tool

---------

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>
2023-10-16 21:12:13 -04:00
zhaoshengbo
cb7e12f6ba Adapt to the latest version of Alibaba Cloud OpenSearch vector store API (#11849)
Hello Folks,

Alibaba Cloud OpenSearch has released a new version of the vector
storage engine, which has significantly improved performance compared to
the previous version. At the same time, the sdk has also undergone
changes, requiring adjustments alibaba opensearch vector store code to
adapt.

This PR includes:

Adapt to the latest version of Alibaba Cloud OpenSearch API.
More comprehensive unit testing.
Improve documentation.

I have read your contributing guidelines. And I have passed the tests
below

- [x] make format
- [x]  make lint
- [x]  make coverage
- [x]  make test

---------

Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
2023-10-16 18:07:24 -07:00
Javier Aranda Santos
96e3e06d50 Fix HuggingFace notebook link (#11863)
- **Description:** While reading the docs
(https://python.langchain.com/docs/integrations/providers/huggingface),
I noticed the notebook linked in
https://python.langchain.com/docs/use_cases/evaluation/huggingface_datasets.html
was giving back 404. I made a search in the docs to see whether it was
available, so this PR updates the link in the docs.
  - **Issue:** I haven't opened an issue for this change.
  - **Dependencies:** -
  - **Tag maintainer:** -,
  - **Twitter handle:** -

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-16 18:03:47 -07:00
standby24x7
40d188948e Fix spelling typos in learned_prompt_optimization.ipynb (#11862)
This patch fixes some spelling typo in
learned_prompt_optimization.ipynb.
It only changed messages, no logic changed.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
2023-10-16 18:01:48 -07:00
Lee
e669f9d731 Fix: Sitemap Document Loader Tests and Documentation (#11866)
**Description:**
While working on the Docusaurus site loader #9138, I noticed some
outdated docs and tests for the Sitemap Loader.

**Issue:** 
This is tangentially related to #6691 in reference to doc links. I plan
on digging in to a few of these issue when I find time next.
2023-10-16 17:42:10 -07:00
DJZevenbergen
8bb8c56f74 Fix missing word (#11868)
- **Description:** added one missing word to a doc, 
  - **Dependencies:** N/A
2023-10-16 17:10:31 -07:00
Nuno Campos
9fdf1059a4 Fix issues in runnable docs examples (#11883) 2023-10-16 17:08:28 -07:00
Jean-Louis Queguiner
8b697ff0ee feat(llm): add together.xyz as an LLM provider (#11892)
- **Description:** added together.xyz as an LLM provider, 
  - **Issues:** fix some linting issues
  - twitter handle @jilijeanlouis 

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-16 17:08:04 -07:00
Leonid Kuligin
d269dd2e2f added a multiturn search based on Vertex AI Search (#11885)
Replace this entire comment with:
- **Description:** Added a retriever based on multi-turn Vertex AI
Search
  - **Twitter handle:** lkuligin
2023-10-16 17:05:12 -07:00
Leonid Kuligin
38ed55245f added Vertex examples as attributes (#11890)
- **Description:** added examples to Vertex chat models as optional
class attributes, so that a model with examples can be used inside a
chain
  - **Twitter handle:** lkuligin
2023-10-16 16:55:45 -07:00
eryk-dsai
5019f59724 fix: more robust check whether the HF model is quantized (#11891)
Removes the check of `model.is_quantized` and adds more robust way of
checking for 4bit and 8bit quantization in the `huggingface_pipeline.py`
script. I had to make the original change on the outdated version of
`transformers`, because the models had this property before. Seems
redundant now.

Fixes: https://github.com/langchain-ai/langchain/issues/11809 and
https://github.com/langchain-ai/langchain/issues/11759
2023-10-16 16:54:20 -07:00
Bagatur
efa9ef75c0 add LCEL to retriever doc (#11888) 2023-10-16 16:44:25 -07:00
Bagatur
d62369f478 Add LCEL to chain doc (#11895) 2023-10-16 16:44:12 -07:00
Harrison Chase
52bf03d786 add how to configure documentation (#11889) 2023-10-16 16:01:47 -07:00
Eugene Yurtsev
3be76ee2fa Add security.md (#11881)
Add security markdown file
2023-10-16 17:41:21 -04:00
Leonid Ganeline
ea0982eede update CONTRIBUTING.md (#11872)
Adding description of the `View deployment` button on the PR page. This
nice feature was not documented.

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2023-10-16 14:21:36 -07:00
Lance Martin
18a4fdded6 Add deps and minor cleaning to cookbooks (#11886) 2023-10-16 13:37:51 -07:00
Bagatur
e3664272f0 Add LCEL to output parser doc (#11880) 2023-10-16 12:35:18 -07:00
Bagatur
049a0357e7 Add LCEL to prompt doc (#11875) 2023-10-16 11:34:31 -07:00
Eugene Yurtsev
210a48cfb5 Add security considerations (#11869)
Add security considerations to existing graph tools.
2023-10-16 12:23:48 -04:00
Lance Martin
201b7ce9af Update SQL cookbook (#11870) 2023-10-16 09:12:03 -07:00
Bagatur
25b1d65305 bump 315 (#11850) 2023-10-16 00:50:54 -07:00
Bagatur
ece22b6b6a Add LCEL to LLM intro (#11835) 2023-10-15 14:59:45 -07:00
Bagatur
ffa1b3a758 Add LCEL to chat model intro (#11834) 2023-10-15 14:59:36 -07:00
Nuno Campos
4321d192ea Use a less specific return type for | on Runnables (#11762)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-15 21:15:06 +01:00
Bagatur
6c5bb1b2e1 RM snippets (#11798) 2023-10-15 12:20:58 -07:00
Lance Martin
ccd1400423 Update multi-modal notebooks (#11827) 2023-10-15 09:00:07 -07:00
Lance Martin
8bf16d5275 LLaMA2 SQL Chat cookbook (#11685) 2023-10-15 08:54:09 -07:00
Harrison Chase
a506302772 bearly tool (#11812) 2023-10-14 16:03:58 -07:00
Harrison Chase
4a2f0c51a1 use get_llm_cache and set_llm_cache (#11741)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-14 09:29:30 -07:00
Harrison Chase
f3ad22e64a pipe default key (#11788) 2023-10-14 08:39:23 +01:00
Bagatur
6e78dacd78 customize rtd build (#11797)
customize readthedocs config so that we can parallelize the api docs
build
2023-10-13 19:50:22 -07:00
Eugene Yurtsev
0d37b4c27d Add python,pandas,xorbits,spark agents to experimental (#11774)
See for contex
https://github.com/langchain-ai/langchain/discussions/11680
2023-10-13 17:36:44 -04:00
Bagatur
d6e34ca2ee fix recent docs integrations file loc (#11782) 2023-10-13 13:58:26 -07:00
Michael Feil
233a904f2e GradientLLM Docs update and model_id renaming. (#10963)
Related to #10800 

- Errors in the Docstring of GradientLLM / Gradient.ai LLM
- Renamed the `model_id` to `model` and adapting this in all tests.
Reason to so is to be in Sync with `GradientEmbeddings` and other LLM's.
- inmproving tests so they check the headers in the sent request.
- making the aiosession a private attribute in the docs, as in the
future `pip install gradientai` will be replacing aiosession.
- adding a example how to fine-tune on the Prompt Template as suggested
in #10800
2023-10-13 13:57:58 -07:00
David
6876b02c87 Move EverlyAI python notebook to the right location (#11779)
Hi,

After submitting https://github.com/langchain-ai/langchain/pull/11357,
we realized that the notebooks are moved to a new location. Sending a
new PR to update the doc.

---------

Co-authored-by: everly-studio <127131037+everly-studio@users.noreply.github.com>
2023-10-13 13:34:27 -07:00
Bagatur
1559ba4bfc fix upstash test import (#11781) 2023-10-13 13:31:36 -07:00
Leonid Kuligin
9f0a718198 added candidate_count for Vertex models (#11729)
- **Description:** added support for `candidate_count` parameter on
Vertex
2023-10-13 13:31:20 -07:00
David
9d200e6cbe Create ChatEverlyAI (#11357)
- Description: Adds the ChatEverlyAI class with llama-2 7b on [EverlyAI
Hosted
Endpoints](https://everlyai.xyz/)
- It inherits from ChatOpenAI and requires openai (probably unnecessary
but it made for a quick and easy implementation)

---------

Co-authored-by: everly-studio <127131037+everly-studio@users.noreply.github.com>
2023-10-13 12:25:11 -07:00
Hristo G
7fb25b4154 Add graceful fallback for ES vectorstore when content field is missing (#11726)
- **Description:**
- If the Elasticsearch field used for Langchain > Document.page_content
is missing because the specific document is
        somehow malformed fail gracefully.

  - **Tag maintainer:** 
    - @joemcelroy
2023-10-13 12:03:32 -07:00
Bagatur
f06fcde0d7 rm duplicate zilliz import (#11777) 2023-10-13 12:01:22 -07:00
Bagatur
a3330c4258 bump 314 (#11773) 2023-10-13 11:09:54 -07:00
Erick Friis
1861cc7100 General anthropic functions, steps towards experimental integration tests (#11727)
To match change in js here
https://github.com/langchain-ai/langchainjs/pull/2892

Some integration tests need a bit more work in experimental:
![Screenshot 2023-10-12 at 12 02 49
PM](https://github.com/langchain-ai/langchain/assets/9557659/262d7d22-c405-40e9-afef-669e8d585307)

Pretty sure the sqldatabase ones are an actual regression or change in
interface because it's returning a placeholder.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-13 09:48:24 -07:00
Lance Martin
98c8516ef1 Semi-structured and Multi-modal RAG cookbooks (#11582) 2023-10-13 08:45:54 -07:00
Nuno Campos
17c69678ab Revert "New add Baichuan Model" (#11761)
Reverts langchain-ai/langchain#11714

This has linting and formatting issues, plus it's added to chat models
folder but doesn't subclass Chat Model base class
2023-10-13 08:23:15 -07:00
cloudscool
56653c53aa New add Baichuan Model (#11714)
Motivation and Context
At present, the Baichuan Large Language Model is relatively popular and
efficient in performance. Due to widespread market recognition, this
model has been added to enhance the scalability of Langchain's ability
to access the big language model, so as to facilitate application access
and usage for interested users.

System Info
langchain: 0.0.295
python:3.8.3
IDE:vs code

Description
Add the following files:

1. Add baichuan_baichuaninc_endpoint.py in the
libs/langchain/langchain/chat_models
2. Modify the __init__.py file,which is located in the
libs/langchain/langchain/chat_models/__init__.py:
a. Add "from langchain.chat_models.baichuan_baichuaninc_endpoint import
BaichuanChatEndpoint"
    b. Add "BaichuanChatEndpoint" In the file's __ All__  method

Your contribution
I am willing to help implement this feature and submit a PR, but I would
appreciate guidance from the maintainers or community to ensure the
changes are made correctly and in line with the project's standards and
practices.
2023-10-12 23:04:28 -07:00
Shreyas S
694d768174 Minor fix (#11748)
changed > to over
2023-10-12 22:36:31 -07:00
Bagatur
8e6fa5f1d7 mv self-query docs to integrations (#11744) 2023-10-12 22:36:07 -07:00
Yang, Bo
9e1e0f54d2 Add TrainableLLM (#11721)
- **Description:** Add `TrainableLLM` for those LLM support fine-tuning
  - **Tag maintainer:** @hwchase17

This PR add training methods to `GradientLLM`

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-12 17:38:33 -07:00
Burak Yılmaz
63e516c2b0 Upstash redis integration (#10871)
- **Description:** Introduced Upstash provider with following wrappers:
UpstashRedisCache, UpstashRedisEntityStore,
UpstashRedisChatMessageHistory, UpstashRedisStore
  - **Issue:** -,
  - **Dependencies:** upstash-redis python package is needed,
  - **Tag maintainer:** @baskaryan 
  - **Twitter handle:** @BurakY744

---------

Co-authored-by: Burak Yılmaz <burakyilmaz@Buraks-MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-12 17:36:51 -07:00
Bagatur
a9db2b0b92 fix tongyi import (#11745) 2023-10-12 17:24:06 -07:00
Aaron Pham
6c61315067 fix(openllm): update with newer remote client implementation (#11740)
cc @baskaryan

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-12 17:01:18 -07:00
Richy Wang
11cdfe44af Implement Alibaba Tongyi chat model apis. (#10922)
Hi there
This PR is aim to implement chat model for Alibaba Tongyi LLM model. It
contains work below:
1.Implement ChatTongyi chat model in langchain.chat_models.tongyi. Note
this is different with tongyi llm model to another PR
https://github.com/langchain-ai/langchain/pull/10878.
For detail it implements _generate() and _stream() function in
ChatTongyi.
2. Add some examples in chat/tongyi.ipynb. 
3. Add integration test in chat_models/test_tongyi.py 

Note async completion for the Text API is not yet supported.
Dependencies: dashscope. It will be installed manually cause it is not
need by everyone.
2023-10-12 16:59:37 -07:00
Adam Demjen
008348ce71 Add ElasticsearchChatMessageHistory (#10932)
**Description**

This PR adds the `ElasticsearchChatMessageHistory` implementation that
stores chat message history in the configured
[Elasticsearch](https://www.elastic.co/elasticsearch/) deployment.

```python
from langchain.memory.chat_message_histories import ElasticsearchChatMessageHistory

history = ElasticsearchChatMessageHistory(
    es_url="https://my-elasticsearch-deployment-url:9200", index="chat-history-index", session_id="123"
)

history.add_ai_message("This is me, the AI")
history.add_user_message("This is me, the human")
```

**Dependencies**
- [elasticsearch client](https://elasticsearch-py.readthedocs.io/)
required

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-12 16:51:38 -07:00
Bagatur
d3a5090e12 mv semadb docs (#11743) 2023-10-12 16:31:09 -07:00
Bagatur
acdbdbddb1 clean up doc (#11742)
committed old doc in wrong place
2023-10-12 16:26:55 -07:00
Jonathan Soma
48cf978391 Allow placeholders in OpenAPI endpoints #2938 (#2940)
Use regex matches when checking endpoints instead of exact matches.
`{varname}` becomes `.*`

Fixes #2938

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-12 16:20:32 -07:00
Mateusz Kozak
e42a576cb2 update Qdrant documentation (#3105)
fix `from_documents` method usage for Qdrant in documentation as
previous example doesn't work

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-12 16:20:18 -07:00
Predrag Gruevski
9e32120cbb Deprecate direct access to globals like debug and verbose. (#11311)
Instead of accessing `langchain.debug`, `langchain.verbose`, or
`langchain.llm_cache`, please use the new getter/setter functions in
`langchain.globals`:
- `langchain.globals.set_debug()` and `langchain.globals.get_debug()`
- `langchain.globals.set_verbose()` and
`langchain.globals.get_verbose()`
- `langchain.globals.set_llm_cache()` and
`langchain.globals.get_llm_cache()`

Using the old globals directly will now raise a warning.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-10-12 15:48:04 -07:00
Bagatur
01b7b46908 reorder eval docs (#11738)
cc @leo-gan
2023-10-12 15:46:55 -07:00
Richard Adams
35965df20d Rspace doc loader (#11511)
**Description:**

Add a document loader for the RSpace Electronic Lab Notebook
(www.researchspace.com), so that scientific documents and research notes
can be easily pulled into Langchain pipelines.

**Issue**

This is an new contribution, rather than an issue fix.

 **Dependencies:** 
  
There are no new required dependencies.
In order to use the loader, clients will need to install rspace_client
SDK using `pip install rspace_client`

---------

Co-authored-by: richarda23 <richard.c.adams@infinityworks.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-12 15:05:38 -07:00
Ryan Zotti
9d1867c77f Update docs to specify Indexing-API-compatible vectorstores (#11581)
**Description:** Update Indexing API docs to specify vectorstores that
are compatible with the Indexing API. I add a unit test to remind
developers to update the documentation whenever they add or change a
vectorstore in a way that affects compatibility. For the unit test I
repurposed existing code from
[here](https://github.com/langchain-ai/langchain/blob/v0.0.311/libs/langchain/langchain/indexes/_api.py#L245-L257).

This is my first PR to an open source project. This is a trivially
simple PR whose main purpose is to make me more comfortable submitting
Langchain PRs. If this PR goes through I plan to submit PRs with more
substantive changes in the near future.

**Issue:** Resolves
[10482](https://github.com/langchain-ai/langchain/discussions/10482).

**Dependencies:** No new dependencies.

**Twitter handle:** None.
2023-10-12 15:17:44 -04:00
Richard Wang
6402c33299 Let Notion document loader support utf-8 and make it default. (#10613)
Use utf-8 encoding by default
2023-10-12 15:13:41 -04:00
Tomaz Bratanic
3759a34229 Add graph construction to neo4j docs (#11716)
Add graph construction section to Neo4j provider docs
2023-10-12 11:37:42 -07:00
Bagatur
bd74eba152 add azure openai sched tests (#11723) 2023-10-12 10:48:45 -07:00
Nuno Campos
b54727fbad Nc/why lcel (#11717)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-12 17:52:20 +01:00
Bagatur
9c0584be74 bump 313 (#11718) 2023-10-12 09:48:54 -07:00
Johnny Deuss
bb2ed4615c Fix typos (#11663) 2023-10-12 11:44:03 -04:00
sudranga
361f8e1bc6 Add MMR functionality to elasticsearch retriever (#11633)
Allows MMR functionality only for the case where we have access to the
embedding function. Also allows for users to request for fields from
elasticsearch store. These are added to the document metadata.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-12 08:42:32 -07:00
Dmitry Tyumentsev
ead9d5b55c Add yandex stt parser (#11435)
Description: Introducing an ability to load a transcription document of
audio file using [Yandex
SpeechKit](https://cloud.yandex.com/en-ru/services/speechkit)
Issue: None
Dependencies: yandex-speechkit
Tag maintainer: @rlancemartin, @eyurtsev
2023-10-12 08:42:03 -07:00
Janos Tolgyesi
15687a28d5 Use correct tokenizer for Bedrock/Anthropic LLMs (#11561)
**Description**

This PR implements the usage of the correct tokenizer in Bedrock LLMs,
if using anthropic models.

**Issue:** #11560

**Dependencies:** optional dependency on `anthropic` python library.

**Twitter handle:** jtolgyesi


---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-12 08:41:52 -07:00
kYLe
467b082c34 Modify Anyscale integration to work with Anyscale Endpoint (#11569)
**Description:** Modify Anyscale integration to work with [Anyscale
Endpoint](https://docs.endpoints.anyscale.com/)
and it supports invoke, async invoke, stream and async invoke features

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-12 08:41:25 -07:00
plpycoin
51193309ea Update readthedocs.py (#11110)
Only parse .html files
.svg .png favicon.ico will crash processing phase

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-10-12 11:32:06 -04:00
Shreyas S
70a793ca9d Update zep_memory.ipynb (#11713)
fixed minor typos;
the your > your
on > upon
2023-10-12 10:41:19 -04:00
Surav Shrestha
e61b528c0e Fix typos in docs/docs/use_cases/question_answering/code_understandin… (#11710)
herarchy -> hierarchy
2023-10-12 10:17:23 -04:00
Surav Shrestha
f386ac3bef Fix typos in docs/docs/use_cases/tagging.ipynb (#11712)
funtion -> function
2023-10-12 10:17:10 -04:00
Surav Shrestha
ac73154005 Fix typos in docs/docs/use_cases/question_answering/conversational_re… (#11709)
neccessary -> necessary
2023-10-12 10:16:52 -04:00
Surav Shrestha
af9ce3c224 Fix typos in docs/docs/use_cases/chatbots.ipynb (#11707)
implemet -> implement
2023-10-12 10:16:34 -04:00
Surav Shrestha
77fcaa410a Fix typos in docs/docs/use_cases/extraction.ipynb (#11708)
This PR has a number of typos correction. I kindly request the repo
maintainers to review this PR and merge it.
2023-10-12 10:16:17 -04:00
Nuno Campos
ca9de26f2b Add callback function to RunnablePassthrough (#11564)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-12 15:10:16 +01:00
Nuno Campos
7f4734c0dd Add deploy command to repos generated by cli template (#11711)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-12 15:09:21 +01:00
Nuno Campos
1c0857b53e Fix default impl of aparse_result (#11702)
Should delegate to parse_result, not to aparse, as parse_result is a
method that some output parsers override

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-12 14:13:59 +01:00
nuric
44da27c07b Add SemaDB VST wrapper (#11484)
- **Description**: Adding vectorstore wrapper for
[SemaDB](https://rapidapi.com/semafind-semadb/api/semadb).
- **Issue**: None
- **Dependencies**: None
- **Twitter handle**: semafind

Checks performed:
- [x] `make format`
- [x] `make lint`
- [x] `make test`
- [x] `make spell_check`
- [x] `make docs_build`

Documentation added:

- SemaDB vectorstore wrapper tutorial
2023-10-11 19:09:38 -07:00
hsuyuming
0b743f005b Feature/enhance huggingfacepipeline to handle different return type (#11394)
**Description:** Avoid huggingfacepipeline to truncate the response if
user setup return_full_text as False within huggingface pipeline.

**Dependencies:** : None
**Tag maintainer:**   Maybe @sam-h-bean ?

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 19:09:03 -07:00
Leonid Kuligin
2aba9ab47e Retriever based on GCP DocAI Warehouse (#11400)
- **Description:** implements a retriever on top of DocAI Warehouse (to
interact with existing enterprise documents)
  https://cloud.google.com/document-ai-warehouse?hl=en
  - **Issue:** new functionality
 
@baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 19:08:53 -07:00
mvhensbergen
629d9b78fa Make example work during pydantic transition (#11498)
**Description:**

Make the example extraction code on
https://python.langchain.com/docs/use_cases/extraction work again by
importing the langchain.pydantic_v1 lib instead of the v2.

**Issue:**

Solves issue https://github.com/langchain-ai/langchain/issues/11468

Co-authored-by: Martin van Hensbergen <martin@mvhensbergen.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 18:44:47 -07:00
Erick Friis
a477ddda45 Langsmith in readme update (#11497) 2023-10-11 18:43:52 -07:00
Leonid Kuligin
9e81ab47be Added a better error description if processor name is wrong. (#11488)
Replace this entire comment with:
  - **Description:** added a better error description for this error
  - **Issue:** #11407 
  
  @baskaryan
2023-10-11 18:43:40 -07:00
Robert Yi
e75766b759 fix: incorrect arguments in clickhouse docstring (#11693)
fix docstring for clickhouse
2023-10-11 21:41:21 -04:00
Eugene Yurtsev
17b5090c18 Add type to Agent actions (#11682)
Add `type` to agent actions.
2023-10-11 21:33:24 -04:00
April
c14a8df2ee wrap confluence attachment processing with a try-except block (#11503)
Prevents document loading from erroring out when an attachment is not
found at the url.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 18:13:42 -07:00
Bagatur
17439daa6a add plan execute cookbook (#11690) 2023-10-11 18:03:13 -07:00
eajechiloae
4ba2c8ba75 Fix ClearML callback (#11472)
Handle different field names in dicts/dataframes, fixing the ClearML
callback.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 17:09:02 -07:00
ElliotKetchup
7ae8b7f065 Llama doc: add 'language' to the response message (#11543)
- **Description:** add 'language' to the reponse message in the Llama
doc,
  - **Issue:** None,
  - **Dependencies:** None,
  - **Tag maintainer:** None,
  - **Twitter handle:** None

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 17:06:04 -07:00
Lawrence Wu
93bb19f69a Fix chains/loading.py error messages (#11688)
- **Description:** make the error messages consistent in
chains/loading.py
  - **Dependencies:** None
2023-10-11 17:05:42 -07:00
Harrison Chase
18ebce2032 fix tool async (#11689) 2023-10-11 16:40:23 -07:00
sudranga
9beb03e771 11474 (#11519)
No relevant documents may be found for a given question. In some use
cases, we could directly respond with a fixed message instead of doing
an LLM call with an empty context. This PR exposes this as an option:
response_if_no_docs_found.

---------

Co-authored-by: Sudharsan Rangarajan <sudranga@nile-global.com>
2023-10-11 16:30:15 -07:00
Shinya Maeda
1f7edcd08b doc: Fix documentation about n-gram overlap (#11549)
Fix the documentation in
https://python.langchain.com/docs/modules/model_io/prompts/example_selectors/ngram_overlap.
It's currently declaring unrelated variables, for example, `examples`
local variable is declared twice and the first one is overwritten
immediately.
  - **Issue:** N/A
  - **Dependencies:** N/A
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
  - **Twitter handle:** @dosuken123
2023-10-11 16:26:56 -07:00
Joaquin Menendez
ef99b06362 feature: add metadata information into the embedding file before uplo… (#11553)
Replace this entire comment with:
- **Description:** In this modified version of the function, if the
metadatas parameter is not None, the function includes the corresponding
metadata in the JSON object for each text. This allows the metadata to
be stored alongside the text's embedding in the vector store.
  - 
  - **Issue:** #10924
  - **Dependencies:** None
  - **Tag maintainer:** @hwchase17
@agola11
  - **Twitter handle:** @MelliJoaco

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 16:05:13 -07:00
maks-operlejn-ds
3c83779661 Qa with anonymization (#11658)
Added demo for QA system with anonymization. It will be part of
LangChain's privacy webinar.

@hwchase17 @baskaryan @nfcampos 

Twitter handle: @MaksOpp

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 15:38:08 -07:00
Marcin Wątroba
51a3a86022 #11655 Add SQLAlchemyMd5Cache implementation (#11660)
- **Description:** Add SQLAlchemyMd5Cache implementation, 
  - **Issue:** the issue # #11655,
  - **Dependencies:** no deps,
  - **Tag maintainer:** @markowanga

---------

Co-authored-by: Marcin Wątroba <marcin.watroba@pwr.edu.pl>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 15:28:09 -07:00
Suresh Kumar Ponnusamy
70f7558db2 langchain-experimental: Add allow_list support in experimental/data_anonymizer (#11597)
- **Description:** Add allow_list support in langchain experimental
data-anonymizer package
  - **Issue:** no
  - **Dependencies:** no
  - **Tag maintainer:** @hwchase17
  - **Twitter handle:**
2023-10-11 14:50:41 -07:00
wemysschen
2363c02cf3 Bos loader (#11525)
**Description:**
Add  BaiduCloud BOS document loader.

---------

Co-authored-by: chenweixu01 <chenweixu01@baidu.com>
Co-authored-by: root <root@icoding-cwx.bcc-szzj.baidu.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 14:43:48 -07:00
Kwanghoon Choi
fbb82608cd Fixed a bug in reporting Python code validation (#11522)
- **Description:** fixed a bug in pal-chain when it reports Python
    code validation errors. When node.func does not have any ids, the
    original code tried to print node.func.id in raising ValueError.
- **Issue:** n/a,
- **Dependencies:** no dependencies,
- **Tag maintainer:** @hazzel-cn, @eyurtsev
- **Twitter handle:** @lazyswamp

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 14:34:28 -07:00
Harrison Chase
9f39c23a13 add input type for convo retrieval chain (#11679) 2023-10-11 17:13:48 -04:00
zhaozhiming
d5e762d328 fix: Change the docs of JSONAgentOutputParser (#11594)
I am merely making some minor adjustments to the function documentation.
I hope to provide a small assistance to LangChain.
- **Description:** Change the docs of JSONAgentOutputParser. It will be
`JSON` better,
  - **Issue:** no,
  - **Dependencies:** no,
  - **Tag maintainer:** @hwchase17,
  - **Twitter handle:** Not worth mentioning.
2023-10-11 14:05:53 -07:00
Shreyas S
3cd0827785 Update kay.ipynb (#11676)
Fixed title display
2023-10-11 14:02:11 -07:00
Vinay Kakade
dd0cd98861 Add support for ChatOpenAI models in Infino callback handler (#11608)
**Description:** This PR adds support for ChatOpenAI models in the
Infino callback handler. In particular, this PR implements
`on_chat_model_start` callback, so that ChatOpenAI models are supported.
With this change, Infino callback handler can be used to track latency,
errors, and prompt tokens for ChatOpenAI models too (in addition to the
support for OpenAI and other non-chat models it has today). The existing
example notebook is updated to show how to use this integration as well.
cc/ @naman-modi @savannahar68

**Issue:** https://github.com/langchain-ai/langchain/issues/11607 

**Dependencies:** None

**Tag maintainer:** @hwchase17 

**Twitter handle:** [@vkakade](https://twitter.com/vkakade)
2023-10-11 14:00:54 -07:00
Israel Ekpo
d0603c86b6 Add Support for Azure Cosmos DB MongoDB vCore Vector Store #11627 (#11632)
This PR adds support for the Azure Cosmos DB MongoDB vCore Vector Store

https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/

https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search

Summary:
- **Description:** added vector store integration for Azure Cosmos DB
MongoDB vCore Vector Store,
  - **Issue:** the issue # it fixes #11627,
  - **Dependencies:** pymongo dependency,
  - **Tag maintainer:** @hwchase17,
  - **Twitter handle:** @izzyacademy

---------

Co-authored-by: Israel Ekpo <israel.ekpo@gmail.com>
Co-authored-by: Israel Ekpo <44282278+izzyacademy@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-11 13:56:46 -07:00
Erick Friis
28ee6a7c12 Track ChatFireworks time to first_token (#11672) 2023-10-11 13:37:03 -07:00
Erick Friis
2c1e735403 Fix runnable docs link (#11675) 2023-10-11 13:11:23 -07:00
Eugene Yurtsev
539941281d Fix output types for BaseChatModel (#11670)
* Should use non chunked messages for Invoke/Batch
* After this PR, stream output type is not represented, do we want to
use the union?

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-10-11 16:02:03 -04:00
Ikko Eltociear Ashimine
7d0dda7e41 Fix typo in baidu_qianfan_endpoint.ipynb (#11667)
enviroment -> environment
2023-10-11 16:01:18 -04:00
Bagatur
cf86447623 Start cookbook and move stuff from use cases (#11636) 2023-10-11 12:27:13 -07:00
Eugene Yurtsev
99adcdb1c9 Add dedicated type attribute to be used solely for serialization purposes (#11585)
Adds standard `type` field for all messages that will be
serialized/validated by pydantic.

* The presence of `type` makes it easier for developers consuming
schemas to write client code to serialize/deserialize.
* In LangServe `type` will be used for both validation and will appear
in the generated openapi specs
2023-10-11 15:06:42 -04:00
eryk-dsai
06d5971be9 Fix issue #10985 - Skip model.to(device) if it is instantiated with bitsandbytes config (#11009)
Preventing error caused by attempting to move the model that was already
loaded on the GPU using the Accelerate module to the same or another
device. It is not possible to load model with Accelerate/PEFT to CPU for
now

Addresses:
[#10985](https://github.com/langchain-ai/langchain/issues/10985)
2023-10-11 09:28:27 -07:00
Nuno Campos
64969bc8ae Add patch_config(configurable=) arg, make with_config(configurable=) merge it with existing (#11662)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-11 14:45:31 +01:00
Harrison Chase
ce0019b646 make utils conditional (#11646) 2023-10-11 06:11:32 +01:00
Harrison Chase
8f06085b24 make tools conditional (#11647) 2023-10-11 06:11:05 +01:00
Bassem Yacoube
5451b724fc Adds support for llama2 and fixes MPT-7b url (#11465)
- **Description:** This is an update to OctoAI LLM provider that adds
support for llama2 endpoints hosted on OctoAI and updates MPT-7b url
with the current one.
@baskaryan
Thanks!

---------

Co-authored-by: ML Wiz <bassemgeorgi@gmail.com>
2023-10-10 20:34:35 -07:00
Todd Kerpelman
0bff399af1 Make metadata from the url_selenium loader match that of the web_base loader (#11617)
**Description:** I noticed the metadata returned by the url_selenium
loader was missing several values included by the web_base loader. (The
former returned `{source: ...}`, the latter returned `{source: ...,
title: ..., description: ..., language: ...}`.) This change fixes it so
both loaders return all 4 key value pairs.

Files have been properly formatted and all tests are passing. Note,
however, that I am not much of a python expert, so that whole "Adding
the imports inside the code so that tests pass" thing seems weird to me.
Please LMK if I did anything wrong.
2023-10-10 20:32:45 -07:00
Tarun Thotakura
c9d4d53545 Fixed the assignment of custom_llm_provider argument (#11628)
- **Description:** Assigning the custom_llm_provider to the default
params function so that it will be passed to the litellm
- **Issue:** Even though the custom_llm_provider argument is being
defined it's not being assigned anywhere in the code and hence its not
being passed to litellm, therefore any litellm call which uses the
custom_llm_provider as required parameter is being failed. This
parameter is mainly used by litellm when we are doing inference via
Custom API server.
https://docs.litellm.ai/docs/providers/custom_openai_proxy
  - **Dependencies:** No dependencies are required

@krrishdholakia , @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-10 20:29:24 -07:00
Leonid Ganeline
db67ccb0bb docstrings cleanup (#11640)
Added missed docstrings. Some reformatting.
2023-10-10 19:56:47 -07:00
Bagatur
78b4c7d5a0 collapse sidebar peer items (#11639) 2023-10-10 19:56:21 -07:00
Bagatur
6dd7362a54 start cookbook (#11638) 2023-10-10 17:37:23 -07:00
Yang, Bo
3a82bd7bdb Use raise from statement so that users can find detailed error message (#11461)
- **Description:** Use `raise from` statement so that users can find
detailed error message
  - **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17
2023-10-10 17:25:23 -07:00
Nuno Campos
9a0ed75a95 Add configurable fields with options (#11601)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-10 22:17:22 +01:00
Bagatur
0ca8d4449c add ls guide redirect (#11623) 2023-10-10 12:58:04 -07:00
Bagatur
eedfddac2d Restructure docs (#11620) 2023-10-10 12:55:19 -07:00
Bagatur
7232e082de bump 312 (#11621) 2023-10-10 12:34:49 -07:00
Eugene Yurtsev
58220cda72 Remove LLM Bash and related bash utilities (#11619)
Deprecate LLMBash and related bash utilities
2023-10-10 14:54:09 -04:00
ElliotKetchup
683f4a93b9 Update azureml_chat_endpoint code exemple (#11602)
- **Description:** azureml_chat_endpoint code exemple now takes
endpoint_url and endpoint_api_key parameter into consideration,
  - **Issue:** None),
  - **Dependencies:** None,
  - **Tag maintainer:** None,
  - **Twitter handle:** @ElliotAlladaye
2023-10-10 10:27:28 -07:00
Yong woo Song
fca34eb122 Fix: invalid link to chat model in openai platform docs (#11609)
There is some invalid link in open ai platform
[docs](https://python.langchain.com/docs/integrations/platforms/openai).
So i fixed it to valid links.
- `/docs/integrations/chat_models/openai` ->
`/docs/integrations/chat/openai`
- `/docs/integrations/chat_models/azure_openai` ->
`/docs/integrations/chat/azure_chat_openai`

Thanks! ☺️
2023-10-10 10:22:39 -07:00
Shubham Kushwaha
49de862076 Arcee.ai LLM & Retriever integration (#11579)
- **Description:** This PR introduces a new LLM and Retriever API to
https://arcee.ai for the python client
  - **Issue:** implements the integrations as requested in #11578 ,
  - **Dependencies:** no dependencies are required,
  - **Tag maintainer:** @hwchase17
  - **Twitter handle:** shwooobham 


** `make format`, `make lint` and `make test` runs locally.**
```shell
=========== 1245 passed, 277 skipped, 20 warnings in 16.26s ===========
./scripts/check_pydantic.sh .
./scripts/check_imports.sh
poetry run ruff .
[ "." = "" ] || poetry run black . --check
All done!  🍰 
1818 files would be left unchanged.
[ "." = "" ] || poetry run mypy .
Success: no issues found in 1815 source files
[ "." = "" ] || poetry run black .
All done!  🍰 
1818 files left unchanged.
[ "." = "" ] || poetry run ruff --select I --fix .
poetry run codespell --toml pyproject.toml
poetry run codespell --toml pyproject.toml -w
```


**Contributions**
1. Arcee (langchain/llms), ArceeRetriever (langchain/retrievers),
ArceeWrapper (langchain/utilities)
2. docs for Arcee (llms/arcee.py) and
ArceeRetriever(retrievers/arcee.py)
3.

cc: @jacobsolawetz @ben-epstein

---------

Co-authored-by: Shubham <shubham@sORo.local>
2023-10-10 10:20:45 -07:00
Eugene Yurtsev
b6a2507794 Docs to use LLMSymbolicMath and LLMBash + utilities from experimental (#11614)
Update docs in lieu of:

https://github.com/langchain-ai/langchain/discussions/11352
2023-10-10 13:11:46 -04:00
Eugene Yurtsev
b56ca0c2a4 Deprecate LLMSymbolicMath from langchain core (#11615)
Deprecate LLMSymbolicMath from langchain core package.
2023-10-10 12:33:51 -04:00
Leonid Ganeline
59adeaddb3 docs: update dependents (#11502)
A regular update of dependents.
2023-10-10 09:31:23 -07:00
Eugene Yurtsev
c9bce5bbfb Add version to langchain_experimental (#11613)
Add version to langchain experimental
2023-10-10 11:17:41 -04:00
Predrag Gruevski
22abeb9f6c Disable loading jinja2 PromptTemplate from file. (#10252)
jinja2 templates are not sandboxed and are at risk for arbitrary code
execution. To mitigate this risk:
- We no longer support loading jinja2-formatted prompt template files.
- `PromptTemplate` with jinja2 may still be constructed manually, but
the class carries a security warning reminding the user to not pass
untrusted input into it.

Resolves #4394.
2023-10-10 11:15:42 -04:00
Bagatur
b642d00f9f rm slack from community.md (#11610) 2023-10-10 07:55:26 -07:00
Nuno Campos
c7c03d4709 Fix mutation bugs in callback manager configure (#11603)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-10-10 14:50:18 +01:00
cccs-eric
e2a9072b80 Fix CohereRerank configuration (#11583)
**Description:** CohereRerank is missing `cohere_api_key` as a field and
since extras are forbidden, it is not possible to pass-in the key. The
only way is to use an env variable named `COHERE_API_KEY`.

For example, if trying to create a compressor like this:
```python
cohere_api_key = "......Cohere api key......"
compressor = CohereRerank(cohere_api_key=cohere_api_key)
```
you will get the following error:
```
  File "/langchain/.venv/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for CohereRerank
cohere_api_key
  extra fields not permitted (type=value_error.extra)
```
2023-10-09 23:26:34 -07:00
Anar
55fef4b64b implemented add files method in LLMRails (#11518)
This PR provides add files method with LLMRails. Implemented here are:

docs/extras/integrations/vectorstores/llm-rails.ipynb

---------

Co-authored-by: Anar Aliyev <aaliyev@mgmt.cloudnet.services>
2023-10-09 16:29:43 -07:00
unifyh
fd7f129f10 Docs: Fix broken line breaks in snippets (#11523)
**Description:**
This PR fix some code snippets that have raw `\n`'s instead of actual
line breaks.

**Issue:**
Currently some snippets look like this:

![image](https://github.com/langchain-ai/langchain/assets/18213435/355b4911-38e9-4ba4-8570-f928557b6c13)

Affected pages:
-
https://python.langchain.com/docs/integrations/providers/predictionguard#example-usage
-
https://python.langchain.com/docs/modules/agents/how_to/custom_llm_agent#set-up-environment
-
https://python.langchain.com/docs/modules/chains/foundational/llm_chain#get-started
-
https://python.langchain.com/docs/integrations/providers/shaleprotocol#how-to

**Tag maintainer:**
@hwchase17
2023-10-09 15:40:27 -07:00
Stephen Hankinson
316dddc7cd fix wording of query_sql_database_tool_description (#11530)
- **Description:** Fixes minor typo for the
query_sql_database_tool_description in the db toolkit
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Tag maintainer:** @nfcampos 
  - **Twitter handle:** N/A
2023-10-09 15:32:45 -07:00
Ash Vardanian
1acfe86353 Accelerating Math Utils with SimSIMD (#11566)
LangChain relies on NumPy to compute cosine distances, which becomes a
bottleneck with the growing dimensionality and number of embeddings. To
avoid this bottleneck, in our libraries at
[Unum](https://github.com/unum-cloud), we have created a specialized
package - [SimSIMD](https://github.com/ashvardanian/simsimd), that knows
how to use newer hardware capabilities. Compared to SciPy and NumPy, it
reaches 3x-200x performance for various data types. Since publication,
several LangChain users have asked me if I can integrate it into
LangChain to accelerate their workflows, so here I am 🤗

## Benchmarking

To conduct benchmarks locally, run this in your Jupyter:

```py
import numpy as np
import scipy as sp
import simsimd as simd
import timeit as tt

def cosine_similarity_np(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
    X_norm = np.linalg.norm(X, axis=1)
    Y_norm = np.linalg.norm(Y, axis=1)
    with np.errstate(divide="ignore", invalid="ignore"):
        similarity = np.dot(X, Y.T) / np.outer(X_norm, Y_norm)
    similarity[np.isnan(similarity) | np.isinf(similarity)] = 0.0
    return similarity

def cosine_similarity_sp(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
    return 1 - sp.spatial.distance.cdist(X, Y, metric='cosine')

def cosine_similarity_simd(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
    return 1 - simd.cdist(X, Y, metric='cosine')

X = np.random.randn(1, 1536).astype(np.float32)
Y = np.random.randn(1, 1536).astype(np.float32)
repeat = 1000

print("NumPy: {:,.0f} ops/s, SciPy: {:,.0f} ops/s, SimSIMD: {:,.0f} ops/s".format(
    repeat / tt.timeit(lambda: cosine_similarity_np(X, Y), number=repeat),
    repeat / tt.timeit(lambda: cosine_similarity_sp(X, Y), number=repeat),
    repeat / tt.timeit(lambda: cosine_similarity_simd(X, Y), number=repeat),
))
```

## Results

I ran this on an M2 Pro Macbook for various data types and different
number of rows in `X` and reformatted the results as a table for
readability:

| Data Type | NumPy | SciPy | SimSIMD |
| :--- | ---: | ---: | ---: |
| `f32, 1` | 59,114 ops/s | 80,330 ops/s | 475,351 ops/s |
| `f16, 1` | 32,880 ops/s | 82,420 ops/s | 650,177 ops/s |
| `i8, 1` | 47,916 ops/s | 115,084 ops/s | 866,958 ops/s |
| `f32, 10` | 40,135 ops/s | 24,305 ops/s | 185,373 ops/s |
| `f16, 10` | 7,041 ops/s | 17,596 ops/s | 192,058 ops/s |
| `f16, 10` | 21,989 ops/s | 25,064 ops/s | 619,131 ops/s |
| `f32, 100` | 3,536 ops/s | 3,094 ops/s | 24,206 ops/s |
| `f16, 100` | 900 ops/s | 2,014 ops/s | 23,364 ops/s |
| `i8, 100` | 5,510 ops/s | 3,214 ops/s | 143,922 ops/s |

It's important to note that SimSIMD will underperform if both matrices
are huge.
That, however, seems to be an uncommon usage pattern for LangChain
users.
You can find a much more detailed performance report for different
hardware models here:

- [Apple M2
Pro](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-1-performance-on-apple-m2-pro).
- [4th Gen Intel Xeon
Platinum](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-2-performance-on-4th-gen-intel-xeon-platinum-8480).
- [AWS Graviton
3](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-3-performance-on-aws-graviton-3).
  
## Additional Notes

1. Previous version used `X = np.array(X)`, to repackage lists of lists.
It's an anti-pattern, as it will use double-precision floating-point
numbers, which are slow on both CPUs and GPUs. I have replaced it with
`X = np.array(X, dtype=np.float32)`, but a more selective approach
should be discussed.
2. In numerical computations, it's recommended to explicitly define
tolerance levels, which were previously avoided in
`np.allclose(expected, actual)` calls. For now, I've set absolute
tolerance to distance computation errors as 0.01: `np.allclose(expected,
actual, atol=1e-2)`.

---

  - **Dependencies:** adds `simsimd` dependency
  - **Tag maintainer:** @hwchase17
  - **Twitter handle:** @ashvardanian

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-09 14:56:55 -07:00
benchello
5de64e6d60 Add option to specify metadata columns in CSV loader (#11576)
#### Description
This PR adds the option to specify additional metadata columns in the
CSVLoader beyond just `Source`.

The current CSV loader includes all columns in `page_content` and if we
want to have columns specified for `page_content` and `metadata` we have
to do something like the below.:
```
csv = pd.read_csv(
        "path_to_csv"
    ).to_dict("records")

documents = [
        Document(
            page_content=doc["content"],
            metadata={
                "last_modified_by": doc["last_modified_by"],
                "point_of_contact": doc["point_of_contact"],
            }
        ) for doc in csv
    ]
```
#### Usage
Example Usage:
```
csv_test  =  CSVLoader(
      file_path="path_to_csv", 
      metadata_columns=["last_modified_by", "point_of_contact"]
 )
```
Example CSV:
```
content, last_modified_by, point_of_contact
"hello world", "Person A", "Person B"
```

Example Result:
```
Document {
 page_content: "hello world"
 metadata: {
 row: '0',
 source: 'path_to_csv',
 last_modified_by: 'Person A',
 point_of_contact: 'Person B',
 }
```

---------

Co-authored-by: Ben Chello <bchello@dropbox.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-09 14:56:45 -07:00
Stephen Hankinson
447a523662 fix comments in output format (#11536)
- **Description:** Fixes the comments in the ConvoOutputParser. Because
the \\\\ is escaping a single \\, they render something like:
`"action_input": string \ The input to the action` in the prompt.
Changing this to \\\\\\\\ lets it escape two slashes so that it renders
a proper comment: `"action_input": string \\ The input to the action`
  - **Issue:** N/A
  - **Dependencies:** 
  - **Tag maintainer:** @hwchase17
  - **Twitter handle:**
2023-10-09 14:55:44 -07:00
Michael Landis
8e45f720a8 feat: add momento vector index as a vector store provider (#11567)
**Description**:

- Added Momento Vector Index (MVI) as a vector store provider. This
includes an implementation with docstrings, integration tests, a
notebook, and documentation on the docs pages.
- Updated the Momento dependency in pyproject.toml and the lock file to
enable access to MVI.
- Refactored the Momento cache and chat history session store to prefer
using "MOMENTO_API_KEY" over "MOMENTO_AUTH_TOKEN" for consistency with
MVI. This change is backwards compatible with the previous "auth_token"
variable usage. Updated the code and tests accordingly.

**Dependencies**:

- Updated Momento dependency in pyproject.toml.

**Testing**:

- Run the integration tests with a Momento API key. Get one at the
[Momento Console](https://console.gomomento.com) for free. MVI is
available in AWS us-west-2 with a superuser key.
- `MOMENTO_API_KEY=<your key> poetry run pytest
tests/integration_tests/vectorstores/test_momento_vector_index.py`

**Tag maintainer:**

@eyurtsev

**Twitter handle**:

Please mention @momentohq for this addition to langchain. With the
integration of Momento Vector Index, Momento caching, and session store,
Momento provides serverless support for the core langchain data needs.

Also mention @mlonml for the integration.
2023-10-09 14:02:59 -07:00
Eugene Yurtsev
ca2eed36b7 LangChain cli fix a few bugs (#11573)
Code was assuming that `git` and `poetry` exist. In addition, it was not
ignoring pycache files that get generated during run time
2023-10-09 13:30:16 -07:00
MSFTeegarden
923e9f9596 Add Azure Redis example (#11570)
**Description**
This PR adds an additional Example to the Redis integration
documentation. [The
example](https://learn.microsoft.com/azure/azure-cache-for-redis/cache-tutorial-vector-similarity)
is a step-by-step walkthrough of using Azure Cache for Redis and Azure
OpenAI for vector similarity search, using LangChain extensively
throughout.

**Issue**
Nothing specific, just adding an additional example.

**Dependencies**
None.

**Tag Maintainer**
Tagging @hwchase17 :)
2023-10-09 13:27:03 -07:00
Hugues Chocart
258ae1ba5f [LLMonitor Callback Handler]: Add error handling (#11563)
Wraps every callback handler method in error handlers to avoid breaking
users' programs when an error occurs inside the handler.

Thanks @valdo99 for the suggestion 🙂
2023-10-09 13:26:35 -07:00
Eugene Yurtsev
2aabfafe1e Module documentation for langchain runnables (#11550)
Add in code documentation for langchain runnables module.
2023-10-09 16:02:29 -04:00
Eugene Yurtsev
d8fa94e6fa RunnablePassthrough: In code documentation (#11552)
Add in code documentation for a runnable passthrough
2023-10-09 16:02:16 -04:00
Eugene Yurtsev
b42f218cfc RunnableLambda: Add in code docs (#11521)
Add in code docs for Runnable Lambda
2023-10-09 14:37:46 -04:00
maks-operlejn-ds
f64522fbaf Reset deanonymizer mapping (#11559)
@hwchase17 @baskaryan
2023-10-09 11:11:05 -07:00
maks-operlejn-ds
b14b65d62a Support all presidio entities (#11558)
https://microsoft.github.io/presidio/supported_entities/

@baskaryan @hwchase17
2023-10-09 11:10:46 -07:00
maks-operlejn-ds
4d62def9ff Better deanonymizer matching strategy (#11557)
@baskaryan, @hwchase17
2023-10-09 11:10:29 -07:00
Ash Vardanian
a992b9670d Fix: Missing DuckDuckGo package version (#11535)
[The `duckduckgo-search` v3.9.2 was removed from
PyPi](https://pypi.org/project/duckduckgo-search/#history). That breaks
the build.

  - **Description:** refreshes the Poetry dependency to v3.9.3
  - **Tag maintainer:** @baskaryan
  - **Twitter handle:** @ashvardanian
2023-10-09 10:55:46 -07:00
Bagatur
0a754fa286 redirect langsmith guides (#11562) 2023-10-09 09:58:03 -07:00
Nuno Campos
2f2a5fd582 Update Dockerfile.base (#11556) 2023-10-09 16:43:04 +01:00
5852 changed files with 552113 additions and 231322 deletions

View File

@@ -17,13 +17,16 @@ For more info, check out the [GitHub documentation](https://docs.github.com/en/f
## VS Code Dev Containers
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
Note: If you click this link you will open the main repo and not your local cloned repo, you can use this link and replace with your username and cloned repo name:
Note: If you click the link above you will open the main repo (langchain-ai/langchain) and not your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
```
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/<yourusername>/<yourclonedreponame>
```
Then you will have a local cloned repo where you can contribute and then create pull requests.
If you already have VS Code and Docker installed, you can use the button above to get started. This will cause VS Code to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
You can also follow these steps to open this repo in a container using the VS Code Dev Containers extension:
Alternatively you can also follow these steps to open this repo in a container using the VS Code Dev Containers extension:
1. If this is your first time using a development container, please ensure your system meets the pre-reqs (i.e. have Docker installed) in the [getting started steps](https://aka.ms/vscode-remote/containers/getting-started).

132
.github/CODE_OF_CONDUCT.md vendored Normal file
View File

@@ -0,0 +1,132 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall
community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of
any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address,
without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
conduct@langchain.dev.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of
actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or permanent
ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the
community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
[https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

View File

@@ -1,20 +1,19 @@
# Contributing to LangChain
Hi there! Thank you for even being interested in contributing to LangChain.
As an open source project in a rapidly developing field, we are extremely open
to contributions, whether they be in the form of new features, improved infra, better documentation, or bug fixes.
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
## 🗺️ Guidelines
### 👩‍💻 Contributing Code
To contribute to this project, please follow a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
To contribute to this project, please follow the ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
Please do not try to push directly to this repo unless you are a maintainer.
Please follow the checked-in pull request template when opening pull requests. Note related issues and tag relevant
maintainers.
Pull requests cannot land without passing the formatting, linting and testing checks first. See [Testing](#testing) and
Pull requests cannot land without passing the formatting, linting, and testing checks first. See [Testing](#testing) and
[Formatting and Linting](#formatting-and-linting) for how to run these checks locally.
It's essential that we maintain great documentation and testing. If you:
@@ -24,19 +23,17 @@ It's essential that we maintain great documentation and testing. If you:
- Update any affected example notebooks and documentation. These live in `docs`.
- Update unit and integration tests when relevant.
- Add a feature
- Add a demo notebook in `docs/modules`.
- Add a demo notebook in `docs/docs/`.
- Add unit and integration tests.
We're a small, building-oriented team. If there's something you'd like to add or change, opening a pull request is the
We are a small, progress-oriented team. If there's something you'd like to add or change, opening a pull request is the
best way to get our attention.
### 🚩GitHub Issues
Our [issues](https://github.com/langchain-ai/langchain/issues) page is kept up to date
with bugs, improvements, and feature requests.
Our [issues](https://github.com/langchain-ai/langchain/issues) page is kept up to date with bugs, improvements, and feature requests.
There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help
organize issues.
There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help organize issues.
If you start working on an issue, please assign it to yourself.
@@ -59,12 +56,12 @@ we do not want these to get in the way of getting good code into the codebase.
## 🚀 Quick Start
This quick start describes running the repository locally.
This quick start guide explains how to run the repository locally.
For a [development container](https://containers.dev/), see the [.devcontainer folder](https://github.com/langchain-ai/langchain/tree/master/.devcontainer).
### Dependency Management: Poetry and other env/dependency managers
This project uses [Poetry](https://python-poetry.org/) v1.6.1+ as a dependency manager.
This project utilizes [Poetry](https://python-poetry.org/) v1.6.1+ as a dependency manager.
❗Note: *Before installing Poetry*, if you use `Conda`, create and activate a new Conda env (e.g. `conda create -n langchain python=3.9`)
@@ -73,16 +70,18 @@ Install Poetry: **[documentation on how to install it](https://python-poetry.org
❗Note: If you use `Conda` or `Pyenv` as your environment/package manager, after installing Poetry,
tell Poetry to use the virtualenv python environment (`poetry config virtualenvs.prefer-active-python true`)
### Core vs. Experimental
### Different packages
There are two separate projects in this repository:
- `langchain`: core langchain code, abstractions, and use cases
- `langchain.experimental`: see the [Experimental README](../libs/experimental/README.md) for more information.
This repository contains multiple packages:
- `langchain-core`: Base interfaces for key abstractions as well as logic for combining them in chains (LangChain Expression Language).
- `langchain-community`: Third-party integrations of various components.
- `langchain`: Chains, agents, and retrieval logic that makes up the cognitive architecture of your applications.
- `langchain-experimental`: Components and chains that are experimental, either in the sense that the techniques are novel and still being tested, or they require giving the LLM more access than would be possible in most production systems.
Each of these has their own development environment. Docs are run from the top-level makefile, but development
Each of these has its own development environment. Docs are run from the top-level makefile, but development
is split across separate test & release flows.
For this quickstart, start with langchain core:
For this quickstart, start with langchain:
```bash
cd libs/langchain
@@ -129,7 +128,25 @@ To run unit tests in Docker:
make docker_tests
```
There are also [integration tests and code-coverage](../libs/langchain/tests/README.md) available.
There are also [integration tests and code-coverage](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/tests/README.md) available.
### Only develop langchain_core or langchain_experimental
If you are only developing `langchain_core` or `langchain_experimental`, you can simply install the dependencies for the respective projects and run tests:
```bash
cd libs/core
poetry install --with test
make test
```
Or:
```bash
cd libs/experimental
poetry install --with test
make test
```
### Formatting and Linting
@@ -137,14 +154,21 @@ Run these locally before submitting a PR; the CI system will check also.
#### Code Formatting
Formatting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/) and [ruff](https://docs.astral.sh/ruff/rules/).
Formatting for this project is done via [ruff](https://docs.astral.sh/ruff/rules/).
To run formatting for this project:
To run formatting for docs, cookbook and templates:
```bash
make format
```
To run formatting for a library, run the same command from the relevant library directory:
```bash
cd libs/{LIBRARY}
make format
```
Additionally, you can run the formatter only on the files that have been modified in your current branch as compared to the master branch using the format_diff command:
```bash
@@ -155,14 +179,21 @@ This is especially useful when you have made changes to a subset of the project
#### Linting
Linting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/), [ruff](https://docs.astral.sh/ruff/rules/), and [mypy](http://mypy-lang.org/).
Linting for this project is done via a combination of [ruff](https://docs.astral.sh/ruff/rules/) and [mypy](http://mypy-lang.org/).
To run linting for this project:
To run linting for docs, cookbook and templates:
```bash
make lint
```
To run linting for a library, run the same command from the relevant library directory:
```bash
cd libs/{LIBRARY}
make lint
```
In addition, you can run the linter only on the files that have been modified in your current branch as compared to the master branch using the lint_diff command:
```bash
@@ -203,6 +234,10 @@ ignore-words-list = 'momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogy
Langchain relies heavily on optional dependencies to keep the Langchain package lightweight.
You only need to add a new dependency if a **unit test** relies on the package.
If your package is only required for **integration tests**, then you can skip these
steps and leave all pyproject.toml and poetry.lock files alone.
If you're adding a new dependency to Langchain, assume that it will be an optional dependency, and
that most users won't have it installed.
@@ -282,22 +317,64 @@ make docs_build
make api_docs_build
```
Finally, you can run the linkchecker to make sure all links are valid:
Finally, run the link checker to ensure all links are valid:
```bash
make docs_linkcheck
make api_docs_linkcheck
```
## 🏭 Release Process
### Verify Documentation changes
After pushing documentation changes to the repository, you can preview and verify that the changes are
what you wanted by clicking the `View deployment` or `Visit Preview` buttons on the pull request `Conversation` page.
This will take you to a preview of the documentation changes.
This preview is created by [Vercel](https://vercel.com/docs/getting-started-with-vercel).
## 📕 Releases & Versioning
As of now, LangChain has an ad hoc release process: releases are cut with high frequency by
a developer and published to [PyPI](https://pypi.org/project/langchain/).
a maintainer and published to [PyPI](https://pypi.org/).
The different packages are versioned slightly differently.
LangChain follows the [semver](https://semver.org/) versioning standard. However, as pre-1.0 software,
even patch releases may contain [non-backwards-compatible changes](https://semver.org/#spec-item-4).
### `langchain-core`
### 🌟 Recognition
`langchain-core` is currently on version `0.1.x`.
As `langchain-core` contains the base abstractions and runtime for the whole LangChain ecosystem, we will communicate any breaking changes with advance notice and version bumps. The exception for this is anything in `langchain_core.beta`. The reason for `langchain_core.beta` is that given the rate of change of the field, being able to move quickly is still a priority, and this module is our attempt to do so.
Minor version increases will occur for:
- Breaking changes for any public interfaces NOT in `langchain_core.beta`
Patch version increases will occur for:
- Bug fixes
- New features
- Any changes to private interfaces
- Any changes to `langchain_core.beta`
### `langchain`
`langchain` is currently on version `0.0.x`
All changes will be accompanied by a patch version increase. Any changes to public interfaces are nearly always done in a backwards compatible way and will be communicated ahead of time when they are not backwards compatible.
We are targeting January 2024 for a release of `langchain` v0.1, at which point `langchain` will adopt the same versioning policy as `langchain-core`.
### `langchain-community`
`langchain-community` is currently on version `0.0.x`
All changes will be accompanied by a patch version increase.
### `langchain-experimental`
`langchain-experimental` is currently on version `0.0.x`
All changes will be accompanied by a patch version increase.
## 🌟 Recognition
If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
If you have a Twitter account you would like us to mention, please let us know in the PR or in another manner.
If you have a Twitter account you would like us to mention, please let us know in the PR or through another means.

46
.github/scripts/check_diff.py vendored Normal file
View File

@@ -0,0 +1,46 @@
import json
import sys
ALL_DIRS = {
"libs/core",
"libs/langchain",
"libs/experimental",
"libs/community",
}
if __name__ == "__main__":
files = sys.argv[1:]
dirs_to_run = set()
for file in files:
if any(
file.startswith(dir_)
for dir_ in (
".github/workflows",
".github/tools",
".github/actions",
"libs/core",
".github/scripts/check_diff.py",
)
):
dirs_to_run = ALL_DIRS
break
elif "libs/community" in file:
dirs_to_run.update(
("libs/community", "libs/langchain", "libs/experimental")
)
elif "libs/partners" in file:
partner_dir = file.split("/")[2]
dirs_to_run.update(
(f"libs/partners/{partner_dir}", "libs/langchain", "libs/experimental")
)
elif "libs/langchain" in file:
dirs_to_run.update(("libs/langchain", "libs/experimental"))
elif "libs/experimental" in file:
dirs_to_run.add("libs/experimental")
elif file.startswith("libs/"):
dirs_to_run = ALL_DIRS
break
else:
pass
print(json.dumps(list(dirs_to_run)))

105
.github/workflows/_all_ci.yml vendored Normal file
View File

@@ -0,0 +1,105 @@
---
name: langchain CI
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
workflow_dispatch:
inputs:
working-directory:
required: true
type: choice
default: 'libs/langchain'
options:
- libs/langchain
- libs/core
- libs/experimental
- libs/community
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ inputs.working-directory }}
cancel-in-progress: true
env:
POETRY_VERSION: "1.6.1"
jobs:
lint:
uses: ./.github/workflows/_lint.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
test:
uses: ./.github/workflows/_test.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
compile-integration-tests:
uses: ./.github/workflows/_compile_integration_test.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
dependencies:
uses: ./.github/workflows/_dependencies.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
extended-tests:
runs-on: ubuntu-latest
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }} extended tests
defaults:
run:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: extended
- name: Install dependencies
shell: bash
run: |
echo "Running extended tests, installing dependencies with poetry..."
poetry install -E extended_testing --with test
- name: Run extended tests
run: make extended_tests
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

View File

@@ -0,0 +1,57 @@
name: compile-integration-test
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
env:
POETRY_VERSION: "1.6.1"
jobs:
build:
defaults:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: compile-integration
- name: Install integration dependencies
shell: bash
run: poetry install --with=test_integration,test
- name: Check integration tests compile
shell: bash
run: poetry run pytest -m compile tests/integration_tests
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

113
.github/workflows/_dependencies.yml vendored Normal file
View File

@@ -0,0 +1,113 @@
name: dependencies
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
langchain-location:
required: false
type: string
description: "Relative path to the langchain library folder"
env:
POETRY_VERSION: "1.6.1"
jobs:
build:
defaults:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: dependencies - Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: pydantic-cross-compat
- name: Install dependencies
shell: bash
run: poetry install
- name: Check imports with base dependencies
shell: bash
run: poetry run make check_imports
- name: Install test dependencies
shell: bash
run: poetry install --with test
- name: Install langchain editable
working-directory: ${{ inputs.working-directory }}
if: ${{ inputs.langchain-location }}
env:
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
run: |
poetry run pip install -e "$LANGCHAIN_LOCATION"
- name: Install the opposite major version of pydantic
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
shell: bash
run: |
# Determine the major part of pydantic version
REGULAR_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
if [[ "$REGULAR_VERSION" == "1" ]]; then
PYDANTIC_DEP=">=2.1,<3"
TEST_WITH_VERSION="2"
elif [[ "$REGULAR_VERSION" == "2" ]]; then
PYDANTIC_DEP="<2"
TEST_WITH_VERSION="1"
else
echo "Unexpected pydantic major version '$REGULAR_VERSION', cannot determine which version to use for cross-compatibility test."
exit 1
fi
# Install via `pip` instead of `poetry add` to avoid changing lockfile,
# which would prevent caching from working: the cache would get saved
# to a different key than where it gets loaded from.
poetry run pip install "pydantic${PYDANTIC_DEP}"
# Ensure that the correct pydantic is installed now.
echo "Checking pydantic version... Expecting ${TEST_WITH_VERSION}"
# Determine the major part of pydantic version
CURRENT_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
# Check that the major part of pydantic version is as expected, if not
# raise an error
if [[ "$CURRENT_VERSION" != "$TEST_WITH_VERSION" ]]; then
echo "Error: expected pydantic version ${CURRENT_VERSION} to have been installed, but found: ${TEST_WITH_VERSION}"
exit 1
fi
echo "Found pydantic version ${CURRENT_VERSION}, as expected"
- name: Run pydantic compatibility tests
shell: bash
run: make test
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

View File

@@ -7,20 +7,21 @@ on:
required: true
type: string
description: "From which folder this pipeline executes"
langchain-location:
required: false
type: string
description: "Relative path to the langchain library folder"
env:
POETRY_VERSION: "1.6.1"
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
# This env var allows us to get inline annotations when ruff has complaints.
RUFF_OUTPUT_FORMAT: github
jobs:
build:
runs-on: ubuntu-latest
env:
# This number is set "by eye": we want it to be big enough
# so that it's bigger than the number of commits in any reasonable PR,
# and also as small as possible since increasing the number makes
# the initial `git fetch` slower.
FETCH_DEPTH: 50
strategy:
matrix:
# Only lint on the min and max supported Python versions.
@@ -34,52 +35,7 @@ jobs:
- "3.8"
- "3.11"
steps:
- uses: actions/checkout@v3
with:
# Fetch the last FETCH_DEPTH commits, so the mtime-changing script
# can accurately set the mtimes of files modified in the last FETCH_DEPTH commits.
fetch-depth: ${{ env.FETCH_DEPTH }}
- name: Restore workdir file mtimes to last-edited commit date
id: restore-mtimes
# This is needed to make black caching work.
# Black's cache uses file (mtime, size) to check whether a lookup is a cache hit.
# Without this command, files in the repo would have the current time as the modified time,
# since the previous action step just created them.
# This command resets the mtime to the last time the files were modified in git instead,
# which is a high-quality and stable representation of the last modification date.
run: |
# Important considerations:
# - These commands run at base of the repo, since we never `cd` to the `WORKDIR`.
# - We only want to alter mtimes for Python files, since that's all black checks.
# - We don't need to alter mtimes for directories, since black doesn't look at those.
# - We also only alter mtimes inside the `WORKDIR` since that's all we'll lint.
# - This should run before `poetry install`, because poetry's venv also contains
# Python files, and we don't want to alter their mtimes since they aren't linted.
# Ensure we fail on non-zero exits and on undefined variables.
# Also print executed commands, for easier debugging.
set -eux
# Restore the mtimes of Python files in the workdir based on git history.
.github/tools/git-restore-mtime --no-directories "$WORKDIR/**/*.py"
# Since CI only does a partial fetch (to `FETCH_DEPTH`) for efficiency,
# the local git repo doesn't have full history. There are probably files
# that were last modified in a commit *older than* the oldest fetched commit.
# After `git-restore-mtime`, such files have a mtime set to the oldest fetched commit.
#
# As new commits get added, that timestamp will keep moving forward.
# If left unchanged, this will make `black` think that the files were edited
# more recently than its cache suggests. Instead, we can set their mtime
# to a fixed date in the far past that won't change and won't cause cache misses in black.
#
# For all workdir Python files modified in or before the oldest few fetched commits,
# make their mtime be 2000-01-01 00:00:00.
OLDEST_COMMIT="$(git log --reverse '--pretty=format:%H' | head -1)"
OLDEST_COMMIT_TIME="$(git show -s '--format=%ai' "$OLDEST_COMMIT")"
find "$WORKDIR" -name '*.py' -type f -not -newermt "$OLDEST_COMMIT_TIME" -exec touch -c -m -t '200001010000' '{}' '+'
echo "oldest-commit=$OLDEST_COMMIT" >> "$GITHUB_OUTPUT"
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
@@ -112,26 +68,15 @@ jobs:
# It doesn't matter how you change it, any change will cause a cache-bust.
working-directory: ${{ inputs.working-directory }}
run: |
poetry install --with dev,lint,test,typing
poetry install --with lint,typing
- name: Install langchain editable
working-directory: ${{ inputs.working-directory }}
if: ${{ inputs.working-directory != 'libs/langchain' }}
run: |
pip install -e ../langchain
- name: Restore black cache
uses: actions/cache@v3
if: ${{ inputs.langchain-location }}
env:
CACHE_BASE: black-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "1"
with:
path: |
${{ env.WORKDIR }}/.black_cache
key: ${{ env.CACHE_BASE }}-${{ steps.restore-mtimes.outputs.oldest-commit }}
restore-keys:
# If we can't find an exact match for our cache key, accept any with this prefix.
${{ env.CACHE_BASE }}-
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
run: |
poetry run pip install -e "$LANGCHAIN_LOCATION"
- name: Get .mypy_cache to speed up mypy
uses: actions/cache@v3
@@ -140,11 +85,37 @@ jobs:
with:
path: |
${{ env.WORKDIR }}/.mypy_cache
key: mypy-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
key: mypy-lint-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
- name: Analysing the code with our lint
working-directory: ${{ inputs.working-directory }}
env:
BLACK_CACHE_DIR: .black_cache
run: |
make lint
make lint_package
- name: Install test dependencies
# Also installs dev/lint/test/typing dependencies, to ensure we have
# type hints for as many of our libraries as possible.
# This helps catch errors that require dependencies to be spotted, for example:
# https://github.com/langchain-ai/langchain/pull/10249/files#diff-935185cd488d015f026dcd9e19616ff62863e8cde8c0bee70318d3ccbca98341
#
# If you change this configuration, make sure to change the `cache-key`
# in the `poetry_setup` action above to stop using the old cache.
# It doesn't matter how you change it, any change will cause a cache-bust.
working-directory: ${{ inputs.working-directory }}
run: |
poetry install --with test
- name: Get .mypy_cache_test to speed up mypy
uses: actions/cache@v3
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
with:
path: |
${{ env.WORKDIR }}/.mypy_cache_test
key: mypy-test-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
- name: Analysing the code with our lint
working-directory: ${{ inputs.working-directory }}
run: |
make lint_tests

View File

@@ -1,93 +0,0 @@
name: pydantic v1/v2 compatibility
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
env:
POETRY_VERSION: "1.6.1"
jobs:
build:
defaults:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: Pydantic v1/v2 compatibility - Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: pydantic-cross-compat
- name: Install dependencies
shell: bash
run: poetry install
- name: Install the opposite major version of pydantic
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
shell: bash
run: |
# Determine the major part of pydantic version
REGULAR_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
if [[ "$REGULAR_VERSION" == "1" ]]; then
PYDANTIC_DEP=">=2.1,<3"
TEST_WITH_VERSION="2"
elif [[ "$REGULAR_VERSION" == "2" ]]; then
PYDANTIC_DEP="<2"
TEST_WITH_VERSION="1"
else
echo "Unexpected pydantic major version '$REGULAR_VERSION', cannot determine which version to use for cross-compatibility test."
exit 1
fi
# Install via `pip` instead of `poetry add` to avoid changing lockfile,
# which would prevent caching from working: the cache would get saved
# to a different key than where it gets loaded from.
poetry run pip install "pydantic${PYDANTIC_DEP}"
# Ensure that the correct pydantic is installed now.
echo "Checking pydantic version... Expecting ${TEST_WITH_VERSION}"
# Determine the major part of pydantic version
CURRENT_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
# Check that the major part of pydantic version is as expected, if not
# raise an error
if [[ "$CURRENT_VERSION" != "$TEST_WITH_VERSION" ]]; then
echo "Error: expected pydantic version ${CURRENT_VERSION} to have been installed, but found: ${TEST_WITH_VERSION}"
exit 1
fi
echo "Found pydantic version ${CURRENT_VERSION}, as expected"
- name: Run pydantic compatibility tests
shell: bash
run: make test
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

View File

@@ -7,15 +7,133 @@ on:
required: true
type: string
description: "From which folder this pipeline executes"
workflow_dispatch:
inputs:
working-directory:
required: true
type: choice
default: 'libs/langchain'
options:
- libs/langchain
- libs/core
- libs/experimental
- libs/community
env:
PYTHON_VERSION: "3.10"
POETRY_VERSION: "1.6.1"
jobs:
if_release:
# Disallow publishing from branches that aren't `master`.
build:
if: github.ref == 'refs/heads/master'
runs-on: ubuntu-latest
outputs:
pkg-name: ${{ steps.check-version.outputs.pkg-name }}
version: ${{ steps.check-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
# We want to keep this build stage *separate* from the release stage,
# so that there's no sharing of permissions between them.
# The release stage has trusted publishing and GitHub repo contents write access,
# and we want to keep the scope of that access limited just to the release job.
# Otherwise, a malicious `build` step (e.g. via a compromised dependency)
# could get access to our GitHub or PyPI credentials.
#
# Per the trusted publishing GitHub Action:
# > It is strongly advised to separate jobs for building [...]
# > from the publish job.
# https://github.com/pypa/gh-action-pypi-publish#non-goals
- name: Build project for distribution
run: poetry build
working-directory: ${{ inputs.working-directory }}
- name: Upload build
uses: actions/upload-artifact@v3
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Check Version
id: check-version
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
test-pypi-publish:
needs:
- build
uses:
./.github/workflows/_test_release.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
pre-release-checks:
needs:
- build
- test-pypi-publish
runs-on: ubuntu-latest
steps:
# We explicitly *don't* set up caching here. This ensures our tests are
# maximally sensitive to catching breakage.
#
# For example, here's a way that caching can cause a falsely-passing test:
# - Make the langchain package manifest no longer list a dependency package
# as a requirement. This means it won't be installed by `pip install`,
# and attempting to use it would cause a crash.
# - That dependency used to be required, so it may have been cached.
# When restoring the venv packages from cache, that dependency gets included.
# - Tests pass, because the dependency is present even though it wasn't specified.
# - The package is published, and it breaks on the missing dependency when
# used in the real world.
- uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Test published package
shell: bash
env:
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
VERSION: ${{ needs.build.outputs.version }}
# Here we use:
# - The default regular PyPI index as the *primary* index, meaning
# that it takes priority (https://pypi.org/simple)
# - The test PyPI index as an extra index, so that any dependencies that
# are not found on test PyPI can be resolved and installed anyway.
# (https://test.pypi.org/simple). This will include the PKG_NAME==VERSION
# package because VERSION will not have been uploaded to regular PyPI yet.
#
# TODO: add more in-depth pre-publish tests after testing that importing works
run: |
pip install \
--extra-index-url https://test.pypi.org/simple/ \
"$PKG_NAME==$VERSION"
# Replace all dashes in the package name with underscores,
# since that's how Python imports packages with dashes in the name.
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g)"
python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
publish:
needs:
- build
- test-pypi-publish
- pre-release-checks
runs-on: ubuntu-latest
permissions:
# This permission is used for trusted publishing:
# https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
@@ -24,28 +142,65 @@ jobs:
# https://docs.pypi.org/trusted-publishers/adding-a-publisher/
id-token: write
# This permission is needed by `ncipollo/release-action` to create the GitHub release.
contents: write
defaults:
run:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: "3.10"
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
- name: Build project for distribution
run: poetry build
- name: Check Version
id: check-version
run: |
echo version=$(poetry version --short) >> $GITHUB_OUTPUT
- uses: actions/download-artifact@v3
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: ${{ inputs.working-directory }}/dist/
verbose: true
print-hash: true
mark-release:
needs:
- build
- test-pypi-publish
- pre-release-checks
- publish
runs-on: ubuntu-latest
permissions:
# This permission is needed by `ncipollo/release-action` to
# create the GitHub release.
contents: write
defaults:
run:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v4
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
- uses: actions/download-artifact@v3
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Create Release
uses: ncipollo/release-action@v1
if: ${{ inputs.working-directory == 'libs/langchain' }}
@@ -54,11 +209,5 @@ jobs:
token: ${{ secrets.GITHUB_TOKEN }}
draft: false
generateReleaseNotes: true
tag: v${{ steps.check-version.outputs.version }}
tag: v${{ needs.build.outputs.version }}
commit: master
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: ${{ inputs.working-directory }}/dist/
verbose: true
print-hash: true

View File

@@ -7,6 +7,10 @@ on:
required: true
type: string
description: "From which folder this pipeline executes"
langchain-location:
required: false
type: string
description: "Relative path to the langchain library folder"
env:
POETRY_VERSION: "1.6.1"
@@ -26,7 +30,7 @@ jobs:
- "3.11"
name: Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
@@ -38,11 +42,20 @@ jobs:
- name: Install dependencies
shell: bash
run: poetry install
run: poetry install --with test
- name: Install langchain editable
working-directory: ${{ inputs.working-directory }}
if: ${{ inputs.langchain-location }}
env:
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
run: |
poetry run pip install -e "$LANGCHAIN_LOCATION"
- name: Run core tests
shell: bash
run: make test
run: |
make test
- name: Ensure the tests did not create any additional files
shell: bash

95
.github/workflows/_test_release.yml vendored Normal file
View File

@@ -0,0 +1,95 @@
name: test-release
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
env:
POETRY_VERSION: "1.6.1"
PYTHON_VERSION: "3.10"
jobs:
build:
if: github.ref == 'refs/heads/master'
runs-on: ubuntu-latest
outputs:
pkg-name: ${{ steps.check-version.outputs.pkg-name }}
version: ${{ steps.check-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
# We want to keep this build stage *separate* from the release stage,
# so that there's no sharing of permissions between them.
# The release stage has trusted publishing and GitHub repo contents write access,
# and we want to keep the scope of that access limited just to the release job.
# Otherwise, a malicious `build` step (e.g. via a compromised dependency)
# could get access to our GitHub or PyPI credentials.
#
# Per the trusted publishing GitHub Action:
# > It is strongly advised to separate jobs for building [...]
# > from the publish job.
# https://github.com/pypa/gh-action-pypi-publish#non-goals
- name: Build project for distribution
run: poetry build
working-directory: ${{ inputs.working-directory }}
- name: Upload build
uses: actions/upload-artifact@v3
with:
name: test-dist
path: ${{ inputs.working-directory }}/dist/
- name: Check Version
id: check-version
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
publish:
needs:
- build
runs-on: ubuntu-latest
permissions:
# This permission is used for trusted publishing:
# https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
#
# Trusted publishing has to also be configured on PyPI for each package:
# https://docs.pypi.org/trusted-publishers/adding-a-publisher/
id-token: write
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v3
with:
name: test-dist
path: ${{ inputs.working-directory }}/dist/
- name: Publish to test PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: ${{ inputs.working-directory }}/dist/
verbose: true
print-hash: true
repository-url: https://test.pypi.org/legacy/
# We overwrite any existing distributions with the same name and version.
# This is *only for CI use* and is *extremely dangerous* otherwise!
# https://github.com/pypa/gh-action-pypi-publish#tolerating-release-package-file-duplicates
skip-existing: true

47
.github/workflows/check_diffs.yml vendored Normal file
View File

@@ -0,0 +1,47 @@
---
name: Check library diffs
on:
push:
branches: [master]
pull_request:
paths:
- ".github/actions/**"
- ".github/tools/**"
- ".github/workflows/**"
- "libs/**"
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- id: files
uses: Ana06/get-changed-files@v2.2.0
- id: set-matrix
run: echo "dirs-to-run=$(python .github/scripts/check_diff.py ${{ steps.files.outputs.all }})" >> $GITHUB_OUTPUT
outputs:
dirs-to-run: ${{ steps.set-matrix.outputs.dirs-to-run }}
ci:
needs: [ build ]
strategy:
matrix:
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-run) }}
uses: ./.github/workflows/_all_ci.yml
with:
working-directory: ${{ matrix.working-directory }}

View File

@@ -17,7 +17,7 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Install Dependencies
run: |

View File

@@ -1,11 +1,17 @@
---
name: Documentation Lint
name: Docs, templates, cookbook lint
on:
push:
branches: [master]
branches: [ master ]
pull_request:
branches: [master]
paths:
- 'docs/**'
- 'templates/**'
- 'cookbook/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/doc_lint.yml'
workflow_dispatch:
jobs:
check:
@@ -13,10 +19,17 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v4
- name: Run import check
run: |
# We should not encourage imports directly from main init file
# Expect for hub
git grep 'from langchain import' docs/{extras,docs_skeleton,snippets} | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
git grep 'from langchain import' {docs/docs,templates,cookbook} | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
lint:
uses:
./.github/workflows/_lint.yml
with:
working-directory: "."
secrets: inherit

View File

@@ -3,6 +3,8 @@ import toml
pyproject_toml = toml.load("pyproject.toml")
# Extract the ignore words list (adjust the key as per your TOML structure)
ignore_words_list = pyproject_toml.get("tool", {}).get("codespell", {}).get("ignore-words-list")
ignore_words_list = (
pyproject_toml.get("tool", {}).get("codespell", {}).get("ignore-words-list")
)
print(f"::set-output name=ignore_words_list::{ignore_words_list}")
print(f"::set-output name=ignore_words_list::{ignore_words_list}")

View File

@@ -1,97 +0,0 @@
---
name: libs/langchain CI
on:
push:
branches: [ master ]
pull_request:
paths:
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/_test.yml'
- '.github/workflows/_pydantic_compatibility.yml'
- '.github/workflows/langchain_ci.yml'
- 'libs/langchain/**'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
POETRY_VERSION: "1.6.1"
WORKDIR: "libs/langchain"
jobs:
lint:
uses:
./.github/workflows/_lint.yml
with:
working-directory: libs/langchain
secrets: inherit
test:
uses:
./.github/workflows/_test.yml
with:
working-directory: libs/langchain
secrets: inherit
pydantic-compatibility:
uses:
./.github/workflows/_pydantic_compatibility.yml
with:
working-directory: libs/langchain
secrets: inherit
extended-tests:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ${{ env.WORKDIR }}
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }} extended tests
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: libs/langchain
cache-key: extended
- name: Install dependencies
shell: bash
run: |
echo "Running extended tests, installing dependencies with poetry..."
poetry install -E extended_testing
- name: Run extended tests
run: make extended_tests
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

View File

@@ -0,0 +1,13 @@
---
name: libs/cli Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_release.yml
with:
working-directory: libs/cli
secrets: inherit

View File

@@ -0,0 +1,13 @@
---
name: libs/community Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_release.yml
with:
working-directory: libs/community
secrets: inherit

View File

@@ -0,0 +1,13 @@
---
name: libs/core Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_release.yml
with:
working-directory: libs/core
secrets: inherit

View File

@@ -1,129 +0,0 @@
---
name: libs/experimental CI
on:
push:
branches: [ master ]
pull_request:
paths:
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/_test.yml'
- '.github/workflows/langchain_experimental_ci.yml'
- 'libs/langchain/**'
- 'libs/experimental/**'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
POETRY_VERSION: "1.6.1"
WORKDIR: "libs/experimental"
jobs:
lint:
uses:
./.github/workflows/_lint.yml
with:
working-directory: libs/experimental
secrets: inherit
test:
uses:
./.github/workflows/_test.yml
with:
working-directory: libs/experimental
secrets: inherit
# It's possible that langchain-experimental works fine with the latest *published* langchain,
# but is broken with the langchain on `master`.
#
# We want to catch situations like that *before* releasing a new langchain, hence this test.
test-with-latest-langchain:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ${{ env.WORKDIR }}
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: test with unpublished langchain - Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ env.WORKDIR }}
cache-key: unpublished-langchain
- name: Install dependencies
shell: bash
run: |
echo "Running tests with unpublished langchain, installing dependencies with poetry..."
poetry install
echo "Editably installing langchain outside of poetry, to avoid messing up lockfile..."
poetry run pip install -e ../langchain
- name: Run tests
run: make test
extended-tests:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ${{ env.WORKDIR }}
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }} extended tests
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: libs/experimental
cache-key: extended
- name: Install dependencies
shell: bash
run: |
echo "Running extended tests, installing dependencies with poetry..."
poetry install -E extended_testing
- name: Run extended tests
run: make extended_tests
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

View File

@@ -0,0 +1,13 @@
---
name: Experimental Test Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_test_release.yml
with:
working-directory: libs/experimental
secrets: inherit

View File

@@ -0,0 +1,13 @@
---
name: libs/core Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_release.yml
with:
working-directory: libs/core
secrets: inherit

View File

@@ -0,0 +1,13 @@
---
name: Test Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_test_release.yml
with:
working-directory: libs/langchain
secrets: inherit

View File

@@ -24,7 +24,7 @@ jobs:
- "3.11"
name: Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: "./.github/actions/poetry_setup"
@@ -52,15 +52,20 @@ jobs:
shell: bash
run: |
echo "Running scheduled tests, installing dependencies with poetry..."
poetry install --with=test_integration
poetry run pip install google-cloud-aiplatform
poetry run pip install "boto3>=1.28.57"
poetry install --with=test_integration,test
- name: Run tests
shell: bash
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
run: |
make scheduled_tests

36
.github/workflows/templates_ci.yml vendored Normal file
View File

@@ -0,0 +1,36 @@
---
name: templates CI
on:
push:
branches: [ master ]
pull_request:
paths:
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/templates_ci.yml'
- 'templates/**'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
POETRY_VERSION: "1.6.1"
WORKDIR: "templates"
jobs:
lint:
uses:
./.github/workflows/_lint.yml
with:
working-directory: templates
secrets: inherit

11
.gitignore vendored
View File

@@ -167,13 +167,14 @@ docs/node_modules/
docs/.docusaurus/
docs/.cache-loader/
docs/_dist
docs/api_reference/api_reference.rst
docs/api_reference/experimental_api_reference.rst
docs/api_reference/*api_reference.rst
docs/api_reference/_build
docs/api_reference/*/
!docs/api_reference/_static/
!docs/api_reference/templates/
!docs/api_reference/themes/
docs/docs_skeleton/build
docs/docs_skeleton/node_modules
docs/docs_skeleton/yarn.lock
docs/docs/build
docs/docs/node_modules
docs/docs/yarn.lock
_dist
docs/docs/templates

4
.gitmodules vendored
View File

@@ -1,4 +0,0 @@
[submodule "docs/_docs_skeleton"]
path = docs/_docs_skeleton
url = https://github.com/langchain-ai/langchain-shared-docs
branch = main

View File

@@ -9,9 +9,14 @@ build:
os: ubuntu-22.04
tools:
python: "3.11"
jobs:
pre_build:
commands:
- python -mvirtualenv $READTHEDOCS_VIRTUALENV_PATH
- python -m pip install --upgrade --no-cache-dir pip setuptools
- python -m pip install --upgrade --no-cache-dir sphinx readthedocs-sphinx-ext
- python -m pip install --exists-action=w --no-cache-dir -r docs/api_reference/requirements.txt
- python docs/api_reference/create_api_rst.py
- cat docs/api_reference/conf.py
- python -m sphinx -T -E -b html -d _build/doctrees -c docs/api_reference docs/api_reference $READTHEDOCS_OUTPUT/html -j auto
# Build documentation in the docs/ directory with Sphinx
sphinx:

View File

@@ -0,0 +1,9 @@
"""Main entrypoint into package."""
from importlib import metadata
try:
__version__ = metadata.version(__package__)
except metadata.PackageNotFoundError:
# Case where package metadata is not available.
__version__ = ""
del metadata # optional, avoids polluting the results of dir(__package__)

View File

@@ -0,0 +1,79 @@
"""Agent toolkits contain integrations with various resources and services.
LangChain has a large ecosystem of integrations with various external resources
like local and remote file systems, APIs and databases.
These integrations allow developers to create versatile applications that combine the
power of LLMs with the ability to access, interact with and manipulate external
resources.
When developing an application, developers should inspect the capabilities and
permissions of the tools that underlie the given agent toolkit, and determine
whether permissions of the given toolkit are appropriate for the application.
See [Security](https://python.langchain.com/docs/security) for more information.
"""
from langchain_community.agent_toolkits.ainetwork.toolkit import AINetworkToolkit
from langchain_community.agent_toolkits.amadeus.toolkit import AmadeusToolkit
from langchain_community.agent_toolkits.azure_cognitive_services import (
AzureCognitiveServicesToolkit,
)
from langchain_community.agent_toolkits.conversational_retrieval.openai_functions import ( # noqa: E501
create_conversational_retrieval_agent,
)
from langchain_community.agent_toolkits.file_management.toolkit import (
FileManagementToolkit,
)
from langchain_community.agent_toolkits.gmail.toolkit import GmailToolkit
from langchain_community.agent_toolkits.jira.toolkit import JiraToolkit
from langchain_community.agent_toolkits.json.base import create_json_agent
from langchain_community.agent_toolkits.json.toolkit import JsonToolkit
from langchain_community.agent_toolkits.multion.toolkit import MultionToolkit
from langchain_community.agent_toolkits.nasa.toolkit import NasaToolkit
from langchain_community.agent_toolkits.nla.toolkit import NLAToolkit
from langchain_community.agent_toolkits.office365.toolkit import O365Toolkit
from langchain_community.agent_toolkits.openapi.base import create_openapi_agent
from langchain_community.agent_toolkits.openapi.toolkit import OpenAPIToolkit
from langchain_community.agent_toolkits.playwright.toolkit import (
PlayWrightBrowserToolkit,
)
from langchain_community.agent_toolkits.powerbi.base import create_pbi_agent
from langchain_community.agent_toolkits.powerbi.chat_base import create_pbi_chat_agent
from langchain_community.agent_toolkits.powerbi.toolkit import PowerBIToolkit
from langchain_community.agent_toolkits.slack.toolkit import SlackToolkit
from langchain_community.agent_toolkits.spark_sql.base import create_spark_sql_agent
from langchain_community.agent_toolkits.spark_sql.toolkit import SparkSQLToolkit
from langchain_community.agent_toolkits.sql.base import create_sql_agent
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
from langchain_community.agent_toolkits.steam.toolkit import SteamToolkit
from langchain_community.agent_toolkits.zapier.toolkit import ZapierToolkit
__all__ = [
"AINetworkToolkit",
"AmadeusToolkit",
"AzureCognitiveServicesToolkit",
"FileManagementToolkit",
"GmailToolkit",
"JiraToolkit",
"JsonToolkit",
"MultionToolkit",
"NasaToolkit",
"NLAToolkit",
"O365Toolkit",
"OpenAPIToolkit",
"PlayWrightBrowserToolkit",
"PowerBIToolkit",
"SlackToolkit",
"SteamToolkit",
"SQLDatabaseToolkit",
"SparkSQLToolkit",
"ZapierToolkit",
"create_json_agent",
"create_openapi_agent",
"create_pbi_agent",
"create_pbi_chat_agent",
"create_spark_sql_agent",
"create_sql_agent",
"create_conversational_retrieval_agent",
]

View File

@@ -0,0 +1,53 @@
"""Json agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_community.agent_toolkits.json.prompt import JSON_PREFIX, JSON_SUFFIX
from langchain_community.agent_toolkits.json.toolkit import JsonToolkit
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
def create_json_agent(
llm: BaseLanguageModel,
toolkit: JsonToolkit,
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = JSON_PREFIX,
suffix: str = JSON_SUFFIX,
format_instructions: Optional[str] = None,
input_variables: Optional[List[str]] = None,
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct a json agent from an LLM and tools."""
from langchain.agents.agent import AgentExecutor
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.chains.llm import LLMChain
tools = toolkit.get_tools()
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=input_variables,
**prompt_params,
)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
)
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
**(agent_executor_kwargs or {}),
)

View File

@@ -0,0 +1,57 @@
"""Tool for interacting with a single API with natural language definition."""
from __future__ import annotations
from typing import Any, Optional, TYPE_CHECKING
from langchain_core.language_models import BaseLanguageModel
from langchain_core.tools import Tool
from langchain_community.tools.openapi.utils.api_models import APIOperation
from langchain_community.tools.openapi.utils.openapi_utils import OpenAPISpec
from langchain_community.utilities.requests import Requests
if TYPE_CHECKING:
from langchain.chains.api.openapi.chain import OpenAPIEndpointChain
class NLATool(Tool):
"""Natural Language API Tool."""
@classmethod
def from_open_api_endpoint_chain(
cls, chain: OpenAPIEndpointChain, api_title: str
) -> "NLATool":
"""Convert an endpoint chain to an API endpoint tool."""
expanded_name = (
f'{api_title.replace(" ", "_")}.{chain.api_operation.operation_id}'
)
description = (
f"I'm an AI from {api_title}. Instruct what you want,"
" and I'll assist via an API with description:"
f" {chain.api_operation.description}"
)
return cls(name=expanded_name, func=chain.run, description=description)
@classmethod
def from_llm_and_method(
cls,
llm: BaseLanguageModel,
path: str,
method: str,
spec: OpenAPISpec,
requests: Optional[Requests] = None,
verbose: bool = False,
return_intermediate_steps: bool = False,
**kwargs: Any,
) -> "NLATool":
"""Instantiate the tool from the specified path and method."""
api_operation = APIOperation.from_openapi_spec(spec, path, method)
chain = OpenAPIEndpointChain.from_api_operation(
api_operation,
llm,
requests=requests,
verbose=verbose,
return_intermediate_steps=return_intermediate_steps,
**kwargs,
)
return cls.from_open_api_endpoint_chain(chain, spec.info.title)

View File

@@ -0,0 +1,77 @@
"""OpenAPI spec agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_community.agent_toolkits.openapi.prompt import (
OPENAPI_PREFIX,
OPENAPI_SUFFIX,
)
from langchain_community.agent_toolkits.openapi.toolkit import OpenAPIToolkit
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
def create_openapi_agent(
llm: BaseLanguageModel,
toolkit: OpenAPIToolkit,
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = OPENAPI_PREFIX,
suffix: str = OPENAPI_SUFFIX,
format_instructions: Optional[str] = None,
input_variables: Optional[List[str]] = None,
max_iterations: Optional[int] = 15,
max_execution_time: Optional[float] = None,
early_stopping_method: str = "force",
verbose: bool = False,
return_intermediate_steps: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct an OpenAPI agent from an LLM and tools.
*Security Note*: When creating an OpenAPI agent, check the permissions
and capabilities of the underlying toolkit.
For example, if the default implementation of OpenAPIToolkit
uses the RequestsToolkit which contains tools to make arbitrary
network requests against any URL (e.g., GET, POST, PATCH, PUT, DELETE),
Control access to who can submit issue requests using this toolkit and
what network access it has.
See https://python.langchain.com/docs/security for more information.
"""
from langchain.agents.agent import AgentExecutor
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.chains.llm import LLMChain
tools = toolkit.get_tools()
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=input_variables,
**prompt_params
)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
)
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
return_intermediate_steps=return_intermediate_steps,
max_iterations=max_iterations,
max_execution_time=max_execution_time,
early_stopping_method=early_stopping_method,
**(agent_executor_kwargs or {}),
)

View File

@@ -0,0 +1,370 @@
"""Agent that interacts with OpenAPI APIs via a hierarchical planning approach."""
import json
import re
from functools import partial
from typing import Any, Callable, Dict, List, Optional, TYPE_CHECKING
import yaml
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_core.prompts import BasePromptTemplate, PromptTemplate
from langchain_core.pydantic_v1 import Field
from langchain_core.tools import BaseTool, Tool
from langchain_community.llms import OpenAI
from langchain_community.agent_toolkits.openapi.planner_prompt import (
API_CONTROLLER_PROMPT,
API_CONTROLLER_TOOL_DESCRIPTION,
API_CONTROLLER_TOOL_NAME,
API_ORCHESTRATOR_PROMPT,
API_PLANNER_PROMPT,
API_PLANNER_TOOL_DESCRIPTION,
API_PLANNER_TOOL_NAME,
PARSING_DELETE_PROMPT,
PARSING_GET_PROMPT,
PARSING_PATCH_PROMPT,
PARSING_POST_PROMPT,
PARSING_PUT_PROMPT,
REQUESTS_DELETE_TOOL_DESCRIPTION,
REQUESTS_GET_TOOL_DESCRIPTION,
REQUESTS_PATCH_TOOL_DESCRIPTION,
REQUESTS_POST_TOOL_DESCRIPTION,
REQUESTS_PUT_TOOL_DESCRIPTION,
)
from langchain_community.agent_toolkits.openapi.spec import ReducedOpenAPISpec
from langchain_community.tools.requests.tool import BaseRequestsTool
from langchain_community.utilities.requests import RequestsWrapper
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
from langchain.chains.llm import LLMChain
from langchain.memory import ReadOnlySharedMemory
#
# Requests tools with LLM-instructed extraction of truncated responses.
#
# Of course, truncating so bluntly may lose a lot of valuable
# information in the response.
# However, the goal for now is to have only a single inference step.
MAX_RESPONSE_LENGTH = 5000
"""Maximum length of the response to be returned."""
def _get_default_llm_chain(prompt: BasePromptTemplate) -> LLMChain:
from langchain.chains.llm import LLMChain
return LLMChain(
llm=OpenAI(),
prompt=prompt,
)
def _get_default_llm_chain_factory(
prompt: BasePromptTemplate,
) -> Callable[[], LLMChain]:
"""Returns a default LLMChain factory."""
return partial(_get_default_llm_chain, prompt)
class RequestsGetToolWithParsing(BaseRequestsTool, BaseTool):
"""Requests GET tool with LLM-instructed extraction of truncated responses."""
name: str = "requests_get"
"""Tool name."""
description = REQUESTS_GET_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_GET_PROMPT)
)
"""LLMChain used to extract the response."""
def _run(self, text: str) -> str:
from langchain.output_parsers.json import parse_json_markdown
try:
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
data_params = data.get("params")
response = self.requests_wrapper.get(data["url"], params=data_params)
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
class RequestsPostToolWithParsing(BaseRequestsTool, BaseTool):
"""Requests POST tool with LLM-instructed extraction of truncated responses."""
name: str = "requests_post"
"""Tool name."""
description = REQUESTS_POST_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_POST_PROMPT)
)
"""LLMChain used to extract the response."""
def _run(self, text: str) -> str:
from langchain.output_parsers.json import parse_json_markdown
try:
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.post(data["url"], data["data"])
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
class RequestsPatchToolWithParsing(BaseRequestsTool, BaseTool):
"""Requests PATCH tool with LLM-instructed extraction of truncated responses."""
name: str = "requests_patch"
"""Tool name."""
description = REQUESTS_PATCH_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_PATCH_PROMPT)
)
"""LLMChain used to extract the response."""
def _run(self, text: str) -> str:
from langchain.output_parsers.json import parse_json_markdown
try:
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.patch(data["url"], data["data"])
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
class RequestsPutToolWithParsing(BaseRequestsTool, BaseTool):
"""Requests PUT tool with LLM-instructed extraction of truncated responses."""
name: str = "requests_put"
"""Tool name."""
description = REQUESTS_PUT_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_PUT_PROMPT)
)
"""LLMChain used to extract the response."""
def _run(self, text: str) -> str:
from langchain.output_parsers.json import parse_json_markdown
try:
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.put(data["url"], data["data"])
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
class RequestsDeleteToolWithParsing(BaseRequestsTool, BaseTool):
"""A tool that sends a DELETE request and parses the response."""
name: str = "requests_delete"
"""The name of the tool."""
description = REQUESTS_DELETE_TOOL_DESCRIPTION
"""The description of the tool."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
"""The maximum length of the response."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_DELETE_PROMPT)
)
"""The LLM chain used to parse the response."""
def _run(self, text: str) -> str:
from langchain.output_parsers.json import parse_json_markdown
try:
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.delete(data["url"])
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
#
# Orchestrator, planner, controller.
#
def _create_api_planner_tool(
api_spec: ReducedOpenAPISpec, llm: BaseLanguageModel
) -> Tool:
from langchain.chains.llm import LLMChain
endpoint_descriptions = [
f"{name} {description}" for name, description, _ in api_spec.endpoints
]
prompt = PromptTemplate(
template=API_PLANNER_PROMPT,
input_variables=["query"],
partial_variables={"endpoints": "- " + "- ".join(endpoint_descriptions)},
)
chain = LLMChain(llm=llm, prompt=prompt)
tool = Tool(
name=API_PLANNER_TOOL_NAME,
description=API_PLANNER_TOOL_DESCRIPTION,
func=chain.run,
)
return tool
def _create_api_controller_agent(
api_url: str,
api_docs: str,
requests_wrapper: RequestsWrapper,
llm: BaseLanguageModel,
) -> AgentExecutor:
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents.agent import AgentExecutor
from langchain.chains.llm import LLMChain
get_llm_chain = LLMChain(llm=llm, prompt=PARSING_GET_PROMPT)
post_llm_chain = LLMChain(llm=llm, prompt=PARSING_POST_PROMPT)
tools: List[BaseTool] = [
RequestsGetToolWithParsing(
requests_wrapper=requests_wrapper, llm_chain=get_llm_chain
),
RequestsPostToolWithParsing(
requests_wrapper=requests_wrapper, llm_chain=post_llm_chain
),
]
prompt = PromptTemplate(
template=API_CONTROLLER_PROMPT,
input_variables=["input", "agent_scratchpad"],
partial_variables={
"api_url": api_url,
"api_docs": api_docs,
"tool_names": ", ".join([tool.name for tool in tools]),
"tool_descriptions": "\n".join(
[f"{tool.name}: {tool.description}" for tool in tools]
),
},
)
agent = ZeroShotAgent(
llm_chain=LLMChain(llm=llm, prompt=prompt),
allowed_tools=[tool.name for tool in tools],
)
return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
def _create_api_controller_tool(
api_spec: ReducedOpenAPISpec,
requests_wrapper: RequestsWrapper,
llm: BaseLanguageModel,
) -> Tool:
"""Expose controller as a tool.
The tool is invoked with a plan from the planner, and dynamically
creates a controller agent with relevant documentation only to
constrain the context.
"""
base_url = api_spec.servers[0]["url"] # TODO: do better.
def _create_and_run_api_controller_agent(plan_str: str) -> str:
pattern = r"\b(GET|POST|PATCH|DELETE)\s+(/\S+)*"
matches = re.findall(pattern, plan_str)
endpoint_names = [
"{method} {route}".format(method=method, route=route.split("?")[0])
for method, route in matches
]
docs_str = ""
for endpoint_name in endpoint_names:
found_match = False
for name, _, docs in api_spec.endpoints:
regex_name = re.compile(re.sub("\{.*?\}", ".*", name))
if regex_name.match(endpoint_name):
found_match = True
docs_str += f"== Docs for {endpoint_name} == \n{yaml.dump(docs)}\n"
if not found_match:
raise ValueError(f"{endpoint_name} endpoint does not exist.")
agent = _create_api_controller_agent(base_url, docs_str, requests_wrapper, llm)
return agent.run(plan_str)
return Tool(
name=API_CONTROLLER_TOOL_NAME,
func=_create_and_run_api_controller_agent,
description=API_CONTROLLER_TOOL_DESCRIPTION,
)
def create_openapi_agent(
api_spec: ReducedOpenAPISpec,
requests_wrapper: RequestsWrapper,
llm: BaseLanguageModel,
shared_memory: Optional[ReadOnlySharedMemory] = None,
callback_manager: Optional[BaseCallbackManager] = None,
verbose: bool = True,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Instantiate OpenAI API planner and controller for a given spec.
Inject credentials via requests_wrapper.
We use a top-level "orchestrator" agent to invoke the planner and controller,
rather than a top-level planner
that invokes a controller with its plan. This is to keep the planner simple.
"""
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents.agent import AgentExecutor
from langchain.chains.llm import LLMChain
tools = [
_create_api_planner_tool(api_spec, llm),
_create_api_controller_tool(api_spec, requests_wrapper, llm),
]
prompt = PromptTemplate(
template=API_ORCHESTRATOR_PROMPT,
input_variables=["input", "agent_scratchpad"],
partial_variables={
"tool_names": ", ".join([tool.name for tool in tools]),
"tool_descriptions": "\n".join(
[f"{tool.name}: {tool.description}" for tool in tools]
),
},
)
agent = ZeroShotAgent(
llm_chain=LLMChain(llm=llm, prompt=prompt, memory=shared_memory),
allowed_tools=[tool.name for tool in tools],
**kwargs,
)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
**(agent_executor_kwargs or {}),
)

View File

@@ -0,0 +1,90 @@
"""Requests toolkit."""
from __future__ import annotations
from typing import Any, List
from langchain_core.language_models import BaseLanguageModel
from langchain_core.tools import Tool
from langchain_community.agent_toolkits.base import BaseToolkit
from langchain_community.agent_toolkits.json.base import create_json_agent
from langchain_community.agent_toolkits.json.toolkit import JsonToolkit
from langchain_community.agent_toolkits.openapi.prompt import DESCRIPTION
from langchain_community.tools import BaseTool
from langchain_community.tools.json.tool import JsonSpec
from langchain_community.tools.requests.tool import (
RequestsDeleteTool,
RequestsGetTool,
RequestsPatchTool,
RequestsPostTool,
RequestsPutTool,
)
from langchain_community.utilities.requests import TextRequestsWrapper
class RequestsToolkit(BaseToolkit):
"""Toolkit for making REST requests.
*Security Note*: This toolkit contains tools to make GET, POST, PATCH, PUT,
and DELETE requests to an API.
Exercise care in who is allowed to use this toolkit. If exposing
to end users, consider that users will be able to make arbitrary
requests on behalf of the server hosting the code. For example,
users could ask the server to make a request to a private API
that is only accessible from the server.
Control access to who can submit issue requests using this toolkit and
what network access it has.
See https://python.langchain.com/docs/security for more information.
"""
requests_wrapper: TextRequestsWrapper
def get_tools(self) -> List[BaseTool]:
"""Return a list of tools."""
return [
RequestsGetTool(requests_wrapper=self.requests_wrapper),
RequestsPostTool(requests_wrapper=self.requests_wrapper),
RequestsPatchTool(requests_wrapper=self.requests_wrapper),
RequestsPutTool(requests_wrapper=self.requests_wrapper),
RequestsDeleteTool(requests_wrapper=self.requests_wrapper),
]
class OpenAPIToolkit(BaseToolkit):
"""Toolkit for interacting with an OpenAPI API.
*Security Note*: This toolkit contains tools that can read and modify
the state of a service; e.g., by creating, deleting, or updating,
reading underlying data.
For example, this toolkit can be used to delete data exposed via
an OpenAPI compliant API.
"""
json_agent: Any
requests_wrapper: TextRequestsWrapper
def get_tools(self) -> List[BaseTool]:
"""Get the tools in the toolkit."""
json_agent_tool = Tool(
name="json_explorer",
func=self.json_agent.run,
description=DESCRIPTION,
)
request_toolkit = RequestsToolkit(requests_wrapper=self.requests_wrapper)
return [*request_toolkit.get_tools(), json_agent_tool]
@classmethod
def from_llm(
cls,
llm: BaseLanguageModel,
json_spec: JsonSpec,
requests_wrapper: TextRequestsWrapper,
**kwargs: Any,
) -> OpenAPIToolkit:
"""Create json agent from llm, then initialize."""
json_agent = create_json_agent(llm, JsonToolkit(spec=json_spec), **kwargs)
return cls(json_agent=json_agent, requests_wrapper=requests_wrapper)

View File

@@ -0,0 +1,68 @@
"""Power BI agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_community.agent_toolkits.powerbi.prompt import (
POWERBI_PREFIX,
POWERBI_SUFFIX,
)
from langchain_community.agent_toolkits.powerbi.toolkit import PowerBIToolkit
from langchain_community.utilities.powerbi import PowerBIDataset
if TYPE_CHECKING:
from langchain.agents import AgentExecutor
def create_pbi_agent(
llm: BaseLanguageModel,
toolkit: Optional[PowerBIToolkit] = None,
powerbi: Optional[PowerBIDataset] = None,
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = POWERBI_PREFIX,
suffix: str = POWERBI_SUFFIX,
format_instructions: Optional[str] = None,
examples: Optional[str] = None,
input_variables: Optional[List[str]] = None,
top_k: int = 10,
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct a Power BI agent from an LLM and tools."""
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents import AgentExecutor
from langchain.chains.llm import LLMChain
if toolkit is None:
if powerbi is None:
raise ValueError("Must provide either a toolkit or powerbi dataset")
toolkit = PowerBIToolkit(powerbi=powerbi, llm=llm, examples=examples)
tools = toolkit.get_tools()
tables = powerbi.table_names if powerbi else toolkit.powerbi.table_names
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
agent = ZeroShotAgent(
llm_chain=LLMChain(
llm=llm,
prompt=ZeroShotAgent.create_prompt(
tools,
prefix=prefix.format(top_k=top_k).format(tables=tables),
suffix=suffix,
input_variables=input_variables,
**prompt_params,
),
callback_manager=callback_manager, # type: ignore
verbose=verbose,
),
allowed_tools=[tool.name for tool in tools],
**kwargs,
)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
**(agent_executor_kwargs or {}),
)

View File

@@ -0,0 +1,69 @@
"""Power BI agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models.chat_models import BaseChatModel
from langchain_community.agent_toolkits.powerbi.prompt import (
POWERBI_CHAT_PREFIX,
POWERBI_CHAT_SUFFIX,
)
from langchain_community.agent_toolkits.powerbi.toolkit import PowerBIToolkit
from langchain_community.utilities.powerbi import PowerBIDataset
if TYPE_CHECKING:
from langchain.agents import AgentExecutor
from langchain.agents.agent import AgentOutputParser
from langchain.memory.chat_memory import BaseChatMemory
def create_pbi_chat_agent(
llm: BaseChatModel,
toolkit: Optional[PowerBIToolkit] = None,
powerbi: Optional[PowerBIDataset] = None,
callback_manager: Optional[BaseCallbackManager] = None,
output_parser: Optional[AgentOutputParser] = None,
prefix: str = POWERBI_CHAT_PREFIX,
suffix: str = POWERBI_CHAT_SUFFIX,
examples: Optional[str] = None,
input_variables: Optional[List[str]] = None,
memory: Optional[BaseChatMemory] = None,
top_k: int = 10,
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct a Power BI agent from a Chat LLM and tools.
If you supply only a toolkit and no Power BI dataset, the same LLM is used for both.
"""
from langchain.agents import AgentExecutor
from langchain.agents.conversational_chat.base import ConversationalChatAgent
from langchain.memory import ConversationBufferMemory
if toolkit is None:
if powerbi is None:
raise ValueError("Must provide either a toolkit or powerbi dataset")
toolkit = PowerBIToolkit(powerbi=powerbi, llm=llm, examples=examples)
tools = toolkit.get_tools()
tables = powerbi.table_names if powerbi else toolkit.powerbi.table_names
agent = ConversationalChatAgent.from_llm_and_tools(
llm=llm,
tools=tools,
system_message=prefix.format(top_k=top_k).format(tables=tables),
human_message=suffix,
input_variables=input_variables,
callback_manager=callback_manager,
output_parser=output_parser,
verbose=verbose,
**kwargs,
)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
memory=memory
or ConversationBufferMemory(memory_key="chat_history", return_messages=True),
verbose=verbose,
**(agent_executor_kwargs or {}),
)

View File

@@ -0,0 +1,106 @@
"""Toolkit for interacting with a Power BI dataset."""
from __future__ import annotations
from typing import List, Optional, Union, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_core.language_models.chat_models import BaseChatModel
from langchain_core.prompts import PromptTemplate
from langchain_core.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain_core.pydantic_v1 import Field
from langchain_community.agent_toolkits.base import BaseToolkit
from langchain_community.tools import BaseTool
from langchain_community.tools.powerbi.prompt import (
QUESTION_TO_QUERY_BASE,
SINGLE_QUESTION_TO_QUERY,
USER_INPUT,
)
from langchain_community.tools.powerbi.tool import (
InfoPowerBITool,
ListPowerBITool,
QueryPowerBITool,
)
from langchain_community.utilities.powerbi import PowerBIDataset
if TYPE_CHECKING:
from langchain.chains.llm import LLMChain
class PowerBIToolkit(BaseToolkit):
"""Toolkit for interacting with Power BI dataset.
*Security Note*: This toolkit interacts with an external service.
Control access to who can use this toolkit.
Make sure that the capabilities given by this toolkit to the calling
code are appropriately scoped to the application.
See https://python.langchain.com/docs/security for more information.
"""
powerbi: PowerBIDataset = Field(exclude=True)
llm: Union[BaseLanguageModel, BaseChatModel] = Field(exclude=True)
examples: Optional[str] = None
max_iterations: int = 5
callback_manager: Optional[BaseCallbackManager] = None
output_token_limit: Optional[int] = None
tiktoken_model_name: Optional[str] = None
class Config:
"""Configuration for this pydantic object."""
arbitrary_types_allowed = True
def get_tools(self) -> List[BaseTool]:
"""Get the tools in the toolkit."""
return [
QueryPowerBITool(
llm_chain=self._get_chain(),
powerbi=self.powerbi,
examples=self.examples,
max_iterations=self.max_iterations,
output_token_limit=self.output_token_limit,
tiktoken_model_name=self.tiktoken_model_name,
),
InfoPowerBITool(powerbi=self.powerbi),
ListPowerBITool(powerbi=self.powerbi),
]
def _get_chain(self) -> LLMChain:
"""Construct the chain based on the callback manager and model type."""
from langchain.chains.llm import LLMChain
if isinstance(self.llm, BaseLanguageModel):
return LLMChain(
llm=self.llm,
callback_manager=self.callback_manager
if self.callback_manager
else None,
prompt=PromptTemplate(
template=SINGLE_QUESTION_TO_QUERY,
input_variables=["tool_input", "tables", "schemas", "examples"],
),
)
system_prompt = SystemMessagePromptTemplate(
prompt=PromptTemplate(
template=QUESTION_TO_QUERY_BASE,
input_variables=["tables", "schemas", "examples"],
)
)
human_prompt = HumanMessagePromptTemplate(
prompt=PromptTemplate(
template=USER_INPUT,
input_variables=["tool_input"],
)
)
return LLMChain(
llm=self.llm,
callback_manager=self.callback_manager if self.callback_manager else None,
prompt=ChatPromptTemplate.from_messages([system_prompt, human_prompt]),
)

View File

@@ -0,0 +1,64 @@
"""Spark SQL agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager, Callbacks
from langchain_core.language_models import BaseLanguageModel
from langchain_community.agent_toolkits.spark_sql.prompt import SQL_PREFIX, SQL_SUFFIX
from langchain_community.agent_toolkits.spark_sql.toolkit import SparkSQLToolkit
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
def create_spark_sql_agent(
llm: BaseLanguageModel,
toolkit: SparkSQLToolkit,
callback_manager: Optional[BaseCallbackManager] = None,
callbacks: Callbacks = None,
prefix: str = SQL_PREFIX,
suffix: str = SQL_SUFFIX,
format_instructions: Optional[str] = None,
input_variables: Optional[List[str]] = None,
top_k: int = 10,
max_iterations: Optional[int] = 15,
max_execution_time: Optional[float] = None,
early_stopping_method: str = "force",
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct a Spark SQL agent from an LLM and tools."""
from langchain.agents.agent import AgentExecutor
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.chains.llm import LLMChain
tools = toolkit.get_tools()
prefix = prefix.format(top_k=top_k)
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=input_variables,
**prompt_params,
)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
callbacks=callbacks,
)
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
callbacks=callbacks,
verbose=verbose,
max_iterations=max_iterations,
max_execution_time=max_execution_time,
early_stopping_method=early_stopping_method,
**(agent_executor_kwargs or {}),
)

View File

@@ -0,0 +1,102 @@
"""SQL agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, Sequence, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_core.messages import AIMessage, SystemMessage
from langchain_core.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
MessagesPlaceholder,
)
from langchain_community.agent_toolkits.sql.prompt import (
SQL_FUNCTIONS_SUFFIX,
SQL_PREFIX,
SQL_SUFFIX,
)
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
from langchain_community.tools import BaseTool
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
from langchain.agents.agent_types import AgentType
def create_sql_agent(
llm: BaseLanguageModel,
toolkit: SQLDatabaseToolkit,
agent_type: Optional[AgentType] = None,
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = SQL_PREFIX,
suffix: Optional[str] = None,
format_instructions: Optional[str] = None,
input_variables: Optional[List[str]] = None,
top_k: int = 10,
max_iterations: Optional[int] = 15,
max_execution_time: Optional[float] = None,
early_stopping_method: str = "force",
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
extra_tools: Sequence[BaseTool] = (),
**kwargs: Any,
) -> AgentExecutor:
"""Construct an SQL agent from an LLM and tools."""
from langchain.agents.agent import AgentExecutor, BaseSingleActionAgent
from langchain.agents.agent_types import AgentType
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents.openai_functions_agent.base import OpenAIFunctionsAgent
from langchain.chains.llm import LLMChain
agent_type = agent_type or AgentType.ZERO_SHOT_REACT_DESCRIPTION
tools = toolkit.get_tools() + list(extra_tools)
prefix = prefix.format(dialect=toolkit.dialect, top_k=top_k)
agent: BaseSingleActionAgent
if agent_type == AgentType.ZERO_SHOT_REACT_DESCRIPTION:
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix or SQL_SUFFIX,
input_variables=input_variables,
**prompt_params,
)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
)
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
elif agent_type == AgentType.OPENAI_FUNCTIONS:
messages = [
SystemMessage(content=prefix),
HumanMessagePromptTemplate.from_template("{input}"),
AIMessage(content=suffix or SQL_FUNCTIONS_SUFFIX),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
input_variables = ["input", "agent_scratchpad"]
_prompt = ChatPromptTemplate(input_variables=input_variables, messages=messages)
agent = OpenAIFunctionsAgent(
llm=llm,
prompt=_prompt,
tools=tools,
callback_manager=callback_manager,
**kwargs,
)
else:
raise ValueError(f"Agent type {agent_type} not supported at the moment.")
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
max_iterations=max_iterations,
max_execution_time=max_execution_time,
early_stopping_method=early_stopping_method,
**(agent_executor_kwargs or {}),
)

View File

@@ -0,0 +1,66 @@
"""**Callback handlers** allow listening to events in LangChain.
**Class hierarchy:**
.. code-block::
BaseCallbackHandler --> <name>CallbackHandler # Example: AimCallbackHandler
"""
from langchain_community.callbacks.aim_callback import AimCallbackHandler
from langchain_community.callbacks.argilla_callback import ArgillaCallbackHandler
from langchain_community.callbacks.arize_callback import ArizeCallbackHandler
from langchain_community.callbacks.arthur_callback import ArthurCallbackHandler
from langchain_community.callbacks.clearml_callback import ClearMLCallbackHandler
from langchain_community.callbacks.comet_ml_callback import CometCallbackHandler
from langchain_community.callbacks.context_callback import ContextCallbackHandler
from langchain_community.callbacks.flyte_callback import FlyteCallbackHandler
from langchain_community.callbacks.human import HumanApprovalCallbackHandler
from langchain_community.callbacks.infino_callback import InfinoCallbackHandler
from langchain_community.callbacks.labelstudio_callback import (
LabelStudioCallbackHandler,
)
from langchain_community.callbacks.llmonitor_callback import LLMonitorCallbackHandler
from langchain_community.callbacks.manager import (
get_openai_callback,
wandb_tracing_enabled,
)
from langchain_community.callbacks.mlflow_callback import MlflowCallbackHandler
from langchain_community.callbacks.openai_info import OpenAICallbackHandler
from langchain_community.callbacks.promptlayer_callback import (
PromptLayerCallbackHandler,
)
from langchain_community.callbacks.sagemaker_callback import SageMakerCallbackHandler
from langchain_community.callbacks.streamlit import (
LLMThoughtLabeler,
StreamlitCallbackHandler,
)
from langchain_community.callbacks.trubrics_callback import TrubricsCallbackHandler
from langchain_community.callbacks.wandb_callback import WandbCallbackHandler
from langchain_community.callbacks.whylabs_callback import WhyLabsCallbackHandler
__all__ = [
"AimCallbackHandler",
"ArgillaCallbackHandler",
"ArizeCallbackHandler",
"PromptLayerCallbackHandler",
"ArthurCallbackHandler",
"ClearMLCallbackHandler",
"CometCallbackHandler",
"ContextCallbackHandler",
"HumanApprovalCallbackHandler",
"InfinoCallbackHandler",
"MlflowCallbackHandler",
"LLMonitorCallbackHandler",
"OpenAICallbackHandler",
"LLMThoughtLabeler",
"StreamlitCallbackHandler",
"WandbCallbackHandler",
"WhyLabsCallbackHandler",
"get_openai_callback",
"wandb_tracing_enabled",
"FlyteCallbackHandler",
"SageMakerCallbackHandler",
"LabelStudioCallbackHandler",
"TrubricsCallbackHandler",
]

View File

@@ -0,0 +1,69 @@
from __future__ import annotations
import logging
from contextlib import contextmanager
from contextvars import ContextVar
from typing import (
Generator,
Optional,
)
from langchain_core.tracers.context import register_configure_hook
from langchain_community.callbacks.openai_info import OpenAICallbackHandler
from langchain_community.callbacks.tracers.wandb import WandbTracer
logger = logging.getLogger(__name__)
openai_callback_var: ContextVar[Optional[OpenAICallbackHandler]] = ContextVar(
"openai_callback", default=None
)
wandb_tracing_callback_var: ContextVar[Optional[WandbTracer]] = ContextVar( # noqa: E501
"tracing_wandb_callback", default=None
)
register_configure_hook(openai_callback_var, True)
register_configure_hook(
wandb_tracing_callback_var, True, WandbTracer, "LANGCHAIN_WANDB_TRACING"
)
@contextmanager
def get_openai_callback() -> Generator[OpenAICallbackHandler, None, None]:
"""Get the OpenAI callback handler in a context manager.
which conveniently exposes token and cost information.
Returns:
OpenAICallbackHandler: The OpenAI callback handler.
Example:
>>> with get_openai_callback() as cb:
... # Use the OpenAI callback handler
"""
cb = OpenAICallbackHandler()
openai_callback_var.set(cb)
yield cb
openai_callback_var.set(None)
@contextmanager
def wandb_tracing_enabled(
session_name: str = "default",
) -> Generator[None, None, None]:
"""Get the WandbTracer in a context manager.
Args:
session_name (str, optional): The name of the session.
Defaults to "default".
Returns:
None
Example:
>>> with wandb_tracing_enabled() as session:
... # Use the WandbTracer session
"""
cb = WandbTracer()
wandb_tracing_callback_var.set(cb)
yield None
wandb_tracing_callback_var.set(None)

View File

@@ -0,0 +1,18 @@
"""Tracers that record execution of LangChain runs."""
from langchain_core.tracers.langchain import LangChainTracer
from langchain_core.tracers.langchain_v1 import LangChainTracerV1
from langchain_core.tracers.stdout import (
ConsoleCallbackHandler,
FunctionCallbackHandler,
)
from langchain_community.callbacks.tracers.wandb import WandbTracer
__all__ = [
"ConsoleCallbackHandler",
"FunctionCallbackHandler",
"LangChainTracer",
"LangChainTracerV1",
"WandbTracer",
]

View File

@@ -0,0 +1,101 @@
"""Abstract interface for document loader implementations."""
from __future__ import annotations
from abc import ABC, abstractmethod
from typing import Iterator, List, Optional, TYPE_CHECKING
from langchain_core.documents import Document
from langchain_community.document_loaders.blob_loaders import Blob
if TYPE_CHECKING:
from langchain.text_splitter import TextSplitter
class BaseLoader(ABC):
"""Interface for Document Loader.
Implementations should implement the lazy-loading method using generators
to avoid loading all Documents into memory at once.
The `load` method will remain as is for backwards compatibility, but its
implementation should be just `list(self.lazy_load())`.
"""
# Sub-classes should implement this method
# as return list(self.lazy_load()).
# This method returns a List which is materialized in memory.
@abstractmethod
def load(self) -> List[Document]:
"""Load data into Document objects."""
def load_and_split(
self, text_splitter: Optional[TextSplitter] = None
) -> List[Document]:
"""Load Documents and split into chunks. Chunks are returned as Documents.
Args:
text_splitter: TextSplitter instance to use for splitting documents.
Defaults to RecursiveCharacterTextSplitter.
Returns:
List of Documents.
"""
from langchain.text_splitter import RecursiveCharacterTextSplitter
if text_splitter is None:
_text_splitter: TextSplitter = RecursiveCharacterTextSplitter()
else:
_text_splitter = text_splitter
docs = self.load()
return _text_splitter.split_documents(docs)
# Attention: This method will be upgraded into an abstractmethod once it's
# implemented in all the existing subclasses.
def lazy_load(
self,
) -> Iterator[Document]:
"""A lazy loader for Documents."""
raise NotImplementedError(
f"{self.__class__.__name__} does not implement lazy_load()"
)
class BaseBlobParser(ABC):
"""Abstract interface for blob parsers.
A blob parser provides a way to parse raw data stored in a blob into one
or more documents.
The parser can be composed with blob loaders, making it easy to reuse
a parser independent of how the blob was originally loaded.
"""
@abstractmethod
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
"""Lazy parsing interface.
Subclasses are required to implement this method.
Args:
blob: Blob instance
Returns:
Generator of documents
"""
def parse(self, blob: Blob) -> List[Document]:
"""Eagerly parse the blob into a document or documents.
This is a convenience method for interactive development environment.
Production applications should favor the lazy_parse method instead.
Subclasses should generally not over-ride this parse method.
Args:
blob: Blob instance
Returns:
List of documents
"""
return list(self.lazy_parse(blob))

View File

@@ -0,0 +1,147 @@
"""Use to load blobs from the local file system."""
from pathlib import Path
from typing import Callable, Iterable, Iterator, Optional, Sequence, TypeVar, Union
from langchain_community.document_loaders.blob_loaders.schema import Blob, BlobLoader
T = TypeVar("T")
def _make_iterator(
length_func: Callable[[], int], show_progress: bool = False
) -> Callable[[Iterable[T]], Iterator[T]]:
"""Create a function that optionally wraps an iterable in tqdm."""
if show_progress:
try:
from tqdm.auto import tqdm
except ImportError:
raise ImportError(
"You must install tqdm to use show_progress=True."
"You can install tqdm with `pip install tqdm`."
)
# Make sure to provide `total` here so that tqdm can show
# a progress bar that takes into account the total number of files.
def _with_tqdm(iterable: Iterable[T]) -> Iterator[T]:
"""Wrap an iterable in a tqdm progress bar."""
return tqdm(iterable, total=length_func())
iterator = _with_tqdm
else:
iterator = iter # type: ignore
return iterator
# PUBLIC API
class FileSystemBlobLoader(BlobLoader):
"""Load blobs in the local file system.
Example:
.. code-block:: python
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
loader = FileSystemBlobLoader("/path/to/directory")
for blob in loader.yield_blobs():
print(blob)
""" # noqa: E501
def __init__(
self,
path: Union[str, Path],
*,
glob: str = "**/[!.]*",
exclude: Sequence[str] = (),
suffixes: Optional[Sequence[str]] = None,
show_progress: bool = False,
) -> None:
"""Initialize with a path to directory and how to glob over it.
Args:
path: Path to directory to load from or path to file to load.
If a path to a file is provided, glob/exclude/suffixes are ignored.
glob: Glob pattern relative to the specified path
by default set to pick up all non-hidden files
exclude: patterns to exclude from results, use glob syntax
suffixes: Provide to keep only files with these suffixes
Useful when wanting to keep files with different suffixes
Suffixes must include the dot, e.g. ".txt"
show_progress: If true, will show a progress bar as the files are loaded.
This forces an iteration through all matching files
to count them prior to loading them.
Examples:
.. code-block:: python
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
# Load a single file.
loader = FileSystemBlobLoader("/path/to/file.txt")
# Recursively load all text files in a directory.
loader = FileSystemBlobLoader("/path/to/directory", glob="**/*.txt")
# Recursively load all non-hidden files in a directory.
loader = FileSystemBlobLoader("/path/to/directory", glob="**/[!.]*")
# Load all files in a directory without recursion.
loader = FileSystemBlobLoader("/path/to/directory", glob="*")
# Recursively load all files in a directory, except for py or pyc files.
loader = FileSystemBlobLoader(
"/path/to/directory",
glob="**/*.txt",
exclude=["**/*.py", "**/*.pyc"]
)
""" # noqa: E501
if isinstance(path, Path):
_path = path
elif isinstance(path, str):
_path = Path(path)
else:
raise TypeError(f"Expected str or Path, got {type(path)}")
self.path = _path.expanduser() # Expand user to handle ~
self.glob = glob
self.suffixes = set(suffixes or [])
self.show_progress = show_progress
self.exclude = exclude
def yield_blobs(
self,
) -> Iterable[Blob]:
"""Yield blobs that match the requested pattern."""
iterator = _make_iterator(
length_func=self.count_matching_files, show_progress=self.show_progress
)
for path in iterator(self._yield_paths()):
yield Blob.from_path(path)
def _yield_paths(self) -> Iterable[Path]:
"""Yield paths that match the requested pattern."""
if self.path.is_file():
yield self.path
return
paths = self.path.glob(self.glob)
for path in paths:
if self.exclude:
if any(path.match(glob) for glob in self.exclude):
continue
if path.is_file():
if self.suffixes and path.suffix not in self.suffixes:
continue
yield path
def count_matching_files(self) -> int:
"""Count files that match the pattern without loading them."""
# Carry out a full iteration to count the files without
# materializing anything expensive in memory.
num = 0
for _ in self._yield_paths():
num += 1
return num

View File

@@ -0,0 +1,190 @@
from __future__ import annotations
from pathlib import Path
from typing import (
TYPE_CHECKING,
Any,
Iterator,
List,
Literal,
Optional,
Sequence,
Union,
)
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseBlobParser, BaseLoader
from langchain_community.document_loaders.blob_loaders import (
BlobLoader,
FileSystemBlobLoader,
)
from langchain_community.document_loaders.parsers.registry import get_parser
if TYPE_CHECKING:
from langchain.text_splitter import TextSplitter
_PathLike = Union[str, Path]
DEFAULT = Literal["default"]
class GenericLoader(BaseLoader):
"""Generic Document Loader.
A generic document loader that allows combining an arbitrary blob loader with
a blob parser.
Examples:
Parse a specific PDF file:
.. code-block:: python
from langchain_community.document_loaders import GenericLoader
from langchain_community.document_loaders.parsers.pdf import PyPDFParser
# Recursively load all text files in a directory.
loader = GenericLoader.from_filesystem(
"my_lovely_pdf.pdf",
parser=PyPDFParser()
)
.. code-block:: python
from langchain_community.document_loaders import GenericLoader
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
loader = GenericLoader.from_filesystem(
path="path/to/directory",
glob="**/[!.]*",
suffixes=[".pdf"],
show_progress=True,
)
docs = loader.lazy_load()
next(docs)
Example instantiations to change which files are loaded:
.. code-block:: python
# Recursively load all text files in a directory.
loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/*.txt")
# Recursively load all non-hidden files in a directory.
loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/[!.]*")
# Load all files in a directory without recursion.
loader = GenericLoader.from_filesystem("/path/to/dir", glob="*")
Example instantiations to change which parser is used:
.. code-block:: python
from langchain_community.document_loaders.parsers.pdf import PyPDFParser
# Recursively load all text files in a directory.
loader = GenericLoader.from_filesystem(
"/path/to/dir",
glob="**/*.pdf",
parser=PyPDFParser()
)
""" # noqa: E501
def __init__(
self,
blob_loader: BlobLoader,
blob_parser: BaseBlobParser,
) -> None:
"""A generic document loader.
Args:
blob_loader: A blob loader which knows how to yield blobs
blob_parser: A blob parser which knows how to parse blobs into documents
"""
self.blob_loader = blob_loader
self.blob_parser = blob_parser
def lazy_load(
self,
) -> Iterator[Document]:
"""Load documents lazily. Use this when working at a large scale."""
for blob in self.blob_loader.yield_blobs():
yield from self.blob_parser.lazy_parse(blob)
def load(self) -> List[Document]:
"""Load all documents."""
return list(self.lazy_load())
def load_and_split(
self, text_splitter: Optional[TextSplitter] = None
) -> List[Document]:
"""Load all documents and split them into sentences."""
raise NotImplementedError(
"Loading and splitting is not yet implemented for generic loaders. "
"When they will be implemented they will be added via the initializer. "
"This method should not be used going forward."
)
@classmethod
def from_filesystem(
cls,
path: _PathLike,
*,
glob: str = "**/[!.]*",
exclude: Sequence[str] = (),
suffixes: Optional[Sequence[str]] = None,
show_progress: bool = False,
parser: Union[DEFAULT, BaseBlobParser] = "default",
parser_kwargs: Optional[dict] = None,
) -> GenericLoader:
"""Create a generic document loader using a filesystem blob loader.
Args:
path: The path to the directory to load documents from OR the path to a
single file to load. If this is a file, glob, exclude, suffixes
will be ignored.
glob: The glob pattern to use to find documents.
suffixes: The suffixes to use to filter documents. If None, all files
matching the glob will be loaded.
exclude: A list of patterns to exclude from the loader.
show_progress: Whether to show a progress bar or not (requires tqdm).
Proxies to the file system loader.
parser: A blob parser which knows how to parse blobs into documents,
will instantiate a default parser if not provided.
The default can be overridden by either passing a parser or
setting the class attribute `blob_parser` (the latter
should be used with inheritance).
parser_kwargs: Keyword arguments to pass to the parser.
Returns:
A generic document loader.
"""
blob_loader = FileSystemBlobLoader(
path,
glob=glob,
exclude=exclude,
suffixes=suffixes,
show_progress=show_progress,
)
if isinstance(parser, str):
if parser == "default":
try:
# If there is an implementation of get_parser on the class, use it.
blob_parser = cls.get_parser(**(parser_kwargs or {}))
except NotImplementedError:
# if not then use the global registry.
blob_parser = get_parser(parser)
else:
blob_parser = get_parser(parser)
else:
blob_parser = parser
return cls(blob_loader, blob_parser)
@staticmethod
def get_parser(**kwargs: Any) -> BaseBlobParser:
"""Override this method to associate a default parser with the class."""
raise NotImplementedError()

View File

@@ -0,0 +1,70 @@
"""Code for generic / auxiliary parsers.
This module contains some logic to help assemble more sophisticated parsers.
"""
from typing import Iterator, Mapping, Optional
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseBlobParser
from langchain_community.document_loaders.blob_loaders.schema import Blob
class MimeTypeBasedParser(BaseBlobParser):
"""Parser that uses `mime`-types to parse a blob.
This parser is useful for simple pipelines where the mime-type is sufficient
to determine how to parse a blob.
To use, configure handlers based on mime-types and pass them to the initializer.
Example:
.. code-block:: python
from langchain_community.document_loaders.parsers.generic import MimeTypeBasedParser
parser = MimeTypeBasedParser(
handlers={
"application/pdf": ...,
},
fallback_parser=...,
)
""" # noqa: E501
def __init__(
self,
handlers: Mapping[str, BaseBlobParser],
*,
fallback_parser: Optional[BaseBlobParser] = None,
) -> None:
"""Define a parser that uses mime-types to determine how to parse a blob.
Args:
handlers: A mapping from mime-types to functions that take a blob, parse it
and return a document.
fallback_parser: A fallback_parser parser to use if the mime-type is not
found in the handlers. If provided, this parser will be
used to parse blobs with all mime-types not found in
the handlers.
If not provided, a ValueError will be raised if the
mime-type is not found in the handlers.
"""
self.handlers = handlers
self.fallback_parser = fallback_parser
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
"""Load documents from a blob."""
mimetype = blob.mimetype
if mimetype is None:
raise ValueError(f"{blob} does not have a mimetype.")
if mimetype in self.handlers:
handler = self.handlers[mimetype]
yield from handler.lazy_parse(blob)
else:
if self.fallback_parser is not None:
yield from self.fallback_parser.lazy_parse(blob)
else:
raise ValueError(f"Unsupported mime type: {mimetype}")

View File

@@ -0,0 +1,157 @@
from __future__ import annotations
from typing import Any, Dict, Iterator, Optional, TYPE_CHECKING
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseBlobParser
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers.language.cobol import CobolSegmenter
from langchain_community.document_loaders.parsers.language.javascript import (
JavaScriptSegmenter,
)
from langchain_community.document_loaders.parsers.language.python import PythonSegmenter
if TYPE_CHECKING:
from langchain.text_splitter import Language
try:
from langchain.text_splitter import Language
LANGUAGE_EXTENSIONS: Dict[str, str] = {
"py": Language.PYTHON,
"js": Language.JS,
"cobol": Language.COBOL,
}
LANGUAGE_SEGMENTERS: Dict[str, Any] = {
Language.PYTHON: PythonSegmenter,
Language.JS: JavaScriptSegmenter,
Language.COBOL: CobolSegmenter,
}
except ImportError:
LANGUAGE_EXTENSIONS = {}
LANGUAGE_SEGMENTERS = {}
class LanguageParser(BaseBlobParser):
"""Parse using the respective programming language syntax.
Each top-level function and class in the code is loaded into separate documents.
Furthermore, an extra document is generated, containing the remaining top-level code
that excludes the already segmented functions and classes.
This approach can potentially improve the accuracy of QA models over source code.
Currently, the supported languages for code parsing are Python and JavaScript.
The language used for parsing can be configured, along with the minimum number of
lines required to activate the splitting based on syntax.
Examples:
.. code-block:: python
from langchain.text_splitter.Language
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import LanguageParser
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py", ".js"],
parser=LanguageParser()
)
docs = loader.load()
Example instantiations to manually select the language:
.. code-block:: python
from langchain.text_splitter import Language
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py"],
parser=LanguageParser(language=Language.PYTHON)
)
Example instantiations to set number of lines threshold:
.. code-block:: python
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py"],
parser=LanguageParser(parser_threshold=200)
)
"""
def __init__(self, language: Optional[Language] = None, parser_threshold: int = 0):
"""
Language parser that split code using the respective language syntax.
Args:
language: If None (default), it will try to infer language from source.
parser_threshold: Minimum lines needed to activate parsing (0 by default).
"""
self.language = language
self.parser_threshold = parser_threshold
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
code = blob.as_string()
language = self.language or (
LANGUAGE_EXTENSIONS.get(blob.source.rsplit(".", 1)[-1])
if isinstance(blob.source, str)
else None
)
if language is None:
yield Document(
page_content=code,
metadata={
"source": blob.source,
},
)
return
if self.parser_threshold >= len(code.splitlines()):
yield Document(
page_content=code,
metadata={
"source": blob.source,
"language": language,
},
)
return
self.Segmenter = LANGUAGE_SEGMENTERS[language]
segmenter = self.Segmenter(blob.as_string())
if not segmenter.is_valid():
yield Document(
page_content=code,
metadata={
"source": blob.source,
},
)
return
for functions_classes in segmenter.extract_functions_classes():
yield Document(
page_content=functions_classes,
metadata={
"source": blob.source,
"content_type": "functions_classes",
"language": language,
},
)
yield Document(
page_content=segmenter.simplify_code(),
metadata={
"source": blob.source,
"content_type": "simplified_code",
"language": language,
},
)

View File

@@ -0,0 +1,262 @@
from __future__ import annotations
import asyncio
import json
from pathlib import Path
from typing import TYPE_CHECKING, Dict, List, Optional, Union
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseLoader
if TYPE_CHECKING:
import pandas as pd
from telethon.hints import EntityLike
def concatenate_rows(row: dict) -> str:
"""Combine message information in a readable format ready to be used."""
date = row["date"]
sender = row["from"]
text = row["text"]
return f"{sender} on {date}: {text}\n\n"
class TelegramChatFileLoader(BaseLoader):
"""Load from `Telegram chat` dump."""
def __init__(self, path: str):
"""Initialize with a path."""
self.file_path = path
def load(self) -> List[Document]:
"""Load documents."""
p = Path(self.file_path)
with open(p, encoding="utf8") as f:
d = json.load(f)
text = "".join(
concatenate_rows(message)
for message in d["messages"]
if message["type"] == "message" and isinstance(message["text"], str)
)
metadata = {"source": str(p)}
return [Document(page_content=text, metadata=metadata)]
def text_to_docs(text: Union[str, List[str]]) -> List[Document]:
"""Convert a string or list of strings to a list of Documents with metadata."""
from langchain.text_splitter import RecursiveCharacterTextSplitter
if isinstance(text, str):
# Take a single string as one page
text = [text]
page_docs = [Document(page_content=page) for page in text]
# Add page numbers as metadata
for i, doc in enumerate(page_docs):
doc.metadata["page"] = i + 1
# Split pages into chunks
doc_chunks = []
for doc in page_docs:
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
chunk_overlap=20,
)
chunks = text_splitter.split_text(doc.page_content)
for i, chunk in enumerate(chunks):
doc = Document(
page_content=chunk, metadata={"page": doc.metadata["page"], "chunk": i}
)
# Add sources a metadata
doc.metadata["source"] = f"{doc.metadata['page']}-{doc.metadata['chunk']}"
doc_chunks.append(doc)
return doc_chunks
class TelegramChatApiLoader(BaseLoader):
"""Load `Telegram` chat json directory dump."""
def __init__(
self,
chat_entity: Optional[EntityLike] = None,
api_id: Optional[int] = None,
api_hash: Optional[str] = None,
username: Optional[str] = None,
file_path: str = "telegram_data.json",
):
"""Initialize with API parameters.
Args:
chat_entity: The chat entity to fetch data from.
api_id: The API ID.
api_hash: The API hash.
username: The username.
file_path: The file path to save the data to. Defaults to
"telegram_data.json".
"""
self.chat_entity = chat_entity
self.api_id = api_id
self.api_hash = api_hash
self.username = username
self.file_path = file_path
async def fetch_data_from_telegram(self) -> None:
"""Fetch data from Telegram API and save it as a JSON file."""
from telethon.sync import TelegramClient
data = []
async with TelegramClient(self.username, self.api_id, self.api_hash) as client:
async for message in client.iter_messages(self.chat_entity):
is_reply = message.reply_to is not None
reply_to_id = message.reply_to.reply_to_msg_id if is_reply else None
data.append(
{
"sender_id": message.sender_id,
"text": message.text,
"date": message.date.isoformat(),
"message.id": message.id,
"is_reply": is_reply,
"reply_to_id": reply_to_id,
}
)
with open(self.file_path, "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=4)
def _get_message_threads(self, data: pd.DataFrame) -> dict:
"""Create a dictionary of message threads from the given data.
Args:
data (pd.DataFrame): A DataFrame containing the conversation \
data with columns:
- message.sender_id
- text
- date
- message.id
- is_reply
- reply_to_id
Returns:
dict: A dictionary where the key is the parent message ID and \
the value is a list of message IDs in ascending order.
"""
def find_replies(parent_id: int, reply_data: pd.DataFrame) -> List[int]:
"""
Recursively find all replies to a given parent message ID.
Args:
parent_id (int): The parent message ID.
reply_data (pd.DataFrame): A DataFrame containing reply messages.
Returns:
list: A list of message IDs that are replies to the parent message ID.
"""
# Find direct replies to the parent message ID
direct_replies = reply_data[reply_data["reply_to_id"] == parent_id][
"message.id"
].tolist()
# Recursively find replies to the direct replies
all_replies = []
for reply_id in direct_replies:
all_replies += [reply_id] + find_replies(reply_id, reply_data)
return all_replies
# Filter out parent messages
parent_messages = data[~data["is_reply"]]
# Filter out reply messages and drop rows with NaN in 'reply_to_id'
reply_messages = data[data["is_reply"]].dropna(subset=["reply_to_id"])
# Convert 'reply_to_id' to integer
reply_messages["reply_to_id"] = reply_messages["reply_to_id"].astype(int)
# Create a dictionary of message threads with parent message IDs as keys and \
# lists of reply message IDs as values
message_threads = {
parent_id: [parent_id] + find_replies(parent_id, reply_messages)
for parent_id in parent_messages["message.id"]
}
return message_threads
def _combine_message_texts(
self, message_threads: Dict[int, List[int]], data: pd.DataFrame
) -> str:
"""
Combine the message texts for each parent message ID based \
on the list of message threads.
Args:
message_threads (dict): A dictionary where the key is the parent message \
ID and the value is a list of message IDs in ascending order.
data (pd.DataFrame): A DataFrame containing the conversation data:
- message.sender_id
- text
- date
- message.id
- is_reply
- reply_to_id
Returns:
str: A combined string of message texts sorted by date.
"""
combined_text = ""
# Iterate through sorted parent message IDs
for parent_id, message_ids in message_threads.items():
# Get the message texts for the message IDs and sort them by date
message_texts = (
data[data["message.id"].isin(message_ids)]
.sort_values(by="date")["text"]
.tolist()
)
message_texts = [str(elem) for elem in message_texts]
# Combine the message texts
combined_text += " ".join(message_texts) + ".\n"
return combined_text.strip()
def load(self) -> List[Document]:
"""Load documents."""
if self.chat_entity is not None:
try:
import nest_asyncio
nest_asyncio.apply()
asyncio.run(self.fetch_data_from_telegram())
except ImportError:
raise ImportError(
"""`nest_asyncio` package not found.
please install with `pip install nest_asyncio`
"""
)
p = Path(self.file_path)
with open(p, encoding="utf8") as f:
d = json.load(f)
try:
import pandas as pd
except ImportError:
raise ImportError(
"""`pandas` package not found.
please install with `pip install pandas`
"""
)
normalized_messages = pd.json_normalize(d)
df = pd.DataFrame(normalized_messages)
message_threads = self._get_message_threads(df)
combined_texts = self._combine_message_texts(message_threads, df)
return text_to_docs(combined_texts)

View File

@@ -0,0 +1,149 @@
from typing import Any, Iterator, List, Sequence, cast
from langchain_core.documents import BaseDocumentTransformer, Document
class BeautifulSoupTransformer(BaseDocumentTransformer):
"""Transform HTML content by extracting specific tags and removing unwanted ones.
Example:
.. code-block:: python
from langchain_community.document_transformers import BeautifulSoupTransformer
bs4_transformer = BeautifulSoupTransformer()
docs_transformed = bs4_transformer.transform_documents(docs)
""" # noqa: E501
def __init__(self) -> None:
"""
Initialize the transformer.
This checks if the BeautifulSoup4 package is installed.
If not, it raises an ImportError.
"""
try:
import bs4 # noqa:F401
except ImportError:
raise ImportError(
"BeautifulSoup4 is required for BeautifulSoupTransformer. "
"Please install it with `pip install beautifulsoup4`."
)
def transform_documents(
self,
documents: Sequence[Document],
unwanted_tags: List[str] = ["script", "style"],
tags_to_extract: List[str] = ["p", "li", "div", "a"],
remove_lines: bool = True,
**kwargs: Any,
) -> Sequence[Document]:
"""
Transform a list of Document objects by cleaning their HTML content.
Args:
documents: A sequence of Document objects containing HTML content.
unwanted_tags: A list of tags to be removed from the HTML.
tags_to_extract: A list of tags whose content will be extracted.
remove_lines: If set to True, unnecessary lines will be
removed from the HTML content.
Returns:
A sequence of Document objects with transformed content.
"""
for doc in documents:
cleaned_content = doc.page_content
cleaned_content = self.remove_unwanted_tags(cleaned_content, unwanted_tags)
cleaned_content = self.extract_tags(cleaned_content, tags_to_extract)
if remove_lines:
cleaned_content = self.remove_unnecessary_lines(cleaned_content)
doc.page_content = cleaned_content
return documents
@staticmethod
def remove_unwanted_tags(html_content: str, unwanted_tags: List[str]) -> str:
"""
Remove unwanted tags from a given HTML content.
Args:
html_content: The original HTML content string.
unwanted_tags: A list of tags to be removed from the HTML.
Returns:
A cleaned HTML string with unwanted tags removed.
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
for tag in unwanted_tags:
for element in soup.find_all(tag):
element.decompose()
return str(soup)
@staticmethod
def extract_tags(html_content: str, tags: List[str]) -> str:
"""
Extract specific tags from a given HTML content.
Args:
html_content: The original HTML content string.
tags: A list of tags to be extracted from the HTML.
Returns:
A string combining the content of the extracted tags.
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
text_parts: List[str] = []
for element in soup.find_all():
if element.name in tags:
# Extract all navigable strings recursively from this element.
text_parts += get_navigable_strings(element)
# To avoid duplicate text, remove all descendants from the soup.
element.decompose()
return " ".join(text_parts)
@staticmethod
def remove_unnecessary_lines(content: str) -> str:
"""
Clean up the content by removing unnecessary lines.
Args:
content: A string, which may contain unnecessary lines or spaces.
Returns:
A cleaned string with unnecessary lines removed.
"""
lines = content.split("\n")
stripped_lines = [line.strip() for line in lines]
non_empty_lines = [line for line in stripped_lines if line]
cleaned_content = " ".join(non_empty_lines)
return cleaned_content
async def atransform_documents(
self,
documents: Sequence[Document],
**kwargs: Any,
) -> Sequence[Document]:
raise NotImplementedError
def get_navigable_strings(element: Any) -> Iterator[str]:
from bs4 import NavigableString, Tag
for child in cast(Tag, element).children:
if isinstance(child, Tag):
yield from get_navigable_strings(child)
elif isinstance(child, NavigableString):
if (element.name == "a") and (href := element.get("href")):
yield f"{child.strip()} ({href})"
else:
yield child.strip()

View File

@@ -0,0 +1,140 @@
"""Document transformers that use OpenAI Functions models"""
from typing import Any, Dict, Optional, Sequence, Type, Union
from langchain_core.documents import BaseDocumentTransformer, Document
from langchain_core.language_models import BaseLanguageModel
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel
class OpenAIMetadataTagger(BaseDocumentTransformer, BaseModel):
"""Extract metadata tags from document contents using OpenAI functions.
Example:
.. code-block:: python
from langchain_community.chat_models import ChatOpenAI
from langchain_community.document_transformers import OpenAIMetadataTagger
from langchain_core.documents import Document
schema = {
"properties": {
"movie_title": { "type": "string" },
"critic": { "type": "string" },
"tone": {
"type": "string",
"enum": ["positive", "negative"]
},
"rating": {
"type": "integer",
"description": "The number of stars the critic rated the movie"
}
},
"required": ["movie_title", "critic", "tone"]
}
# Must be an OpenAI model that supports functions
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
tagging_chain = create_tagging_chain(schema, llm)
document_transformer = OpenAIMetadataTagger(tagging_chain=tagging_chain)
original_documents = [
Document(page_content="Review of The Bee Movie\nBy Roger Ebert\n\nThis is the greatest movie ever made. 4 out of 5 stars."),
Document(page_content="Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.", metadata={"reliable": False}),
]
enhanced_documents = document_transformer.transform_documents(original_documents)
""" # noqa: E501
tagging_chain: Any
"""The chain used to extract metadata from each document."""
def transform_documents(
self, documents: Sequence[Document], **kwargs: Any
) -> Sequence[Document]:
"""Automatically extract and populate metadata
for each document according to the provided schema."""
new_documents = []
for document in documents:
extracted_metadata: Dict = self.tagging_chain.run(document.page_content) # type: ignore[assignment] # noqa: E501
new_document = Document(
page_content=document.page_content,
metadata={**extracted_metadata, **document.metadata},
)
new_documents.append(new_document)
return new_documents
async def atransform_documents(
self, documents: Sequence[Document], **kwargs: Any
) -> Sequence[Document]:
raise NotImplementedError
def create_metadata_tagger(
metadata_schema: Union[Dict[str, Any], Type[BaseModel]],
llm: BaseLanguageModel,
prompt: Optional[ChatPromptTemplate] = None,
*,
tagging_chain_kwargs: Optional[Dict] = None,
) -> OpenAIMetadataTagger:
"""Create a DocumentTransformer that uses an OpenAI function chain to automatically
tag documents with metadata based on their content and an input schema.
Args:
metadata_schema: Either a dictionary or pydantic.BaseModel class. If a dictionary
is passed in, it's assumed to already be a valid JsonSchema.
For best results, pydantic.BaseModels should have docstrings describing what
the schema represents and descriptions for the parameters.
llm: Language model to use, assumed to support the OpenAI function-calling API.
Defaults to use "gpt-3.5-turbo-0613"
prompt: BasePromptTemplate to pass to the model.
Returns:
An LLMChain that will pass the given function to the model.
Example:
.. code-block:: python
from langchain_community.chat_models import ChatOpenAI
from langchain_community.document_transformers import create_metadata_tagger
from langchain_core.documents import Document
schema = {
"properties": {
"movie_title": { "type": "string" },
"critic": { "type": "string" },
"tone": {
"type": "string",
"enum": ["positive", "negative"]
},
"rating": {
"type": "integer",
"description": "The number of stars the critic rated the movie"
}
},
"required": ["movie_title", "critic", "tone"]
}
# Must be an OpenAI model that supports functions
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
document_transformer = create_metadata_tagger(schema, llm)
original_documents = [
Document(page_content="Review of The Bee Movie\nBy Roger Ebert\n\nThis is the greatest movie ever made. 4 out of 5 stars."),
Document(page_content="Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.", metadata={"reliable": False}),
]
enhanced_documents = document_transformer.transform_documents(original_documents)
""" # noqa: E501
from langchain.chains.openai_functions import create_tagging_chain
metadata_schema = (
metadata_schema
if isinstance(metadata_schema, dict)
else metadata_schema.schema()
)
_tagging_chain_kwargs = tagging_chain_kwargs or {}
tagging_chain = create_tagging_chain(
metadata_schema, llm, prompt=prompt, **_tagging_chain_kwargs
)
return OpenAIMetadataTagger(tagging_chain=tagging_chain)

View File

@@ -0,0 +1,161 @@
"""**Embedding models** are wrappers around embedding models
from different APIs and services.
**Embedding models** can be LLMs or not.
**Class hierarchy:**
.. code-block::
Embeddings --> <name>Embeddings # Examples: OpenAIEmbeddings, HuggingFaceEmbeddings
"""
import logging
from typing import Any
from langchain_community.embeddings.aleph_alpha import (
AlephAlphaAsymmetricSemanticEmbedding,
AlephAlphaSymmetricSemanticEmbedding,
)
from langchain_community.embeddings.awa import AwaEmbeddings
from langchain_community.embeddings.azure_openai import AzureOpenAIEmbeddings
from langchain_community.embeddings.baidu_qianfan_endpoint import (
QianfanEmbeddingsEndpoint,
)
from langchain_community.embeddings.bedrock import BedrockEmbeddings
from langchain_community.embeddings.bookend import BookendEmbeddings
from langchain_community.embeddings.clarifai import ClarifaiEmbeddings
from langchain_community.embeddings.cohere import CohereEmbeddings
from langchain_community.embeddings.dashscope import DashScopeEmbeddings
from langchain_community.embeddings.databricks import DatabricksEmbeddings
from langchain_community.embeddings.deepinfra import DeepInfraEmbeddings
from langchain_community.embeddings.edenai import EdenAiEmbeddings
from langchain_community.embeddings.elasticsearch import ElasticsearchEmbeddings
from langchain_community.embeddings.embaas import EmbaasEmbeddings
from langchain_community.embeddings.ernie import ErnieEmbeddings
from langchain_community.embeddings.fake import (
DeterministicFakeEmbedding,
FakeEmbeddings,
)
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_community.embeddings.google_palm import GooglePalmEmbeddings
from langchain_community.embeddings.gpt4all import GPT4AllEmbeddings
from langchain_community.embeddings.gradient_ai import GradientEmbeddings
from langchain_community.embeddings.huggingface import (
HuggingFaceBgeEmbeddings,
HuggingFaceEmbeddings,
HuggingFaceInferenceAPIEmbeddings,
HuggingFaceInstructEmbeddings,
)
from langchain_community.embeddings.huggingface_hub import HuggingFaceHubEmbeddings
from langchain_community.embeddings.infinity import InfinityEmbeddings
from langchain_community.embeddings.javelin_ai_gateway import JavelinAIGatewayEmbeddings
from langchain_community.embeddings.jina import JinaEmbeddings
from langchain_community.embeddings.johnsnowlabs import JohnSnowLabsEmbeddings
from langchain_community.embeddings.llamacpp import LlamaCppEmbeddings
from langchain_community.embeddings.localai import LocalAIEmbeddings
from langchain_community.embeddings.minimax import MiniMaxEmbeddings
from langchain_community.embeddings.mlflow import MlflowEmbeddings
from langchain_community.embeddings.mlflow_gateway import MlflowAIGatewayEmbeddings
from langchain_community.embeddings.modelscope_hub import ModelScopeEmbeddings
from langchain_community.embeddings.mosaicml import MosaicMLInstructorEmbeddings
from langchain_community.embeddings.nlpcloud import NLPCloudEmbeddings
from langchain_community.embeddings.octoai_embeddings import OctoAIEmbeddings
from langchain_community.embeddings.ollama import OllamaEmbeddings
from langchain_community.embeddings.openai import OpenAIEmbeddings
from langchain_community.embeddings.sagemaker_endpoint import (
SagemakerEndpointEmbeddings,
)
from langchain_community.embeddings.self_hosted import SelfHostedEmbeddings
from langchain_community.embeddings.self_hosted_hugging_face import (
SelfHostedHuggingFaceEmbeddings,
SelfHostedHuggingFaceInstructEmbeddings,
)
from langchain_community.embeddings.sentence_transformer import (
SentenceTransformerEmbeddings,
)
from langchain_community.embeddings.spacy_embeddings import SpacyEmbeddings
from langchain_community.embeddings.tensorflow_hub import TensorflowHubEmbeddings
from langchain_community.embeddings.vertexai import VertexAIEmbeddings
from langchain_community.embeddings.voyageai import VoyageEmbeddings
from langchain_community.embeddings.xinference import XinferenceEmbeddings
logger = logging.getLogger(__name__)
__all__ = [
"OpenAIEmbeddings",
"AzureOpenAIEmbeddings",
"ClarifaiEmbeddings",
"CohereEmbeddings",
"DatabricksEmbeddings",
"ElasticsearchEmbeddings",
"FastEmbedEmbeddings",
"HuggingFaceEmbeddings",
"HuggingFaceInferenceAPIEmbeddings",
"InfinityEmbeddings",
"GradientEmbeddings",
"JinaEmbeddings",
"LlamaCppEmbeddings",
"HuggingFaceHubEmbeddings",
"MlflowEmbeddings",
"MlflowAIGatewayEmbeddings",
"ModelScopeEmbeddings",
"TensorflowHubEmbeddings",
"SagemakerEndpointEmbeddings",
"HuggingFaceInstructEmbeddings",
"MosaicMLInstructorEmbeddings",
"SelfHostedEmbeddings",
"SelfHostedHuggingFaceEmbeddings",
"SelfHostedHuggingFaceInstructEmbeddings",
"FakeEmbeddings",
"DeterministicFakeEmbedding",
"AlephAlphaAsymmetricSemanticEmbedding",
"AlephAlphaSymmetricSemanticEmbedding",
"SentenceTransformerEmbeddings",
"GooglePalmEmbeddings",
"MiniMaxEmbeddings",
"VertexAIEmbeddings",
"BedrockEmbeddings",
"DeepInfraEmbeddings",
"EdenAiEmbeddings",
"DashScopeEmbeddings",
"EmbaasEmbeddings",
"OctoAIEmbeddings",
"SpacyEmbeddings",
"NLPCloudEmbeddings",
"GPT4AllEmbeddings",
"XinferenceEmbeddings",
"LocalAIEmbeddings",
"AwaEmbeddings",
"HuggingFaceBgeEmbeddings",
"ErnieEmbeddings",
"JavelinAIGatewayEmbeddings",
"OllamaEmbeddings",
"QianfanEmbeddingsEndpoint",
"JohnSnowLabsEmbeddings",
"VoyageEmbeddings",
"BookendEmbeddings",
]
# TODO: this is in here to maintain backwards compatibility
class HypotheticalDocumentEmbedder:
def __init__(self, *args: Any, **kwargs: Any):
logger.warning(
"Using a deprecated class. Please use "
"`from langchain.chains import HypotheticalDocumentEmbedder` instead"
)
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder as H
return H(*args, **kwargs) # type: ignore
@classmethod
def from_llm(cls, *args: Any, **kwargs: Any) -> Any:
logger.warning(
"Using a deprecated class. Please use "
"`from langchain.chains import HypotheticalDocumentEmbedder` instead"
)
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder as H
return H.from_llm(*args, **kwargs)

View File

@@ -0,0 +1,343 @@
from typing import Any, Dict, List, Optional
import requests
from langchain_core.embeddings import Embeddings
from langchain_core.pydantic_v1 import BaseModel, Extra, Field
DEFAULT_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"
DEFAULT_INSTRUCT_MODEL = "hkunlp/instructor-large"
DEFAULT_BGE_MODEL = "BAAI/bge-large-en"
DEFAULT_EMBED_INSTRUCTION = "Represent the document for retrieval: "
DEFAULT_QUERY_INSTRUCTION = (
"Represent the question for retrieving supporting documents: "
)
DEFAULT_QUERY_BGE_INSTRUCTION_EN = (
"Represent this question for searching relevant passages: "
)
DEFAULT_QUERY_BGE_INSTRUCTION_ZH = "为这个句子生成表示以用于检索相关文章:"
class HuggingFaceEmbeddings(BaseModel, Embeddings):
"""HuggingFace sentence_transformers embedding models.
To use, you should have the ``sentence_transformers`` python package installed.
Example:
.. code-block:: python
from langchain_community.embeddings import HuggingFaceEmbeddings
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
hf = HuggingFaceEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs
)
"""
client: Any #: :meta private:
model_name: str = DEFAULT_MODEL_NAME
"""Model name to use."""
cache_folder: Optional[str] = None
"""Path to store models.
Can be also set by SENTENCE_TRANSFORMERS_HOME environment variable."""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass to the model."""
encode_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass when calling the `encode` method of the model."""
multi_process: bool = False
"""Run encode() on multiple GPUs."""
def __init__(self, **kwargs: Any):
"""Initialize the sentence_transformer."""
super().__init__(**kwargs)
try:
import sentence_transformers
except ImportError as exc:
raise ImportError(
"Could not import sentence_transformers python package. "
"Please install it with `pip install sentence-transformers`."
) from exc
self.client = sentence_transformers.SentenceTransformer(
self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
)
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace transformer model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
import sentence_transformers
texts = list(map(lambda x: x.replace("\n", " "), texts))
if self.multi_process:
pool = self.client.start_multi_process_pool()
embeddings = self.client.encode_multi_process(texts, pool)
sentence_transformers.SentenceTransformer.stop_multi_process_pool(pool)
else:
embeddings = self.client.encode(texts, **self.encode_kwargs)
return embeddings.tolist()
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a HuggingFace transformer model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
return self.embed_documents([text])[0]
class HuggingFaceInstructEmbeddings(BaseModel, Embeddings):
"""Wrapper around sentence_transformers embedding models.
To use, you should have the ``sentence_transformers``
and ``InstructorEmbedding`` python packages installed.
Example:
.. code-block:: python
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
model_name = "hkunlp/instructor-large"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
hf = HuggingFaceInstructEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs
)
"""
client: Any #: :meta private:
model_name: str = DEFAULT_INSTRUCT_MODEL
"""Model name to use."""
cache_folder: Optional[str] = None
"""Path to store models.
Can be also set by SENTENCE_TRANSFORMERS_HOME environment variable."""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass to the model."""
encode_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass when calling the `encode` method of the model."""
embed_instruction: str = DEFAULT_EMBED_INSTRUCTION
"""Instruction to use for embedding documents."""
query_instruction: str = DEFAULT_QUERY_INSTRUCTION
"""Instruction to use for embedding query."""
def __init__(self, **kwargs: Any):
"""Initialize the sentence_transformer."""
super().__init__(**kwargs)
try:
from InstructorEmbedding import INSTRUCTOR
self.client = INSTRUCTOR(
self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
)
except ImportError as e:
raise ImportError("Dependencies for InstructorEmbedding not found.") from e
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace instruct model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
instruction_pairs = [[self.embed_instruction, text] for text in texts]
embeddings = self.client.encode(instruction_pairs, **self.encode_kwargs)
return embeddings.tolist()
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a HuggingFace instruct model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
instruction_pair = [self.query_instruction, text]
embedding = self.client.encode([instruction_pair], **self.encode_kwargs)[0]
return embedding.tolist()
class HuggingFaceBgeEmbeddings(BaseModel, Embeddings):
"""HuggingFace BGE sentence_transformers embedding models.
To use, you should have the ``sentence_transformers`` python package installed.
Example:
.. code-block:: python
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
model_name = "BAAI/bge-large-en"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
hf = HuggingFaceBgeEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs
)
"""
client: Any #: :meta private:
model_name: str = DEFAULT_BGE_MODEL
"""Model name to use."""
cache_folder: Optional[str] = None
"""Path to store models.
Can be also set by SENTENCE_TRANSFORMERS_HOME environment variable."""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass to the model."""
encode_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass when calling the `encode` method of the model."""
query_instruction: str = DEFAULT_QUERY_BGE_INSTRUCTION_EN
"""Instruction to use for embedding query."""
def __init__(self, **kwargs: Any):
"""Initialize the sentence_transformer."""
super().__init__(**kwargs)
try:
import sentence_transformers
except ImportError as exc:
raise ImportError(
"Could not import sentence_transformers python package. "
"Please install it with `pip install sentence_transformers`."
) from exc
self.client = sentence_transformers.SentenceTransformer(
self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
)
if "-zh" in self.model_name:
self.query_instruction = DEFAULT_QUERY_BGE_INSTRUCTION_ZH
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace transformer model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
texts = [t.replace("\n", " ") for t in texts]
embeddings = self.client.encode(texts, **self.encode_kwargs)
return embeddings.tolist()
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a HuggingFace transformer model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
text = text.replace("\n", " ")
embedding = self.client.encode(
self.query_instruction + text, **self.encode_kwargs
)
return embedding.tolist()
class HuggingFaceInferenceAPIEmbeddings(BaseModel, Embeddings):
"""Embed texts using the HuggingFace API.
Requires a HuggingFace Inference API key and a model name.
"""
api_key: str
"""Your API key for the HuggingFace Inference API."""
model_name: str = "sentence-transformers/all-MiniLM-L6-v2"
"""The name of the model to use for text embeddings."""
api_url: Optional[str] = None
"""Custom inference endpoint url. None for using default public url."""
@property
def _api_url(self) -> str:
return self.api_url or self._default_api_url
@property
def _default_api_url(self) -> str:
return (
"https://api-inference.huggingface.co"
"/pipeline"
"/feature-extraction"
f"/{self.model_name}"
)
@property
def _headers(self) -> dict:
return {"Authorization": f"Bearer {self.api_key}"}
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Get the embeddings for a list of texts.
Args:
texts (Documents): A list of texts to get embeddings for.
Returns:
Embedded texts as List[List[float]], where each inner List[float]
corresponds to a single input text.
Example:
.. code-block:: python
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
hf_embeddings = HuggingFaceInferenceAPIEmbeddings(
api_key="your_api_key",
model_name="sentence-transformers/all-MiniLM-l6-v2"
)
texts = ["Hello, world!", "How are you?"]
hf_embeddings.embed_documents(texts)
""" # noqa: E501
response = requests.post(
self._api_url,
headers=self._headers,
json={
"inputs": texts,
"options": {"wait_for_model": True, "use_cache": True},
},
)
return response.json()
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a HuggingFace transformer model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
return self.embed_documents([text])[0]

View File

@@ -0,0 +1,92 @@
import os
import sys
from typing import Any, List
from langchain_core.embeddings import Embeddings
from langchain_core.pydantic_v1 import BaseModel, Extra
class JohnSnowLabsEmbeddings(BaseModel, Embeddings):
"""JohnSnowLabs embedding models
To use, you should have the ``johnsnowlabs`` python package installed.
Example:
.. code-block:: python
from langchain_community.embeddings.johnsnowlabs import JohnSnowLabsEmbeddings
embedding = JohnSnowLabsEmbeddings(model='embed_sentence.bert')
output = embedding.embed_query("foo bar")
""" # noqa: E501
model: Any = "embed_sentence.bert"
def __init__(
self,
model: Any = "embed_sentence.bert",
hardware_target: str = "cpu",
**kwargs: Any,
):
"""Initialize the johnsnowlabs model."""
super().__init__(**kwargs)
# 1) Check imports
try:
from johnsnowlabs import nlp
from nlu.pipe.pipeline import NLUPipeline
except ImportError as exc:
raise ImportError(
"Could not import johnsnowlabs python package. "
"Please install it with `pip install johnsnowlabs`."
) from exc
# 2) Start a Spark Session
try:
os.environ["PYSPARK_PYTHON"] = sys.executable
os.environ["PYSPARK_DRIVER_PYTHON"] = sys.executable
nlp.start(hardware_target=hardware_target)
except Exception as exc:
raise Exception("Failure starting Spark Session") from exc
# 3) Load the model
try:
if isinstance(model, str):
self.model = nlp.load(model)
elif isinstance(model, NLUPipeline):
self.model = model
else:
self.model = nlp.to_nlu_pipe(model)
except Exception as exc:
raise Exception("Failure loading model") from exc
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a JohnSnowLabs transformer model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
df = self.model.predict(texts, output_level="document")
emb_col = None
for c in df.columns:
if "embedding" in c:
emb_col = c
return [vec.tolist() for vec in df[emb_col].tolist()]
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a JohnSnowLabs transformer model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
return self.embed_documents([text])[0]

View File

@@ -0,0 +1,168 @@
import importlib
import logging
from typing import Any, Callable, List, Optional
from langchain_community.embeddings.self_hosted import SelfHostedEmbeddings
DEFAULT_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"
DEFAULT_INSTRUCT_MODEL = "hkunlp/instructor-large"
DEFAULT_EMBED_INSTRUCTION = "Represent the document for retrieval: "
DEFAULT_QUERY_INSTRUCTION = (
"Represent the question for retrieving supporting documents: "
)
logger = logging.getLogger(__name__)
def _embed_documents(client: Any, *args: Any, **kwargs: Any) -> List[List[float]]:
"""Inference function to send to the remote hardware.
Accepts a sentence_transformer model_id and
returns a list of embeddings for each document in the batch.
"""
return client.encode(*args, **kwargs)
def load_embedding_model(model_id: str, instruct: bool = False, device: int = 0) -> Any:
"""Load the embedding model."""
if not instruct:
import sentence_transformers
client = sentence_transformers.SentenceTransformer(model_id)
else:
from InstructorEmbedding import INSTRUCTOR
client = INSTRUCTOR(model_id)
if importlib.util.find_spec("torch") is not None:
import torch
cuda_device_count = torch.cuda.device_count()
if device < -1 or (device >= cuda_device_count):
raise ValueError(
f"Got device=={device}, "
f"device is required to be within [-1, {cuda_device_count})"
)
if device < 0 and cuda_device_count > 0:
logger.warning(
"Device has %d GPUs available. "
"Provide device={deviceId} to `from_model_id` to use available"
"GPUs for execution. deviceId is -1 for CPU and "
"can be a positive integer associated with CUDA device id.",
cuda_device_count,
)
client = client.to(device)
return client
class SelfHostedHuggingFaceEmbeddings(SelfHostedEmbeddings):
"""HuggingFace embedding models on self-hosted remote hardware.
Supported hardware includes auto-launched instances on AWS, GCP, Azure,
and Lambda, as well as servers specified
by IP address and SSH credentials (such as on-prem, or another cloud
like Paperspace, Coreweave, etc.).
To use, you should have the ``runhouse`` python package installed.
Example:
.. code-block:: python
from langchain_community.embeddings import SelfHostedHuggingFaceEmbeddings
import runhouse as rh
model_name = "sentence-transformers/all-mpnet-base-v2"
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
hf = SelfHostedHuggingFaceEmbeddings(model_name=model_name, hardware=gpu)
"""
client: Any #: :meta private:
model_id: str = DEFAULT_MODEL_NAME
"""Model name to use."""
model_reqs: List[str] = ["./", "sentence_transformers", "torch"]
"""Requirements to install on hardware to inference the model."""
hardware: Any
"""Remote hardware to send the inference function to."""
model_load_fn: Callable = load_embedding_model
"""Function to load the model remotely on the server."""
load_fn_kwargs: Optional[dict] = None
"""Keyword arguments to pass to the model load function."""
inference_fn: Callable = _embed_documents
"""Inference function to extract the embeddings."""
def __init__(self, **kwargs: Any):
"""Initialize the remote inference function."""
load_fn_kwargs = kwargs.pop("load_fn_kwargs", {})
load_fn_kwargs["model_id"] = load_fn_kwargs.get("model_id", DEFAULT_MODEL_NAME)
load_fn_kwargs["instruct"] = load_fn_kwargs.get("instruct", False)
load_fn_kwargs["device"] = load_fn_kwargs.get("device", 0)
super().__init__(load_fn_kwargs=load_fn_kwargs, **kwargs)
class SelfHostedHuggingFaceInstructEmbeddings(SelfHostedHuggingFaceEmbeddings):
"""HuggingFace InstructEmbedding models on self-hosted remote hardware.
Supported hardware includes auto-launched instances on AWS, GCP, Azure,
and Lambda, as well as servers specified
by IP address and SSH credentials (such as on-prem, or another
cloud like Paperspace, Coreweave, etc.).
To use, you should have the ``runhouse`` python package installed.
Example:
.. code-block:: python
from langchain_community.embeddings import SelfHostedHuggingFaceInstructEmbeddings
import runhouse as rh
model_name = "hkunlp/instructor-large"
gpu = rh.cluster(name='rh-a10x', instance_type='A100:1')
hf = SelfHostedHuggingFaceInstructEmbeddings(
model_name=model_name, hardware=gpu)
""" # noqa: E501
model_id: str = DEFAULT_INSTRUCT_MODEL
"""Model name to use."""
embed_instruction: str = DEFAULT_EMBED_INSTRUCTION
"""Instruction to use for embedding documents."""
query_instruction: str = DEFAULT_QUERY_INSTRUCTION
"""Instruction to use for embedding query."""
model_reqs: List[str] = ["./", "InstructorEmbedding", "torch"]
"""Requirements to install on hardware to inference the model."""
def __init__(self, **kwargs: Any):
"""Initialize the remote inference function."""
load_fn_kwargs = kwargs.pop("load_fn_kwargs", {})
load_fn_kwargs["model_id"] = load_fn_kwargs.get(
"model_id", DEFAULT_INSTRUCT_MODEL
)
load_fn_kwargs["instruct"] = load_fn_kwargs.get("instruct", True)
load_fn_kwargs["device"] = load_fn_kwargs.get("device", 0)
super().__init__(load_fn_kwargs=load_fn_kwargs, **kwargs)
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace instruct model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
instruction_pairs = []
for text in texts:
instruction_pairs.append([self.embed_instruction, text])
embeddings = self.client(self.pipeline_ref, instruction_pairs)
return embeddings.tolist()
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a HuggingFace instruct model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
instruction_pair = [self.query_instruction, text]
embedding = self.client(self.pipeline_ref, [instruction_pair])[0]
return embedding.tolist()

View File

@@ -0,0 +1,351 @@
import re
import warnings
from typing import (
Any,
AsyncIterator,
Callable,
Dict,
Iterator,
List,
Mapping,
Optional,
)
from langchain_core.callbacks import (
AsyncCallbackManagerForLLMRun,
CallbackManagerForLLMRun,
)
from langchain_core.language_models import BaseLanguageModel
from langchain_core.language_models.llms import LLM
from langchain_core.outputs import GenerationChunk
from langchain_core.prompt_values import PromptValue
from langchain_core.pydantic_v1 import Field, SecretStr, root_validator
from langchain_core.utils import (
check_package_version,
get_from_dict_or_env,
get_pydantic_field_names,
)
from langchain_core.utils.utils import build_extra_kwargs, convert_to_secret_str
class _AnthropicCommon(BaseLanguageModel):
client: Any = None #: :meta private:
async_client: Any = None #: :meta private:
model: str = Field(default="claude-2", alias="model_name")
"""Model name to use."""
max_tokens_to_sample: int = Field(default=256, alias="max_tokens")
"""Denotes the number of tokens to predict per generation."""
temperature: Optional[float] = None
"""A non-negative float that tunes the degree of randomness in generation."""
top_k: Optional[int] = None
"""Number of most likely tokens to consider at each step."""
top_p: Optional[float] = None
"""Total probability mass of tokens to consider at each step."""
streaming: bool = False
"""Whether to stream the results."""
default_request_timeout: Optional[float] = None
"""Timeout for requests to Anthropic Completion API. Default is 600 seconds."""
anthropic_api_url: Optional[str] = None
anthropic_api_key: Optional[SecretStr] = None
HUMAN_PROMPT: Optional[str] = None
AI_PROMPT: Optional[str] = None
count_tokens: Optional[Callable[[str], int]] = None
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
@root_validator(pre=True)
def build_extra(cls, values: Dict) -> Dict:
extra = values.get("model_kwargs", {})
all_required_field_names = get_pydantic_field_names(cls)
values["model_kwargs"] = build_extra_kwargs(
extra, values, all_required_field_names
)
return values
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
values["anthropic_api_key"] = convert_to_secret_str(
get_from_dict_or_env(values, "anthropic_api_key", "ANTHROPIC_API_KEY")
)
# Get custom api url from environment.
values["anthropic_api_url"] = get_from_dict_or_env(
values,
"anthropic_api_url",
"ANTHROPIC_API_URL",
default="https://api.anthropic.com",
)
try:
import anthropic
check_package_version("anthropic", gte_version="0.3")
values["client"] = anthropic.Anthropic(
base_url=values["anthropic_api_url"],
api_key=values["anthropic_api_key"].get_secret_value(),
timeout=values["default_request_timeout"],
)
values["async_client"] = anthropic.AsyncAnthropic(
base_url=values["anthropic_api_url"],
api_key=values["anthropic_api_key"].get_secret_value(),
timeout=values["default_request_timeout"],
)
values["HUMAN_PROMPT"] = anthropic.HUMAN_PROMPT
values["AI_PROMPT"] = anthropic.AI_PROMPT
values["count_tokens"] = values["client"].count_tokens
except ImportError:
raise ImportError(
"Could not import anthropic python package. "
"Please it install it with `pip install anthropic`."
)
return values
@property
def _default_params(self) -> Mapping[str, Any]:
"""Get the default parameters for calling Anthropic API."""
d = {
"max_tokens_to_sample": self.max_tokens_to_sample,
"model": self.model,
}
if self.temperature is not None:
d["temperature"] = self.temperature
if self.top_k is not None:
d["top_k"] = self.top_k
if self.top_p is not None:
d["top_p"] = self.top_p
return {**d, **self.model_kwargs}
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {**{}, **self._default_params}
def _get_anthropic_stop(self, stop: Optional[List[str]] = None) -> List[str]:
if not self.HUMAN_PROMPT or not self.AI_PROMPT:
raise NameError("Please ensure the anthropic package is loaded")
if stop is None:
stop = []
# Never want model to invent new turns of Human / Assistant dialog.
stop.extend([self.HUMAN_PROMPT])
return stop
class Anthropic(LLM, _AnthropicCommon):
"""Anthropic large language models.
To use, you should have the ``anthropic`` python package installed, and the
environment variable ``ANTHROPIC_API_KEY`` set with your API key, or pass
it as a named parameter to the constructor.
Example:
.. code-block:: python
import anthropic
from langchain_community.llms import Anthropic
model = Anthropic(model="<model_name>", anthropic_api_key="my-api-key")
# Simplest invocation, automatically wrapped with HUMAN_PROMPT
# and AI_PROMPT.
response = model("What are the biggest risks facing humanity?")
# Or if you want to use the chat mode, build a few-shot-prompt, or
# put words in the Assistant's mouth, use HUMAN_PROMPT and AI_PROMPT:
raw_prompt = "What are the biggest risks facing humanity?"
prompt = f"{anthropic.HUMAN_PROMPT} {prompt}{anthropic.AI_PROMPT}"
response = model(prompt)
"""
class Config:
"""Configuration for this pydantic object."""
allow_population_by_field_name = True
arbitrary_types_allowed = True
@root_validator()
def raise_warning(cls, values: Dict) -> Dict:
"""Raise warning that this class is deprecated."""
warnings.warn(
"This Anthropic LLM is deprecated. "
"Please use `from langchain_community.chat_models import ChatAnthropic` "
"instead"
)
return values
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "anthropic-llm"
def _wrap_prompt(self, prompt: str) -> str:
if not self.HUMAN_PROMPT or not self.AI_PROMPT:
raise NameError("Please ensure the anthropic package is loaded")
if prompt.startswith(self.HUMAN_PROMPT):
return prompt # Already wrapped.
# Guard against common errors in specifying wrong number of newlines.
corrected_prompt, n_subs = re.subn(r"^\n*Human:", self.HUMAN_PROMPT, prompt)
if n_subs == 1:
return corrected_prompt
# As a last resort, wrap the prompt ourselves to emulate instruct-style.
return f"{self.HUMAN_PROMPT} {prompt}{self.AI_PROMPT} Sure, here you go:\n"
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
r"""Call out to Anthropic's completion endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
Example:
.. code-block:: python
prompt = "What are the biggest risks facing humanity?"
prompt = f"\n\nHuman: {prompt}\n\nAssistant:"
response = model(prompt)
"""
if self.streaming:
completion = ""
for chunk in self._stream(
prompt=prompt, stop=stop, run_manager=run_manager, **kwargs
):
completion += chunk.text
return completion
stop = self._get_anthropic_stop(stop)
params = {**self._default_params, **kwargs}
response = self.client.completions.create(
prompt=self._wrap_prompt(prompt),
stop_sequences=stop,
**params,
)
return response.completion
def convert_prompt(self, prompt: PromptValue) -> str:
return self._wrap_prompt(prompt.to_string())
async def _acall(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
"""Call out to Anthropic's completion endpoint asynchronously."""
if self.streaming:
completion = ""
async for chunk in self._astream(
prompt=prompt, stop=stop, run_manager=run_manager, **kwargs
):
completion += chunk.text
return completion
stop = self._get_anthropic_stop(stop)
params = {**self._default_params, **kwargs}
response = await self.async_client.completions.create(
prompt=self._wrap_prompt(prompt),
stop_sequences=stop,
**params,
)
return response.completion
def _stream(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[GenerationChunk]:
r"""Call Anthropic completion_stream and return the resulting generator.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
A generator representing the stream of tokens from Anthropic.
Example:
.. code-block:: python
prompt = "Write a poem about a stream."
prompt = f"\n\nHuman: {prompt}\n\nAssistant:"
generator = anthropic.stream(prompt)
for token in generator:
yield token
"""
stop = self._get_anthropic_stop(stop)
params = {**self._default_params, **kwargs}
for token in self.client.completions.create(
prompt=self._wrap_prompt(prompt), stop_sequences=stop, stream=True, **params
):
chunk = GenerationChunk(text=token.completion)
yield chunk
if run_manager:
run_manager.on_llm_new_token(chunk.text, chunk=chunk)
async def _astream(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> AsyncIterator[GenerationChunk]:
r"""Call Anthropic completion_stream and return the resulting generator.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
A generator representing the stream of tokens from Anthropic.
Example:
.. code-block:: python
prompt = "Write a poem about a stream."
prompt = f"\n\nHuman: {prompt}\n\nAssistant:"
generator = anthropic.stream(prompt)
for token in generator:
yield token
"""
stop = self._get_anthropic_stop(stop)
params = {**self._default_params, **kwargs}
async for token in await self.async_client.completions.create(
prompt=self._wrap_prompt(prompt),
stop_sequences=stop,
stream=True,
**params,
):
chunk = GenerationChunk(text=token.completion)
yield chunk
if run_manager:
await run_manager.on_llm_new_token(chunk.text, chunk=chunk)
def get_num_tokens(self, text: str) -> int:
"""Calculate number of tokens."""
if not self.count_tokens:
raise NameError("Please ensure the anthropic package is loaded")
return self.count_tokens(text)

View File

@@ -0,0 +1,126 @@
import json
import logging
from typing import Any, Dict, Iterator, List, Optional
import requests
from langchain_core.callbacks import CallbackManagerForLLMRun
from langchain_core.language_models.llms import LLM
from langchain_core.outputs import GenerationChunk
logger = logging.getLogger(__name__)
class CloudflareWorkersAI(LLM):
"""Langchain LLM class to help to access Cloudflare Workers AI service.
To use, you must provide an API token and
account ID to access Cloudflare Workers AI, and
pass it as a named parameter to the constructor.
Example:
.. code-block:: python
from langchain_community.llms.cloudflare_workersai import CloudflareWorkersAI
my_account_id = "my_account_id"
my_api_token = "my_secret_api_token"
llm_model = "@cf/meta/llama-2-7b-chat-int8"
cf_ai = CloudflareWorkersAI(
account_id=my_account_id,
api_token=my_api_token,
model=llm_model
)
""" # noqa: E501
account_id: str
api_token: str
model: str = "@cf/meta/llama-2-7b-chat-int8"
base_url: str = "https://api.cloudflare.com/client/v4/accounts"
streaming: bool = False
endpoint_url: str = ""
def __init__(self, **kwargs: Any) -> None:
"""Initialize the Cloudflare Workers AI class."""
super().__init__(**kwargs)
self.endpoint_url = f"{self.base_url}/{self.account_id}/ai/run/{self.model}"
@property
def _llm_type(self) -> str:
"""Return type of LLM."""
return "cloudflare"
@property
def _default_params(self) -> Dict[str, Any]:
"""Default parameters"""
return {}
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Identifying parameters"""
return {
"account_id": self.account_id,
"api_token": self.api_token,
"model": self.model,
"base_url": self.base_url,
}
def _call_api(self, prompt: str, params: Dict[str, Any]) -> requests.Response:
"""Call Cloudflare Workers API"""
headers = {"Authorization": f"Bearer {self.api_token}"}
data = {"prompt": prompt, "stream": self.streaming, **params}
response = requests.post(self.endpoint_url, headers=headers, json=data)
return response
def _process_response(self, response: requests.Response) -> str:
"""Process API response"""
if response.ok:
data = response.json()
return data["result"]["response"]
else:
raise ValueError(f"Request failed with status {response.status_code}")
def _stream(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[GenerationChunk]:
"""Streaming prediction"""
original_steaming: bool = self.streaming
self.streaming = True
_response_prefix_count = len("data: ")
_response_stream_end = b"data: [DONE]"
for chunk in self._call_api(prompt, kwargs).iter_lines():
if chunk == _response_stream_end:
break
if len(chunk) > _response_prefix_count:
try:
data = json.loads(chunk[_response_prefix_count:])
except Exception as e:
logger.debug(chunk)
raise e
if data is not None and "response" in data:
yield GenerationChunk(text=data["response"])
if run_manager:
run_manager.on_llm_new_token(data["response"])
logger.debug("stream end")
self.streaming = original_steaming
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
"""Regular prediction"""
if self.streaming:
return "".join(
[c.text for c in self._stream(prompt, stop, run_manager, **kwargs)]
)
else:
response = self._call_api(prompt, kwargs)
return self._process_response(response)

View File

@@ -0,0 +1,106 @@
"""**Retriever** class returns Documents given a text **query**.
It is more general than a vector store. A retriever does not need to be able to
store documents, only to return (or retrieve) it. Vector stores can be used as
the backbone of a retriever, but there are other types of retrievers as well.
**Class hierarchy:**
.. code-block::
BaseRetriever --> <name>Retriever # Examples: ArxivRetriever, MergerRetriever
**Main helpers:**
.. code-block::
Document, Serializable, Callbacks,
CallbackManagerForRetrieverRun, AsyncCallbackManagerForRetrieverRun
"""
from langchain_community.retrievers.arcee import ArceeRetriever
from langchain_community.retrievers.arxiv import ArxivRetriever
from langchain_community.retrievers.azure_cognitive_search import (
AzureCognitiveSearchRetriever,
)
from langchain_community.retrievers.bedrock import AmazonKnowledgeBasesRetriever
from langchain_community.retrievers.bm25 import BM25Retriever
from langchain_community.retrievers.chaindesk import ChaindeskRetriever
from langchain_community.retrievers.chatgpt_plugin_retriever import (
ChatGPTPluginRetriever,
)
from langchain_community.retrievers.cohere_rag_retriever import CohereRagRetriever
from langchain_community.retrievers.docarray import DocArrayRetriever
from langchain_community.retrievers.elastic_search_bm25 import (
ElasticSearchBM25Retriever,
)
from langchain_community.retrievers.embedchain import EmbedchainRetriever
from langchain_community.retrievers.google_cloud_documentai_warehouse import (
GoogleDocumentAIWarehouseRetriever,
)
from langchain_community.retrievers.google_vertex_ai_search import (
GoogleCloudEnterpriseSearchRetriever,
GoogleVertexAIMultiTurnSearchRetriever,
GoogleVertexAISearchRetriever,
)
from langchain_community.retrievers.kay import KayAiRetriever
from langchain_community.retrievers.kendra import AmazonKendraRetriever
from langchain_community.retrievers.knn import KNNRetriever
from langchain_community.retrievers.llama_index import (
LlamaIndexGraphRetriever,
LlamaIndexRetriever,
)
from langchain_community.retrievers.metal import MetalRetriever
from langchain_community.retrievers.milvus import MilvusRetriever
from langchain_community.retrievers.outline import OutlineRetriever
from langchain_community.retrievers.pinecone_hybrid_search import (
PineconeHybridSearchRetriever,
)
from langchain_community.retrievers.pubmed import PubMedRetriever
from langchain_community.retrievers.remote_retriever import RemoteLangChainRetriever
from langchain_community.retrievers.svm import SVMRetriever
from langchain_community.retrievers.tavily_search_api import TavilySearchAPIRetriever
from langchain_community.retrievers.tfidf import TFIDFRetriever
from langchain_community.retrievers.weaviate_hybrid_search import (
WeaviateHybridSearchRetriever,
)
from langchain_community.retrievers.wikipedia import WikipediaRetriever
from langchain_community.retrievers.zep import ZepRetriever
from langchain_community.retrievers.zilliz import ZillizRetriever
__all__ = [
"AmazonKendraRetriever",
"AmazonKnowledgeBasesRetriever",
"ArceeRetriever",
"ArxivRetriever",
"AzureCognitiveSearchRetriever",
"ChatGPTPluginRetriever",
"ChaindeskRetriever",
"CohereRagRetriever",
"ElasticSearchBM25Retriever",
"EmbedchainRetriever",
"GoogleDocumentAIWarehouseRetriever",
"GoogleCloudEnterpriseSearchRetriever",
"GoogleVertexAIMultiTurnSearchRetriever",
"GoogleVertexAISearchRetriever",
"KayAiRetriever",
"KNNRetriever",
"LlamaIndexGraphRetriever",
"LlamaIndexRetriever",
"MetalRetriever",
"MilvusRetriever",
"OutlineRetriever",
"PineconeHybridSearchRetriever",
"PubMedRetriever",
"RemoteLangChainRetriever",
"SVMRetriever",
"TavilySearchAPIRetriever",
"TFIDFRetriever",
"BM25Retriever",
"VespaRetriever",
"WeaviateHybridSearchRetriever",
"WikipediaRetriever",
"ZepRetriever",
"ZillizRetriever",
"DocArrayRetriever",
]

View File

@@ -0,0 +1,19 @@
"""Implementations of key-value stores and storage helpers.
Module provides implementations of various key-value stores that conform
to a simple key-value interface.
The primary goal of these storages is to support implementation of caching.
"""
from langchain_community.storage.redis import RedisStore
from langchain_community.storage.upstash_redis import (
UpstashRedisByteStore,
UpstashRedisStore,
)
__all__ = [
"RedisStore",
"UpstashRedisByteStore",
"UpstashRedisStore",
]

View File

@@ -0,0 +1,50 @@
from typing import Optional, Type
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_community.chat_models import ChatOpenAI
from langchain_community.tools.amadeus.base import AmadeusBaseTool
class ClosestAirportSchema(BaseModel):
"""Schema for the AmadeusClosestAirport tool."""
location: str = Field(
description=(
" The location for which you would like to find the nearest airport "
" along with optional details such as country, state, region, or "
" province, allowing for easy processing and identification of "
" the closest airport. Examples of the format are the following:\n"
" Cali, Colombia\n "
" Lincoln, Nebraska, United States\n"
" New York, United States\n"
" Sydney, New South Wales, Australia\n"
" Rome, Lazio, Italy\n"
" Toronto, Ontario, Canada\n"
)
)
class AmadeusClosestAirport(AmadeusBaseTool):
"""Tool for finding the closest airport to a particular location."""
name: str = "closest_airport"
description: str = (
"Use this tool to find the closest airport to a particular location."
)
args_schema: Type[ClosestAirportSchema] = ClosestAirportSchema
def _run(
self,
location: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
content = (
f" What is the nearest airport to {location}? Please respond with the "
" airport's International Air Transport Association (IATA) Location "
' Identifier in the following JSON format. JSON: "iataCode": "IATA '
' Location Identifier" '
)
return ChatOpenAI(temperature=0).predict(content)

View File

@@ -0,0 +1,42 @@
"""
This tool allows agents to interact with the clickup library
and operate on a Clickup instance.
To use this tool, you must first set as environment variables:
client_secret
client_id
code
Below is a sample script that uses the Clickup tool:
```python
from langchain_community.agent_toolkits.clickup.toolkit import ClickupToolkit
from langchain_community.utilities.clickup import ClickupAPIWrapper
clickup = ClickupAPIWrapper()
toolkit = ClickupToolkit.from_clickup_api_wrapper(clickup)
```
"""
from typing import Optional
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.pydantic_v1 import Field
from langchain_core.tools import BaseTool
from langchain_community.utilities.clickup import ClickupAPIWrapper
class ClickupAction(BaseTool):
"""Tool that queries the Clickup API."""
api_wrapper: ClickupAPIWrapper = Field(default_factory=ClickupAPIWrapper)
mode: str
name: str = ""
description: str = ""
def _run(
self,
instructions: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the Clickup API to run an operation."""
return self.api_wrapper.run(self.mode, instructions)

View File

@@ -0,0 +1,44 @@
"""
This tool allows agents to interact with the atlassian-python-api library
and operate on a Jira instance. For more information on the
atlassian-python-api library, see https://atlassian-python-api.readthedocs.io/jira.html
To use this tool, you must first set as environment variables:
JIRA_API_TOKEN
JIRA_USERNAME
JIRA_INSTANCE_URL
Below is a sample script that uses the Jira tool:
```python
from langchain_community.agent_toolkits.jira.toolkit import JiraToolkit
from langchain_community.utilities.jira import JiraAPIWrapper
jira = JiraAPIWrapper()
toolkit = JiraToolkit.from_jira_api_wrapper(jira)
```
"""
from typing import Optional
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.pydantic_v1 import Field
from langchain_core.tools import BaseTool
from langchain_community.utilities.jira import JiraAPIWrapper
class JiraAction(BaseTool):
"""Tool that queries the Atlassian Jira API."""
api_wrapper: JiraAPIWrapper = Field(default_factory=JiraAPIWrapper)
mode: str
name: str = ""
description: str = ""
def _run(
self,
instructions: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the Atlassian Jira API to run an operation."""
return self.api_wrapper.run(self.mode, instructions)

View File

@@ -0,0 +1,276 @@
"""Tools for interacting with a Power BI dataset."""
import logging
from time import perf_counter
from typing import Any, Dict, Optional, Tuple
from langchain_core.callbacks import (
AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun,
)
from langchain_core.pydantic_v1 import Field, validator
from langchain_core.tools import BaseTool
from langchain_community.chat_models.openai import _import_tiktoken
from langchain_community.tools.powerbi.prompt import (
BAD_REQUEST_RESPONSE,
DEFAULT_FEWSHOT_EXAMPLES,
RETRY_RESPONSE,
)
from langchain_community.utilities.powerbi import PowerBIDataset, json_to_md
logger = logging.getLogger(__name__)
class QueryPowerBITool(BaseTool):
"""Tool for querying a Power BI Dataset."""
name: str = "query_powerbi"
description: str = """
Input to this tool is a detailed question about the dataset, output is a result from the dataset. It will try to answer the question using the dataset, and if it cannot, it will ask for clarification.
Example Input: "How many rows are in table1?"
""" # noqa: E501
llm_chain: Any
powerbi: PowerBIDataset = Field(exclude=True)
examples: Optional[str] = DEFAULT_FEWSHOT_EXAMPLES
session_cache: Dict[str, Any] = Field(default_factory=dict, exclude=True)
max_iterations: int = 5
output_token_limit: int = 4000
tiktoken_model_name: Optional[str] = None # "cl100k_base"
class Config:
"""Configuration for this pydantic object."""
arbitrary_types_allowed = True
@validator("llm_chain")
def validate_llm_chain_input_variables( # pylint: disable=E0213
cls, llm_chain: Any
) -> Any:
"""Make sure the LLM chain has the correct input variables."""
for var in llm_chain.prompt.input_variables:
if var not in ["tool_input", "tables", "schemas", "examples"]:
raise ValueError(
"LLM chain for QueryPowerBITool must have input variables ['tool_input', 'tables', 'schemas', 'examples'], found %s", # noqa: C0301 E501 # pylint: disable=C0301
llm_chain.prompt.input_variables,
)
return llm_chain
def _check_cache(self, tool_input: str) -> Optional[str]:
"""Check if the input is present in the cache.
If the value is a bad request, overwrite with the escalated version,
if not present return None."""
if tool_input not in self.session_cache:
return None
return self.session_cache[tool_input]
def _run(
self,
tool_input: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
**kwargs: Any,
) -> str:
"""Execute the query, return the results or an error message."""
if cache := self._check_cache(tool_input):
logger.debug("Found cached result for %s: %s", tool_input, cache)
return cache
try:
logger.info("Running PBI Query Tool with input: %s", tool_input)
query = self.llm_chain.predict(
tool_input=tool_input,
tables=self.powerbi.get_table_names(),
schemas=self.powerbi.get_schemas(),
examples=self.examples,
callbacks=run_manager.get_child() if run_manager else None,
)
except Exception as exc: # pylint: disable=broad-except
self.session_cache[tool_input] = f"Error on call to LLM: {exc}"
return self.session_cache[tool_input]
if query == "I cannot answer this":
self.session_cache[tool_input] = query
return self.session_cache[tool_input]
logger.info("PBI Query:\n%s", query)
start_time = perf_counter()
pbi_result = self.powerbi.run(command=query)
end_time = perf_counter()
logger.debug("PBI Result: %s", pbi_result)
logger.debug(f"PBI Query duration: {end_time - start_time:0.6f}")
result, error = self._parse_output(pbi_result)
if error is not None and "TokenExpired" in error:
self.session_cache[
tool_input
] = "Authentication token expired or invalid, please try reauthenticate."
return self.session_cache[tool_input]
iterations = kwargs.get("iterations", 0)
if error and iterations < self.max_iterations:
return self._run(
tool_input=RETRY_RESPONSE.format(
tool_input=tool_input, query=query, error=error
),
run_manager=run_manager,
iterations=iterations + 1,
)
self.session_cache[tool_input] = (
result if result else BAD_REQUEST_RESPONSE.format(error=error)
)
return self.session_cache[tool_input]
async def _arun(
self,
tool_input: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
**kwargs: Any,
) -> str:
"""Execute the query, return the results or an error message."""
if cache := self._check_cache(tool_input):
logger.debug("Found cached result for %s: %s", tool_input, cache)
return f"{cache}, from cache, you have already asked this question."
try:
logger.info("Running PBI Query Tool with input: %s", tool_input)
query = await self.llm_chain.apredict(
tool_input=tool_input,
tables=self.powerbi.get_table_names(),
schemas=self.powerbi.get_schemas(),
examples=self.examples,
callbacks=run_manager.get_child() if run_manager else None,
)
except Exception as exc: # pylint: disable=broad-except
self.session_cache[tool_input] = f"Error on call to LLM: {exc}"
return self.session_cache[tool_input]
if query == "I cannot answer this":
self.session_cache[tool_input] = query
return self.session_cache[tool_input]
logger.info("PBI Query: %s", query)
start_time = perf_counter()
pbi_result = await self.powerbi.arun(command=query)
end_time = perf_counter()
logger.debug("PBI Result: %s", pbi_result)
logger.debug(f"PBI Query duration: {end_time - start_time:0.6f}")
result, error = self._parse_output(pbi_result)
if error is not None and ("TokenExpired" in error or "TokenError" in error):
self.session_cache[
tool_input
] = "Authentication token expired or invalid, please try to reauthenticate or check the scope of the credential." # noqa: E501
return self.session_cache[tool_input]
iterations = kwargs.get("iterations", 0)
if error and iterations < self.max_iterations:
return await self._arun(
tool_input=RETRY_RESPONSE.format(
tool_input=tool_input, query=query, error=error
),
run_manager=run_manager,
iterations=iterations + 1,
)
self.session_cache[tool_input] = (
result if result else BAD_REQUEST_RESPONSE.format(error=error)
)
return self.session_cache[tool_input]
def _parse_output(
self, pbi_result: Dict[str, Any]
) -> Tuple[Optional[str], Optional[Any]]:
"""Parse the output of the query to a markdown table."""
if "results" in pbi_result:
rows = pbi_result["results"][0]["tables"][0]["rows"]
if len(rows) == 0:
logger.info("0 records in result, query was valid.")
return (
None,
"0 rows returned, this might be correct, but please validate if all filter values were correct?", # noqa: E501
)
result = json_to_md(rows)
too_long, length = self._result_too_large(result)
if too_long:
return (
f"Result too large, please try to be more specific or use the `TOPN` function. The result is {length} tokens long, the limit is {self.output_token_limit} tokens.", # noqa: E501
None,
)
return result, None
if "error" in pbi_result:
if (
"pbi.error" in pbi_result["error"]
and "details" in pbi_result["error"]["pbi.error"]
):
return None, pbi_result["error"]["pbi.error"]["details"][0]["detail"]
return None, pbi_result["error"]
return None, pbi_result
def _result_too_large(self, result: str) -> Tuple[bool, int]:
"""Tokenize the output of the query."""
if self.tiktoken_model_name:
tiktoken_ = _import_tiktoken()
encoding = tiktoken_.encoding_for_model(self.tiktoken_model_name)
length = len(encoding.encode(result))
logger.info("Result length: %s", length)
return length > self.output_token_limit, length
return False, 0
class InfoPowerBITool(BaseTool):
"""Tool for getting metadata about a PowerBI Dataset."""
name: str = "schema_powerbi"
description: str = """
Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.
Be sure that the tables actually exist by calling list_tables_powerbi first!
Example Input: "table1, table2, table3"
""" # noqa: E501
powerbi: PowerBIDataset = Field(exclude=True)
class Config:
"""Configuration for this pydantic object."""
arbitrary_types_allowed = True
def _run(
self,
tool_input: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the schema for tables in a comma-separated list."""
return self.powerbi.get_table_info(tool_input.split(", "))
async def _arun(
self,
tool_input: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
return await self.powerbi.aget_table_info(tool_input.split(", "))
class ListPowerBITool(BaseTool):
"""Tool for getting tables names."""
name: str = "list_tables_powerbi"
description: str = "Input is an empty string, output is a comma separated list of tables in the database." # noqa: E501 # pylint: disable=C0301
powerbi: PowerBIDataset = Field(exclude=True)
class Config:
"""Configuration for this pydantic object."""
arbitrary_types_allowed = True
def _run(
self,
tool_input: Optional[str] = None,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the names of the tables."""
return ", ".join(self.powerbi.get_table_names())
async def _arun(
self,
tool_input: Optional[str] = None,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Get the names of the tables."""
return ", ".join(self.powerbi.get_table_names())

View File

@@ -0,0 +1,130 @@
# flake8: noqa
"""Tools for interacting with Spark SQL."""
from typing import Any, Dict, Optional
from langchain_core.pydantic_v1 import BaseModel, Field, root_validator
from langchain_core.language_models import BaseLanguageModel
from langchain_core.callbacks import (
AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun,
)
from langchain_core.prompts import PromptTemplate
from langchain_community.utilities.spark_sql import SparkSQL
from langchain_core.tools import BaseTool
from langchain_community.tools.spark_sql.prompt import QUERY_CHECKER
class BaseSparkSQLTool(BaseModel):
"""Base tool for interacting with Spark SQL."""
db: SparkSQL = Field(exclude=True)
class Config(BaseTool.Config):
pass
class QuerySparkSQLTool(BaseSparkSQLTool, BaseTool):
"""Tool for querying a Spark SQL."""
name: str = "query_sql_db"
description: str = """
Input to this tool is a detailed and correct SQL query, output is a result from the Spark SQL.
If the query is not correct, an error message will be returned.
If an error is returned, rewrite the query, check the query, and try again.
"""
def _run(
self,
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Execute the query, return the results or an error message."""
return self.db.run_no_throw(query)
class InfoSparkSQLTool(BaseSparkSQLTool, BaseTool):
"""Tool for getting metadata about a Spark SQL."""
name: str = "schema_sql_db"
description: str = """
Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.
Be sure that the tables actually exist by calling list_tables_sql_db first!
Example Input: "table1, table2, table3"
"""
def _run(
self,
table_names: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the schema for tables in a comma-separated list."""
return self.db.get_table_info_no_throw(table_names.split(", "))
class ListSparkSQLTool(BaseSparkSQLTool, BaseTool):
"""Tool for getting tables names."""
name: str = "list_tables_sql_db"
description: str = "Input is an empty string, output is a comma separated list of tables in the Spark SQL."
def _run(
self,
tool_input: str = "",
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the schema for a specific table."""
return ", ".join(self.db.get_usable_table_names())
class QueryCheckerTool(BaseSparkSQLTool, BaseTool):
"""Use an LLM to check if a query is correct.
Adapted from https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/"""
template: str = QUERY_CHECKER
llm: BaseLanguageModel
llm_chain: Any = Field(init=False)
name: str = "query_checker_sql_db"
description: str = """
Use this tool to double check if your query is correct before executing it.
Always use this tool before executing a query with query_sql_db!
"""
@root_validator(pre=True)
def initialize_llm_chain(cls, values: Dict[str, Any]) -> Dict[str, Any]:
if "llm_chain" not in values:
from langchain.chains.llm import LLMChain
values["llm_chain"] = LLMChain(
llm=values.get("llm"),
prompt=PromptTemplate(
template=QUERY_CHECKER, input_variables=["query"]
),
)
if values["llm_chain"].prompt.input_variables != ["query"]:
raise ValueError(
"LLM chain for QueryCheckerTool need to use ['query'] as input_variables "
"for the embedded prompt"
)
return values
def _run(
self,
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the LLM to check the query."""
return self.llm_chain.predict(
query=query, callbacks=run_manager.get_child() if run_manager else None
)
async def _arun(
self,
query: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
return await self.llm_chain.apredict(
query=query, callbacks=run_manager.get_child() if run_manager else None
)

View File

@@ -0,0 +1,134 @@
# flake8: noqa
"""Tools for interacting with a SQL database."""
from typing import Any, Dict, Optional
from langchain_core.pydantic_v1 import BaseModel, Extra, Field, root_validator
from langchain_core.language_models import BaseLanguageModel
from langchain_core.callbacks import (
AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun,
)
from langchain_core.prompts import PromptTemplate
from langchain_community.utilities.sql_database import SQLDatabase
from langchain_core.tools import BaseTool
from langchain_community.tools.sql_database.prompt import QUERY_CHECKER
class BaseSQLDatabaseTool(BaseModel):
"""Base tool for interacting with a SQL database."""
db: SQLDatabase = Field(exclude=True)
class Config(BaseTool.Config):
pass
class QuerySQLDataBaseTool(BaseSQLDatabaseTool, BaseTool):
"""Tool for querying a SQL database."""
name: str = "sql_db_query"
description: str = """
Input to this tool is a detailed and correct SQL query, output is a result from the database.
If the query is not correct, an error message will be returned.
If an error is returned, rewrite the query, check the query, and try again.
"""
def _run(
self,
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Execute the query, return the results or an error message."""
return self.db.run_no_throw(query)
class InfoSQLDatabaseTool(BaseSQLDatabaseTool, BaseTool):
"""Tool for getting metadata about a SQL database."""
name: str = "sql_db_schema"
description: str = """
Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.
Example Input: "table1, table2, table3"
"""
def _run(
self,
table_names: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the schema for tables in a comma-separated list."""
return self.db.get_table_info_no_throw(
[t.strip() for t in table_names.split(",")]
)
class ListSQLDatabaseTool(BaseSQLDatabaseTool, BaseTool):
"""Tool for getting tables names."""
name: str = "sql_db_list_tables"
description: str = "Input is an empty string, output is a comma separated list of tables in the database."
def _run(
self,
tool_input: str = "",
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the schema for a specific table."""
return ", ".join(self.db.get_usable_table_names())
class QuerySQLCheckerTool(BaseSQLDatabaseTool, BaseTool):
"""Use an LLM to check if a query is correct.
Adapted from https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/"""
template: str = QUERY_CHECKER
llm: BaseLanguageModel
llm_chain: Any = Field(init=False)
name: str = "sql_db_query_checker"
description: str = """
Use this tool to double check if your query is correct before executing it.
Always use this tool before executing a query with sql_db_query!
"""
@root_validator(pre=True)
def initialize_llm_chain(cls, values: Dict[str, Any]) -> Dict[str, Any]:
if "llm_chain" not in values:
from langchain.chains.llm import LLMChain
values["llm_chain"] = LLMChain(
llm=values.get("llm"),
prompt=PromptTemplate(
template=QUERY_CHECKER, input_variables=["dialect", "query"]
),
)
if values["llm_chain"].prompt.input_variables != ["dialect", "query"]:
raise ValueError(
"LLM chain for QueryCheckerTool must have input variables ['query', 'dialect']"
)
return values
def _run(
self,
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the LLM to check the query."""
return self.llm_chain.predict(
query=query,
dialect=self.db.dialect,
callbacks=run_manager.get_child() if run_manager else None,
)
async def _arun(
self,
query: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
return await self.llm_chain.apredict(
query=query,
dialect=self.db.dialect,
callbacks=run_manager.get_child() if run_manager else None,
)

View File

@@ -0,0 +1,215 @@
"""[DEPRECATED]
## Zapier Natural Language Actions API
\
Full docs here: https://nla.zapier.com/start/
**Zapier Natural Language Actions** gives you access to the 5k+ apps, 20k+ actions
on Zapier's platform through a natural language API interface.
NLA supports apps like Gmail, Salesforce, Trello, Slack, Asana, HubSpot, Google Sheets,
Microsoft Teams, and thousands more apps: https://zapier.com/apps
Zapier NLA handles ALL the underlying API auth and translation from
natural language --> underlying API call --> return simplified output for LLMs
The key idea is you, or your users, expose a set of actions via an oauth-like setup
window, which you can then query and execute via a REST API.
NLA offers both API Key and OAuth for signing NLA API requests.
1. Server-side (API Key): for quickly getting started, testing, and production scenarios
where LangChain will only use actions exposed in the developer's Zapier account
(and will use the developer's connected accounts on Zapier.com)
2. User-facing (Oauth): for production scenarios where you are deploying an end-user
facing application and LangChain needs access to end-user's exposed actions and
connected accounts on Zapier.com
This quick start will focus on the server-side use case for brevity.
Review [full docs](https://nla.zapier.com/start/) for user-facing oauth developer
support.
Typically, you'd use SequentialChain, here's a basic example:
1. Use NLA to find an email in Gmail
2. Use LLMChain to generate a draft reply to (1)
3. Use NLA to send the draft reply (2) to someone in Slack via direct message
In code, below:
```python
import os
# get from https://platform.openai.com/
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "")
# get from https://nla.zapier.com/docs/authentication/
os.environ["ZAPIER_NLA_API_KEY"] = os.environ.get("ZAPIER_NLA_API_KEY", "")
from langchain_community.agent_toolkits import ZapierToolkit
from langchain_community.utilities.zapier import ZapierNLAWrapper
## step 0. expose gmail 'find email' and slack 'send channel message' actions
# first go here, log in, expose (enable) the two actions:
# https://nla.zapier.com/demo/start
# -- for this example, can leave all fields "Have AI guess"
# in an oauth scenario, you'd get your own <provider> id (instead of 'demo')
# which you route your users through first
zapier = ZapierNLAWrapper()
## To leverage OAuth you may pass the value `nla_oauth_access_token` to
## the ZapierNLAWrapper. If you do this there is no need to initialize
## the ZAPIER_NLA_API_KEY env variable
# zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token="TOKEN_HERE")
toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)
```
"""
from typing import Any, Dict, Optional
from langchain_core._api import warn_deprecated
from langchain_core.callbacks import (
AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun,
)
from langchain_core.pydantic_v1 import Field, root_validator
from langchain_core.tools import BaseTool
from langchain_community.tools.zapier.prompt import BASE_ZAPIER_TOOL_PROMPT
from langchain_community.utilities.zapier import ZapierNLAWrapper
class ZapierNLARunAction(BaseTool):
"""
Args:
action_id: a specific action ID (from list actions) of the action to execute
(the set api_key must be associated with the action owner)
instructions: a natural language instruction string for using the action
(eg. "get the latest email from Mike Knoop" for "Gmail: find email" action)
params: a dict, optional. Any params provided will *override* AI guesses
from `instructions` (see "understanding the AI guessing flow" here:
https://nla.zapier.com/docs/using-the-api#ai-guessing)
"""
api_wrapper: ZapierNLAWrapper = Field(default_factory=ZapierNLAWrapper)
action_id: str
params: Optional[dict] = None
base_prompt: str = BASE_ZAPIER_TOOL_PROMPT
zapier_description: str
params_schema: Dict[str, str] = Field(default_factory=dict)
name: str = ""
description: str = ""
@root_validator
def set_name_description(cls, values: Dict[str, Any]) -> Dict[str, Any]:
zapier_description = values["zapier_description"]
params_schema = values["params_schema"]
if "instructions" in params_schema:
del params_schema["instructions"]
# Ensure base prompt (if overridden) contains necessary input fields
necessary_fields = {"{zapier_description}", "{params}"}
if not all(field in values["base_prompt"] for field in necessary_fields):
raise ValueError(
"Your custom base Zapier prompt must contain input fields for "
"{zapier_description} and {params}."
)
values["name"] = zapier_description
values["description"] = values["base_prompt"].format(
zapier_description=zapier_description,
params=str(list(params_schema.keys())),
)
return values
def _run(
self, instructions: str, run_manager: Optional[CallbackManagerForToolRun] = None
) -> str:
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
warn_deprecated(
since="0.0.319",
message=(
"This tool will be deprecated on 2023-11-17. See "
"https://nla.zapier.com/sunset/ for details"
),
)
return self.api_wrapper.run_as_str(self.action_id, instructions, self.params)
async def _arun(
self,
instructions: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
warn_deprecated(
since="0.0.319",
message=(
"This tool will be deprecated on 2023-11-17. See "
"https://nla.zapier.com/sunset/ for details"
),
)
return await self.api_wrapper.arun_as_str(
self.action_id,
instructions,
self.params,
)
ZapierNLARunAction.__doc__ = (
ZapierNLAWrapper.run.__doc__ + ZapierNLARunAction.__doc__ # type: ignore
)
# other useful actions
class ZapierNLAListActions(BaseTool):
"""
Args:
None
"""
name: str = "ZapierNLA_list_actions"
description: str = BASE_ZAPIER_TOOL_PROMPT + (
"This tool returns a list of the user's exposed actions."
)
api_wrapper: ZapierNLAWrapper = Field(default_factory=ZapierNLAWrapper)
def _run(
self,
_: str = "",
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
warn_deprecated(
since="0.0.319",
message=(
"This tool will be deprecated on 2023-11-17. See "
"https://nla.zapier.com/sunset/ for details"
),
)
return self.api_wrapper.list_as_str()
async def _arun(
self,
_: str = "",
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
warn_deprecated(
since="0.0.319",
message=(
"This tool will be deprecated on 2023-11-17. See "
"https://nla.zapier.com/sunset/ for details"
),
)
return await self.api_wrapper.alist_as_str()
ZapierNLAListActions.__doc__ = (
ZapierNLAWrapper.list.__doc__ + ZapierNLAListActions.__doc__ # type: ignore
)

View File

@@ -0,0 +1,283 @@
"""Integration tests for the langchain tracer module."""
import asyncio
import os
from aiohttp import ClientSession
from langchain_core.callbacks.manager import atrace_as_chain_group, trace_as_chain_group
from langchain_core.tracers.context import tracing_v2_enabled, tracing_enabled
from langchain_core.prompts import PromptTemplate
from langchain_community.chat_models import ChatOpenAI
from langchain_community.llms import OpenAI
questions = [
(
"Who won the US Open men's final in 2019? "
"What is his age raised to the 0.334 power?"
),
(
"Who is Olivia Wilde's boyfriend? "
"What is his current age raised to the 0.23 power?"
),
(
"Who won the most recent formula 1 grand prix? "
"What is their age raised to the 0.23 power?"
),
(
"Who won the US Open women's final in 2019? "
"What is her age raised to the 0.34 power?"
),
("Who is Beyonce's husband? " "What is his age raised to the 0.19 power?"),
]
def test_tracing_sequential() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING"] = "true"
for q in questions[:3]:
llm = OpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(q)
def test_tracing_session_env_var() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING"] = "true"
os.environ["LANGCHAIN_SESSION"] = "my_session"
llm = OpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(questions[0])
if "LANGCHAIN_SESSION" in os.environ:
del os.environ["LANGCHAIN_SESSION"]
async def test_tracing_concurrent() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING"] = "true"
aiosession = ClientSession()
llm = OpenAI(temperature=0)
async_tools = load_tools(["llm-math", "serpapi"], llm=llm, aiosession=aiosession)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
tasks = [agent.arun(q) for q in questions[:3]]
await asyncio.gather(*tasks)
await aiosession.close()
async def test_tracing_concurrent_bw_compat_environ() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_HANDLER"] = "langchain"
if "LANGCHAIN_TRACING" in os.environ:
del os.environ["LANGCHAIN_TRACING"]
aiosession = ClientSession()
llm = OpenAI(temperature=0)
async_tools = load_tools(["llm-math", "serpapi"], llm=llm, aiosession=aiosession)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
tasks = [agent.arun(q) for q in questions[:3]]
await asyncio.gather(*tasks)
await aiosession.close()
if "LANGCHAIN_HANDLER" in os.environ:
del os.environ["LANGCHAIN_HANDLER"]
def test_tracing_context_manager() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = OpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if "LANGCHAIN_TRACING" in os.environ:
del os.environ["LANGCHAIN_TRACING"]
with tracing_enabled() as session:
assert session
agent.run(questions[0]) # this should be traced
agent.run(questions[0]) # this should not be traced
async def test_tracing_context_manager_async() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = OpenAI(temperature=0)
async_tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if "LANGCHAIN_TRACING" in os.environ:
del os.environ["LANGCHAIN_TRACING"]
# start a background task
task = asyncio.create_task(agent.arun(questions[0])) # this should not be traced
with tracing_enabled() as session:
assert session
tasks = [agent.arun(q) for q in questions[1:4]] # these should be traced
await asyncio.gather(*tasks)
await task
async def test_tracing_v2_environment_variable() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING_V2"] = "true"
aiosession = ClientSession()
llm = OpenAI(temperature=0)
async_tools = load_tools(["llm-math", "serpapi"], llm=llm, aiosession=aiosession)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
tasks = [agent.arun(q) for q in questions[:3]]
await asyncio.gather(*tasks)
await aiosession.close()
def test_tracing_v2_context_manager() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = ChatOpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if "LANGCHAIN_TRACING_V2" in os.environ:
del os.environ["LANGCHAIN_TRACING_V2"]
with tracing_v2_enabled():
agent.run(questions[0]) # this should be traced
agent.run(questions[0]) # this should not be traced
def test_tracing_v2_chain_with_tags() -> None:
from langchain.chains.llm import LLMChain
from langchain.chains.constitutional_ai.base import ConstitutionalChain
from langchain.chains.constitutional_ai.models import ConstitutionalPrinciple
llm = OpenAI(temperature=0)
chain = ConstitutionalChain.from_llm(
llm,
chain=LLMChain.from_string(llm, "Q: {question} A:"),
tags=["only-root"],
constitutional_principles=[
ConstitutionalPrinciple(
critique_request="Tell if this answer is good.",
revision_request="Give a better answer.",
)
],
)
if "LANGCHAIN_TRACING_V2" in os.environ:
del os.environ["LANGCHAIN_TRACING_V2"]
with tracing_v2_enabled():
chain.run("what is the meaning of life", tags=["a-tag"])
def test_tracing_v2_agent_with_metadata() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING_V2"] = "true"
llm = OpenAI(temperature=0)
chat = ChatOpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
chat_agent = initialize_agent(
tools, chat, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
chat_agent.run(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
async def test_tracing_v2_async_agent_with_metadata() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING_V2"] = "true"
llm = OpenAI(temperature=0, metadata={"f": "g", "h": "i"})
chat = ChatOpenAI(temperature=0, metadata={"f": "g", "h": "i"})
async_tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
chat_agent = initialize_agent(
async_tools,
chat,
agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
)
await agent.arun(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
await chat_agent.arun(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
def test_trace_as_group() -> None:
from langchain.chains.llm import LLMChain
llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
with trace_as_chain_group("my_group", inputs={"input": "cars"}) as group_manager:
chain.run(product="cars", callbacks=group_manager)
chain.run(product="computers", callbacks=group_manager)
final_res = chain.run(product="toys", callbacks=group_manager)
group_manager.on_chain_end({"output": final_res})
with trace_as_chain_group("my_group_2", inputs={"input": "toys"}) as group_manager:
final_res = chain.run(product="toys", callbacks=group_manager)
group_manager.on_chain_end({"output": final_res})
def test_trace_as_group_with_env_set() -> None:
from langchain.chains.llm import LLMChain
os.environ["LANGCHAIN_TRACING_V2"] = "true"
llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
with trace_as_chain_group(
"my_group_env_set", inputs={"input": "cars"}
) as group_manager:
chain.run(product="cars", callbacks=group_manager)
chain.run(product="computers", callbacks=group_manager)
final_res = chain.run(product="toys", callbacks=group_manager)
group_manager.on_chain_end({"output": final_res})
with trace_as_chain_group(
"my_group_2_env_set", inputs={"input": "toys"}
) as group_manager:
final_res = chain.run(product="toys", callbacks=group_manager)
group_manager.on_chain_end({"output": final_res})
async def test_trace_as_group_async() -> None:
from langchain.chains.llm import LLMChain
llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
async with atrace_as_chain_group("my_async_group") as group_manager:
await chain.arun(product="cars", callbacks=group_manager)
await chain.arun(product="computers", callbacks=group_manager)
await chain.arun(product="toys", callbacks=group_manager)
async with atrace_as_chain_group(
"my_async_group_2", inputs={"input": "toys"}
) as group_manager:
res = await asyncio.gather(
*[
chain.arun(product="toys", callbacks=group_manager),
chain.arun(product="computers", callbacks=group_manager),
chain.arun(product="cars", callbacks=group_manager),
]
)
await group_manager.on_chain_end({"output": res})

View File

@@ -0,0 +1,68 @@
"""Integration tests for the langchain tracer module."""
import asyncio
from langchain_community.callbacks import get_openai_callback
from langchain_community.llms import OpenAI
async def test_openai_callback() -> None:
llm = OpenAI(temperature=0)
with get_openai_callback() as cb:
llm("What is the square root of 4?")
total_tokens = cb.total_tokens
assert total_tokens > 0
with get_openai_callback() as cb:
llm("What is the square root of 4?")
llm("What is the square root of 4?")
assert cb.total_tokens == total_tokens * 2
with get_openai_callback() as cb:
await asyncio.gather(
*[llm.agenerate(["What is the square root of 4?"]) for _ in range(3)]
)
assert cb.total_tokens == total_tokens * 3
task = asyncio.create_task(llm.agenerate(["What is the square root of 4?"]))
with get_openai_callback() as cb:
await llm.agenerate(["What is the square root of 4?"])
await task
assert cb.total_tokens == total_tokens
def test_openai_callback_batch_llm() -> None:
llm = OpenAI(temperature=0)
with get_openai_callback() as cb:
llm.generate(["What is the square root of 4?", "What is the square root of 4?"])
assert cb.total_tokens > 0
total_tokens = cb.total_tokens
with get_openai_callback() as cb:
llm("What is the square root of 4?")
llm("What is the square root of 4?")
assert cb.total_tokens == total_tokens
def test_openai_callback_agent() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
with get_openai_callback() as cb:
agent.run(
"Who is Olivia Wilde's boyfriend? "
"What is his current age raised to the 0.23 power?"
)
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")

View File

@@ -0,0 +1,30 @@
"""Integration tests for the StreamlitCallbackHandler module."""
import pytest
# Import the internal StreamlitCallbackHandler from its module - and not from
# the `langchain_community.callbacks.streamlit` package - so that we don't end up using
# Streamlit's externally-provided callback handler.
from langchain_community.callbacks.streamlit.streamlit_callback_handler import (
StreamlitCallbackHandler,
)
from langchain_community.llms import OpenAI
@pytest.mark.requires("streamlit")
def test_streamlit_callback_agent() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
import streamlit as st
streamlit_callback = StreamlitCallbackHandler(st.container())
llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(
"Who is Olivia Wilde's boyfriend? "
"What is his current age raised to the 0.23 power?",
callbacks=[streamlit_callback],
)

View File

@@ -0,0 +1,118 @@
"""Integration tests for the langchain tracer module."""
import asyncio
import os
from aiohttp import ClientSession
from langchain_community.callbacks import wandb_tracing_enabled
from langchain_community.llms import OpenAI
questions = [
(
"Who won the US Open men's final in 2019? "
"What is his age raised to the 0.334 power?"
),
(
"Who is Olivia Wilde's boyfriend? "
"What is his current age raised to the 0.23 power?"
),
(
"Who won the most recent formula 1 grand prix? "
"What is their age raised to the 0.23 power?"
),
(
"Who won the US Open women's final in 2019? "
"What is her age raised to the 0.34 power?"
),
("Who is Beyonce's husband? " "What is his age raised to the 0.19 power?"),
]
def test_tracing_sequential() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
os.environ["WANDB_PROJECT"] = "langchain-tracing"
for q in questions[:3]:
llm = OpenAI(temperature=0)
tools = load_tools(
["llm-math", "serpapi"],
llm=llm,
)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(q)
def test_tracing_session_env_var() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
llm = OpenAI(temperature=0)
tools = load_tools(
["llm-math", "serpapi"],
llm=llm,
)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(questions[0])
async def test_tracing_concurrent() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
aiosession = ClientSession()
llm = OpenAI(temperature=0)
async_tools = load_tools(
["llm-math", "serpapi"],
llm=llm,
aiosession=aiosession,
)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
tasks = [agent.arun(q) for q in questions[:3]]
await asyncio.gather(*tasks)
await aiosession.close()
def test_tracing_context_manager() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = OpenAI(temperature=0)
tools = load_tools(
["llm-math", "serpapi"],
llm=llm,
)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if "LANGCHAIN_WANDB_TRACING" in os.environ:
del os.environ["LANGCHAIN_WANDB_TRACING"]
with wandb_tracing_enabled():
agent.run(questions[0]) # this should be traced
agent.run(questions[0]) # this should not be traced
async def test_tracing_context_manager_async() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = OpenAI(temperature=0)
async_tools = load_tools(
["llm-math", "serpapi"],
llm=llm,
)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if "LANGCHAIN_WANDB_TRACING" in os.environ:
del os.environ["LANGCHAIN_TRACING"]
# start a background task
task = asyncio.create_task(agent.arun(questions[0])) # this should not be traced
with wandb_tracing_enabled():
tasks = [agent.arun(q) for q in questions[1:4]] # these should be traced
await asyncio.gather(*tasks)
await task

View File

@@ -0,0 +1,333 @@
"""Test ChatOpenAI wrapper."""
from typing import Any, Optional
import pytest
from langchain_core.callbacks import CallbackManager
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage
from langchain_core.outputs import (
ChatGeneration,
ChatResult,
LLMResult,
)
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_community.chat_models.openai import ChatOpenAI
from tests.unit_tests.callbacks.fake_callback_handler import FakeCallbackHandler
@pytest.mark.scheduled
def test_chat_openai() -> None:
"""Test ChatOpenAI wrapper."""
chat = ChatOpenAI(
temperature=0.7,
base_url=None,
organization=None,
openai_proxy=None,
timeout=10.0,
max_retries=3,
http_client=None,
n=1,
max_tokens=10,
default_headers=None,
default_query=None,
)
message = HumanMessage(content="Hello")
response = chat([message])
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_chat_openai_model() -> None:
"""Test ChatOpenAI wrapper handles model_name."""
chat = ChatOpenAI(model="foo")
assert chat.model_name == "foo"
chat = ChatOpenAI(model_name="bar")
assert chat.model_name == "bar"
def test_chat_openai_system_message() -> None:
"""Test ChatOpenAI wrapper with system message."""
chat = ChatOpenAI(max_tokens=10)
system_message = SystemMessage(content="You are to chat with the user.")
human_message = HumanMessage(content="Hello")
response = chat([system_message, human_message])
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
@pytest.mark.scheduled
def test_chat_openai_generate() -> None:
"""Test ChatOpenAI wrapper with generate."""
chat = ChatOpenAI(max_tokens=10, n=2)
message = HumanMessage(content="Hello")
response = chat.generate([[message], [message]])
assert isinstance(response, LLMResult)
assert len(response.generations) == 2
assert response.llm_output
for generations in response.generations:
assert len(generations) == 2
for generation in generations:
assert isinstance(generation, ChatGeneration)
assert isinstance(generation.text, str)
assert generation.text == generation.message.content
@pytest.mark.scheduled
def test_chat_openai_multiple_completions() -> None:
"""Test ChatOpenAI wrapper with multiple completions."""
chat = ChatOpenAI(max_tokens=10, n=5)
message = HumanMessage(content="Hello")
response = chat._generate([message])
assert isinstance(response, ChatResult)
assert len(response.generations) == 5
for generation in response.generations:
assert isinstance(generation.message, BaseMessage)
assert isinstance(generation.message.content, str)
@pytest.mark.scheduled
def test_chat_openai_streaming() -> None:
"""Test that streaming correctly invokes on_llm_new_token callback."""
callback_handler = FakeCallbackHandler()
callback_manager = CallbackManager([callback_handler])
chat = ChatOpenAI(
max_tokens=10,
streaming=True,
temperature=0,
callback_manager=callback_manager,
verbose=True,
)
message = HumanMessage(content="Hello")
response = chat([message])
assert callback_handler.llm_streams > 0
assert isinstance(response, BaseMessage)
@pytest.mark.scheduled
def test_chat_openai_streaming_generation_info() -> None:
"""Test that generation info is preserved when streaming."""
class _FakeCallback(FakeCallbackHandler):
saved_things: dict = {}
def on_llm_end(
self,
*args: Any,
**kwargs: Any,
) -> Any:
# Save the generation
self.saved_things["generation"] = args[0]
callback = _FakeCallback()
callback_manager = CallbackManager([callback])
chat = ChatOpenAI(
max_tokens=2,
temperature=0,
callback_manager=callback_manager,
)
list(chat.stream("hi"))
generation = callback.saved_things["generation"]
# `Hello!` is two tokens, assert that that is what is returned
assert generation.generations[0][0].text == "Hello!"
def test_chat_openai_llm_output_contains_model_name() -> None:
"""Test llm_output contains model_name."""
chat = ChatOpenAI(max_tokens=10)
message = HumanMessage(content="Hello")
llm_result = chat.generate([[message]])
assert llm_result.llm_output is not None
assert llm_result.llm_output["model_name"] == chat.model_name
def test_chat_openai_streaming_llm_output_contains_model_name() -> None:
"""Test llm_output contains model_name."""
chat = ChatOpenAI(max_tokens=10, streaming=True)
message = HumanMessage(content="Hello")
llm_result = chat.generate([[message]])
assert llm_result.llm_output is not None
assert llm_result.llm_output["model_name"] == chat.model_name
def test_chat_openai_invalid_streaming_params() -> None:
"""Test that streaming correctly invokes on_llm_new_token callback."""
with pytest.raises(ValueError):
ChatOpenAI(
max_tokens=10,
streaming=True,
temperature=0,
n=5,
)
@pytest.mark.scheduled
async def test_async_chat_openai() -> None:
"""Test async generation."""
chat = ChatOpenAI(max_tokens=10, n=2)
message = HumanMessage(content="Hello")
response = await chat.agenerate([[message], [message]])
assert isinstance(response, LLMResult)
assert len(response.generations) == 2
assert response.llm_output
for generations in response.generations:
assert len(generations) == 2
for generation in generations:
assert isinstance(generation, ChatGeneration)
assert isinstance(generation.text, str)
assert generation.text == generation.message.content
@pytest.mark.scheduled
async def test_async_chat_openai_streaming() -> None:
"""Test that streaming correctly invokes on_llm_new_token callback."""
callback_handler = FakeCallbackHandler()
callback_manager = CallbackManager([callback_handler])
chat = ChatOpenAI(
max_tokens=10,
streaming=True,
temperature=0,
callback_manager=callback_manager,
verbose=True,
)
message = HumanMessage(content="Hello")
response = await chat.agenerate([[message], [message]])
assert callback_handler.llm_streams > 0
assert isinstance(response, LLMResult)
assert len(response.generations) == 2
for generations in response.generations:
assert len(generations) == 1
for generation in generations:
assert isinstance(generation, ChatGeneration)
assert isinstance(generation.text, str)
assert generation.text == generation.message.content
@pytest.mark.scheduled
async def test_async_chat_openai_bind_functions() -> None:
"""Test ChatOpenAI wrapper with multiple completions."""
class Person(BaseModel):
"""Identifying information about a person."""
name: str = Field(..., title="Name", description="The person's name")
age: int = Field(..., title="Age", description="The person's age")
fav_food: Optional[str] = Field(
default=None, title="Fav Food", description="The person's favorite food"
)
chat = ChatOpenAI(
max_tokens=30,
n=1,
streaming=True,
).bind_functions(functions=[Person], function_call="Person")
prompt = ChatPromptTemplate.from_messages(
[
("system", "Use the provided Person function"),
("user", "{input}"),
]
)
chain = prompt | chat
message = HumanMessage(content="Sally is 13 years old")
response = await chain.abatch([{"input": message}])
assert isinstance(response, list)
assert len(response) == 1
for generation in response:
assert isinstance(generation, AIMessage)
def test_chat_openai_extra_kwargs() -> None:
"""Test extra kwargs to chat openai."""
# Check that foo is saved in extra_kwargs.
llm = ChatOpenAI(foo=3, max_tokens=10)
assert llm.max_tokens == 10
assert llm.model_kwargs == {"foo": 3}
# Test that if extra_kwargs are provided, they are added to it.
llm = ChatOpenAI(foo=3, model_kwargs={"bar": 2})
assert llm.model_kwargs == {"foo": 3, "bar": 2}
# Test that if provided twice it errors
with pytest.raises(ValueError):
ChatOpenAI(foo=3, model_kwargs={"foo": 2})
# Test that if explicit param is specified in kwargs it errors
with pytest.raises(ValueError):
ChatOpenAI(model_kwargs={"temperature": 0.2})
# Test that "model" cannot be specified in kwargs
with pytest.raises(ValueError):
ChatOpenAI(model_kwargs={"model": "text-davinci-003"})
@pytest.mark.scheduled
def test_openai_streaming() -> None:
"""Test streaming tokens from OpenAI."""
llm = ChatOpenAI(max_tokens=10)
for token in llm.stream("I'm Pickle Rick"):
assert isinstance(token.content, str)
@pytest.mark.scheduled
async def test_openai_astream() -> None:
"""Test streaming tokens from OpenAI."""
llm = ChatOpenAI(max_tokens=10)
async for token in llm.astream("I'm Pickle Rick"):
assert isinstance(token.content, str)
@pytest.mark.scheduled
async def test_openai_abatch() -> None:
"""Test streaming tokens from ChatOpenAI."""
llm = ChatOpenAI(max_tokens=10)
result = await llm.abatch(["I'm Pickle Rick", "I'm not Pickle Rick"])
for token in result:
assert isinstance(token.content, str)
@pytest.mark.scheduled
async def test_openai_abatch_tags() -> None:
"""Test batch tokens from ChatOpenAI."""
llm = ChatOpenAI(max_tokens=10)
result = await llm.abatch(
["I'm Pickle Rick", "I'm not Pickle Rick"], config={"tags": ["foo"]}
)
for token in result:
assert isinstance(token.content, str)
@pytest.mark.scheduled
def test_openai_batch() -> None:
"""Test batch tokens from ChatOpenAI."""
llm = ChatOpenAI(max_tokens=10)
result = llm.batch(["I'm Pickle Rick", "I'm not Pickle Rick"])
for token in result:
assert isinstance(token.content, str)
@pytest.mark.scheduled
async def test_openai_ainvoke() -> None:
"""Test invoke tokens from ChatOpenAI."""
llm = ChatOpenAI(max_tokens=10)
result = await llm.ainvoke("I'm Pickle Rick", config={"tags": ["foo"]})
assert isinstance(result.content, str)
@pytest.mark.scheduled
def test_openai_invoke() -> None:
"""Test invoke tokens from ChatOpenAI."""
llm = ChatOpenAI(max_tokens=10)
result = llm.invoke("I'm Pickle Rick", config=dict(tags=["foo"]))
assert isinstance(result.content, str)

View File

@@ -0,0 +1,219 @@
"""Test Baidu Qianfan Chat Endpoint."""
from typing import Any
from langchain_core.callbacks import CallbackManager
from langchain_core.messages import (
AIMessage,
BaseMessage,
FunctionMessage,
HumanMessage,
)
from langchain_core.outputs import ChatGeneration, LLMResult
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_community.chat_models.baidu_qianfan_endpoint import QianfanChatEndpoint
from tests.unit_tests.callbacks.fake_callback_handler import FakeCallbackHandler
_FUNCTIONS: Any = [
{
"name": "format_person_info",
"description": (
"Output formatter. Should always be used to format your response to the"
" user."
),
"parameters": {
"title": "Person",
"description": "Identifying information about a person.",
"type": "object",
"properties": {
"name": {
"title": "Name",
"description": "The person's name",
"type": "string",
},
"age": {
"title": "Age",
"description": "The person's age",
"type": "integer",
},
"fav_food": {
"title": "Fav Food",
"description": "The person's favorite food",
"type": "string",
},
},
"required": ["name", "age"],
},
},
{
"name": "get_current_temperature",
"description": ("Used to get the location's temperature."),
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "city name",
},
"unit": {
"type": "string",
"enum": ["centigrade", "Fahrenheit"],
},
},
"required": ["location", "unit"],
},
"responses": {
"type": "object",
"properties": {
"temperature": {
"type": "integer",
"description": "city temperature",
},
"unit": {
"type": "string",
"enum": ["centigrade", "Fahrenheit"],
},
},
},
},
]
def test_default_call() -> None:
"""Test default model(`ERNIE-Bot`) call."""
chat = QianfanChatEndpoint()
response = chat(messages=[HumanMessage(content="Hello")])
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_model() -> None:
"""Test model kwarg works."""
chat = QianfanChatEndpoint(model="BLOOMZ-7B")
response = chat(messages=[HumanMessage(content="Hello")])
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_model_param() -> None:
"""Test model params works."""
chat = QianfanChatEndpoint()
response = chat(model="BLOOMZ-7B", messages=[HumanMessage(content="Hello")])
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_endpoint() -> None:
"""Test user custom model deployments like some open source models."""
chat = QianfanChatEndpoint(endpoint="qianfan_bloomz_7b_compressed")
response = chat(messages=[HumanMessage(content="Hello")])
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_endpoint_param() -> None:
"""Test user custom model deployments like some open source models."""
chat = QianfanChatEndpoint()
response = chat(
messages=[
HumanMessage(endpoint="qianfan_bloomz_7b_compressed", content="Hello")
]
)
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_multiple_history() -> None:
"""Tests multiple history works."""
chat = QianfanChatEndpoint()
response = chat(
messages=[
HumanMessage(content="Hello."),
AIMessage(content="Hello!"),
HumanMessage(content="How are you doing?"),
]
)
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_stream() -> None:
"""Test that stream works."""
chat = QianfanChatEndpoint(streaming=True)
callback_handler = FakeCallbackHandler()
callback_manager = CallbackManager([callback_handler])
response = chat(
messages=[
HumanMessage(content="Hello."),
AIMessage(content="Hello!"),
HumanMessage(content="Who are you?"),
],
stream=True,
callbacks=callback_manager,
)
assert callback_handler.llm_streams > 0
assert isinstance(response.content, str)
def test_multiple_messages() -> None:
"""Tests multiple messages works."""
chat = QianfanChatEndpoint()
message = HumanMessage(content="Hi, how are you.")
response = chat.generate([[message], [message]])
assert isinstance(response, LLMResult)
assert len(response.generations) == 2
for generations in response.generations:
assert len(generations) == 1
for generation in generations:
assert isinstance(generation, ChatGeneration)
assert isinstance(generation.text, str)
assert generation.text == generation.message.content
def test_functions_call_thoughts() -> None:
chat = QianfanChatEndpoint(model="ERNIE-Bot")
prompt_tmpl = "Use the given functions to answer following question: {input}"
prompt_msgs = [
HumanMessagePromptTemplate.from_template(prompt_tmpl),
]
prompt = ChatPromptTemplate(messages=prompt_msgs)
chain = prompt | chat.bind(functions=_FUNCTIONS)
message = HumanMessage(content="What's the temperature in Shanghai today?")
response = chain.batch([{"input": message}])
assert isinstance(response[0], AIMessage)
assert "function_call" in response[0].additional_kwargs
def test_functions_call() -> None:
chat = QianfanChatEndpoint(model="ERNIE-Bot")
prompt = ChatPromptTemplate(
messages=[
HumanMessage(content="What's the temperature in Shanghai today?"),
AIMessage(
content="",
additional_kwargs={
"function_call": {
"name": "get_current_temperature",
"thoughts": "i will use get_current_temperature "
"to resolve the questions",
"arguments": '{"location":"Shanghai","unit":"centigrade"}',
}
},
),
FunctionMessage(
name="get_current_weather",
content='{"temperature": "25", \
"unit": "摄氏度", "description": "晴朗"}',
),
]
)
chain = prompt | chat.bind(functions=_FUNCTIONS)
resp = chain.invoke({})
assert isinstance(resp, AIMessage)

View File

@@ -0,0 +1,182 @@
from pathlib import Path
import pytest
from langchain_community.document_loaders.concurrent import ConcurrentLoader
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import LanguageParser
def test_language_loader_for_python() -> None:
"""Test Python loader with parser enabled."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path, glob="hello_world.py", parser=LanguageParser(parser_threshold=5)
)
docs = loader.load()
assert len(docs) == 2
metadata = docs[0].metadata
assert metadata["source"] == str(file_path / "hello_world.py")
assert metadata["content_type"] == "functions_classes"
assert metadata["language"] == "python"
metadata = docs[1].metadata
assert metadata["source"] == str(file_path / "hello_world.py")
assert metadata["content_type"] == "simplified_code"
assert metadata["language"] == "python"
assert (
docs[0].page_content
== """def main():
print("Hello World!")
return 0"""
)
assert (
docs[1].page_content
== """#!/usr/bin/env python3
import sys
# Code for: def main():
if __name__ == "__main__":
sys.exit(main())"""
)
def test_language_loader_for_python_with_parser_threshold() -> None:
"""Test Python loader with parser enabled and below threshold."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path,
glob="hello_world.py",
parser=LanguageParser(language="python", parser_threshold=1000),
)
docs = loader.load()
assert len(docs) == 1
def esprima_installed() -> bool:
try:
import esprima # noqa: F401
return True
except Exception as e:
print(f"esprima not installed, skipping test {e}")
return False
@pytest.mark.skipif(not esprima_installed(), reason="requires esprima package")
def test_language_loader_for_javascript() -> None:
"""Test JavaScript loader with parser enabled."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path, glob="hello_world.js", parser=LanguageParser(parser_threshold=5)
)
docs = loader.load()
assert len(docs) == 3
metadata = docs[0].metadata
assert metadata["source"] == str(file_path / "hello_world.js")
assert metadata["content_type"] == "functions_classes"
assert metadata["language"] == "js"
metadata = docs[1].metadata
assert metadata["source"] == str(file_path / "hello_world.js")
assert metadata["content_type"] == "functions_classes"
assert metadata["language"] == "js"
metadata = docs[2].metadata
assert metadata["source"] == str(file_path / "hello_world.js")
assert metadata["content_type"] == "simplified_code"
assert metadata["language"] == "js"
assert (
docs[0].page_content
== """class HelloWorld {
sayHello() {
console.log("Hello World!");
}
}"""
)
assert (
docs[1].page_content
== """function main() {
const hello = new HelloWorld();
hello.sayHello();
}"""
)
assert (
docs[2].page_content
== """// Code for: class HelloWorld {
// Code for: function main() {
main();"""
)
def test_language_loader_for_javascript_with_parser_threshold() -> None:
"""Test JavaScript loader with parser enabled and below threshold."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path,
glob="hello_world.js",
parser=LanguageParser(language="js", parser_threshold=1000),
)
docs = loader.load()
assert len(docs) == 1
def test_concurrent_language_loader_for_javascript_with_parser_threshold() -> None:
"""Test JavaScript ConcurrentLoader with parser enabled and below threshold."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = ConcurrentLoader.from_filesystem(
file_path,
glob="hello_world.js",
parser=LanguageParser(language="js", parser_threshold=1000),
)
docs = loader.load()
assert len(docs) == 1
def test_concurrent_language_loader_for_python_with_parser_threshold() -> None:
"""Test Python ConcurrentLoader with parser enabled and below threshold."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = ConcurrentLoader.from_filesystem(
file_path,
glob="hello_world.py",
parser=LanguageParser(language="python", parser_threshold=1000),
)
docs = loader.load()
assert len(docs) == 1
@pytest.mark.skipif(not esprima_installed(), reason="requires esprima package")
def test_concurrent_language_loader_for_javascript() -> None:
"""Test JavaScript ConcurrentLoader with parser enabled."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = ConcurrentLoader.from_filesystem(
file_path, glob="hello_world.js", parser=LanguageParser(parser_threshold=5)
)
docs = loader.load()
assert len(docs) == 3
def test_concurrent_language_loader_for_python() -> None:
"""Test Python ConcurrentLoader with parser enabled."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = ConcurrentLoader.from_filesystem(
file_path, glob="hello_world.py", parser=LanguageParser(parser_threshold=5)
)
docs = loader.load()
assert len(docs) == 2

View File

@@ -0,0 +1,136 @@
"""Test Fireworks AI API Wrapper."""
from typing import Generator
import pytest
from langchain_core.outputs import LLMResult
from langchain_community.llms.fireworks import Fireworks
@pytest.fixture
def llm() -> Fireworks:
return Fireworks(model_kwargs={"temperature": 0, "max_tokens": 512})
@pytest.mark.scheduled
def test_fireworks_call(llm: Fireworks) -> None:
"""Test valid call to fireworks."""
output = llm("How is the weather in New York today?")
assert isinstance(output, str)
@pytest.mark.scheduled
def test_fireworks_model_param() -> None:
"""Tests model parameters for Fireworks"""
llm = Fireworks(model="foo")
assert llm.model == "foo"
@pytest.mark.scheduled
def test_fireworks_invoke(llm: Fireworks) -> None:
"""Tests completion with invoke"""
output = llm.invoke("How is the weather in New York today?", stop=[","])
assert isinstance(output, str)
assert output[-1] == ","
@pytest.mark.scheduled
async def test_fireworks_ainvoke(llm: Fireworks) -> None:
"""Tests completion with invoke"""
output = await llm.ainvoke("How is the weather in New York today?", stop=[","])
assert isinstance(output, str)
assert output[-1] == ","
@pytest.mark.scheduled
def test_fireworks_batch(llm: Fireworks) -> None:
"""Tests completion with invoke"""
llm = Fireworks()
output = llm.batch(
[
"How is the weather in New York today?",
"How is the weather in New York today?",
],
stop=[","],
)
for token in output:
assert isinstance(token, str)
assert token[-1] == ","
@pytest.mark.scheduled
async def test_fireworks_abatch(llm: Fireworks) -> None:
"""Tests completion with invoke"""
output = await llm.abatch(
[
"How is the weather in New York today?",
"How is the weather in New York today?",
],
stop=[","],
)
for token in output:
assert isinstance(token, str)
assert token[-1] == ","
@pytest.mark.scheduled
def test_fireworks_multiple_prompts(
llm: Fireworks,
) -> None:
"""Test completion with multiple prompts."""
output = llm.generate(["How is the weather in New York today?", "I'm pickle rick"])
assert isinstance(output, LLMResult)
assert isinstance(output.generations, list)
assert len(output.generations) == 2
@pytest.mark.scheduled
def test_fireworks_streaming(llm: Fireworks) -> None:
"""Test stream completion."""
generator = llm.stream("Who's the best quarterback in the NFL?")
assert isinstance(generator, Generator)
for token in generator:
assert isinstance(token, str)
@pytest.mark.scheduled
def test_fireworks_streaming_stop_words(llm: Fireworks) -> None:
"""Test stream completion with stop words."""
generator = llm.stream("Who's the best quarterback in the NFL?", stop=[","])
assert isinstance(generator, Generator)
last_token = ""
for token in generator:
last_token = token
assert isinstance(token, str)
assert last_token[-1] == ","
@pytest.mark.scheduled
async def test_fireworks_streaming_async(llm: Fireworks) -> None:
"""Test stream completion."""
last_token = ""
async for token in llm.astream(
"Who's the best quarterback in the NFL?", stop=[","]
):
last_token = token
assert isinstance(token, str)
assert last_token[-1] == ","
@pytest.mark.scheduled
async def test_fireworks_async_agenerate(llm: Fireworks) -> None:
"""Test async."""
output = await llm.agenerate(["What is the best city to live in California?"])
assert isinstance(output, LLMResult)
@pytest.mark.scheduled
async def test_fireworks_multiple_prompts_async_agenerate(llm: Fireworks) -> None:
output = await llm.agenerate(
["How is the weather in New York today?", "I'm pickle rick"]
)
assert isinstance(output, LLMResult)
assert isinstance(output.generations, list)
assert len(output.generations) == 2

View File

@@ -0,0 +1,77 @@
import langchain_community.utilities.opaqueprompts as op
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableParallel
from langchain_community.llms import OpenAI
from langchain_community.llms.opaqueprompts import OpaquePrompts
prompt_template = """
As an AI assistant, you will answer questions according to given context.
Sensitive personal information in the question is masked for privacy.
For instance, if the original text says "Giana is good," it will be changed
to "PERSON_998 is good."
Here's how to handle these changes:
* Consider these masked phrases just as placeholders, but still refer to
them in a relevant way when answering.
* It's possible that different masked terms might mean the same thing.
Stick with the given term and don't modify it.
* All masked terms follow the "TYPE_ID" pattern.
* Please don't invent new masked terms. For instance, if you see "PERSON_998,"
don't come up with "PERSON_997" or "PERSON_999" unless they're already in the question.
Conversation History: ```{history}```
Context : ```During our recent meeting on February 23, 2023, at 10:30 AM,
John Doe provided me with his personal details. His email is johndoe@example.com
and his contact number is 650-456-7890. He lives in New York City, USA, and
belongs to the American nationality with Christian beliefs and a leaning towards
the Democratic party. He mentioned that he recently made a transaction using his
credit card 4111 1111 1111 1111 and transferred bitcoins to the wallet address
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. While discussing his European travels, he
noted down his IBAN as GB29 NWBK 6016 1331 9268 19. Additionally, he provided
his website as https://johndoeportfolio.com. John also discussed
some of his US-specific details. He said his bank account number is
1234567890123456 and his drivers license is Y12345678. His ITIN is 987-65-4321,
and he recently renewed his passport,
the number for which is 123456789. He emphasized not to share his SSN, which is
669-45-6789. Furthermore, he mentioned that he accesses his work files remotely
through the IP 192.168.1.1 and has a medical license number MED-123456. ```
Question: ```{question}```
"""
def test_opaqueprompts() -> None:
chain = PromptTemplate.from_template(prompt_template) | OpaquePrompts(llm=OpenAI())
output = chain.invoke(
{
"question": "Write a text message to remind John to do password reset \
for his website through his email to stay secure."
}
)
assert isinstance(output, str)
def test_opaqueprompts_functions() -> None:
prompt = (PromptTemplate.from_template(prompt_template),)
llm = OpenAI()
pg_chain = (
op.sanitize
| RunnableParallel(
secure_context=lambda x: x["secure_context"], # type: ignore
response=(lambda x: x["sanitized_input"]) # type: ignore
| prompt
| llm
| StrOutputParser(),
)
| (lambda x: op.desanitize(x["response"], x["secure_context"]))
)
pg_chain.invoke(
{
"question": "Write a text message to remind John to do password reset\
for his website through his email to stay secure.",
"history": "",
}
)

View File

@@ -0,0 +1,42 @@
"""Test Nebula API wrapper."""
from langchain_community.llms.symblai_nebula import Nebula
def test_symblai_nebula_call() -> None:
"""Test valid call to Nebula."""
conversation = """Sam: Good morning, team! Let's keep this standup concise.
We'll go in the usual order: what you did yesterday,
what you plan to do today, and any blockers. Alex, kick us off.
Alex: Morning! Yesterday, I wrapped up the UI for the user dashboard.
The new charts and widgets are now responsive.
I also had a sync with the design team to ensure the final touchups are in
line with the brand guidelines. Today, I'll start integrating the frontend with
the new API endpoints Rhea was working on.
The only blocker is waiting for some final API documentation,
but I guess Rhea can update on that.
Rhea: Hey, all! Yep, about the API documentation - I completed the majority of
the backend work for user data retrieval yesterday.
The endpoints are mostly set up, but I need to do a bit more testing today.
I'll finalize the API documentation by noon, so that should unblock Alex.
After that, Ill be working on optimizing the database queries
for faster data fetching. No other blockers on my end.
Sam: Great, thanks Rhea. Do reach out if you need any testing assistance
or if there are any hitches with the database.
Now, my update: Yesterday, I coordinated with the client to get clarity
on some feature requirements. Today, I'll be updating our project roadmap
and timelines based on their feedback. Additionally, I'll be sitting with
the QA team in the afternoon for preliminary testing.
Blocker: I might need both of you to be available for a quick call
in case the client wants to discuss the changes live.
Alex: Sounds good, Sam. Just let us know a little in advance for the call.
Rhea: Agreed. We can make time for that.
Sam: Perfect! Let's keep the momentum going. Reach out if there are any
sudden issues or support needed. Have a productive day!
Alex: You too.
Rhea: Thanks, bye!"""
llm = Nebula(nebula_api_key="<your_api_key>")
instruction = """Identify the main objectives mentioned in this
conversation."""
output = llm.invoke(f"{instruction}\n{conversation}")
assert isinstance(output, str)

View File

@@ -0,0 +1,151 @@
"""Test Vertex AI API wrapper.
In order to run this test, you need to install VertexAI SDK:
pip install google-cloud-aiplatform>=1.36.0
Your end-user credentials would be used to make the calls (make sure you've run
`gcloud auth login` first).
"""
import os
from typing import Optional
import pytest
from langchain_core.outputs import LLMResult
from langchain_community.llms import VertexAI, VertexAIModelGarden
def test_vertex_initialization() -> None:
llm = VertexAI()
assert llm._llm_type == "vertexai"
assert llm.model_name == llm.client._model_id
def test_vertex_call() -> None:
llm = VertexAI(temperature=0)
output = llm("Say foo:")
assert isinstance(output, str)
@pytest.mark.scheduled
def test_vertex_generate() -> None:
llm = VertexAI(temperature=0.3, n=2, model_name="text-bison@001")
output = llm.generate(["Say foo:"])
assert isinstance(output, LLMResult)
assert len(output.generations) == 1
assert len(output.generations[0]) == 2
@pytest.mark.scheduled
def test_vertex_generate_code() -> None:
llm = VertexAI(temperature=0.3, n=2, model_name="code-bison@001")
output = llm.generate(["generate a python method that says foo:"])
assert isinstance(output, LLMResult)
assert len(output.generations) == 1
assert len(output.generations[0]) == 2
@pytest.mark.scheduled
async def test_vertex_agenerate() -> None:
llm = VertexAI(temperature=0)
output = await llm.agenerate(["Please say foo:"])
assert isinstance(output, LLMResult)
@pytest.mark.scheduled
def test_vertex_stream() -> None:
llm = VertexAI(temperature=0)
outputs = list(llm.stream("Please say foo:"))
assert isinstance(outputs[0], str)
async def test_vertex_consistency() -> None:
llm = VertexAI(temperature=0)
output = llm.generate(["Please say foo:"])
streaming_output = llm.generate(["Please say foo:"], stream=True)
async_output = await llm.agenerate(["Please say foo:"])
assert output.generations[0][0].text == streaming_output.generations[0][0].text
assert output.generations[0][0].text == async_output.generations[0][0].text
@pytest.mark.parametrize(
"endpoint_os_variable_name,result_arg",
[("FALCON_ENDPOINT_ID", "generated_text"), ("LLAMA_ENDPOINT_ID", None)],
)
def test_model_garden(
endpoint_os_variable_name: str, result_arg: Optional[str]
) -> None:
"""In order to run this test, you should provide endpoint names.
Example:
export FALCON_ENDPOINT_ID=...
export LLAMA_ENDPOINT_ID=...
export PROJECT=...
"""
endpoint_id = os.environ[endpoint_os_variable_name]
project = os.environ["PROJECT"]
location = "europe-west4"
llm = VertexAIModelGarden(
endpoint_id=endpoint_id,
project=project,
result_arg=result_arg,
location=location,
)
output = llm("What is the meaning of life?")
assert isinstance(output, str)
assert llm._llm_type == "vertexai_model_garden"
@pytest.mark.parametrize(
"endpoint_os_variable_name,result_arg",
[("FALCON_ENDPOINT_ID", "generated_text"), ("LLAMA_ENDPOINT_ID", None)],
)
def test_model_garden_generate(
endpoint_os_variable_name: str, result_arg: Optional[str]
) -> None:
"""In order to run this test, you should provide endpoint names.
Example:
export FALCON_ENDPOINT_ID=...
export LLAMA_ENDPOINT_ID=...
export PROJECT=...
"""
endpoint_id = os.environ[endpoint_os_variable_name]
project = os.environ["PROJECT"]
location = "europe-west4"
llm = VertexAIModelGarden(
endpoint_id=endpoint_id,
project=project,
result_arg=result_arg,
location=location,
)
output = llm.generate(["What is the meaning of life?", "How much is 2+2"])
assert isinstance(output, LLMResult)
assert len(output.generations) == 2
@pytest.mark.asyncio
@pytest.mark.parametrize(
"endpoint_os_variable_name,result_arg",
[("FALCON_ENDPOINT_ID", "generated_text"), ("LLAMA_ENDPOINT_ID", None)],
)
async def test_model_garden_agenerate(
endpoint_os_variable_name: str, result_arg: Optional[str]
) -> None:
endpoint_id = os.environ[endpoint_os_variable_name]
project = os.environ["PROJECT"]
location = "europe-west4"
llm = VertexAIModelGarden(
endpoint_id=endpoint_id,
project=project,
result_arg=result_arg,
location=location,
)
output = await llm.agenerate(["What is the meaning of life?", "How much is 2+2"])
assert isinstance(output, LLMResult)
assert len(output.generations) == 2
def test_vertex_call_count_tokens() -> None:
llm = VertexAI()
output = llm.get_num_tokens("How are you?")
assert output == 4

View File

@@ -0,0 +1,171 @@
"""Integration test for Arxiv API Wrapper."""
from typing import Any, List
import pytest
from langchain_core.documents import Document
from langchain_core.tools import BaseTool
from langchain_community.tools import ArxivQueryRun
from langchain_community.utilities import ArxivAPIWrapper
@pytest.fixture
def api_client() -> ArxivAPIWrapper:
return ArxivAPIWrapper()
def test_run_success_paper_name(api_client: ArxivAPIWrapper) -> None:
"""Test a query of paper name that returns the correct answer"""
output = api_client.run("Heat-bath random walks with Markov bases")
assert "Probability distributions for Markov chains based quantum walks" in output
assert (
"Transformations of random walks on groups via Markov stopping times" in output
)
assert (
"Recurrence of Multidimensional Persistent Random Walks. Fourier and Series "
"Criteria" in output
)
def test_run_success_arxiv_identifier(api_client: ArxivAPIWrapper) -> None:
"""Test a query of an arxiv identifier returns the correct answer"""
output = api_client.run("1605.08386v1")
assert "Heat-bath random walks with Markov bases" in output
def test_run_success_multiple_arxiv_identifiers(api_client: ArxivAPIWrapper) -> None:
"""Test a query of multiple arxiv identifiers that returns the correct answer"""
output = api_client.run("1605.08386v1 2212.00794v2 2308.07912")
assert "Heat-bath random walks with Markov bases" in output
assert "Scaling Language-Image Pre-training via Masking" in output
assert (
"Ultra-low mass PBHs in the early universe can explain the PTA signal" in output
)
def test_run_returns_several_docs(api_client: ArxivAPIWrapper) -> None:
"""Test that returns several docs"""
output = api_client.run("Caprice Stanley")
assert "On Mixing Behavior of a Family of Random Walks" in output
def test_run_returns_no_result(api_client: ArxivAPIWrapper) -> None:
"""Test that gives no result."""
output = api_client.run("1605.08386WWW")
assert "No good Arxiv Result was found" == output
def assert_docs(docs: List[Document]) -> None:
for doc in docs:
assert doc.page_content
assert doc.metadata
assert set(doc.metadata) == {"Published", "Title", "Authors", "Summary"}
def test_load_success_paper_name(api_client: ArxivAPIWrapper) -> None:
"""Test a query of paper name that returns one document"""
docs = api_client.load("Heat-bath random walks with Markov bases")
assert len(docs) == 3
assert_docs(docs)
def test_load_success_arxiv_identifier(api_client: ArxivAPIWrapper) -> None:
"""Test a query of an arxiv identifier that returns one document"""
docs = api_client.load("1605.08386v1")
assert len(docs) == 1
assert_docs(docs)
def test_load_success_multiple_arxiv_identifiers(api_client: ArxivAPIWrapper) -> None:
"""Test a query of arxiv identifiers that returns the correct answer"""
docs = api_client.load("1605.08386v1 2212.00794v2 2308.07912")
assert len(docs) == 3
assert_docs(docs)
def test_load_returns_no_result(api_client: ArxivAPIWrapper) -> None:
"""Test that returns no docs"""
docs = api_client.load("1605.08386WWW")
assert len(docs) == 0
def test_load_returns_limited_docs() -> None:
"""Test that returns several docs"""
expected_docs = 2
api_client = ArxivAPIWrapper(load_max_docs=expected_docs)
docs = api_client.load("ChatGPT")
assert len(docs) == expected_docs
assert_docs(docs)
def test_load_returns_limited_doc_content_chars() -> None:
"""Test that returns limited doc_content_chars_max"""
doc_content_chars_max = 100
api_client = ArxivAPIWrapper(doc_content_chars_max=doc_content_chars_max)
docs = api_client.load("1605.08386")
assert len(docs[0].page_content) == doc_content_chars_max
def test_load_returns_unlimited_doc_content_chars() -> None:
"""Test that returns unlimited doc_content_chars_max"""
doc_content_chars_max = None
api_client = ArxivAPIWrapper(doc_content_chars_max=doc_content_chars_max)
docs = api_client.load("1605.08386")
assert len(docs[0].page_content) == pytest.approx(54338, rel=1e-2)
def test_load_returns_full_set_of_metadata() -> None:
"""Test that returns several docs"""
api_client = ArxivAPIWrapper(load_max_docs=1, load_all_available_meta=True)
docs = api_client.load("ChatGPT")
assert len(docs) == 1
for doc in docs:
assert doc.page_content
assert doc.metadata
assert set(doc.metadata).issuperset(
{"Published", "Title", "Authors", "Summary"}
)
print(doc.metadata)
assert len(set(doc.metadata)) > 4
def _load_arxiv_from_universal_entry(**kwargs: Any) -> BaseTool:
from langchain.agents.load_tools import load_tools
tools = load_tools(["arxiv"], **kwargs)
assert len(tools) == 1, "loaded more than 1 tool"
return tools[0]
def test_load_arxiv_from_universal_entry() -> None:
arxiv_tool = _load_arxiv_from_universal_entry()
output = arxiv_tool("Caprice Stanley")
assert (
"On Mixing Behavior of a Family of Random Walks" in output
), "failed to fetch a valid result"
def test_load_arxiv_from_universal_entry_with_params() -> None:
params = {
"top_k_results": 1,
"load_max_docs": 10,
"load_all_available_meta": True,
}
arxiv_tool = _load_arxiv_from_universal_entry(**params)
assert isinstance(arxiv_tool, ArxivQueryRun)
wp = arxiv_tool.api_wrapper
assert wp.top_k_results == 1, "failed to assert top_k_results"
assert wp.load_max_docs == 10, "failed to assert load_max_docs"
assert (
wp.load_all_available_meta is True
), "failed to assert load_all_available_meta"

View File

@@ -0,0 +1,164 @@
"""Integration test for PubMed API Wrapper."""
from typing import Any, List
import pytest
from langchain_core.documents import Document
from langchain_core.tools import BaseTool
from langchain_community.tools import PubmedQueryRun
from langchain_community.utilities import PubMedAPIWrapper
xmltodict = pytest.importorskip("xmltodict")
@pytest.fixture
def api_client() -> PubMedAPIWrapper:
return PubMedAPIWrapper()
def test_run_success(api_client: PubMedAPIWrapper) -> None:
"""Test that returns the correct answer"""
search_string = (
"Examining the Validity of ChatGPT in Identifying "
"Relevant Nephrology Literature"
)
output = api_client.run(search_string)
test_string = (
"Examining the Validity of ChatGPT in Identifying "
"Relevant Nephrology Literature: Findings and Implications"
)
assert test_string in output
assert len(output) == api_client.doc_content_chars_max
def test_run_returns_no_result(api_client: PubMedAPIWrapper) -> None:
"""Test that gives no result."""
output = api_client.run("1605.08386WWW")
assert "No good PubMed Result was found" == output
def test_retrieve_article_returns_book_abstract(api_client: PubMedAPIWrapper) -> None:
"""Test that returns the excerpt of a book."""
output_nolabel = api_client.retrieve_article("25905357", "")
output_withlabel = api_client.retrieve_article("29262144", "")
test_string_nolabel = (
"Osteoporosis is a multifactorial disorder associated with low bone mass and "
"enhanced skeletal fragility. Although"
)
assert test_string_nolabel in output_nolabel["Summary"]
assert (
"Wallenberg syndrome was first described in 1808 by Gaspard Vieusseux. However,"
in output_withlabel["Summary"]
)
def test_retrieve_article_returns_article_abstract(
api_client: PubMedAPIWrapper,
) -> None:
"""Test that returns the abstract of an article."""
output_nolabel = api_client.retrieve_article("37666905", "")
output_withlabel = api_client.retrieve_article("37666551", "")
test_string_nolabel = (
"This work aims to: (1) Provide maximal hand force data on six different "
"grasp types for healthy subjects; (2) detect grasp types with maximal "
"force significantly affected by hand osteoarthritis (HOA) in women; (3) "
"look for predictors to detect HOA from the maximal forces using discriminant "
"analyses."
)
assert test_string_nolabel in output_nolabel["Summary"]
test_string_withlabel = (
"OBJECTIVES: To assess across seven hospitals from six different countries "
"the extent to which the COVID-19 pandemic affected the volumes of orthopaedic "
"hospital admissions and patient outcomes for non-COVID-19 patients admitted "
"for orthopaedic care."
)
assert test_string_withlabel in output_withlabel["Summary"]
def test_retrieve_article_no_abstract_available(api_client: PubMedAPIWrapper) -> None:
"""Test that returns 'No abstract available'."""
output = api_client.retrieve_article("10766884", "")
assert "No abstract available" == output["Summary"]
def assert_docs(docs: List[Document]) -> None:
for doc in docs:
assert doc.metadata
assert set(doc.metadata) == {
"Copyright Information",
"uid",
"Title",
"Published",
}
def test_load_success(api_client: PubMedAPIWrapper) -> None:
"""Test that returns one document"""
docs = api_client.load_docs("chatgpt")
assert len(docs) == api_client.top_k_results == 3
assert_docs(docs)
def test_load_returns_no_result(api_client: PubMedAPIWrapper) -> None:
"""Test that returns no docs"""
docs = api_client.load_docs("1605.08386WWW")
assert len(docs) == 0
def test_load_returns_limited_docs() -> None:
"""Test that returns several docs"""
expected_docs = 2
api_client = PubMedAPIWrapper(top_k_results=expected_docs)
docs = api_client.load_docs("ChatGPT")
assert len(docs) == expected_docs
assert_docs(docs)
def test_load_returns_full_set_of_metadata() -> None:
"""Test that returns several docs"""
api_client = PubMedAPIWrapper(load_max_docs=1, load_all_available_meta=True)
docs = api_client.load_docs("ChatGPT")
assert len(docs) == 3
for doc in docs:
assert doc.metadata
assert set(doc.metadata).issuperset(
{"Copyright Information", "Published", "Title", "uid"}
)
def _load_pubmed_from_universal_entry(**kwargs: Any) -> BaseTool:
from langchain.agents.load_tools import load_tools
tools = load_tools(["pubmed"], **kwargs)
assert len(tools) == 1, "loaded more than 1 tool"
return tools[0]
def test_load_pupmed_from_universal_entry() -> None:
pubmed_tool = _load_pubmed_from_universal_entry()
search_string = (
"Examining the Validity of ChatGPT in Identifying "
"Relevant Nephrology Literature"
)
output = pubmed_tool(search_string)
test_string = (
"Examining the Validity of ChatGPT in Identifying "
"Relevant Nephrology Literature: Findings and Implications"
)
assert test_string in output
def test_load_pupmed_from_universal_entry_with_params() -> None:
params = {
"top_k_results": 1,
}
pubmed_tool = _load_pubmed_from_universal_entry(**params)
assert isinstance(pubmed_tool, PubmedQueryRun)
wp = pubmed_tool.api_wrapper
assert wp.top_k_results == 1, "failed to assert top_k_results"

View File

@@ -0,0 +1,44 @@
import os
from typing import Union
import pytest
from vcr.request import Request
# Those environment variables turn on Deep Lake pytest mode.
# It significantly makes tests run much faster.
# Need to run before `import deeplake`
os.environ["BUGGER_OFF"] = "true"
os.environ["DEEPLAKE_DOWNLOAD_PATH"] = "./testing/local_storage"
os.environ["DEEPLAKE_PYTEST_ENABLED"] = "true"
# This fixture returns a dictionary containing filter_headers options
# for replacing certain headers with dummy values during cassette playback
# Specifically, it replaces the authorization header with a dummy value to
# prevent sensitive data from being recorded in the cassette.
# It also filters request to certain hosts (specified in the `ignored_hosts` list)
# to prevent data from being recorded in the cassette.
@pytest.fixture(scope="module")
def vcr_config() -> dict:
skipped_host = ["pinecone.io"]
def before_record_response(response: dict) -> Union[dict, None]:
return response
def before_record_request(request: Request) -> Union[Request, None]:
for host in skipped_host:
if request.host.startswith(host) or request.host.endswith(host):
return None
return request
return {
"before_record_request": before_record_request,
"before_record_response": before_record_response,
"filter_headers": [
("authorization", "authorization-DUMMY"),
("X-OpenAI-Client-User-Agent", "X-OpenAI-Client-User-Agent-DUMMY"),
("Api-Key", "Api-Key-DUMMY"),
("User-Agent", "User-Agent-DUMMY"),
],
"ignore_localhost": True,
}

View File

@@ -0,0 +1,85 @@
"""Test CallbackManager."""
from unittest.mock import patch
import pytest
from langchain_community.callbacks import get_openai_callback
from langchain_core.callbacks.manager import trace_as_chain_group, CallbackManager
from langchain_core.outputs import LLMResult
from langchain_core.tracers.langchain import LangChainTracer, wait_for_all_tracers
from langchain_community.llms.openai import BaseOpenAI
def test_callback_manager_configure_context_vars(
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""Test callback manager configuration."""
monkeypatch.setenv("LANGCHAIN_TRACING_V2", "true")
monkeypatch.setenv("LANGCHAIN_TRACING", "false")
with patch.object(LangChainTracer, "_update_run_single"):
with patch.object(LangChainTracer, "_persist_run_single"):
with trace_as_chain_group("test") as group_manager:
assert len(group_manager.handlers) == 1
tracer = group_manager.handlers[0]
assert isinstance(tracer, LangChainTracer)
with get_openai_callback() as cb:
# This is a new empty callback handler
assert cb.successful_requests == 0
assert cb.total_tokens == 0
# configure adds this openai cb but doesn't modify the group manager
mngr = CallbackManager.configure(group_manager)
assert mngr.handlers == [tracer, cb]
assert group_manager.handlers == [tracer]
response = LLMResult(
generations=[],
llm_output={
"token_usage": {
"prompt_tokens": 2,
"completion_tokens": 1,
"total_tokens": 3,
},
"model_name": BaseOpenAI.__fields__["model_name"].default,
},
)
mngr.on_llm_start({}, ["prompt"])[0].on_llm_end(response)
# The callback handler has been updated
assert cb.successful_requests == 1
assert cb.total_tokens == 3
assert cb.prompt_tokens == 2
assert cb.completion_tokens == 1
assert cb.total_cost > 0
with get_openai_callback() as cb:
# This is a new empty callback handler
assert cb.successful_requests == 0
assert cb.total_tokens == 0
# configure adds this openai cb but doesn't modify the group manager
mngr = CallbackManager.configure(group_manager)
assert mngr.handlers == [tracer, cb]
assert group_manager.handlers == [tracer]
response = LLMResult(
generations=[],
llm_output={
"token_usage": {
"prompt_tokens": 2,
"completion_tokens": 1,
"total_tokens": 3,
},
"model_name": BaseOpenAI.__fields__["model_name"].default,
},
)
mngr.on_llm_start({}, ["prompt"])[0].on_llm_end(response)
# The callback handler has been updated
assert cb.successful_requests == 1
assert cb.total_tokens == 3
assert cb.prompt_tokens == 2
assert cb.completion_tokens == 1
assert cb.total_cost > 0
wait_for_all_tracers()
assert LangChainTracer._persist_run_single.call_count == 1 # type: ignore

View File

@@ -0,0 +1,31 @@
from langchain_community.callbacks import __all__
EXPECTED_ALL = [
"AimCallbackHandler",
"ArgillaCallbackHandler",
"ArizeCallbackHandler",
"PromptLayerCallbackHandler",
"ArthurCallbackHandler",
"ClearMLCallbackHandler",
"CometCallbackHandler",
"ContextCallbackHandler",
"HumanApprovalCallbackHandler",
"InfinoCallbackHandler",
"MlflowCallbackHandler",
"LLMonitorCallbackHandler",
"OpenAICallbackHandler",
"LLMThoughtLabeler",
"StreamlitCallbackHandler",
"WandbCallbackHandler",
"WhyLabsCallbackHandler",
"get_openai_callback",
"wandb_tracing_enabled",
"FlyteCallbackHandler",
"SageMakerCallbackHandler",
"LabelStudioCallbackHandler",
"TrubricsCallbackHandler",
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

View File

@@ -0,0 +1,23 @@
import pathlib
from langchain_community.chat_loaders import slack, utils
def test_slack_chat_loader() -> None:
chat_path = (
pathlib.Path(__file__).parents[2]
/ "examples"
/ "slack_export.zip"
)
loader = slack.SlackChatLoader(str(chat_path))
chat_sessions = list(
utils.map_ai_messages(loader.lazy_load(), sender="U0500003428")
)
assert chat_sessions, "Chat sessions should not be empty"
assert chat_sessions[1]["messages"], "Chat messages should not be empty"
assert (
"Example message" in chat_sessions[1]["messages"][0].content
), "Chat content mismatch"

View File

@@ -0,0 +1,54 @@
"""Test Anthropic Chat API wrapper."""
from typing import List
from unittest.mock import MagicMock
import pytest
from langchain_core.messages import (
AIMessage,
BaseMessage,
HumanMessage,
SystemMessage,
)
from langchain_community.chat_models import BedrockChat
from langchain_community.chat_models.meta import convert_messages_to_prompt_llama
@pytest.mark.parametrize(
("messages", "expected"),
[
([HumanMessage(content="Hello")], "[INST] Hello [/INST]"),
(
[HumanMessage(content="Hello"), AIMessage(content="Answer:")],
"[INST] Hello [/INST]\nAnswer:",
),
(
[
SystemMessage(content="You're an assistant"),
HumanMessage(content="Hello"),
AIMessage(content="Answer:"),
],
"<<SYS>> You're an assistant <</SYS>>\n[INST] Hello [/INST]\nAnswer:",
),
],
)
def test_formatting(messages: List[BaseMessage], expected: str) -> None:
result = convert_messages_to_prompt_llama(messages)
assert result == expected
def test_anthropic_bedrock() -> None:
client = MagicMock()
respbody = MagicMock(
read=MagicMock(
return_value=MagicMock(
decode=MagicMock(return_value=b'{"completion":"Hi back"}')
)
)
)
client.invoke_model.return_value = {"body": respbody}
model = BedrockChat(model_id="anthropic.claude-v2", client=client)
# should not throw an error
model.invoke("hello there")

View File

@@ -0,0 +1,96 @@
"""Tests for the various PDF parsers."""
from pathlib import Path
from typing import Iterator
import pytest
from langchain_community.document_loaders.base import BaseBlobParser
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers.pdf import (
PDFMinerParser,
PyMuPDFParser,
PyPDFium2Parser,
PyPDFParser,
)
_THIS_DIR = Path(__file__).parents[3]
_EXAMPLES_DIR = _THIS_DIR / "examples"
# Paths to test PDF files
HELLO_PDF = _EXAMPLES_DIR / "hello.pdf"
LAYOUT_PARSER_PAPER_PDF = _EXAMPLES_DIR / "layout-parser-paper.pdf"
def _assert_with_parser(parser: BaseBlobParser, splits_by_page: bool = True) -> None:
"""Standard tests to verify that the given parser works.
Args:
parser (BaseBlobParser): The parser to test.
splits_by_page (bool): Whether the parser splits by page or not by default.
"""
blob = Blob.from_path(HELLO_PDF)
doc_generator = parser.lazy_parse(blob)
assert isinstance(doc_generator, Iterator)
docs = list(doc_generator)
assert len(docs) == 1
page_content = docs[0].page_content
assert isinstance(page_content, str)
# The different parsers return different amount of whitespace, so using
# startswith instead of equals.
assert docs[0].page_content.startswith("Hello world!")
blob = Blob.from_path(LAYOUT_PARSER_PAPER_PDF)
doc_generator = parser.lazy_parse(blob)
assert isinstance(doc_generator, Iterator)
docs = list(doc_generator)
if splits_by_page:
assert len(docs) == 16
else:
assert len(docs) == 1
# Test is imprecise since the parsers yield different parse information depending
# on configuration. Each parser seems to yield a slightly different result
# for this page!
assert "LayoutParser" in docs[0].page_content
metadata = docs[0].metadata
assert metadata["source"] == str(LAYOUT_PARSER_PAPER_PDF)
if splits_by_page:
assert int(metadata["page"]) == 0
@pytest.mark.requires("pypdf")
def test_pypdf_parser() -> None:
"""Test PyPDF parser."""
_assert_with_parser(PyPDFParser())
@pytest.mark.requires("pdfminer")
def test_pdfminer_parser() -> None:
"""Test PDFMiner parser."""
# Does not follow defaults to split by page.
_assert_with_parser(PDFMinerParser(), splits_by_page=False)
@pytest.mark.requires("fitz") # package is PyMuPDF
def test_pymupdf_loader() -> None:
"""Test PyMuPDF loader."""
_assert_with_parser(PyMuPDFParser())
@pytest.mark.requires("pypdfium2")
def test_pypdfium2_parser() -> None:
"""Test PyPDFium2 parser."""
# Does not follow defaults to split by page.
_assert_with_parser(PyPDFium2Parser())
@pytest.mark.requires("rapidocr_onnxruntime")
def test_extract_images_text_from_pdf() -> None:
"""Test extract image from pdf and recognize text with rapid ocr"""
_assert_with_parser(PyPDFParser(extract_images=True))
_assert_with_parser(PDFMinerParser(extract_images=True))
_assert_with_parser(PyMuPDFParser(extract_images=True))
_assert_with_parser(PyPDFium2Parser(extract_images=True))

View File

@@ -0,0 +1,60 @@
from langchain_community.embeddings import __all__
EXPECTED_ALL = [
"OpenAIEmbeddings",
"AzureOpenAIEmbeddings",
"ClarifaiEmbeddings",
"CohereEmbeddings",
"DatabricksEmbeddings",
"ElasticsearchEmbeddings",
"FastEmbedEmbeddings",
"HuggingFaceEmbeddings",
"HuggingFaceInferenceAPIEmbeddings",
"InfinityEmbeddings",
"GradientEmbeddings",
"JinaEmbeddings",
"LlamaCppEmbeddings",
"HuggingFaceHubEmbeddings",
"MlflowAIGatewayEmbeddings",
"MlflowEmbeddings",
"ModelScopeEmbeddings",
"TensorflowHubEmbeddings",
"SagemakerEndpointEmbeddings",
"HuggingFaceInstructEmbeddings",
"MosaicMLInstructorEmbeddings",
"SelfHostedEmbeddings",
"SelfHostedHuggingFaceEmbeddings",
"SelfHostedHuggingFaceInstructEmbeddings",
"FakeEmbeddings",
"DeterministicFakeEmbedding",
"AlephAlphaAsymmetricSemanticEmbedding",
"AlephAlphaSymmetricSemanticEmbedding",
"SentenceTransformerEmbeddings",
"GooglePalmEmbeddings",
"MiniMaxEmbeddings",
"VertexAIEmbeddings",
"BedrockEmbeddings",
"DeepInfraEmbeddings",
"EdenAiEmbeddings",
"DashScopeEmbeddings",
"EmbaasEmbeddings",
"OctoAIEmbeddings",
"SpacyEmbeddings",
"NLPCloudEmbeddings",
"GPT4AllEmbeddings",
"XinferenceEmbeddings",
"LocalAIEmbeddings",
"AwaEmbeddings",
"HuggingFaceBgeEmbeddings",
"ErnieEmbeddings",
"JavelinAIGatewayEmbeddings",
"OllamaEmbeddings",
"QianfanEmbeddingsEndpoint",
"JohnSnowLabsEmbeddings",
"VoyageEmbeddings",
"BookendEmbeddings",
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

View File

@@ -0,0 +1,56 @@
import os
import pytest
from langchain_community.llms.openai import OpenAI
from langchain_community.utils.openai import is_openai_v1
os.environ["OPENAI_API_KEY"] = "foo"
def _openai_v1_installed() -> bool:
try:
return is_openai_v1()
except Exception as _:
return False
@pytest.mark.requires("openai")
def test_openai_model_param() -> None:
llm = OpenAI(model="foo")
assert llm.model_name == "foo"
llm = OpenAI(model_name="foo")
assert llm.model_name == "foo"
@pytest.mark.requires("openai")
def test_openai_model_kwargs() -> None:
llm = OpenAI(model_kwargs={"foo": "bar"})
assert llm.model_kwargs == {"foo": "bar"}
@pytest.mark.requires("openai")
def test_openai_invalid_model_kwargs() -> None:
with pytest.raises(ValueError):
OpenAI(model_kwargs={"model_name": "foo"})
@pytest.mark.requires("openai")
def test_openai_incorrect_field() -> None:
with pytest.warns(match="not default parameter"):
llm = OpenAI(foo="bar")
assert llm.model_kwargs == {"foo": "bar"}
@pytest.fixture
def mock_completion() -> dict:
return {
"id": "cmpl-3evkmQda5Hu7fcZavknQda3SQ",
"object": "text_completion",
"created": 1689989000,
"model": "text-davinci-003",
"choices": [
{"text": "Bar Baz", "index": 0, "logprobs": None, "finish_reason": "length"}
],
"usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3},
}

View File

@@ -0,0 +1,42 @@
from langchain_community.retrievers import __all__
EXPECTED_ALL = [
"AmazonKendraRetriever",
"AmazonKnowledgeBasesRetriever",
"ArceeRetriever",
"ArxivRetriever",
"AzureCognitiveSearchRetriever",
"ChatGPTPluginRetriever",
"ChaindeskRetriever",
"CohereRagRetriever",
"ElasticSearchBM25Retriever",
"EmbedchainRetriever",
"GoogleDocumentAIWarehouseRetriever",
"GoogleCloudEnterpriseSearchRetriever",
"GoogleVertexAIMultiTurnSearchRetriever",
"GoogleVertexAISearchRetriever",
"KayAiRetriever",
"KNNRetriever",
"LlamaIndexGraphRetriever",
"LlamaIndexRetriever",
"MetalRetriever",
"MilvusRetriever",
"OutlineRetriever",
"PineconeHybridSearchRetriever",
"PubMedRetriever",
"RemoteLangChainRetriever",
"SVMRetriever",
"TavilySearchAPIRetriever",
"TFIDFRetriever",
"BM25Retriever",
"VespaRetriever",
"WeaviateHybridSearchRetriever",
"WikipediaRetriever",
"ZepRetriever",
"ZillizRetriever",
"DocArrayRetriever",
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

View File

@@ -0,0 +1,11 @@
from langchain_community.storage import __all__
EXPECTED_ALL = [
"RedisStore",
"UpstashRedisByteStore",
"UpstashRedisStore",
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

View File

@@ -0,0 +1,40 @@
from typing import List, Type
from langchain_core.tools import BaseTool, StructuredTool
import langchain_community.tools
from langchain_community.tools import _DEPRECATED_TOOLS
from langchain_community.tools import __all__ as tools_all
_EXCLUDE = {
BaseTool,
StructuredTool,
}
def _get_tool_classes(skip_tools_without_default_names: bool) -> List[Type[BaseTool]]:
results = []
for tool_class_name in tools_all:
if tool_class_name in _DEPRECATED_TOOLS:
continue
# Resolve the str to the class
tool_class = getattr(langchain_community.tools, tool_class_name)
if isinstance(tool_class, type) and issubclass(tool_class, BaseTool):
if tool_class in _EXCLUDE:
continue
if (
skip_tools_without_default_names
and tool_class.__fields__["name"].default # type: ignore
in [None, ""]
):
continue
results.append(tool_class)
return results
def test_tool_names_unique() -> None:
"""Test that the default names for our core tools are unique."""
tool_classes = _get_tool_classes(skip_tools_without_default_names=True)
names = sorted([tool_cls.__fields__["name"].default for tool_cls in tool_classes])
duplicated_names = [name for name in names if names.count(name) > 1]
assert not duplicated_names

View File

@@ -0,0 +1,728 @@
"""Test FAISS functionality."""
import datetime
import math
import tempfile
import pytest
from typing import Union
from langchain_core.documents import Document
from langchain_community.docstore.base import Docstore
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores.faiss import FAISS
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
_PAGE_CONTENT = """This is a page about LangChain.
It is a really cool framework.
What isn't there to love about langchain?
Made in 2022."""
class FakeDocstore(Docstore):
"""Fake docstore for testing purposes."""
def search(self, search: str) -> Union[str, Document]:
"""Return the fake document."""
document = Document(page_content=_PAGE_CONTENT)
return document
@pytest.mark.requires("faiss")
def test_faiss() -> None:
"""Test end to end construction and search."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = docsearch.similarity_search("foo", k=1)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
async def test_faiss_afrom_texts() -> None:
"""Test end to end construction and search."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = await docsearch.asimilarity_search("foo", k=1)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
def test_faiss_vector_sim() -> None:
"""Test vector similarity."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.similarity_search_by_vector(query_vec, k=1)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
async def test_faiss_async_vector_sim() -> None:
"""Test vector similarity."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.asimilarity_search_by_vector(query_vec, k=1)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
def test_faiss_vector_sim_with_score_threshold() -> None:
"""Test vector similarity."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.similarity_search_by_vector(query_vec, k=2, score_threshold=0.2)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
async def test_faiss_vector_async_sim_with_score_threshold() -> None:
"""Test vector similarity."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.asimilarity_search_by_vector(
query_vec, k=2, score_threshold=0.2
)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
def test_similarity_search_with_score_by_vector() -> None:
"""Test vector similarity with score by vector."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.similarity_search_with_score_by_vector(query_vec, k=1)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo")
@pytest.mark.requires("faiss")
async def test_similarity_async_search_with_score_by_vector() -> None:
"""Test vector similarity with score by vector."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.asimilarity_search_with_score_by_vector(query_vec, k=1)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo")
@pytest.mark.requires("faiss")
def test_similarity_search_with_score_by_vector_with_score_threshold() -> None:
"""Test vector similarity with score by vector."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.similarity_search_with_score_by_vector(
query_vec,
k=2,
score_threshold=0.2,
)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo")
assert output[0][1] < 0.2
@pytest.mark.requires("faiss")
async def test_sim_asearch_with_score_by_vector_with_score_threshold() -> None:
"""Test vector similarity with score by vector."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.asimilarity_search_with_score_by_vector(
query_vec,
k=2,
score_threshold=0.2,
)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo")
assert output[0][1] < 0.2
@pytest.mark.requires("faiss")
def test_faiss_mmr() -> None:
texts = ["foo", "foo", "fou", "foy"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
query_vec = FakeEmbeddings().embed_query(text="foo")
# make sure we can have k > docstore size
output = docsearch.max_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1
)
assert len(output) == len(texts)
assert output[0][0] == Document(page_content="foo")
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo")
@pytest.mark.requires("faiss")
async def test_faiss_async_mmr() -> None:
texts = ["foo", "foo", "fou", "foy"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
query_vec = await FakeEmbeddings().aembed_query(text="foo")
# make sure we can have k > docstore size
output = await docsearch.amax_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1
)
assert len(output) == len(texts)
assert output[0][0] == Document(page_content="foo")
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo")
@pytest.mark.requires("faiss")
def test_faiss_mmr_with_metadatas() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.max_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1
)
assert len(output) == len(texts)
assert output[0][0] == Document(page_content="foo", metadata={"page": 0})
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo", metadata={"page": 0})
@pytest.mark.requires("faiss")
async def test_faiss_async_mmr_with_metadatas() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.amax_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1
)
assert len(output) == len(texts)
assert output[0][0] == Document(page_content="foo", metadata={"page": 0})
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo", metadata={"page": 0})
@pytest.mark.requires("faiss")
def test_faiss_mmr_with_metadatas_and_filter() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.max_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1, filter={"page": 1}
)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo", metadata={"page": 1})
assert output[0][1] == 0.0
@pytest.mark.requires("faiss")
async def test_faiss_async_mmr_with_metadatas_and_filter() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.amax_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1, filter={"page": 1}
)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo", metadata={"page": 1})
assert output[0][1] == 0.0
@pytest.mark.requires("faiss")
def test_faiss_mmr_with_metadatas_and_list_filter() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} if i <= 3 else {"page": 3} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.max_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1, filter={"page": [0, 1, 2]}
)
assert len(output) == 3
assert output[0][0] == Document(page_content="foo", metadata={"page": 0})
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo", metadata={"page": 0})
@pytest.mark.requires("faiss")
async def test_faiss_async_mmr_with_metadatas_and_list_filter() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} if i <= 3 else {"page": 3} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.amax_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1, filter={"page": [0, 1, 2]}
)
assert len(output) == 3
assert output[0][0] == Document(page_content="foo", metadata={"page": 0})
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo", metadata={"page": 0})
@pytest.mark.requires("faiss")
def test_faiss_with_metadatas() -> None:
"""Test end to end construction and search."""
texts = ["foo", "bar", "baz"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = docsearch.similarity_search("foo", k=1)
assert output == [Document(page_content="foo", metadata={"page": 0})]
@pytest.mark.requires("faiss")
async def test_faiss_async_with_metadatas() -> None:
"""Test end to end construction and search."""
texts = ["foo", "bar", "baz"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = await docsearch.asimilarity_search("foo", k=1)
assert output == [Document(page_content="foo", metadata={"page": 0})]
@pytest.mark.requires("faiss")
def test_faiss_with_metadatas_and_filter() -> None:
texts = ["foo", "bar", "baz"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = docsearch.similarity_search("foo", k=1, filter={"page": 1})
assert output == [Document(page_content="bar", metadata={"page": 1})]
@pytest.mark.requires("faiss")
async def test_faiss_async_with_metadatas_and_filter() -> None:
texts = ["foo", "bar", "baz"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = await docsearch.asimilarity_search("foo", k=1, filter={"page": 1})
assert output == [Document(page_content="bar", metadata={"page": 1})]
@pytest.mark.requires("faiss")
def test_faiss_with_metadatas_and_list_filter() -> None:
texts = ["foo", "bar", "baz", "foo", "qux"]
metadatas = [{"page": i} if i <= 3 else {"page": 3} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
docsearch.index_to_docstore_id[3]: Document(
page_content="foo", metadata={"page": 3}
),
docsearch.index_to_docstore_id[4]: Document(
page_content="qux", metadata={"page": 3}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = docsearch.similarity_search("foor", k=1, filter={"page": [0, 1, 2]})
assert output == [Document(page_content="foo", metadata={"page": 0})]
@pytest.mark.requires("faiss")
async def test_faiss_async_with_metadatas_and_list_filter() -> None:
texts = ["foo", "bar", "baz", "foo", "qux"]
metadatas = [{"page": i} if i <= 3 else {"page": 3} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
docsearch.index_to_docstore_id[3]: Document(
page_content="foo", metadata={"page": 3}
),
docsearch.index_to_docstore_id[4]: Document(
page_content="qux", metadata={"page": 3}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = await docsearch.asimilarity_search("foor", k=1, filter={"page": [0, 1, 2]})
assert output == [Document(page_content="foo", metadata={"page": 0})]
@pytest.mark.requires("faiss")
def test_faiss_search_not_found() -> None:
"""Test what happens when document is not found."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
# Get rid of the docstore to purposefully induce errors.
docsearch.docstore = InMemoryDocstore({})
with pytest.raises(ValueError):
docsearch.similarity_search("foo")
@pytest.mark.requires("faiss")
async def test_faiss_async_search_not_found() -> None:
"""Test what happens when document is not found."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
# Get rid of the docstore to purposefully induce errors.
docsearch.docstore = InMemoryDocstore({})
with pytest.raises(ValueError):
await docsearch.asimilarity_search("foo")
@pytest.mark.requires("faiss")
def test_faiss_add_texts() -> None:
"""Test end to end adding of texts."""
# Create initial doc store.
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
# Test adding a similar document as before.
docsearch.add_texts(["foo"])
output = docsearch.similarity_search("foo", k=2)
assert output == [Document(page_content="foo"), Document(page_content="foo")]
@pytest.mark.requires("faiss")
async def test_faiss_async_add_texts() -> None:
"""Test end to end adding of texts."""
# Create initial doc store.
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
# Test adding a similar document as before.
await docsearch.aadd_texts(["foo"])
output = await docsearch.asimilarity_search("foo", k=2)
assert output == [Document(page_content="foo"), Document(page_content="foo")]
@pytest.mark.requires("faiss")
def test_faiss_add_texts_not_supported() -> None:
"""Test adding of texts to a docstore that doesn't support it."""
docsearch = FAISS(FakeEmbeddings(), None, FakeDocstore(), {})
with pytest.raises(ValueError):
docsearch.add_texts(["foo"])
@pytest.mark.requires("faiss")
async def test_faiss_async_add_texts_not_supported() -> None:
"""Test adding of texts to a docstore that doesn't support it."""
docsearch = FAISS(FakeEmbeddings(), None, FakeDocstore(), {})
with pytest.raises(ValueError):
await docsearch.aadd_texts(["foo"])
@pytest.mark.requires("faiss")
def test_faiss_local_save_load() -> None:
"""Test end to end serialization."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
temp_timestamp = datetime.datetime.utcnow().strftime("%Y%m%d-%H%M%S")
with tempfile.TemporaryDirectory(suffix="_" + temp_timestamp + "/") as temp_folder:
docsearch.save_local(temp_folder)
new_docsearch = FAISS.load_local(temp_folder, FakeEmbeddings())
assert new_docsearch.index is not None
@pytest.mark.requires("faiss")
async def test_faiss_async_local_save_load() -> None:
"""Test end to end serialization."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
temp_timestamp = datetime.datetime.utcnow().strftime("%Y%m%d-%H%M%S")
with tempfile.TemporaryDirectory(suffix="_" + temp_timestamp + "/") as temp_folder:
docsearch.save_local(temp_folder)
new_docsearch = FAISS.load_local(temp_folder, FakeEmbeddings())
assert new_docsearch.index is not None
@pytest.mark.requires("faiss")
def test_faiss_similarity_search_with_relevance_scores() -> None:
"""Test the similarity search with normalized similarities."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(
texts,
FakeEmbeddings(),
relevance_score_fn=lambda score: 1.0 - score / math.sqrt(2),
)
outputs = docsearch.similarity_search_with_relevance_scores("foo", k=1)
output, score = outputs[0]
assert output == Document(page_content="foo")
assert score == 1.0
@pytest.mark.requires("faiss")
async def test_faiss_async_similarity_search_with_relevance_scores() -> None:
"""Test the similarity search with normalized similarities."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(
texts,
FakeEmbeddings(),
relevance_score_fn=lambda score: 1.0 - score / math.sqrt(2),
)
outputs = await docsearch.asimilarity_search_with_relevance_scores("foo", k=1)
output, score = outputs[0]
assert output == Document(page_content="foo")
assert score == 1.0
@pytest.mark.requires("faiss")
def test_faiss_similarity_search_with_relevance_scores_with_threshold() -> None:
"""Test the similarity search with normalized similarities with score threshold."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(
texts,
FakeEmbeddings(),
relevance_score_fn=lambda score: 1.0 - score / math.sqrt(2),
)
outputs = docsearch.similarity_search_with_relevance_scores(
"foo", k=2, score_threshold=0.5
)
assert len(outputs) == 1
output, score = outputs[0]
assert output == Document(page_content="foo")
assert score == 1.0
@pytest.mark.requires("faiss")
async def test_faiss_asimilarity_search_with_relevance_scores_with_threshold() -> None:
"""Test the similarity search with normalized similarities with score threshold."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(
texts,
FakeEmbeddings(),
relevance_score_fn=lambda score: 1.0 - score / math.sqrt(2),
)
outputs = await docsearch.asimilarity_search_with_relevance_scores(
"foo", k=2, score_threshold=0.5
)
assert len(outputs) == 1
output, score = outputs[0]
assert output == Document(page_content="foo")
assert score == 1.0
@pytest.mark.requires("faiss")
def test_faiss_invalid_normalize_fn() -> None:
"""Test the similarity search with normalized similarities."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(
texts, FakeEmbeddings(), relevance_score_fn=lambda _: 2.0
)
with pytest.warns(Warning, match="scores must be between"):
docsearch.similarity_search_with_relevance_scores("foo", k=1)
@pytest.mark.requires("faiss")
async def test_faiss_async_invalid_normalize_fn() -> None:
"""Test the similarity search with normalized similarities."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(
texts, FakeEmbeddings(), relevance_score_fn=lambda _: 2.0
)
with pytest.warns(Warning, match="scores must be between"):
await docsearch.asimilarity_search_with_relevance_scores("foo", k=1)
@pytest.mark.requires("faiss")
def test_missing_normalize_score_fn() -> None:
"""Test doesn't perform similarity search without a valid distance strategy."""
texts = ["foo", "bar", "baz"]
faiss_instance = FAISS.from_texts(texts, FakeEmbeddings(), distance_strategy="fake")
with pytest.raises(ValueError):
faiss_instance.similarity_search_with_relevance_scores("foo", k=2)
@pytest.mark.requires("faiss")
async def test_async_missing_normalize_score_fn() -> None:
"""Test doesn't perform similarity search without a valid distance strategy."""
texts = ["foo", "bar", "baz"]
faiss_instance = await FAISS.afrom_texts(
texts, FakeEmbeddings(), distance_strategy="fake"
)
with pytest.raises(ValueError):
await faiss_instance.asimilarity_search_with_relevance_scores("foo", k=2)
@pytest.mark.requires("faiss")
def test_delete() -> None:
"""Test the similarity search with normalized similarities."""
ids = ["a", "b", "c"]
docsearch = FAISS.from_texts(["foo", "bar", "baz"], FakeEmbeddings(), ids=ids)
docsearch.delete(ids[1:2])
result = docsearch.similarity_search("bar", k=2)
assert sorted([d.page_content for d in result]) == ["baz", "foo"]
assert docsearch.index_to_docstore_id == {0: ids[0], 1: ids[2]}
@pytest.mark.requires("faiss")
async def test_async_delete() -> None:
"""Test the similarity search with normalized similarities."""
ids = ["a", "b", "c"]
docsearch = await FAISS.afrom_texts(
["foo", "bar", "baz"], FakeEmbeddings(), ids=ids
)
docsearch.delete(ids[1:2])
result = await docsearch.asimilarity_search("bar", k=2)
assert sorted([d.page_content for d in result]) == ["baz", "foo"]
assert docsearch.index_to_docstore_id == {0: ids[0], 1: ids[2]}

View File

@@ -0,0 +1,13 @@
from langchain_community import vectorstores
from langchain_core.vectorstores import VectorStore
def test_all_imports() -> None:
"""Simple test to make sure all things can be imported."""
for cls in vectorstores.__all__:
if cls not in [
"AlibabaCloudOpenSearchSettings",
"ClickhouseSettings",
"MyScaleSettings",
]:
assert issubclass(getattr(vectorstores, cls), VectorStore)

View File

@@ -0,0 +1,144 @@
import importlib
import json
import os
from typing import Any, Dict, List, Optional
from langchain_core.load.mapping import SERIALIZABLE_MAPPING
from langchain_core.load.serializable import Serializable
DEFAULT_NAMESPACES = ["langchain", "langchain_core", "langchain_community"]
class Reviver:
"""Reviver for JSON objects."""
def __init__(
self,
secrets_map: Optional[Dict[str, str]] = None,
valid_namespaces: Optional[List[str]] = None,
) -> None:
self.secrets_map = secrets_map or dict()
# By default only support langchain, but user can pass in additional namespaces
self.valid_namespaces = (
[*DEFAULT_NAMESPACES, *valid_namespaces]
if valid_namespaces
else DEFAULT_NAMESPACES
)
def __call__(self, value: Dict[str, Any]) -> Any:
if (
value.get("lc", None) == 1
and value.get("type", None) == "secret"
and value.get("id", None) is not None
):
[key] = value["id"]
if key in self.secrets_map:
return self.secrets_map[key]
else:
if key in os.environ and os.environ[key]:
return os.environ[key]
raise KeyError(f'Missing key "{key}" in load(secrets_map)')
if (
value.get("lc", None) == 1
and value.get("type", None) == "not_implemented"
and value.get("id", None) is not None
):
raise NotImplementedError(
"Trying to load an object that doesn't implement "
f"serialization: {value}"
)
if (
value.get("lc", None) == 1
and value.get("type", None) == "constructor"
and value.get("id", None) is not None
):
[*namespace, name] = value["id"]
if namespace[0] not in self.valid_namespaces:
raise ValueError(f"Invalid namespace: {value}")
# The root namespace "langchain" is not a valid identifier.
if len(namespace) == 1 and namespace[0] == "langchain":
raise ValueError(f"Invalid namespace: {value}")
# Get the importable path
key = tuple(namespace + [name])
if key not in SERIALIZABLE_MAPPING:
raise ValueError(
"Trying to deserialize something that cannot "
"be deserialized in current version of langchain-core: "
f"{key}"
)
import_path = SERIALIZABLE_MAPPING[key]
# Split into module and name
import_dir, import_obj = import_path[:-1], import_path[-1]
# Import module
mod = importlib.import_module(".".join(import_dir))
# Import class
cls = getattr(mod, import_obj)
# The class must be a subclass of Serializable.
if not issubclass(cls, Serializable):
raise ValueError(f"Invalid namespace: {value}")
# We don't need to recurse on kwargs
# as json.loads will do that for us.
kwargs = value.get("kwargs", dict())
return cls(**kwargs)
return value
def loads(
text: str,
*,
secrets_map: Optional[Dict[str, str]] = None,
valid_namespaces: Optional[List[str]] = None,
) -> Any:
"""Revive a LangChain class from a JSON string.
Equivalent to `load(json.loads(text))`.
Args:
text: The string to load.
secrets_map: A map of secrets to load.
valid_namespaces: A list of additional namespaces (modules)
to allow to be deserialized.
Returns:
Revived LangChain objects.
"""
return json.loads(text, object_hook=Reviver(secrets_map, valid_namespaces))
def load(
obj: Any,
*,
secrets_map: Optional[Dict[str, str]] = None,
valid_namespaces: Optional[List[str]] = None,
) -> Any:
"""Revive a LangChain class from a JSON object. Use this if you already
have a parsed JSON object, eg. from `json.load` or `orjson.loads`.
Args:
obj: The object to load.
secrets_map: A map of secrets to load.
valid_namespaces: A list of additional namespaces (modules)
to allow to be deserialized.
Returns:
Revived LangChain objects.
"""
reviver = Reviver(secrets_map, valid_namespaces)
def _load(obj: Any) -> Any:
if isinstance(obj, dict):
# Need to revive leaf nodes before reviving this node
loaded_obj = {k: _load(v) for k, v in obj.items()}
return reviver(loaded_obj)
if isinstance(obj, list):
return [_load(o) for o in obj]
return obj
return _load(obj)

View File

@@ -0,0 +1,49 @@
"""
**Utility functions** for LangChain.
These functions do not depend on any other LangChain module.
"""
from langchain_core.utils.env import get_from_dict_or_env, get_from_env
from langchain_core.utils.formatting import StrictFormatter, formatter
from langchain_core.utils.input import (
get_bolded_text,
get_color_mapping,
get_colored_text,
print_text,
)
from langchain_core.utils.loading import try_load_from_hub
from langchain_core.utils.strings import comma_list, stringify_dict, stringify_value
from langchain_core.utils.utils import (
build_extra_kwargs,
check_package_version,
convert_to_secret_str,
get_pydantic_field_names,
guard_import,
mock_now,
raise_for_status_with_text,
xor_args,
)
__all__ = [
"StrictFormatter",
"check_package_version",
"convert_to_secret_str",
"formatter",
"get_bolded_text",
"get_color_mapping",
"get_colored_text",
"get_pydantic_field_names",
"guard_import",
"mock_now",
"print_text",
"raise_for_status_with_text",
"xor_args",
"try_load_from_hub",
"build_extra_kwargs",
"get_from_env",
"get_from_dict_or_env",
"stringify_dict",
"comma_list",
"stringify_value",
]

View File

@@ -0,0 +1,45 @@
from __future__ import annotations
import os
from typing import Any, Dict, Optional
def env_var_is_set(env_var: str) -> bool:
"""Check if an environment variable is set.
Args:
env_var (str): The name of the environment variable.
Returns:
bool: True if the environment variable is set, False otherwise.
"""
return env_var in os.environ and os.environ[env_var] not in (
"",
"0",
"false",
"False",
)
def get_from_dict_or_env(
data: Dict[str, Any], key: str, env_key: str, default: Optional[str] = None
) -> str:
"""Get a value from a dictionary or an environment variable."""
if key in data and data[key]:
return data[key]
else:
return get_from_env(key, env_key, default=default)
def get_from_env(key: str, env_key: str, default: Optional[str] = None) -> str:
"""Get a value from a dictionary or an environment variable."""
if env_key in os.environ and os.environ[env_key]:
return os.environ[env_key]
elif default is not None:
return default
else:
raise ValueError(
f"Did not find {key}, please add an environment variable"
f" `{env_key}` which contains it, or pass"
f" `{key}` as a named parameter."
)

View File

@@ -0,0 +1,28 @@
from langchain_core.utils import __all__
EXPECTED_ALL = [
"StrictFormatter",
"check_package_version",
"convert_to_secret_str",
"formatter",
"get_bolded_text",
"get_color_mapping",
"get_colored_text",
"get_pydantic_field_names",
"guard_import",
"mock_now",
"print_text",
"raise_for_status_with_text",
"xor_args",
"try_load_from_hub",
"build_extra_kwargs",
"get_from_dict_or_env",
"get_from_env",
"stringify_dict",
"comma_list",
"stringify_value"
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

View File

@@ -0,0 +1,83 @@
"""**Callback handlers** allow listening to events in LangChain.
**Class hierarchy:**
.. code-block::
BaseCallbackHandler --> <name>CallbackHandler # Example: AimCallbackHandler
"""
from langchain_core.callbacks import StdOutCallbackHandler, StreamingStdOutCallbackHandler
from langchain_core.tracers.langchain import LangChainTracer
from langchain_core.tracers.context import (
collect_runs,
tracing_enabled,
tracing_v2_enabled,
)
from langchain_community.callbacks.aim_callback import AimCallbackHandler
from langchain_community.callbacks.argilla_callback import ArgillaCallbackHandler
from langchain_community.callbacks.arize_callback import ArizeCallbackHandler
from langchain_community.callbacks.arthur_callback import ArthurCallbackHandler
from langchain_community.callbacks.clearml_callback import ClearMLCallbackHandler
from langchain_community.callbacks.comet_ml_callback import CometCallbackHandler
from langchain_community.callbacks.context_callback import ContextCallbackHandler
from langchain.callbacks.file import FileCallbackHandler
from langchain_community.callbacks.flyte_callback import FlyteCallbackHandler
from langchain_community.callbacks.human import HumanApprovalCallbackHandler
from langchain_community.callbacks.infino_callback import InfinoCallbackHandler
from langchain_community.callbacks.labelstudio_callback import LabelStudioCallbackHandler
from langchain_community.callbacks.llmonitor_callback import LLMonitorCallbackHandler
from langchain_community.callbacks.mlflow_callback import MlflowCallbackHandler
from langchain_community.callbacks.openai_info import OpenAICallbackHandler
from langchain_community.callbacks.promptlayer_callback import PromptLayerCallbackHandler
from langchain_community.callbacks.sagemaker_callback import SageMakerCallbackHandler
from langchain.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
from langchain.callbacks.streaming_stdout_final_only import (
FinalStreamingStdOutCallbackHandler,
)
from langchain_community.callbacks.streamlit import LLMThoughtLabeler, StreamlitCallbackHandler
from langchain_community.callbacks.trubrics_callback import TrubricsCallbackHandler
from langchain_community.callbacks.wandb_callback import WandbCallbackHandler
from langchain_community.callbacks.whylabs_callback import WhyLabsCallbackHandler
from langchain_community.callbacks.manager import (
get_openai_callback,
wandb_tracing_enabled,
)
__all__ = [
"AimCallbackHandler",
"ArgillaCallbackHandler",
"ArizeCallbackHandler",
"PromptLayerCallbackHandler",
"ArthurCallbackHandler",
"ClearMLCallbackHandler",
"CometCallbackHandler",
"ContextCallbackHandler",
"FileCallbackHandler",
"HumanApprovalCallbackHandler",
"InfinoCallbackHandler",
"MlflowCallbackHandler",
"LLMonitorCallbackHandler",
"OpenAICallbackHandler",
"StdOutCallbackHandler",
"AsyncIteratorCallbackHandler",
"StreamingStdOutCallbackHandler",
"FinalStreamingStdOutCallbackHandler",
"LLMThoughtLabeler",
"LangChainTracer",
"StreamlitCallbackHandler",
"WandbCallbackHandler",
"WhyLabsCallbackHandler",
"get_openai_callback",
"tracing_enabled",
"tracing_v2_enabled",
"collect_runs",
"wandb_tracing_enabled",
"FlyteCallbackHandler",
"SageMakerCallbackHandler",
"LabelStudioCallbackHandler",
"TrubricsCallbackHandler",
]

View File

@@ -0,0 +1,68 @@
from __future__ import annotations
from langchain_core.callbacks.manager import (
AsyncCallbackManager,
AsyncCallbackManagerForChainGroup,
AsyncCallbackManagerForChainRun,
AsyncCallbackManagerForLLMRun,
AsyncCallbackManagerForRetrieverRun,
AsyncCallbackManagerForToolRun,
AsyncParentRunManager,
AsyncRunManager,
BaseRunManager,
CallbackManager,
CallbackManagerForChainGroup,
CallbackManagerForChainRun,
CallbackManagerForLLMRun,
CallbackManagerForRetrieverRun,
CallbackManagerForToolRun,
Callbacks,
ParentRunManager,
RunManager,
ahandle_event,
atrace_as_chain_group,
handle_event,
trace_as_chain_group,
)
from langchain_core.tracers.context import (
collect_runs,
tracing_enabled,
tracing_v2_enabled,
)
from langchain_core.utils.env import env_var_is_set
from langchain_community.callbacks.manager import (
get_openai_callback,
wandb_tracing_enabled,
)
__all__ = [
"BaseRunManager",
"RunManager",
"ParentRunManager",
"AsyncRunManager",
"AsyncParentRunManager",
"CallbackManagerForLLMRun",
"AsyncCallbackManagerForLLMRun",
"CallbackManagerForChainRun",
"AsyncCallbackManagerForChainRun",
"CallbackManagerForToolRun",
"AsyncCallbackManagerForToolRun",
"CallbackManagerForRetrieverRun",
"AsyncCallbackManagerForRetrieverRun",
"CallbackManager",
"CallbackManagerForChainGroup",
"AsyncCallbackManager",
"AsyncCallbackManagerForChainGroup",
"tracing_enabled",
"tracing_v2_enabled",
"collect_runs",
"atrace_as_chain_group",
"trace_as_chain_group",
"handle_event",
"ahandle_event",
"Callbacks",
"env_var_is_set",
"get_openai_callback",
"wandb_tracing_enabled",
]

View File

@@ -0,0 +1,36 @@
from langchain.callbacks.manager import __all__
EXPECTED_ALL = [
"BaseRunManager",
"RunManager",
"ParentRunManager",
"AsyncRunManager",
"AsyncParentRunManager",
"CallbackManagerForLLMRun",
"AsyncCallbackManagerForLLMRun",
"CallbackManagerForChainRun",
"AsyncCallbackManagerForChainRun",
"CallbackManagerForToolRun",
"AsyncCallbackManagerForToolRun",
"CallbackManagerForRetrieverRun",
"AsyncCallbackManagerForRetrieverRun",
"CallbackManager",
"CallbackManagerForChainGroup",
"AsyncCallbackManager",
"AsyncCallbackManagerForChainGroup",
"tracing_enabled",
"tracing_v2_enabled",
"collect_runs",
"atrace_as_chain_group",
"trace_as_chain_group",
"handle_event",
"ahandle_event",
"env_var_is_set",
"Callbacks",
"get_openai_callback",
"wandb_tracing_enabled",
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

Some files were not shown because too many files have changed in this diff Show More