Commit Graph

251 Commits

Author SHA1 Message Date
Eugene Yurtsev
25fbe356b4
community[patch]: upgrade to recent version of mypy (#21616)
This PR upgrades community to a recent version of mypy. It inserts type:
ignore on all existing failures.
2024-05-13 14:55:07 -04:00
ccurme
3bb9bec314
bedrock: add unit test for retriever (#21485)
This was implemented in
https://github.com/langchain-ai/langchain/pull/21349 but dropped before
merge.
2024-05-09 11:37:03 -04:00
Yash
cb31c3611f
Ndb enterprise (#21233)
Description: Adds NeuralDBClientVectorStore to the langchain, which is
our enterprise client.

---------

Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
2024-05-08 16:30:58 -07:00
Sokolov Fedor
f4ddf64faa
community: Add MarkdownifyTransformer to langchain_community.document_transformers (#21247)
- Added new document_transformer: MarkdonifyTransformer, that uses
`markdonify` package with customizable options to convert HTML to
Markdown. It's similar to Html2TextTransformer, but has more flexible
options and also I've noticed that sometimes MarkdownifyTransformer
performs better than html2text one, so that's why I use markdownify on
my project.
- Added docs and tests

- Usage:
```python
from langchain_community.document_transformers import MarkdownifyTransformer

markdownify = MarkdownifyTransformer()
docs_transform = markdownify.transform_documents(docs)
```

- Example of better performance on simple task, that I've noticed:
```
<html>
<head><title>Reports on product movement</title></head>
<body>
<p data-block-key="2wst7">The reports on product movement will be useful for forming supplier orders and controlling outcomes.</p>
</body>
```
**Html2TextTransformer**: 
```python
[Document(page_content='The reports on product movement will be useful for forming supplier orders and\ncontrolling outcomes.\n\n')]
# Here we can see 'and\ncontrolling', which has extra '\n' in it
```
**MarkdownifyTranformer**:
```python
[Document(page_content='Reports on product movement\n\nThe reports on product movement will be useful for forming supplier orders and controlling outcomes.')]
```

---------

Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.bbrouter>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.local>
Co-authored-by: Sokolov Fedor <f.sokolov@192.168.1.6>
2024-05-08 14:45:13 -07:00
Eugene Yurtsev
f92006de3c
multiple: langchain 0.2 in master (#21191)
0.2rc 

migrations

- [x] Move memory
- [x] Move remaining retrievers
- [x] graph_qa chains
- [x] some dependency from evaluation code potentially on math utils
- [x] Move openapi chain from `langchain.chains.api.openapi` to
`langchain_community.chains.openapi`
- [x] Migrate `langchain.chains.ernie_functions` to
`langchain_community.chains.ernie_functions`
- [x] migrate `langchain/chains/llm_requests.py` to
`langchain_community.chains.llm_requests`
- [x] Moving `langchain_community.cross_enoders.base:BaseCrossEncoder`
->
`langchain_community.retrievers.document_compressors.cross_encoder:BaseCrossEncoder`
(namespace not ideal, but it needs to be moved to `langchain` to avoid
circular deps)
- [x] unit tests langchain -- add pytest.mark.community to some unit
tests that will stay in langchain
- [x] unit tests community -- move unit tests that depend on community
to community
- [x] mv integration tests that depend on community to community
- [x] mypy checks

Other todo

- [x] Make deprecation warnings not noisy (need to use warn deprecated
and check that things are implemented properly)
- [x] Update deprecation messages with timeline for code removal (likely
we actually won't be removing things until 0.4 release) -- will give
people more time to transition their code.
- [ ] Add information to deprecation warning to show users how to
migrate their code base using langchain-cli
- [ ] Remove any unnecessary requirements in langchain (e.g., is
SQLALchemy required?)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-05-08 16:46:52 -04:00
Eugene Yurtsev
6a1d61dbf1
community[patch]: Fix in memory vectorstore to take into account ids when adding docs (#21384)
Should respect `ids` if passed
2024-05-07 15:05:16 -04:00
nrpd25
95cc8e3fc3
premai[patch]:Standardized model init args (#21308)
[Standardized model init args
#20085](https://github.com/langchain-ai/langchain/issues/20085)
- Enable premai chat model to be initialized with `model_name` as an
alias for `model`, `api_key` as an alias for `premai_api_key`.
- Add initialization test `test_premai_initialization`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-05-06 18:12:29 -04:00
Jorge Piedrahita Ortiz
e65652c3e8
community: add SambaNova embeddings integration (#21227)
- **Description:**  SambaNova hosted embeddings integration
2024-05-06 13:29:59 -07:00
Jorge Piedrahita Ortiz
df1c10260c
community: minor changes sambanova integration (#21231)
- **Description:** fix: variable names in root validator not allowing
pass credentials as named parameters in llm instancing, also added
sambanova's sambaverse and sambastudio llms to __init__.py for module
import
2024-05-06 13:28:35 -07:00
Mark Cusack
060987d755
community[minor]: Add indexing via locality sensitive hashing to the Yellowbrick vector store (#20856)
- **Description:** Add LSH-based indexing to the Yellowbrick vector
store module
- **Twitter handle:** @markcusack

---------

Co-authored-by: markcusack <markcusack@markcusacksmac.lan>
Co-authored-by: markcusack <markcusack@Mark-Cusack-sMac.local>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-05-06 20:18:02 +00:00
Param Singh
fee91d43b7
baichuan[patch]:standardize chat init args (#21298)
Thank you for contributing to LangChain!

community:baichuan[patch]: standardize init args

updated `baichuan_api_key` so that aliased to `api_key`. Added test that
it continues to set the same underlying attribute. Test checks for
`SecretStr`

updated `temperature` with Pydantic Field, added unit test. 

Related to https://github.com/langchain-ai/langchain/issues/20085
2024-05-06 18:33:57 +00:00
Rohan Aggarwal
8021d2a2ab
community[minor]: Oraclevs integration (#21123)
Thank you for contributing to LangChain!

- Oracle AI Vector Search 
Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.


- Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.
This Pull Requests Adds the following functionalities
Oracle AI Vector Search : Vector Store
Oracle AI Vector Search : Document Loader
Oracle AI Vector Search : Document Splitter
Oracle AI Vector Search : Summary
Oracle AI Vector Search : Oracle Embeddings


- We have added unit tests and have our own local unit test suite which
verifies all the code is correct. We have made sure to add guides for
each of the components and one end to end guide that shows how the
entire thing runs.


- We have made sure that make format and make lint run clean.

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: skmishraoracle <shailendra.mishra@oracle.com>
Co-authored-by: hroyofc <harichandan.roy@oracle.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-05-04 03:15:35 +00:00
Eugene Yurtsev
c9119b0e75
langchain[patch],community[minor]: Move some unit tests from langchain to community, use core for fake models (#21190) 2024-05-02 09:57:52 -04:00
Eugene Yurtsev
bec3eee3fa
langchain[patch]: Migrate retrievers to use optional langchain community imports (#21155) 2024-05-01 14:44:44 -04:00
East Agile
2a6f78a53f
community[minor]: Rememberizer retriever (#20052)
**Description:**
This pull request introduces a new feature for LangChain: the
integration with the Rememberizer API through a custom retriever.
This enables LangChain applications to allow users to load and sync
their data from Dropbox, Google Drive, Slack, their hard drive into a
vector database that LangChain can query. Queries involve sending text
chunks generated within LangChain and retrieving a collection of
semantically relevant user data for inclusion in LLM prompts.
User knowledge dramatically improved AI applications.
The Rememberizer integration will also allow users to access general
purpose vectorized data such as Reddit channel discussions and US
patents.

**Issue:**
N/A

**Dependencies:**
N/A

**Twitter handle:**
https://twitter.com/Rememberizer
2024-05-01 10:41:44 -04:00
MacanPN
0f7f448603
community[patch]: add delete() method to AzureSearch vector store (#21127)
**Issue:**
Currently `AzureSearch` vector store does not implement `delete` method.
This PR implements it. This also makes it compatible with LangChain
indexer.

**Dependencies:**
None

**Twitter handle:**
@martintriska1

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-30 23:46:18 +00:00
Cahid Arda Öz
cc6191cb90
community[minor]: Add support for Upstash Vector (#20824)
## Description

Adding `UpstashVectorStore` to utilize [Upstash
Vector](https://upstash.com/docs/vector/overall/getstarted)!

#17012 was opened to add Upstash Vector to langchain but was closed to
wait for filtering. Now filtering is added to Upstash vector and we open
a new PR. Additionally, [embedding
feature](https://upstash.com/docs/vector/features/embeddingmodels) was
added and we add this to our vectorstore aswell.

## Dependencies

[upstash-vector](https://pypi.org/project/upstash-vector/) should be
installed to use `UpstashVectorStore`. Didn't update dependencies
because of [this comment in the previous
PR](https://github.com/langchain-ai/langchain/pull/17012#pullrequestreview-1876522450).

## Tests

Tests are added and they pass. Tests are naturally network bound since
Upstash Vector is offered through an API.

There was [a discussion in the previous PR about mocking the
unittests](https://github.com/langchain-ai/langchain/pull/17012#pullrequestreview-1891820567).
We didn't make changes to this end yet. We can update the tests if you
can explain how the tests should be mocked.

---------

Co-authored-by: ytkimirti <yusuftaha9@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-29 17:25:01 -04:00
chyroc
3e241956d3
community[minor]: add coze chat model (#20770)
add coze chat model, to call coze.com apis
2024-04-29 12:26:16 -04:00
Patrick McFadin
3331865f6b
community[minor]: add Cassandra Database Toolkit (#20246)
**Description**: ToolKit and Tools for accessing data in a Cassandra
Database primarily for Agent integration. Initially, this includes the
following tools:
- `cassandra_db_schema` Gathers all schema information for the connected
database or a specific schema. Critical for the agent when determining
actions.
- `cassandra_db_select_table_data` Selects data from a specific keyspace
and table. The agent can pass paramaters for a predicate and limits on
the number of returned records.
- `cassandra_db_query` Expiriemental alternative to
`cassandra_db_select_table_data` which takes a query string completely
formed by the agent instead of parameters. May be removed in future
versions.

Includes unit test and two notebooks to demonstrate usage. 

**Dependencies**: cassio
**Twitter handle**: @PatrickMcFadin

---------

Co-authored-by: Phil Miesle <phil.miesle@datastax.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-29 15:51:43 +00:00
Igor Brai
b3e74f2b98
community[minor]: add mojeek search util (#20922)
**Description:** This pull request introduces a new feature to community
tools, enhancing its search capabilities by integrating the Mojeek
search engine
**Dependencies:** None

---------

Co-authored-by: Igor Brai <igor@mojeek.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-04-29 15:49:53 +00:00
Leonid Ganeline
dc7c06bc07
community[minor]: import fix (#20995)
Issue: When the third-party package is not installed, whenever we need
to `pip install <package>` the ImportError is raised.
But sometimes, the `ValueError` or `ModuleNotFoundError` is raised. It
is bad for consistency.
Change: replaced the `ValueError` or `ModuleNotFoundError` with
`ImportError` when we raise an error with the `pip install <package>`
message.
Note: Ideally, we replace all `try: import... except... raise ... `with
helper functions like `import_aim` or just use the existing
[langchain_core.utils.utils.guard_import](https://api.python.langchain.com/en/latest/utils/langchain_core.utils.utils.guard_import.html#langchain_core.utils.utils.guard_import)
But it would be much bigger refactoring. @baskaryan Please, advice on
this.
2024-04-29 10:32:50 -04:00
WilliamEspegren
804390ba4b
community: Spider integration (#20937)
Added the [Spider.cloud](https://spider.cloud) document loader.
[Spider](https://github.com/spider-rs/spider) is the
[fastest](https://github.com/spider-rs/spider/blob/main/benches/BENCHMARKS.md)
and cheapest crawler that returns LLM-ready data.

```
- **Description:** Adds Spider data loader
- **Dependencies:** spider-client
- **Twitter handle:** @WilliamEspegren 
```

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: = <=>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-04-27 21:45:03 +00:00
Chip Davis
e818c75f8a
infra: test directory loader multithreaded (#20281)
This is a unit test for #20230 which was a fix for using multithreaded
mode with directory loader @eyurtsev
2024-04-26 19:16:47 -07:00
Matt
28df4750ef
community[patch]: Add initial tests for AzureSearch vector store (#17663)
**Description:** AzureSearch vector store has no tests. This PR adds
initial tests to validate the code can be imported and used.
**Issue:** N/A
**Dependencies:** azure-search-documents and azure-identity are added as
optional dependencies for testing

---------

Co-authored-by: Matt Gotteiner <[email protected]>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-25 20:42:01 +00:00
am-kinetica
b54b19ba1c
community[minor]: Implemented Kinetica Document Loader and added notebooks (#20002)
- [ ] **Kinetica Document Loader**: "community: a class to load
Documents from Kinetica"



- [ ] **Kinetica Document Loader**: 
- **Description:** implemented KineticaLoader in `kinetica_loader.py`
- **Dependencies:** install the Kinetica API using `pip install
gpudb==7.2.0.1 `
2024-04-25 13:39:00 -07:00
Jingpan Xiong
1202017c56
community[minor]: Add relyt vector database (#20316)
Co-authored-by: kaka <kaka@zbyte-inc.cloud>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: jingsi <jingsi@leadincloud.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-04-25 19:49:29 +00:00
ccurme
b8db73233c
core, community: deprecate tool.__call__ (#20900)
Does not update docs.
2024-04-25 14:50:39 -04:00
Joan Fontanals
baefbfb14e
community[mionr]: add Jina Reranker in retrievers module (#19406)
- **Description:** Adapt JinaEmbeddings to run with the new Jina AI
Rerank API
- **Twitter handle:** https://twitter.com/JinaAI_


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-25 10:27:10 -07:00
Mish Ushakov
6ccecf2363
community[minor]: added Browserbase loader (#20478) 2024-04-25 01:11:03 +00:00
ccurme
481d3855dc
patch: remove usage of llm, chat model __call__ (#20788)
- `llm(prompt)` -> `llm.invoke(prompt)`
- `llm(prompt=prompt` -> `llm.invoke(prompt)` (same with `messages=`)
- `llm(prompt, callbacks=callbacks)` -> `llm.invoke(prompt,
config={"callbacks": callbacks})`
- `llm(prompt, **kwargs)` -> `llm.invoke(prompt, **kwargs)`
2024-04-24 19:39:23 -04:00
Raghav Dixit
9b7fb381a4
community[patch]: LanceDB integration patch update (#20686)
Description : 

- added functionalities - delete, index creation, using existing
connection object etc.
- updated usage 
- Added LaceDB cloud OSS support

make lint_diff , make test checks done
2024-04-24 16:27:43 -07:00
Alex Sherstinsky
12e5ec6de3
community: Support both Predibase SDK-v1 and SDK-v2 in Predibase-LangChain integration (#20859) 2024-04-24 13:31:01 -07:00
JeffKatzy
5ab3f9a995
community[patch]: standardize chat init args (#20844)
Thank you for contributing to LangChain!

community:perplexity[patch]: standardize init args

updated pplx_api_key and request_timeout so that aliased to api_key, and
timeout respectively. Added test that both continue to set the same
underlying attributes.

Related to
[20085](https://github.com/langchain-ai/langchain/issues/20085)

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-04-24 12:26:05 -07:00
Eugene Yurtsev
30e48c9878
core[patch],community[patch]: Move file chat history back to community (#20834)
Marking as patch since we haven't had releases in between. This just reverting part of a PR from yesterday.
2024-04-24 12:47:25 -04:00
Eugene Yurtsev
645b1e142e
core[minor],langchain[patch],community[patch]: Move InMemory and File implementations of Chat History to core (#20752)
This PR moves the implementations for chat history to core. So it's
easier to determine which dependencies need to be broken / add
deprecation warnings
2024-04-23 10:22:11 -04:00
ccurme
c010ec8b71
patch: deprecate (a)get_relevant_documents (#20477)
- `.get_relevant_documents(query)` -> `.invoke(query)`
- `.get_relevant_documents(query=query)` -> `.invoke(query)`
- `.get_relevant_documents(query, callbacks=callbacks)` ->
`.invoke(query, config={"callbacks": callbacks})`
- `.get_relevant_documents(query, **kwargs)` -> `.invoke(query,
**kwargs)`

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-04-22 11:14:53 -04:00
shumway743
cb6e5e56c2
community[minor]: add graph store implementation for apache age (#20582)
**Description:** implemented GraphStore class for Apache Age graph db

**Dependencies:** depends on psycopg2

Unit and integration tests included. Formatting and linting have been
run.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-20 14:31:04 -07:00
Lance Martin
d5c22b80a5
community[patch]: Fix Ollama for LLaMA3 (#20624)
We see verbose generations w/ LLaMA3 and Ollama - 

https://smith.langchain.com/public/88c4cd21-3d57-4229-96fe-53443398ca99/r

--- 

Fix here implies that when stop was being set to an empty list, the
stream had no conditions under which to stop, which could lead to
excessive or unintended output.

Test LLaMA2 - 

https://smith.langchain.com/public/57dfc64a-591b-46fa-a1cd-8783acaefea2/r

Test LLaMA3 - 

https://smith.langchain.com/public/76ff5f47-ac89-4772-a7d2-5caa907d3fd6/r

https://smith.langchain.com/public/a31d2fad-9094-4c93-949a-964b27630ccb/r

Test Mistral -

https://smith.langchain.com/public/a4fe7114-c308-4317-b9fd-6c86d31f1c5b/r

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-04-19 00:20:32 +00:00
Pengcheng Liu
ecd19a9e58
community[patch]: Add function call support in Tongyi chat model. (#20119)
- [ ] **PR message**: 
- **Description:** This pr adds function calling support in Tongyi chat
model.
    - **Issue:** None
    - **Dependencies:** None
    - **Twitter handle:** None

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-04-17 20:42:23 +00:00
Sevin F. Varoglu
3f156e0ece
community[minor]: add ChatOctoAI (#20059)
This PR adds ChatOctoAI, a chat model integration for OctoAI.
2024-04-17 03:20:56 -07:00
pjb157
479be3cc91
community[minor]: Unify Titan Takeoff Integrations and Adding Embedding Support (#18775)
**Community: Unify Titan Takeoff Integrations and Adding Embedding
Support**

 **Description:** 
Titan Takeoff no longer reflects this either of the integrations in the
community folder. The two integrations (TitanTakeoffPro and
TitanTakeoff) where causing confusion with clients, so have moved code
into one place and created an alias for backwards compatibility. Added
Takeoff Client python package to do the bulk of the work with the
requests, this is because this package is actively updated with new
versions of Takeoff. So this integration will be far more robust and
will not degrade as badly over time.

**Issue:**
Fixes bugs in the old Titan integrations and unified the code with added
unit test converge to avoid future problems.

**Dependencies:**
Added optional dependency takeoff-client, all imports still work without
dependency including the Titan Takeoff classes but just will fail on
initialisation if not pip installed takeoff-client

**Twitter**
@MeryemArik9

Thanks all :)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-04-17 01:43:35 +00:00
sdan
a7c5e41443
community[minor]: Added VLite as VectorStore (#20245)
Support [VLite](https://github.com/sdan/vlite) as a new VectorStore
type.

**Description**:
vlite is a simple and blazing fast vector database(vdb) made with numpy.
It abstracts a lot of the functionality around using a vdb in the
retrieval augmented generation(RAG) pipeline such as embeddings
generation, chunking, and file processing while still giving developers
the functionality to change how they're made/stored.

**Before submitting**:
Added tests
[here](c09c2ebd5c/libs/community/tests/integration_tests/vectorstores/test_vlite.py)
Added ipython notebook
[here](c09c2ebd5c/docs/docs/integrations/vectorstores/vlite.ipynb)
Added simple docs on how to use
[here](c09c2ebd5c/docs/docs/integrations/providers/vlite.mdx)

**Profiles**

Maintainers: @sdan
Twitter handles: [@sdand](https://x.com/sdand)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-17 01:24:38 +00:00
Benito Geordie
57b226532d
community[minor]: Added integrations for ThirdAI's NeuralDB as a Retriever (#17334)
**Description:** Adds ThirdAI NeuralDB retriever integration. NeuralDB
is a CPU-friendly and fine-tunable text retrieval engine. We previously
added a vector store integration but we think that it will be easier for
our customers if they can also find us under under
langchain-community/retrievers.

---------

Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
2024-04-16 16:36:55 -07:00
Dhruv Chawla
d6d559d50d
community[minor]: add UpTrainCallbackHandler (#19956)
- **Description:** 
This PR adds a callback handler for UpTrain. It performs evaluations in
the RAG pipeline to check the quality of retrieved documents, generated
queries and responses.

- **Dependencies:** 
    - The UpTrainCallbackHandler requires the uptrain package

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-04-16 19:32:03 +00:00
Ravindu Somawansa
5acc7ba622
community[minor]: Add glue catalog loader (#20220)
Add Glue Catalog loader
2024-04-16 11:39:23 -04:00
Juan Carlos José Camacho
450c458f8f
community[minor]: Add Datahareld tool (#19680)
**Description:** Integrate [dataherald](https://www.dataherald.com)
tool, It is a natural language-to-SQL tool.
**Dependencies:** Install dataherald sdk to use it,
```
pip install dataherald
```

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
2024-04-13 23:27:16 +00:00
Egor Krasheninnikov
c8391d4ff1
community[patch]: Fix YandexGPT embeddings (#19720)
Fix of YandexGPT embeddings. 

The current version uses a single `model_name` for queries and
documents, essentially making the `embed_documents` and `embed_query`
methods the same. Yandex has a different endpoint (`model_uri`) for
encoding documents, see
[this](https://yandex.cloud/en/docs/yandexgpt/concepts/embeddings). The
bug may impact retrievers built with `YandexGPTEmbeddings` (for instance
FAISS database as retriever) since they use both `embed_documents` and
`embed_query`.

A simple snippet to test the behaviour:
```python
from langchain_community.embeddings.yandex import YandexGPTEmbeddings
embeddings = YandexGPTEmbeddings()
q_emb = embeddings.embed_query('hello world')
doc_emb = embeddings.embed_documents(['hello world', 'hello world'])
q_emb == doc_emb[0]
```
The response is `True` with the current version and `False` with the
changes I made.


Twitter: @egor_krash

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-13 16:23:01 -07:00
ccurme
38faa74c23
community[patch]: update use of deprecated llm methods (#20393)
.predict and .predict_messages for BaseLanguageModel and BaseChatModel
2024-04-12 17:28:23 -04:00
Corey Zumar
3a068b26f3
community[patch]: Databricks - fix scope of dangerous deserialization error in Databricks LLM connector (#20368)
fix scope of dangerous deserialization error in Databricks LLM connector

---------

Signed-off-by: dbczumar <corey.zumar@databricks.com>
2024-04-12 17:27:26 -04:00
Nicolas
ad04585e30
community[minor]: Firecrawl.dev integration (#20364)
Added the [FireCrawl](https://firecrawl.dev) document loader. Firecrawl
crawls and convert any website into LLM-ready data. It crawls all
accessible subpages and give you clean markdown for each.

    - **Description:** Adds FireCrawl data loader
    - **Dependencies:** firecrawl-py
    - **Twitter handle:** @mendableai 

ccing contributors: (@ericciarla @nickscamara)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-04-12 19:13:48 +00:00