Commit Graph

1514 Commits

Author SHA1 Message Date
Eduard van Valkenburg
2966b50608 Azure CosmosDB memory (#3434)
Still needs docs, otherwise works.
2023-04-28 10:11:03 -07:00
Lucas Vieira
6bfd97b24d Support GCS Objects with / in GCS Loaders (#3356)
So, this is basically fixing the same things as #1517 but for GCS.

### Problem
When loading GCS Objects with `/` in the object key (eg.
folder/some-document.txt) using `GCSFileLoader`, the objects are
downloaded into a temporary directory and saved as a file.

This errors out when the parent directory does not exist within the
temporary directory.

### What this pr does
Creates parent directories based on object key.

This also works with deeply nested keys:
folder/subfolder/some-document.txt
2023-04-28 10:11:03 -07:00
Mindaugas Sharskus
e69208f362 [Fix #3365]: Changed regex to cover new line before action serious (#3367)
Fix for: [Changed regex to cover new line before action
serious.](https://github.com/hwchase17/langchain/issues/3365)
---

This PR fixes the issue where `ValueError: Could not parse LLM output:`
was thrown on seems to be valid input.

Changed regex to cover new lines before action serious (after the
keywords "Action:" and "Action Input:").

regex101: https://regex101.com/r/CXl1kB/1

---------

Co-authored-by: msarskus <msarskus@cisco.com>
2023-04-28 10:11:03 -07:00
Maxwell Mullin
22ad33a8e1 GuessedAtParserWarning from RTD document loader documentation example (#3397)
Addresses #3396 by adding 

`features='html.parser'` in example
2023-04-28 10:11:03 -07:00
engkheng
e44317878b Improve llm_chain.ipynb and getting_started.ipynb for chains docs (#3380)
My attempt at improving the `Chain`'s `Getting Started` docs and
`LLMChain` docs. Might need some proof-reading as English is not my
first language.

In LLM examples, I replaced the example use case when a simpler one
(shorter LLM output) to reduce cognitive load.
2023-04-28 10:11:03 -07:00
Zander Chase
2402bb5b57 Add retry logic for ChromaDB (#3372)
Rewrite of #3368

Mainly an issue for when people are just getting started, but still nice
to not throw an error if the number of docs is < k.

Add a little decorator utility to block mutually exclusive keyword
arguments
2023-04-28 10:11:03 -07:00
tkarper
a61aa37010 Add Databutton to list of Deployment options (#3364) 2023-04-28 10:11:03 -07:00
jrhe
6cefec4c65 Adds progress bar using tqdm to directory_loader (#3349)
Approach copied from `WebBaseLoader`. Assumes the user doesn't have
`tqdm` installed.
2023-04-28 10:11:03 -07:00
killpanda
5c8d6fa791 bug_fixes: use md5 instead of uuid id generation (#3442)
At present, the method of generating `point` in qdrant is to use random
`uuid`. The problem with this approach is that even documents with the
same content will be inserted repeatedly instead of updated. Using `md5`
as the `ID` of `point` to insert text can achieve true `update or
insert`.

Co-authored-by: mayue <mayue05@qiyi.com>
2023-04-28 10:11:03 -07:00
Jon Luo
4917c71695 Support SQLAlchemy 2.0 (#3310)
With https://github.com/executablebooks/jupyter-cache/pull/93 merged and
`MyST-NB` updated, we can now support SQLAlchemy 2. Closes #1766
2023-04-28 10:11:03 -07:00
engkheng
ca98b3e519 Update Getting Started page of Prompt Templates (#3298)
Updated `Getting Started` page of `Prompt Templates` to showcase more
features provided by the class. Might need some proof reading because
apparently English is not my first language.
2023-04-28 10:11:03 -07:00
Hasan Patel
fae3eb7223 Updated Readme.md (#3477)
Corrected some minor grammar issues, changed infra to infrastructure for
more clarity. Improved readability
2023-04-28 10:11:03 -07:00
Davis Chase
6544d2bc6f fix #3884 (#3475)
fixes mar bug #3384
2023-04-28 10:11:03 -07:00
Prakhar Agarwal
75c097dbdf pass list of strings to embed method in tf_hub (#3284)
This fixes the below mentioned issue. Instead of simply passing the text
to `tensorflow_hub`, we convert it to a list and then pass it.
https://github.com/hwchase17/langchain/issues/3282

Co-authored-by: Prakhar Agarwal <i.prakhar-agarwal@devrev.ai>
2023-04-28 10:11:03 -07:00
Beau Horenberger
6214b15d53 add LoRA loading for the LlamaCpp LLM (#3363)
First PR, let me know if this needs anything like unit tests,
reformatting, etc. Seemed pretty straightforward to implement. Only
hitch was that mmap needs to be disabled when loading LoRAs or else you
segfault.
2023-04-28 10:11:03 -07:00
Ehsan M. Kermani
a21fc19d91 Use a consistent poetry version everywhere (#3250)
Fixes the discrepancy of poetry version in Dockerfile and the GAs
2023-04-28 10:11:03 -07:00
Felipe Lopes
385d9271eb feat: add private weaviate api_key support on from_texts (#3139)
This PR adds support for providing a Weaviate API Key to the VectorStore
methods `from_documents` and `from_texts`. With this addition, users can
authenticate to Weaviate and make requests to private Weaviate servers
when using these methods.

## Motivation
Currently, LangChain's VectorStore methods do not provide a way to
authenticate to Weaviate. This limits the functionality of the library
and makes it more difficult for users to take advantage of Weaviate's
features.

This PR addresses this issue by adding support for providing a Weaviate
API Key as extra parameter used in the `from_texts` method.

## Contributing Guidelines
I have read the [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md)
and the PR code passes the following tests:

- [x] make format
- [x] make lint
- [x] make coverage
- [x] make test
2023-04-28 10:11:03 -07:00
Zzz233
0ed4b1050f ES similarity_search_with_score() and metadata filter (#3046)
Add similarity_search_with_score() to ElasticVectorSearch, add metadata
filter to both similarity_search() and similarity_search_with_score()
2023-04-28 10:11:03 -07:00
Zander Chase
312cb3fd88 Vwp/alpaca streaming (#3468)
Co-authored-by: Luke Stanley <306671+lukestanley@users.noreply.github.com>
2023-04-28 10:11:03 -07:00
Cao Hoang
38a958eb30 remove default usage of openai model in SQLDatabaseToolkit (#2884)
#2866

This toolkit used openai LLM as the default, which could incurr unwanted
cost.
2023-04-28 10:11:03 -07:00
Harrison Chase
0471854072 show how to use memory in convo chain (#3463) 2023-04-28 10:11:03 -07:00
leo-gan
1b3bf86486 added integration links to the ecosystem.rst (#3453)
Now it is hard to search for the integration points between
data_loaders, retrievers, tools, etc.
I've placed links to all groups of providers and integrations on the
`ecosystem` page.
So, it is easy to navigate between all integrations from a single
location.
2023-04-28 10:11:03 -07:00
Davis Chase
6100ad65b1 Bugfix: Not all combine docs chains takes kwargs prompt (#3462)
Generalize ConversationalRetrievalChain.from_llm kwargs

---------

Co-authored-by: shubham.suneja <shubham.suneja>
2023-04-28 10:11:03 -07:00
cs0lar
ae6bda90fc fixes #1214 (#3003)
### Background

Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search_by_vector` method.

### Changes

- a `max_marginal_relevance_search_by_vector` method implementation has
been added in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests

### Test Plan

Added tests for the `max_marginal_relevance_search_by_vector`
implementation

### Change Safety

- [x] I have added tests to cover my changes
2023-04-28 10:11:03 -07:00
Zander Chase
f44e275e1e LM Requests Wrapper (#3457)
Co-authored-by: jnmarti <88381891+jnmarti@users.noreply.github.com>
2023-04-28 10:11:03 -07:00
Harrison Chase
a83c4a7711 bump version to 148 (#3458) 2023-04-28 10:11:03 -07:00
Harrison Chase
17cbc6a5dd update notebook 2023-04-28 10:11:03 -07:00
mbchang
95fbd29353 add meta-prompt to autonomous agents use cases (#3254)
An implementation of
[meta-prompt](https://noahgoodman.substack.com/p/meta-prompt-a-simple-self-improving),
where the agent modifies its own instructions across episodes with a
user.

![figure](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F468217b9-96d9-47c0-a08b-dbf6b21b9f49_492x384.png)
2023-04-28 10:11:03 -07:00
yunfeilu92
14a2599bd2 propogate kwargs to cls in OpenSearchVectorSearch (#3416)
kwargs shoud be passed into cls so that opensearch client can be
properly initlized in __init__(). Otherwise logic like below will not
work. as auth will not be passed into __init__

```python
docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url="http://localhost:9200")

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)
```

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-28-97.ec2.internal>
2023-04-28 10:11:03 -07:00
Eduard van Valkenburg
f17c7bbe83 small constructor change and updated notebook (#3426)
small change in the pydantic definitions, same api. 

updated notebook with right constructure and added few shot example
2023-04-28 10:11:03 -07:00
Zander Chase
b253b0b0d9 Structured Tool Bugfixes (#3324)
- Proactively raise error if a tool subclasses BaseTool, defines its
own schema, but fails to add the type-hints
- fix the auto-inferred schema of the decorator to strip the
unneeded virtual kwargs from the schema dict

Helps avoid silent instances of #3297
2023-04-28 10:11:03 -07:00
Bilal Mahmoud
a588e5a311 Do not await sync callback managers (#3440)
This fixes a bug in the math LLM, where even the sync manager was
awaited, creating a nasty `RuntimeError`
2023-04-28 10:11:03 -07:00
Dianliang233
b23e1de43b Fix NoneType has no len() in DDG tool (#3334)
Per
46ac914daa/duckduckgo_search/ddg.py (L109),
ddg function actually returns None when there is no result.
2023-04-28 10:11:03 -07:00
Davit Buniatyan
b72d9c9d77 Deep Lake mini upgrades (#3375)
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3

Notes
* please double check if poetry is not messed up (thanks!)

Asks
* Would be great to create a shared slack channel for quick questions

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-04-28 10:11:03 -07:00
Haste171
4a08ffc2e0 Update unstructured_file.ipynb (#3377)
Fix typo in docs
2023-04-28 10:11:03 -07:00
张城铭
31ee671894 Optimize code (#3412)
Co-authored-by: assert <zhangchengming@kkguan.com>
2023-04-28 10:11:03 -07:00
Zander Chase
e5f184c7ba Catch all exceptions in autogpt (#3413)
Ought to be more autonomous
2023-04-28 10:11:03 -07:00
Zander Chase
6d07bafda5 Move Generative Agent definition to Experimental (#3245)
Extending @BeautyyuYanli 's #3220 to move from the notebook

---------

Co-authored-by: BeautyyuYanli <beautyyuyanli@gmail.com>
2023-04-28 10:11:03 -07:00
Zander Chase
eb47767e9e Add Sentence Transformers Embeddings (#3409)
Add embeddings based on the sentence transformers library.
Add a notebook and integration tests.

Co-authored-by: khimaros <me@khimaros.com>
2023-04-28 10:11:03 -07:00
Zander Chase
9f40c09c86 Update marathon notebook (#3408)
Fixes #3404
2023-04-28 10:11:03 -07:00
Luke Harris
b7dad1b6bf Several confluence loader improvements (#3300)
This PR addresses several improvements:

- Previously it was not possible to load spaces of more than 100 pages.
The `limit` was being used both as an overall page limit *and* as a per
request pagination limit. This, in combination with the fact that
atlassian seem to use a server-side hard limit of 100 when page content
is expanded, meant it wasn't possible to download >100 pages. Now
`limit` is used *only* as a per-request pagination limit and `max_pages`
is introduced as the way to limit the total number of pages returned by
the paginator.
- Document metadata now includes `source` (the source url), making it
compatible with `RetrievalQAWithSourcesChain`.
 - It is now possible to include inline and footer comments.
- It is now possible to pass `verify_ssl=False` and other parameters to
the confluence object for use cases that require it.
2023-04-28 10:11:03 -07:00
zz
e808444b79 Add support for wikipedia's lang parameter (#3383)
Allow to hange the language of the wikipedia API being requested.

Co-authored-by: zhuohui <zhuohui@datastory.com.cn>
2023-04-28 10:11:03 -07:00
Johann-Peter Hartmann
f400386865 Improve youtube loader (#3395)
Small improvements for the YouTube loader: 
a) use the YouTube API permission scope instead of Google Drive 
b) bugfix: allow transcript loading for single videos 
c) an additional parameter "continue_on_failure" for cases when videos
in a playlist do not have transcription enabled.
d) support automated translation for all languages, if available.

---------

Co-authored-by: Johann-Peter Hartmann <johann-peter.hartmann@mayflower.de>
2023-04-28 10:11:03 -07:00
Harrison Chase
f3ab7c2a9f Harrison/hf document loader (#3394)
Co-authored-by: Azam Iftikhar <azamiftikhar1000@gmail.com>
2023-04-28 10:11:03 -07:00
Hadi Curtay
4b071a69d1 Updated incorrect link to Weaviate notebook (#3362)
The detailed walkthrough of the Weaviate wrapper was pointing to the
getting-started notebook. Fixed it to point to the Weaviable notebook in
the examples folder.
2023-04-28 10:11:03 -07:00
Ismail Pelaseyed
0a7ca1014f Add example on deploying LangChain to Cloud Run (#3366)
## Summary

Adds a link to a minimal example of running LangChain on Google Cloud
Run.
2023-04-28 10:11:03 -07:00
Ivan Zatevakhin
326c2c2474 llamacpp wrong default value passed for f16_kv (#3320)
Fixes default f16_kv value in llamacpp; corrects incorrect parameter
passed.

See:
ba3959eafd/llama_cpp/llama.py (L33)

Fixes #3241
Fixes #3301
2023-04-28 10:11:03 -07:00
Harrison Chase
8feb416664 bump version to 147 (#3353) 2023-04-28 10:11:03 -07:00
Harrison Chase
4003a79b35 Harrison/myscale (#3352)
Co-authored-by: Fangrui Liu <fangruil@moqi.ai>
Co-authored-by: 刘 方瑞 <fangrui.liu@outlook.com>
Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>
2023-04-28 10:11:03 -07:00
Harrison Chase
994027771e Harrison/error hf (#3348)
Co-authored-by: Rui Melo <44201826+rufimelo99@users.noreply.github.com>
2023-04-28 10:11:03 -07:00