Commit Graph

290 Commits

Author SHA1 Message Date
Dev 2049
5101fa1f97 nit 2023-04-19 19:30:26 -07:00
Dev 2049
92ddfc4385 fix 2023-04-19 19:21:56 -07:00
Dev 2049
68dd33f849 cr 2023-04-19 19:09:40 -07:00
Dev 2049
40b192a571 rename 2023-04-19 18:40:02 -07:00
Dev 2049
5ef902a077 cr 2023-04-19 16:07:43 -07:00
Dev 2049
b86aedd07b rename 2023-04-19 12:23:12 -07:00
Dev 2049
61f1177721 Merge branch 'master' into dev2049/contextual_compression 2023-04-19 12:08:17 -07:00
Zander Chase
90ef705ced Update Tool Input (#3103)
- Remove dynamic model creation in the `args()` property. _Only infer
for the decorator (and add an argument to NOT infer if someone wishes to
only pass as a string)_
- Update the validation example to make it less likely to be
misinterpreted as a "safe" way to run a repl


There is one example of "Multi-argument tools" in the custom_tools.ipynb
from yesterday, but we could add more. The output parsing for the base
MRKL agent hasn't been adapted to handle structured args at this point
in time

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-18 18:18:33 -07:00
Dev 2049
ae1409cac2 add example nb 2023-04-18 11:37:23 -07:00
Dev 2049
48fef55be4 unit test 2023-04-18 10:43:40 -07:00
Dev 2049
2bdb06051e cosine unit test 2023-04-18 10:17:16 -07:00
Dev 2049
4219c854f6 Merge branch 'master' into dev2049/contextual_compression 2023-04-18 09:47:20 -07:00
Harrison Chase
aad0a498ac Harrison/output error (#3094)
Co-authored-by: yummydum <sumita@nowcast.co.jp>
2023-04-18 08:59:56 -07:00
Dev 2049
9750409966 private 2023-04-17 22:21:51 -07:00
Dev 2049
3fb67b2707 cr 2023-04-17 22:13:47 -07:00
Harrison Chase
db968284f8 tools refactor (#2961)
Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>
2023-04-17 21:35:29 -07:00
Dev 2049
00e9d2ff0d cr 2023-04-17 21:31:52 -07:00
engkheng
19febc77d6 Support inference of input_variables from jinja2 template (#3013)
`langchain.prompts.PromptTemplate` is unable to infer `input_variables`
from jinja2 template.

```python
# Using langchain v0.0.141
template_string = """\
Hello world
Your variable: {{ var }}
{# This will not get rendered #}

{% if verbose %}
Congrats! You just turned on verbose mode and got extra messages!
{% endif %}
"""

template = PromptTemplate.from_template(template_string, template_format="jinja2")
print(template.input_variables) # Output ['# This will not get rendered #', '% endif %', '% if verbose %']
```

---------

Co-authored-by: engkheng <ongengkheng929@example.com>
2023-04-17 20:31:03 -07:00
Nuno Campos
dac32c59e5 Nc/combining output parser (#3014)
Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>
2023-04-17 20:29:53 -07:00
Davis Chase
19c85aa990 Factor out doc formatting and add validation (#3026)
@cnhhoang850 slightly more generic fix for #2944, works for whatever the
expected metadata keys are not just `source`
2023-04-17 20:28:01 -07:00
Naveen Tatikonda
3453b7457c OpenSearch: Add Support for Boolean Filter with ANN search (#3038)
### Description
Add Support for Boolean Filter with ANN search
Documentation -
https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#boolean-filter-with-ann-search

### Issues Resolved
https://github.com/hwchase17/langchain/issues/2924

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-04-17 20:26:26 -07:00
Harrison Chase
afd3e70ae5 Harrison/confluent loader (#2994)
Co-authored-by: Justin Flick <Justinjayflick@gmail.com>
2023-04-17 20:23:45 -07:00
vowelparrot
99c0382209 Generative Characters (#2859)
Add a time-weighted memory retriever and a notebook that approximates a
Generative Agent from https://arxiv.org/pdf/2304.03442.pdf


The "daily plan" components are removed for now since they are less
useful without a virtual world, but the memory is an interesting
component to build off.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-16 21:41:00 -07:00
Jan Backes
a9310a3e8b Add Annoy as VectorStore (#2939)
Adds Annoy (https://github.com/spotify/annoy) as vector Store. 

RESOLVES hwchase17/langchain#2842

discord ref:
https://discord.com/channels/1038097195422978059/1051632794427723827/1096089994168377354

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>
2023-04-16 13:44:04 -07:00
Harrison Chase
e12e00df12 use output parsers in agents (#2987) 2023-04-16 13:15:21 -07:00
cs0lar
8b9e02da9d Fix/issue 1213 (#2932)
### Background

Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search` method.

### Changes

- a `max_marginal_relevance_search` method implementation has been added
in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests

### Test Plan

Added tests for the `max_marginal_relevance_search` implementation

### Change Safety

- [x] I have added tests to cover my changes
2023-04-16 13:11:30 -07:00
vowelparrot
5ca7ce77cd Remove pythonrepl from LLM-MathChain (#2943)
Use numexpr evaluate instead of the python REPL to avoid malicious code
injection.

Tested against the (limited) math dataset and got the same score as
before.

For more permissive tools (like the REPL tool itself), other approaches
ought to be provided (some combination of Sanitizer + Restricted python
+ unprivileged-docker + ...), but for a calculator tool, only
mathematical expressions should be permitted.

See https://github.com/hwchase17/langchain/issues/814
2023-04-16 08:50:32 -07:00
vowelparrot
4ffc58e07b Add similarity_search_with_normalized_similarities (#2916)
Add a method that exposes a similarity search with corresponding
normalized similarity scores. Implement only for FAISS now.

### Motivation:

Some memory definitions combine `relevance` with other scores, like
recency , importance, etc.

While many (but not all) of the `VectorStore`'s expose a
`similarity_search_with_score` method, they don't all interpret the
units of that score (depends on the distance metric and whether or not
the the embeddings are normalized).

This PR proposes a `similarity_search_with_normalized_similarities`
method that lets consumers of the vector store not have to worry about
the metric and embedding scale.

*Most providers default to euclidean distance, with Pinecone being one
exception (defaults to cosine _similarity_).*

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-15 21:06:08 -07:00
dev2049
36aa7f30e4 Move PythonRepl -> langchain.utilities (#2917) 2023-04-15 10:50:25 -07:00
Davit Buniatyan
b3a5b51728 [minor] Deep Lake auth improvements in docs, kwargs pass, faster tests (#2927)
Minor cosmetic changes 
- Activeloop environment cred authentication in notebooks with
`getpass.getpass` (instead of CLI which not always works)
- much faster tests with Deep Lake pytest mode on 
- Deep Lake kwargs pass

Notes
- I put pytest environment creds inside `vectorstores/conftest.py`, but
feel free to suggest a better location. For context, if I put in
`test_deeplake.py`, `ruff` doesn't let me to set them before import
deeplake

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-04-15 10:49:16 -07:00
Ankush Gola
ec59e9d886 Fix ChatAnthropic stop_sequences error (#2919) (#2920)
Note to self: Always run integration tests, even on "that last minute
change you thought would be safe" :)

---------

Co-authored-by: Mike Lambert <mike.lambert@anthropic.com>
2023-04-14 17:22:01 -07:00
Mike Lambert
392f1b3218 Add Anthropic ChatModel to langchain (#2293)
* Adds an Anthropic ChatModel
* Factors out common code in our LLMModel and ChatModel
* Supports streaming llm-tokens to the callbacks on a delta basis (until
a future V2 API does that for us)
* Some fixes
2023-04-14 15:09:07 -07:00
Harrison Chase
1e9378d0a8 Harrison/weaviate fixes (#2872)
Co-authored-by: cs0lar <cristiano.solarino@gmail.com>
Co-authored-by: cs0lar <cristiano.solarino@brightminded.com>
2023-04-13 22:37:34 -07:00
Harrison Chase
705596b46a Harrison/fix create sql agent (#2870)
Co-authored-by: Timothé Pearce <timothe.pearce@gmail.com>
2023-04-13 22:07:58 -07:00
sergerdn
04c458a270 feat: improve pinecone tests (#2806)
Improve the integration tests for Pinecone by adding an `.env.example`
file for local testing. Additionally, add some dev dependencies
specifically for integration tests.

This change also helps me understand how Pinecone deals with certain
things, see related issues
https://github.com/hwchase17/langchain/issues/2484
https://github.com/hwchase17/langchain/issues/2816
2023-04-13 21:49:31 -07:00
vowelparrot
bf0887c486 Add Slack Directory Loader (#2841)
Fixes linting issue from #2835 

Adds a loader for Slack Exports which can be a very valuable source of
knowledge to use for internal QA bots and other use cases.

```py
# Export data from your Slack Workspace first.
from langchain.document_loaders import SLackDirectoryLoader

SLACK_WORKSPACE_URL = "https://awesome.slack.com"

loader = ("Slack_Exports", SLACK_WORKSPACE_URL)
docs = loader.load()
```
2023-04-13 21:31:59 -07:00
Tim Asp
be4fb24b32 OpenAI LLM: update modelname_to_contextsize with new models (#2843)
Token counts pulled from https://openai.com/pricing
2023-04-13 11:13:34 -07:00
KullTC
802363eb6a Remove print statement from test (#2809)
Remove unnecessary print statement.
2023-04-13 09:31:48 -07:00
KullTC
64596b23b9 Return output of PythonAstREPLTool when falling back to exec() (#2780)
When the code ran by the PythonAstREPLTool contains multiple statements
it will fallback to exec() instead of using eval(). With this change, it
will also return the output of the code in the same way the
PythonREPLTool will.
2023-04-12 21:22:46 -07:00
Harrison Chase
e49f1e628c Harrison/gpt cache (#2744)
Co-authored-by: SimFG <bang.fu@zilliz.com>
2023-04-12 14:16:58 -07:00
Joshua Snyder
59d054308c Add type inference for output parsers (#2769)
Currently, the output type of a number of OutputParser's `parse` methods
is `Any` when it can in fact be inferred.

This PR makes BaseOutputParser use a generic type and fixes the output
types of the following parsers:
- `PydanticOutputParser`
- `OutputFixingParser`
- `RetryOutputParser`
- `RetryWithErrorOutputParser`

The output of the `StructuredOutputParser` is corrected from `BaseModel`
to `Any` since there are no type guarantees provided by the parser.

Fixes issue #2715
2023-04-12 09:12:20 -07:00
Johnny Lee
0ab364404e add continue to fix 'continue_on_failure' parameter for URL doc loader (#2735)
Currently, the function still fails if `continue_on_failure` is set to
True, because `elements` is not set.

---------

Co-authored-by: leecjohnny <johnny-lee1255@users.noreply.github.com>
2023-04-11 21:12:39 -07:00
sergerdn
4bdcedab54 fix: some imports for integration tests (#2612)
Add more missed imports for integration tests. Bump `pytest` to the
current latest version.
Fix `tests/integration_tests/vectorstores/test_elasticsearch.py` to
update its cassette(easy fix).

Related PR: https://github.com/hwchase17/langchain/pull/2560
2023-04-11 20:45:36 -07:00
Ankush Gola
c1521ddbdb Add workaround for not having async vector store methods (#2733)
This allows us to use the async API for the Retrieval chains, though it is not guaranteed to be thread safe.
2023-04-11 18:49:08 -07:00
vowelparrot
709f26b69e Added bilibili loader (#2673) (#2724)
I've added a bilibili loader, bilibili is a very active video site in
China and I think we need this loader.

Example:
```python
from langchain.document_loaders.bilibili import BiliBiliLoader

loader = BiliBiliLoader(
       ["https://www.bilibili.com/video/BV1xt411o7Xu/",
       "https://www.bilibili.com/video/av330407025/"]
)
docs = loader.load()
```

Co-authored-by: 了空 <568250549@qq.com>
2023-04-11 10:40:32 -07:00
Abhik Singla
955bd2e1db Fixed Ast Python Repl for Chatgpt multiline commands (#2406)
Resolves issue https://github.com/hwchase17/langchain/issues/2252

---------

Co-authored-by: Abhik Singla <abhiksingla@microsoft.com>
2023-04-10 21:25:03 -07:00
Naveen Tatikonda
4364d3316e Add custom vector fields and text fields for OpenSearch (#2652)
**Description**
Add custom vector field name and text field name while indexing and
querying for OpenSearch

**Issues**
https://github.com/hwchase17/langchain/issues/2500

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-04-10 21:02:02 -07:00
Ankush Gola
b82cbd1be0 Use run and arun in place of combine_docs and acombine_docs (#2635)
`combine_docs` does not go through the standard chain call path which
means that chain callbacks won't be triggered, meaning QA chains won't
be traced properly, this fixes that.

Also fix several errors in the chat_vector_db notebook
2023-04-09 18:47:59 -07:00
Chetanya Rastogi
50c511d75f Add new loader to load pdf as html content (#2607)
Adds a new pdf loader using the existing dependency on PDFMiner. 

The new loader can be helpful for chunking texts semantically into
sections as the output html content can be parsed via `BeautifulSoup` to
get more structured and rich information about font size, page numbers,
pdf headers/footers, etc. which may not be available otherwise with
other pdf loaders
2023-04-09 17:57:25 -07:00
sergerdn
cd9336469e fix: missed deps integrations tests (#2560)
Almost all integration tests have failed, but we haven't encountered any
import errors yet. Some tests failed due to lazy import issues. It
doesn't seem like a problem to resolve some of these errors in the next
PR.
I have a headache from resolving conflicts with `deeplake` and `boto3`,
so I will temporarily comment out `boto3`.


fix https://github.com/hwchase17/langchain/issues/2426
2023-04-07 20:43:53 -07:00