Compare commits

..

113 Commits

Author SHA1 Message Date
vowelparrot
9a77540fbe Merge branch 'kayvane1-align-search-tools' into vwp/align_search_tools 2023-04-26 15:04:23 -07:00
Kátia Nakamura
5763d26b9e Add docs for Fly.io deployment (#3584)
A minimal example of how to deploy LangChain to Fly.io using Flask.
2023-04-26 15:04:10 -07:00
Chirag Bhatia
54076f21b2 Fixed typo for HuggingFaceHub (#3612)
The current text has a typo. This PR contains the corrected spelling for
HuggingFaceHub
2023-04-26 15:04:10 -07:00
Charlie Holtz
f3d727147a Fix Replicate llm response to handle iterator / multiple outputs (#3614)
One of our users noticed a bug when calling streaming models. This is
because those models return an iterator. So, I've updated the Replicate
`_call` code to join together the output. The other advantage of this
fix is that if you requested multiple outputs you would get them all –
previously I was just returning output[0].

I also adjusted the demo docs to use dolly, because we're featuring that
model right now and it's always hot, so people won't have to wait for
the model to boot up.

The error that this fixes:
```
> llm = Replicate(model=“replicate/flan-t5-xl:eec2f71c986dfa3b7a5d842d22e1130550f015720966bec48beaae059b19ef4c”)
>  llm(“hello”)
> Traceback (most recent call last):
  File "/Users/charlieholtz/workspace/dev/python/main.py", line 15, in <module>
    print(llm(prompt))
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/base.py", line 246, in __call__
    return self.generate([prompt], stop=stop).generations[0][0].text
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/base.py", line 140, in generate
    raise e
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/base.py", line 137, in generate
    output = self._generate(prompts, stop=stop)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/base.py", line 324, in _generate
    text = self._call(prompt, stop=stop)
  File "/opt/homebrew/lib/python3.10/site-packages/langchain/llms/replicate.py", line 108, in _call
    return outputs[0]
TypeError: 'generator' object is not subscriptable
```
2023-04-26 15:04:10 -07:00
Harrison Chase
c825bd45d8 bump ver 150 (#3599) 2023-04-26 15:04:10 -07:00
Chirag Bhatia
3b10dabe4d Fix broken Cerebrium link in documentation (#3554)
The current hyperlink has a typo. This PR contains the corrected
hyperlink to Cerebrium docs
2023-04-26 15:04:10 -07:00
Harrison Chase
21f0719c9e Harrison/plugnplai (#3573)
Co-authored-by: Eduardo Reis <edu.pontes@gmail.com>
2023-04-26 15:04:10 -07:00
Zander Chase
0094879504 Confluence beautifulsoup (#3576)
Co-authored-by: Theau Heral <theau.heral@ln.email.gs.com>
2023-04-26 15:04:10 -07:00
Mike Wang
396a4b0458 [simple] updated annotation in load_tools.py (#3544)
- added a few missing annotation for complex local variables.
- auto formatted.
- I also went through all other files in agent directory. no seeing any
other missing piece. (there are several prompt strings not annotated,
but I think it’s trivial. Also adding annotation will make it harder to
read in terms of indents.) Anyway, I think this is the last PR in
agent/annotation.
2023-04-26 15:04:10 -07:00
Zander Chase
b76f8cd252 Sentence Transformers Aliasing (#3541)
The sentence transformers was a dup of the HF one. 

This is a breaking change (model_name vs. model) for anyone using
`SentenceTransformerEmbeddings(model="some/nondefault/model")`, but
since it was landed only this week it seems better to do this now rather
than doing a wrapper.
2023-04-26 15:04:10 -07:00
Eric Peter
2b7d51706e Fix docs error for google drive loader (#3574) 2023-04-26 15:04:10 -07:00
CG80499
0e7e1e66f9 Add ReAct eval chain (#3161)
- Adds GPT-4 eval chain for arbitrary agents using any set of tools
- Adds notebook

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-26 15:04:10 -07:00
mbchang
af302d99f0 example: multi player dnd (#3560)
This notebook shows how the DialogueAgent and DialogueSimulator class
make it easy to extend the [Two-Player Dungeons & Dragons
example](https://python.langchain.com/en/latest/use_cases/agent_simulations/two_player_dnd.html)
to multiple players.

The main difference between simulating two players and multiple players
is in revising the schedule for when each agent speaks

To this end, we augment DialogueSimulator to take in a custom function
that determines the schedule of which agent speaks. In the example
below, each character speaks in round-robin fashion, with the
storyteller interleaved between each player.
2023-04-26 15:04:10 -07:00
James Brotchie
1c2e2e93c8 Strip surrounding quotes from requests tool URLs. (#3563)
Often an LLM will output a requests tool input argument surrounded by
single quotes. This triggers an exception in the requests library. Here,
we add a simple clean url function that strips any leading and trailing
single and double quotes before passing the URL to the underlying
requests library.

Co-authored-by: James Brotchie <brotchie@google.com>
2023-04-26 15:04:10 -07:00
Harrison Chase
0e06e6e34a add feast nb (#3565) 2023-04-26 15:04:10 -07:00
Harrison Chase
37b819cfa5 Harrison/streamlit handler (#3564)
Co-authored-by: kurupapi <37198601+kurupapi@users.noreply.github.com>
2023-04-26 15:04:10 -07:00
Filip Michalsky
ec00fc71a8 Notebook example: Context-Aware AI Sales Agent (#3547)
I would like to contribute with a jupyter notebook example
implementation of an AI Sales Agent using `langchain`.

The bot understands the conversation stage (you can define your own
stages fitting your needs)
using two chains:

1. StageAnalyzerChain - takes context and LLM decides what part of sales
conversation is one in
2. SalesConversationChain - generate next message

Schema:

https://images-genai.s3.us-east-1.amazonaws.com/architecture2.png

my original repo: https://github.com/filip-michalsky/SalesGPT

This example creates a sales person named Ted Lasso who is trying to
sell you mattresses.

Happy to update based on your feedback.

Thanks, Filip
https://twitter.com/FilipMichalsky
2023-04-26 15:04:10 -07:00
Harrison Chase
ceec14f1bf anthropic docs: deprecated LLM, add chat model (#3549) 2023-04-26 15:04:10 -07:00
mbchang
4aa03b3e01 docs: simplification of two agent d&d simulation (#3550)
Simplifies the [Two Agent
D&D](https://python.langchain.com/en/latest/use_cases/agent_simulations/two_player_dnd.html)
example with a cleaner, simpler interface that is extensible for
multiple agents.

`DialogueAgent`:
- `send()`: applies the chatmodel to the message history and returns the
message string
- `receive(name, message)`: adds the `message` spoken by `name` to
message history

The `DialogueSimulator` class takes a list of agents. At each step, it
performs the following:
1. Select the next speaker
2. Calls the next speaker to send a message 
3. Broadcasts the message to all other agents
4. Update the step counter.
The selection of the next speaker can be implemented as any function,
but in this case we simply loop through the agents.
2023-04-26 15:04:10 -07:00
apurvsibal
7e6097964e Update Alchemy Key URL (#3559)
Update Alchemy Key URL in Blockchain Document Loader. I want to say
thank you for the incredible work the LangChain library creators have
done.

I am amazed at how seamlessly the Loader integrates with Ethereum
Mainnet, Ethereum Testnet, Polygon Mainnet, and Polygon Testnet, and I
am excited to see how this technology can be extended in the future.

@hwchase17 - Please let me know if I can improve or if I have missed any
community guidelines in making the edit? Thank you again for your hard
work and dedication to the open source community.
2023-04-26 15:04:10 -07:00
Tiago De Gaspari
5104f9b08c Fix agents' notebooks outputs (#3517)
Fix agents' notebooks to make the answer reflect what is being asked by
the user.
2023-04-26 15:04:10 -07:00
engkheng
871c295b4c Fix typo in Prompts Templates Getting Started page (#3514)
`from_templates` -> `from_template`
2023-04-26 15:04:10 -07:00
Vincent
07627b57ec adding add_documents and aadd_documents to class RedisVectorStoreRetriever (#3419)
Ran into this issue In vectorstores/redis.py when trying to use the
AutoGPT agent with redis vector store. The error I received was

`
langchain/experimental/autonomous_agents/autogpt/agent.py", line 134, in
run
    self.memory.add_documents([Document(page_content=memory_to_add)])
AttributeError: 'RedisVectorStoreRetriever' object has no attribute
'add_documents'
`

Added the needed function to the class RedisVectorStoreRetriever which
did not have the functionality like the base VectorStoreRetriever in
vectorstores/base.py that, for example, vectorstores/faiss.py has
2023-04-26 15:04:10 -07:00
Davis Chase
6f514361be Add Anthropic default request timeout (#3540)
thanks @hitflame!

---------

Co-authored-by: Wenqiang Zhao <hitzhaowenqiang@sina.com>
Co-authored-by: delta@com <delta@com>
2023-04-26 15:04:10 -07:00
Zander Chase
b7f4a410a3 Change Chain Docs (#3537)
Co-authored-by: engkheng <60956360+outday29@users.noreply.github.com>
2023-04-26 15:04:10 -07:00
Ikko Eltociear Ashimine
eb12242495 fix typo in comet_tracking.ipynb (#3505)
intializing -> initializing
2023-04-26 15:04:10 -07:00
Zander Chase
aa1c3df5cf Add DDG to load_tools (#3535)
Fix linting

---------

Co-authored-by: Mike Wang <62768671+skcoirz@users.noreply.github.com>
2023-04-26 15:04:10 -07:00
Roma
f7af565510 Add unit test for _merge_splits function (#3513)
This commit adds a new unit test for the _merge_splits function in the
text splitter. The new test verifies that the function merges text into
chunks of the correct size and overlap, using a specified separator. The
test passes on the current implementation of the function.
2023-04-26 15:04:10 -07:00
Sami Liedes
3ec77607dc Pandas agent: Pass forward callback manager (#3518)
The Pandas agent fails to pass callback_manager forward, making it
impossible to use custom callbacks with it. Fix that.

Co-authored-by: Sami Liedes <sami.liedes@rocket-science.ch>
2023-04-26 15:04:10 -07:00
mbchang
597e87abac Docs: fix naming typo (#3532) 2023-04-26 15:04:10 -07:00
Harrison Chase
dfdb8279a6 bump version to 149 (#3530) 2023-04-26 15:04:10 -07:00
mbchang
1999294349 docs: two_player_dnd docs (#3528) 2023-04-26 15:04:10 -07:00
yakigac
6732ef9d35 Add a test for cosmos db memory (#3525)
Test for #3434 @eavanvalkenburg 
Initially, I was unaware and had submitted a pull request #3450 for the
same purpose, but I have now repurposed the one I used for that. And it
worked.
2023-04-26 15:04:10 -07:00
leo-gan
2ba18a0096 improved arxiv (#3495)
Improved `arxiv/tool.py` by adding more specific information to the
`description`. It would help with selecting `arxiv` tool between other
tools.
Improved `arxiv.ipynb` with more useful descriptions.
2023-04-26 15:04:10 -07:00
mbchang
48997b35c9 doc: add two player D&D game (#3476)
In this notebook, we show how we can use concepts from
[CAMEL](https://www.camel-ai.org/) to simulate a role-playing game with
a protagonist and a dungeon master. To simulate this game, we create a
`TwoAgentSimulator` class that coordinates the dialogue between the two
agents.
2023-04-26 15:04:10 -07:00
Harrison Chase
9d7cfbcfcc Harrison/blockchain docloader (#3491)
Co-authored-by: Jon Saginaw <saginawj@users.noreply.github.com>
2023-04-26 15:04:10 -07:00
Harrison Chase
8fb767b8c6 Updated missing refactor in docs "return_map_steps" (#2956) (#3469)
Minor rename in the documentation that was overlooked when refactoring.

---------

Co-authored-by: Ehmad Zubair <ehmad@cogentlabs.co>
2023-04-26 15:04:10 -07:00
Harrison Chase
69db22be32 Harrison/prediction guard (#3490)
Co-authored-by: Daniel Whitenack <whitenack.daniel@gmail.com>
2023-04-26 15:04:10 -07:00
Harrison Chase
5f0248f0fb Harrison/tfidf parameters (#3481)
Co-authored-by: pao <go5kuramubon@gmail.com>
Co-authored-by: KyoHattori <kyo.hattori@abejainc.com>
2023-04-26 15:04:10 -07:00
Harrison Chase
580f1b2a48 openai embeddings (#3488) 2023-04-26 15:04:10 -07:00
Harrison Chase
be794e0360 Harrison/chroma update (#3489)
Co-authored-by: vyeevani <30946190+vyeevani@users.noreply.github.com>
Co-authored-by: Vineeth Yeevani <vineeth.yeevani@gmail.com>
2023-04-26 15:04:10 -07:00
Sami Liedes
659e94fc9c langchain-server: Do not expose postgresql port to host (#3431)
Apart from being unnecessary, postgresql is run on its default port,
which means that the langchain-server will fail to start if there is
already a postgresql server running on the host. This is obviously less
than ideal.

(Yeah, I don't understand why "expose" is the syntax that does not
expose the ports to the host...)

Tested by running langchain-server and trying out debugging on a host
that already has postgresql bound to the port 5432.

Co-authored-by: Sami Liedes <sami.liedes@rocket-science.ch>
2023-04-26 15:04:10 -07:00
Harrison Chase
74a95629a3 Harrison/verbose conv ret (#3492)
Co-authored-by: makretch <max.kretchmer@gmail.com>
2023-04-26 15:04:09 -07:00
Harrison Chase
1781d611f8 Harrison/prompt prefix (#3496)
Co-authored-by: Ian <ArGregoryIan@gmail.com>
2023-04-26 15:04:09 -07:00
Harrison Chase
ca0dfd38f8 Harrison/weaviate (#3494)
Co-authored-by: Nick Rubell <nick@rubell.com>
2023-04-26 15:04:09 -07:00
Eduard van Valkenburg
4630916e8c Azure CosmosDB memory (#3434)
Still needs docs, otherwise works.
2023-04-26 15:04:09 -07:00
Lucas Vieira
57a6982007 Support GCS Objects with / in GCS Loaders (#3356)
So, this is basically fixing the same things as #1517 but for GCS.

### Problem
When loading GCS Objects with `/` in the object key (eg.
folder/some-document.txt) using `GCSFileLoader`, the objects are
downloaded into a temporary directory and saved as a file.

This errors out when the parent directory does not exist within the
temporary directory.

### What this pr does
Creates parent directories based on object key.

This also works with deeply nested keys:
folder/subfolder/some-document.txt
2023-04-26 15:04:09 -07:00
Mindaugas Sharskus
cdbc4cda37 [Fix #3365]: Changed regex to cover new line before action serious (#3367)
Fix for: [Changed regex to cover new line before action
serious.](https://github.com/hwchase17/langchain/issues/3365)
---

This PR fixes the issue where `ValueError: Could not parse LLM output:`
was thrown on seems to be valid input.

Changed regex to cover new lines before action serious (after the
keywords "Action:" and "Action Input:").

regex101: https://regex101.com/r/CXl1kB/1

---------

Co-authored-by: msarskus <msarskus@cisco.com>
2023-04-26 15:04:09 -07:00
Maxwell Mullin
4f501e59ec GuessedAtParserWarning from RTD document loader documentation example (#3397)
Addresses #3396 by adding 

`features='html.parser'` in example
2023-04-26 15:04:09 -07:00
engkheng
c850a4d406 Improve llm_chain.ipynb and getting_started.ipynb for chains docs (#3380)
My attempt at improving the `Chain`'s `Getting Started` docs and
`LLMChain` docs. Might need some proof-reading as English is not my
first language.

In LLM examples, I replaced the example use case when a simpler one
(shorter LLM output) to reduce cognitive load.
2023-04-26 15:04:09 -07:00
Zander Chase
aa9cf24a54 Add retry logic for ChromaDB (#3372)
Rewrite of #3368

Mainly an issue for when people are just getting started, but still nice
to not throw an error if the number of docs is < k.

Add a little decorator utility to block mutually exclusive keyword
arguments
2023-04-26 15:04:09 -07:00
tkarper
ffac033150 Add Databutton to list of Deployment options (#3364) 2023-04-26 15:04:09 -07:00
jrhe
8fc1c43e5d Adds progress bar using tqdm to directory_loader (#3349)
Approach copied from `WebBaseLoader`. Assumes the user doesn't have
`tqdm` installed.
2023-04-26 15:04:09 -07:00
killpanda
1deacb4f0a bug_fixes: use md5 instead of uuid id generation (#3442)
At present, the method of generating `point` in qdrant is to use random
`uuid`. The problem with this approach is that even documents with the
same content will be inserted repeatedly instead of updated. Using `md5`
as the `ID` of `point` to insert text can achieve true `update or
insert`.

Co-authored-by: mayue <mayue05@qiyi.com>
2023-04-26 15:04:09 -07:00
Jon Luo
621ab11734 Support SQLAlchemy 2.0 (#3310)
With https://github.com/executablebooks/jupyter-cache/pull/93 merged and
`MyST-NB` updated, we can now support SQLAlchemy 2. Closes #1766
2023-04-26 15:04:09 -07:00
engkheng
4bb95ad529 Update Getting Started page of Prompt Templates (#3298)
Updated `Getting Started` page of `Prompt Templates` to showcase more
features provided by the class. Might need some proof reading because
apparently English is not my first language.
2023-04-26 15:04:09 -07:00
Hasan Patel
8f5996a31c Updated Readme.md (#3477)
Corrected some minor grammar issues, changed infra to infrastructure for
more clarity. Improved readability
2023-04-26 15:04:09 -07:00
Davis Chase
68c19e1452 fix #3884 (#3475)
fixes mar bug #3384
2023-04-26 15:04:09 -07:00
Prakhar Agarwal
df0e1f85da pass list of strings to embed method in tf_hub (#3284)
This fixes the below mentioned issue. Instead of simply passing the text
to `tensorflow_hub`, we convert it to a list and then pass it.
https://github.com/hwchase17/langchain/issues/3282

Co-authored-by: Prakhar Agarwal <i.prakhar-agarwal@devrev.ai>
2023-04-26 15:04:09 -07:00
Beau Horenberger
99f74ff7d9 add LoRA loading for the LlamaCpp LLM (#3363)
First PR, let me know if this needs anything like unit tests,
reformatting, etc. Seemed pretty straightforward to implement. Only
hitch was that mmap needs to be disabled when loading LoRAs or else you
segfault.
2023-04-26 15:04:09 -07:00
Ehsan M. Kermani
fe5db65628 Use a consistent poetry version everywhere (#3250)
Fixes the discrepancy of poetry version in Dockerfile and the GAs
2023-04-26 15:04:09 -07:00
Felipe Lopes
59a4a8b34b feat: add private weaviate api_key support on from_texts (#3139)
This PR adds support for providing a Weaviate API Key to the VectorStore
methods `from_documents` and `from_texts`. With this addition, users can
authenticate to Weaviate and make requests to private Weaviate servers
when using these methods.

## Motivation
Currently, LangChain's VectorStore methods do not provide a way to
authenticate to Weaviate. This limits the functionality of the library
and makes it more difficult for users to take advantage of Weaviate's
features.

This PR addresses this issue by adding support for providing a Weaviate
API Key as extra parameter used in the `from_texts` method.

## Contributing Guidelines
I have read the [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md)
and the PR code passes the following tests:

- [x] make format
- [x] make lint
- [x] make coverage
- [x] make test
2023-04-26 15:04:09 -07:00
Zzz233
73aedeed07 ES similarity_search_with_score() and metadata filter (#3046)
Add similarity_search_with_score() to ElasticVectorSearch, add metadata
filter to both similarity_search() and similarity_search_with_score()
2023-04-26 15:04:09 -07:00
Zander Chase
e7d27d52f6 Vwp/alpaca streaming (#3468)
Co-authored-by: Luke Stanley <306671+lukestanley@users.noreply.github.com>
2023-04-26 15:04:09 -07:00
Cao Hoang
1c73dc6408 remove default usage of openai model in SQLDatabaseToolkit (#2884)
#2866

This toolkit used openai LLM as the default, which could incurr unwanted
cost.
2023-04-26 15:04:09 -07:00
Harrison Chase
b9d0e88584 show how to use memory in convo chain (#3463) 2023-04-26 15:04:09 -07:00
leo-gan
7482cc218c added integration links to the ecosystem.rst (#3453)
Now it is hard to search for the integration points between
data_loaders, retrievers, tools, etc.
I've placed links to all groups of providers and integrations on the
`ecosystem` page.
So, it is easy to navigate between all integrations from a single
location.
2023-04-26 15:04:09 -07:00
Davis Chase
cc247960a4 Bugfix: Not all combine docs chains takes kwargs prompt (#3462)
Generalize ConversationalRetrievalChain.from_llm kwargs

---------

Co-authored-by: shubham.suneja <shubham.suneja>
2023-04-26 15:04:09 -07:00
cs0lar
2e2be677c9 fixes #1214 (#3003)
### Background

Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search_by_vector` method.

### Changes

- a `max_marginal_relevance_search_by_vector` method implementation has
been added in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests

### Test Plan

Added tests for the `max_marginal_relevance_search_by_vector`
implementation

### Change Safety

- [x] I have added tests to cover my changes
2023-04-26 15:04:09 -07:00
Zander Chase
dca5772ed9 LM Requests Wrapper (#3457)
Co-authored-by: jnmarti <88381891+jnmarti@users.noreply.github.com>
2023-04-26 15:04:09 -07:00
Harrison Chase
5adfda8507 bump version to 148 (#3458) 2023-04-26 15:04:09 -07:00
Harrison Chase
704e0b98d8 update notebook 2023-04-26 15:04:09 -07:00
mbchang
cc6902f817 add meta-prompt to autonomous agents use cases (#3254)
An implementation of
[meta-prompt](https://noahgoodman.substack.com/p/meta-prompt-a-simple-self-improving),
where the agent modifies its own instructions across episodes with a
user.

![figure](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F468217b9-96d9-47c0-a08b-dbf6b21b9f49_492x384.png)
2023-04-26 15:04:09 -07:00
yunfeilu92
2f1ab146d5 propogate kwargs to cls in OpenSearchVectorSearch (#3416)
kwargs shoud be passed into cls so that opensearch client can be
properly initlized in __init__(). Otherwise logic like below will not
work. as auth will not be passed into __init__

```python
docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url="http://localhost:9200")

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)
```

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-28-97.ec2.internal>
2023-04-26 15:04:09 -07:00
Eduard van Valkenburg
9bcb2af86a small constructor change and updated notebook (#3426)
small change in the pydantic definitions, same api. 

updated notebook with right constructure and added few shot example
2023-04-26 15:04:09 -07:00
Zander Chase
cdc9c6a2fd Structured Tool Bugfixes (#3324)
- Proactively raise error if a tool subclasses BaseTool, defines its
own schema, but fails to add the type-hints
- fix the auto-inferred schema of the decorator to strip the
unneeded virtual kwargs from the schema dict

Helps avoid silent instances of #3297
2023-04-26 15:04:09 -07:00
Bilal Mahmoud
d0fa3cf798 Do not await sync callback managers (#3440)
This fixes a bug in the math LLM, where even the sync manager was
awaited, creating a nasty `RuntimeError`
2023-04-26 15:04:09 -07:00
Dianliang233
d80017f51f Fix NoneType has no len() in DDG tool (#3334)
Per
46ac914daa/duckduckgo_search/ddg.py (L109),
ddg function actually returns None when there is no result.
2023-04-26 15:04:09 -07:00
Davit Buniatyan
bf0bbc8f2c Deep Lake mini upgrades (#3375)
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3

Notes
* please double check if poetry is not messed up (thanks!)

Asks
* Would be great to create a shared slack channel for quick questions

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-04-26 15:04:09 -07:00
Haste171
27f1463f4a Update unstructured_file.ipynb (#3377)
Fix typo in docs
2023-04-26 15:04:09 -07:00
张城铭
f7b05e7348 Optimize code (#3412)
Co-authored-by: assert <zhangchengming@kkguan.com>
2023-04-26 15:04:09 -07:00
Zander Chase
bf795bffdb Catch all exceptions in autogpt (#3413)
Ought to be more autonomous
2023-04-26 15:04:09 -07:00
Zander Chase
906488f87e Move Generative Agent definition to Experimental (#3245)
Extending @BeautyyuYanli 's #3220 to move from the notebook

---------

Co-authored-by: BeautyyuYanli <beautyyuyanli@gmail.com>
2023-04-26 15:04:09 -07:00
Zander Chase
7a01742895 Add Sentence Transformers Embeddings (#3409)
Add embeddings based on the sentence transformers library.
Add a notebook and integration tests.

Co-authored-by: khimaros <me@khimaros.com>
2023-04-26 15:04:09 -07:00
Zander Chase
cef046ae18 Update marathon notebook (#3408)
Fixes #3404
2023-04-26 15:04:09 -07:00
Luke Harris
5e53336c7d Several confluence loader improvements (#3300)
This PR addresses several improvements:

- Previously it was not possible to load spaces of more than 100 pages.
The `limit` was being used both as an overall page limit *and* as a per
request pagination limit. This, in combination with the fact that
atlassian seem to use a server-side hard limit of 100 when page content
is expanded, meant it wasn't possible to download >100 pages. Now
`limit` is used *only* as a per-request pagination limit and `max_pages`
is introduced as the way to limit the total number of pages returned by
the paginator.
- Document metadata now includes `source` (the source url), making it
compatible with `RetrievalQAWithSourcesChain`.
 - It is now possible to include inline and footer comments.
- It is now possible to pass `verify_ssl=False` and other parameters to
the confluence object for use cases that require it.
2023-04-26 15:04:09 -07:00
zz
95ae3c5f4b Add support for wikipedia's lang parameter (#3383)
Allow to hange the language of the wikipedia API being requested.

Co-authored-by: zhuohui <zhuohui@datastory.com.cn>
2023-04-26 15:04:09 -07:00
Johann-Peter Hartmann
fa9c5ac78d Improve youtube loader (#3395)
Small improvements for the YouTube loader: 
a) use the YouTube API permission scope instead of Google Drive 
b) bugfix: allow transcript loading for single videos 
c) an additional parameter "continue_on_failure" for cases when videos
in a playlist do not have transcription enabled.
d) support automated translation for all languages, if available.

---------

Co-authored-by: Johann-Peter Hartmann <johann-peter.hartmann@mayflower.de>
2023-04-26 15:04:09 -07:00
Harrison Chase
d5ef266842 Harrison/hf document loader (#3394)
Co-authored-by: Azam Iftikhar <azamiftikhar1000@gmail.com>
2023-04-26 15:04:09 -07:00
Hadi Curtay
3fdfa5d576 Updated incorrect link to Weaviate notebook (#3362)
The detailed walkthrough of the Weaviate wrapper was pointing to the
getting-started notebook. Fixed it to point to the Weaviable notebook in
the examples folder.
2023-04-26 15:04:09 -07:00
Ismail Pelaseyed
e41a70eb59 Add example on deploying LangChain to Cloud Run (#3366)
## Summary

Adds a link to a minimal example of running LangChain on Google Cloud
Run.
2023-04-26 15:04:09 -07:00
Ivan Zatevakhin
71db9c97c6 llamacpp wrong default value passed for f16_kv (#3320)
Fixes default f16_kv value in llamacpp; corrects incorrect parameter
passed.

See:
ba3959eafd/llama_cpp/llama.py (L33)

Fixes #3241
Fixes #3301
2023-04-26 15:04:09 -07:00
Harrison Chase
042415eee4 bump version to 147 (#3353) 2023-04-26 15:04:09 -07:00
Harrison Chase
37cc3d2e63 Harrison/myscale (#3352)
Co-authored-by: Fangrui Liu <fangruil@moqi.ai>
Co-authored-by: 刘 方瑞 <fangrui.liu@outlook.com>
Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>
2023-04-26 15:04:09 -07:00
Harrison Chase
828c96072c Harrison/error hf (#3348)
Co-authored-by: Rui Melo <44201826+rufimelo99@users.noreply.github.com>
2023-04-26 15:04:09 -07:00
Honkware
edbd3c7964 Add ChatGPT Data Loader (#3336)
This pull request adds a ChatGPT document loader to the document loaders
module in `langchain/document_loaders/chatgpt.py`. Additionally, it
includes an example Jupyter notebook in
`docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
which uses fake sample data based on the original structure of the
`conversations.json` file.

The following files were added/modified:
- `langchain/document_loaders/__init__.py`
- `langchain/document_loaders/chatgpt.py`
- `docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
-
`docs/modules/indexes/document_loaders/examples/example_data/fake_conversations.json`

This pull request was made in response to the recent release of ChatGPT
data exports by email:
https://help.openai.com/en/articles/7260999-how-do-i-export-my-chatgpt-history
2023-04-26 15:04:09 -07:00
Zander Chase
f553d28a11 Fix Sagemaker Batch Endpoints (#3249)
Add different typing for @evandiewald 's heplful PR

---------

Co-authored-by: Evan Diewald <evandiewald@gmail.com>
2023-04-26 15:04:09 -07:00
Johann-Peter Hartmann
e8e8ca163b Support recursive sitemaps in SitemapLoader (#3146)
A (very) simple addition to support multiple sitemap urls.

---------

Co-authored-by: Johann-Peter Hartmann <johann-peter.hartmann@mayflower.de>
2023-04-26 15:04:09 -07:00
Filip Haltmayer
c9d5525485 Refactor Milvus/Zilliz (#3047)
Refactoring milvus/zilliz to clean up and have a more consistent
experience.

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
2023-04-26 15:04:09 -07:00
Harrison Chase
8f4f90cdae Harrison/voice assistant (#3347)
Co-authored-by: Jaden <jaden.lorenc@gmail.com>
2023-04-26 15:04:09 -07:00
Richy Wang
612f928323 Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135)
Hi there!
I'm excited to open this PR to add support for using a fully Postgres
syntax compatible database 'AnalyticDB' as a vector.
As AnalyticDB has been proved can be used with AutoGPT,
ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for
you.
AnalyticDB is a distributed Alibaba Cloud-Native vector database. It
works better when data comes to large scale. The PR includes:

- [x]  A new memory: AnalyticDBVector
- [x]  A suite of integration tests verifies the AnalyticDB integration

I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
- [x]  make format
- [x]  make lint
- [x]  make coverage
- [x]  make test
2023-04-26 15:04:09 -07:00
Harrison Chase
7c211d2438 Harrison/power bi (#3205)
Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
2023-04-26 15:04:09 -07:00
Daniel Chalef
6a0abccf4d args_schema type hint on subclassing (#3323)
per https://github.com/hwchase17/langchain/issues/3297

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-04-26 15:04:09 -07:00
Zander Chase
beb0f6fd60 Fix linting on master (#3327) 2023-04-26 15:04:09 -07:00
Varun Srinivas
219b618a5b Change in method name for creating an issue on JIRA (#3307)
The awesome JIRA tool created by @zywilliamli calls the `create_issue()`
method to create issues, however, the actual method is `issue_create()`.

Details in the Documentation here:
https://atlassian-python-api.readthedocs.io/jira.html#manage-issues
2023-04-26 15:04:09 -07:00
Davis Chase
fcd174cf43 Update docs api references (#3315) 2023-04-26 15:04:09 -07:00
Paul Garner
74f46262d0 Add PythonLoader which auto-detects encoding of Python files (#3311)
This PR contributes a `PythonLoader`, which inherits from
`TextLoader` but detects and sets the encoding automatically.
2023-04-26 15:04:09 -07:00
Daniel Chalef
058273174a Fix example match_documents fn table name, grammar (#3294)
ref
https://github.com/hwchase17/langchain/pull/3100#issuecomment-1517086472

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-04-26 15:04:09 -07:00
Davis Chase
1a4c4a24f2 Cleanup integration test dir (#3308) 2023-04-26 15:04:09 -07:00
kayvane1
f6c98a7c1e chore: backwards compatibility 2023-04-26 10:59:16 +01:00
kayvane1
e0cb4c3005 chore: docstring update 2023-04-24 15:22:58 +01:00
kayvane1
97cabb40ae tests: fix tests 2023-04-24 11:54:42 +01:00
kayvane1
37b68dc8f2 feat: aligning the tools available for agents to switch between Bing, DDG and Google. All three services now have the same tools and implementations 2023-04-24 11:14:57 +01:00
21 changed files with 76 additions and 597 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.5 MiB

View File

@@ -1,26 +0,0 @@
# Metal
This page covers how to use [Metal](https://getmetal.io) within LangChain.
## What is Metal?
Metal is a managed retrieval & memory platform built for production. Easily index your data into `Metal` and run semantic search and retrieval on it.
![Metal](../_static/MetalDash.png)
## Quick start
Get started by [creating a Metal account](https://app.getmetal.io/signup).
Then, you can easily take advantage of the `MetalRetriever` class to start retrieving your data for semantic search, prompting context, etc. This class takes a `Metal` instance and a dictionary of parameters to pass to the Metal API.
```python
from langchain.retrievers import MetalRetriever
from metal_sdk.metal import Metal
metal = Metal("API_KEY", "CLIENT_ID", "INDEX_ID");
retriever = MetalRetriever(metal, params={"limit": 2})
docs = retriever.get_relevant_documents("search term")
```

View File

@@ -39,27 +39,11 @@
"name": "stdout",
"output_type": "stream",
"text": [
"apify.ipynb\n",
"arxiv.ipynb\n",
"bash.ipynb\n",
"bing_search.ipynb\n",
"chatgpt_plugins.ipynb\n",
"ddg.ipynb\n",
"google_places.ipynb\n",
"google_search.ipynb\n",
"google_serper.ipynb\n",
"gradio_tools.ipynb\n",
"human_tools.ipynb\n",
"ifttt.ipynb\n",
"openweathermap.ipynb\n",
"python.ipynb\n",
"requests.ipynb\n",
"search_tools.ipynb\n",
"searx_search.ipynb\n",
"serpapi.ipynb\n",
"wikipedia.ipynb\n",
"wolfram_alpha.ipynb\n",
"zapier.ipynb\n",
"\n"
]
}
@@ -68,95 +52,10 @@
"print(bash.run(\"ls\"))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "e7896f8e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"apify.ipynb\n",
"arxiv.ipynb\n",
"bash.ipynb\n",
"bing_search.ipynb\n",
"chatgpt_plugins.ipynb\n",
"ddg.ipynb\n",
"google_places.ipynb\n",
"google_search.ipynb\n",
"google_serper.ipynb\n",
"gradio_tools.ipynb\n",
"human_tools.ipynb\n",
"ifttt.ipynb\n",
"openweathermap.ipynb\n",
"python.ipynb\n",
"requests.ipynb\n",
"search_tools.ipynb\n",
"searx_search.ipynb\n",
"serpapi.ipynb\n",
"wikipedia.ipynb\n",
"wolfram_alpha.ipynb\n",
"zapier.ipynb\n",
"\n"
]
}
],
"source": [
"bash.run(\"cd ..\")\n",
"# The commands are executed in a new subprocess each time, meaning that\n",
"# this call will return the same results as the last.\n",
"print(bash.run(\"ls\"))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "851fee9f",
"metadata": {},
"source": [
"## Terminal Persistance\n",
"\n",
"By default, the bash command will be executed in a new subprocess each time. To retain a persistent bash session, we can use the `persistent=True` arg."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4a93ea2c",
"metadata": {},
"outputs": [],
"source": [
"bash = BashProcess(persistent=True)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a1e98b78",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"custom_tools.ipynb\t\tmulti_input_tool.ipynb\n",
"examples\t\t\ttool_input_validation.ipynb\n",
"getting_started.md\n"
]
}
],
"source": [
"bash.run(\"cd ..\")\n",
"# Note the list of files is different\n",
"print(bash.run(\"ls\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e13c1c9c",
"id": "851fee9f",
"metadata": {},
"outputs": [],
"source": []
@@ -178,7 +77,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.16"
"version": "3.10.9"
}
},
"nbformat": 4,

View File

@@ -27,7 +27,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.tools import DuckDuckGoSearchRun"
"from langchain.tools import DuckDuckGoSearchTool"
]
},
{
@@ -37,7 +37,7 @@
"metadata": {},
"outputs": [],
"source": [
"search = DuckDuckGoSearchRun()"
"search = DuckDuckGoSearchTool()"
]
},
{

View File

@@ -24,8 +24,8 @@
"\n",
"```bash\n",
"echo \"Hello World\"\n",
"```\u001b[0m\n",
"Code: \u001b[33;1m\u001b[1;3m['echo \"Hello World\"']\u001b[0m\n",
"```\u001b[0m['```bash', 'echo \"Hello World\"', '```']\n",
"\n",
"Answer: \u001b[33;1m\u001b[1;3mHello World\n",
"\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
@@ -65,7 +65,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
@@ -93,7 +93,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 29,
"metadata": {},
"outputs": [
{
@@ -107,8 +107,8 @@
"\n",
"```bash\n",
"printf \"Hello World\\n\"\n",
"```\u001b[0m\n",
"Code: \u001b[33;1m\u001b[1;3m['printf \"Hello World\\\\n\"']\u001b[0m\n",
"```\u001b[0m['```bash', 'printf \"Hello World\\\\n\"', '```']\n",
"\n",
"Answer: \u001b[33;1m\u001b[1;3mHello World\n",
"\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
@@ -120,7 +120,7 @@
"'Hello World\\n'"
]
},
"execution_count": 3,
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
@@ -132,114 +132,6 @@
"\n",
"bash_chain.run(text)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Persistent Terminal\n",
"\n",
"By default, the chain will run in a separate subprocess each time it is called. This behavior can be changed by instantiating with a persistent bash process."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMBashChain chain...\u001b[0m\n",
"List the current directory then move up a level.\u001b[32;1m\u001b[1;3m\n",
"\n",
"```bash\n",
"ls\n",
"cd ..\n",
"```\u001b[0m\n",
"Code: \u001b[33;1m\u001b[1;3m['ls', 'cd ..']\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3mapi.ipynb\t\t\tllm_summarization_checker.ipynb\n",
"constitutional_chain.ipynb\tmoderation.ipynb\n",
"llm_bash.ipynb\t\t\topenai_openapi.yaml\n",
"llm_checker.ipynb\t\topenapi.ipynb\n",
"llm_math.ipynb\t\t\tpal.ipynb\n",
"llm_requests.ipynb\t\tsqlite.ipynb\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'api.ipynb\\t\\t\\tllm_summarization_checker.ipynb\\r\\nconstitutional_chain.ipynb\\tmoderation.ipynb\\r\\nllm_bash.ipynb\\t\\t\\topenai_openapi.yaml\\r\\nllm_checker.ipynb\\t\\topenapi.ipynb\\r\\nllm_math.ipynb\\t\\t\\tpal.ipynb\\r\\nllm_requests.ipynb\\t\\tsqlite.ipynb'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.utilities.bash import BashProcess\n",
"\n",
"\n",
"persistent_process = BashProcess(persistent=True)\n",
"bash_chain = LLMBashChain.from_bash_process(llm=llm, bash_process=persistent_process, verbose=True)\n",
"\n",
"text = \"List the current directory then move up a level.\"\n",
"\n",
"bash_chain.run(text)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMBashChain chain...\u001b[0m\n",
"List the current directory then move up a level.\u001b[32;1m\u001b[1;3m\n",
"\n",
"```bash\n",
"ls\n",
"cd ..\n",
"```\u001b[0m\n",
"Code: \u001b[33;1m\u001b[1;3m['ls', 'cd ..']\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3mexamples\t\tgetting_started.ipynb\tindex_examples\n",
"generic\t\t\thow_to_guides.rst\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'examples\\t\\tgetting_started.ipynb\\tindex_examples\\r\\ngeneric\\t\\t\\thow_to_guides.rst'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Run the same command again and see that the state is maintained between calls\n",
"bash_chain.run(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -258,7 +150,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.16"
"version": "3.10.6"
}
},
"nbformat": 4,

View File

@@ -219,7 +219,7 @@
},
"outputs": [],
"source": [
"from langchain.tools import BaseTool, DuckDuckGoSearchRun\n",
"from langchain.tools import BaseTool, DuckDuckGoSearchTool\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"from pydantic import Field\n",
@@ -321,7 +321,7 @@
"outputs": [],
"source": [
"# !pip install duckduckgo_search\n",
"web_search = DuckDuckGoSearchRun()"
"web_search = DuckDuckGoSearchTool()"
]
},
{
@@ -618,7 +618,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.16"
"version": "3.11.2"
}
},
"nbformat": 4,

View File

@@ -35,7 +35,7 @@ def create_pandas_dataframe_agent(
prompt = ZeroShotAgent.create_prompt(
tools, prefix=prefix, suffix=suffix, input_variables=input_variables
)
partial_prompt = prompt.partial(df=str(df.head().to_markdown()))
partial_prompt = prompt.partial(df=str(df.head()))
llm_chain = LLMChain(
llm=llm,
prompt=partial_prompt,

View File

@@ -15,7 +15,7 @@ from langchain.requests import TextRequestsWrapper
from langchain.tools.arxiv.tool import ArxivQueryRun
from langchain.tools.base import BaseTool
from langchain.tools.bing_search.tool import BingSearchRun
from langchain.tools.ddg_search.tool import DuckDuckGoSearchRun
from langchain.tools.ddg_search.tool import DuckDuckGoSearchTool
from langchain.tools.google_search.tool import GoogleSearchResults, GoogleSearchRun
from langchain.tools.human.tool import HumanInputRun
from langchain.tools.python.tool import PythonREPLTool
@@ -219,7 +219,7 @@ def _get_bing_search(**kwargs: Any) -> BaseTool:
def _get_ddg_search(**kwargs: Any) -> BaseTool:
return DuckDuckGoSearchRun(api_wrapper=DuckDuckGoSearchAPIWrapper(**kwargs))
return DuckDuckGoSearchTool(api_wrapper=DuckDuckGoSearchAPIWrapper(**kwargs))
def _get_human_tool(**kwargs: Any) -> BaseTool:

View File

@@ -1,46 +1,15 @@
"""Chain that interprets a prompt and executes bash code to perform bash operations."""
import logging
import re
from typing import Any, Dict, List
from typing import Dict, List
from pydantic import Extra, Field
from pydantic import Extra
from langchain.chains.base import Chain
from langchain.chains.llm import LLMChain
from langchain.chains.llm_bash.prompt import PROMPT
from langchain.prompts.base import BasePromptTemplate
from langchain.schema import BaseLanguageModel, BaseOutputParser, OutputParserException
from langchain.schema import BaseLanguageModel
from langchain.utilities.bash import BashProcess
logger = logging.getLogger(__name__)
class BashOutputParser(BaseOutputParser):
"""Parser for bash output."""
def parse(self, text: str) -> List[str]:
if "```bash" in text:
return self.get_code_blocks(text)
else:
raise OutputParserException(
f"Failed to parse bash output. Got: {text}",
)
@staticmethod
def get_code_blocks(t: str) -> List[str]:
"""Get multiple code blocks from the LLM result."""
code_blocks: List[str] = []
# Bash markdown code blocks
pattern = re.compile(r"```bash(.*?)(?:\n\s*)```", re.DOTALL)
for match in pattern.finditer(t):
matched = match.group(1).strip()
if matched:
code_blocks.extend(
[line for line in matched.split("\n") if line.strip()]
)
return code_blocks
class LLMBashChain(Chain):
"""Chain that interprets a prompt and executes bash code to perform bash operations.
@@ -57,8 +26,6 @@ class LLMBashChain(Chain):
input_key: str = "question" #: :meta private:
output_key: str = "answer" #: :meta private:
prompt: BasePromptTemplate = PROMPT
output_parser: BaseOutputParser = Field(default_factory=BashOutputParser)
bash_process: BashProcess = Field(default_factory=BashProcess) #: :meta private:
class Config:
"""Configuration for this pydantic object."""
@@ -84,40 +51,29 @@ class LLMBashChain(Chain):
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
llm_executor = LLMChain(prompt=self.prompt, llm=self.llm)
bash_executor = BashProcess()
self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
t = llm_executor.predict(question=inputs[self.input_key])
self.callback_manager.on_text(t, color="green", verbose=self.verbose)
t = t.strip()
try:
command_list = self.output_parser.parse(t)
except OutputParserException as e:
self.callback_manager.on_chain_error(e, verbose=self.verbose)
raise e
if t.startswith("```bash"):
# Split the string into a list of substrings
command_list = t.split("\n")
print(command_list)
if self.verbose:
self.callback_manager.on_text("\nCode: ", verbose=self.verbose)
self.callback_manager.on_text(
str(command_list), color="yellow", verbose=self.verbose
)
# Remove the first and last substrings
command_list = [s for s in command_list[1:-1]]
output = bash_executor.run(command_list)
output = self.bash_process.run(command_list)
self.callback_manager.on_text("\nAnswer: ", verbose=self.verbose)
self.callback_manager.on_text(output, color="yellow", verbose=self.verbose)
self.callback_manager.on_text("\nAnswer: ", verbose=self.verbose)
self.callback_manager.on_text(output, color="yellow", verbose=self.verbose)
else:
raise ValueError(f"unknown format from LLM: {t}")
return {self.output_key: output}
@property
def _chain_type(self) -> str:
return "llm_bash_chain"
@classmethod
def from_bash_process(
cls,
bash_process: BashProcess,
llm: BaseLanguageModel,
**kwargs: Any,
) -> "LLMBashChain":
"""Create a LLMBashChain from a BashProcess."""
return cls(llm=llm, bash_process=bash_process, **kwargs)

View File

@@ -26,4 +26,4 @@ services:
- POSTGRES_USER=postgres
- POSTGRES_DB=postgres
expose:
- 5432
- 5432:5432

View File

@@ -17,7 +17,6 @@ class BSHTMLLoader(BaseLoader):
file_path: str,
open_encoding: Union[str, None] = None,
bs_kwargs: Union[dict, None] = None,
get_text_separator: str = "",
) -> None:
"""Initialise with path, and optionally, file encoding to use, and any kwargs
to pass to the BeautifulSoup object."""
@@ -34,7 +33,6 @@ class BSHTMLLoader(BaseLoader):
if bs_kwargs is None:
bs_kwargs = {"features": "lxml"}
self.bs_kwargs = bs_kwargs
self.get_text_separator = get_text_separator
def load(self) -> List[Document]:
from bs4 import BeautifulSoup
@@ -43,7 +41,7 @@ class BSHTMLLoader(BaseLoader):
with open(self.file_path, "r", encoding=self.open_encoding) as f:
soup = BeautifulSoup(f, **self.bs_kwargs)
text = soup.get_text(self.get_text_separator)
text = soup.get_text()
if soup.title:
title = str(soup.title.string)

View File

@@ -11,17 +11,11 @@ from langchain.tools.openapi.utils.openapi_utils import OpenAPISpec
from langchain.tools.plugin import AIPluginTool
__all__ = [
"AIPluginTool",
"APIOperation",
"BingSearchResults",
"BingSearchRun",
"DuckDuckGoSearchResults",
"DuckDuckGoSearchRun",
"DuckDuckGoSearchRun",
"GooglePlacesTool",
"GoogleSearchResults",
"GoogleSearchRun",
"IFTTTWebhook",
"OpenAPISpec",
"BaseTool",
"IFTTTWebhook",
"AIPluginTool",
"OpenAPISpec",
"APIOperation",
"GooglePlacesTool",
"DuckDuckGoSearchTool",
]

View File

@@ -1,5 +1,5 @@
"""DuckDuckGo Search API toolkit."""
from langchain.tools.ddg_search.tool import DuckDuckGoSearchRun
from langchain.tools.ddg_search.tool import DuckDuckGoSearchTool
__all__ = ["DuckDuckGoSearchRun"]
__all__ = ["DuckDuckGoSearchTool"]

View File

@@ -1,13 +1,10 @@
"""Tool for the DuckDuckGo search API."""
import warnings
from typing import Any
from pydantic import Field
from langchain.tools.base import BaseTool
from langchain.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper
import warnings
class DuckDuckGoSearchRun(BaseTool):
"""Tool that adds the capability to query the DuckDuckGo search API."""
@@ -32,7 +29,7 @@ class DuckDuckGoSearchRun(BaseTool):
class DuckDuckGoSearchResults(BaseTool):
"""Tool that queries the Duck Duck Go Search API and get back json."""
"""Tool that has capability to query the Duck Duck Go Search API and get back json."""
name = "DuckDuckGo Results JSON"
description = (
@@ -53,11 +50,10 @@ class DuckDuckGoSearchResults(BaseTool):
"""Use the tool asynchronously."""
raise NotImplementedError("DuckDuckGoSearchResults does not support async")
def DuckDuckGoSearchTool(*args: Any, **kwargs: Any) -> DuckDuckGoSearchRun:
def DuckDuckGoSearchTool(*args, **kwargs):
warnings.warn(
"DuckDuckGoSearchTool will be deprecated in the future. "
"Please use DuckDuckGoSearchRun instead.",
DeprecationWarning,
)
return DuckDuckGoSearchRun(*args, **kwargs)
return DuckDuckGoSearchRun(*args, **kwargs)

View File

@@ -1,59 +1,24 @@
"""Wrapper around subprocess to run commands."""
import re
import subprocess
from typing import List, Union
from uuid import uuid4
import pexpect
class BashProcess:
"""Executes bash commands and returns the output."""
def __init__(
self,
strip_newlines: bool = False,
return_err_output: bool = False,
persistent: bool = False,
):
def __init__(self, strip_newlines: bool = False, return_err_output: bool = False):
"""Initialize with stripping newlines."""
self.strip_newlines = strip_newlines
self.return_err_output = return_err_output
self.prompt = ""
self.process = None
if persistent:
self.prompt = str(uuid4())
self.process = self._initialize_persistent_process(self.prompt)
@staticmethod
def _initialize_persistent_process(prompt: str) -> pexpect.spawn:
# Start bash in a clean environment
process = pexpect.spawn(
"env", ["-i", "bash", "--norc", "--noprofile"], encoding="utf-8"
)
# Set the custom prompt
process.sendline("PS1=" + prompt)
process.expect_exact(prompt, timeout=10)
return process
def run(self, commands: Union[str, List[str]]) -> str:
"""Run commands and return final output."""
if isinstance(commands, str):
commands = [commands]
commands = ";".join(commands)
if self.process is not None:
return self._run_persistent(
commands,
)
else:
return self._run(commands)
def _run(self, command: str) -> str:
"""Run commands and return final output."""
try:
output = subprocess.run(
command,
commands,
shell=True,
check=True,
stdout=subprocess.PIPE,
@@ -66,31 +31,3 @@ class BashProcess:
if self.strip_newlines:
output = output.strip()
return output
def process_output(self, output: str, command: str) -> str:
# Remove the command from the output using a regular expression
pattern = re.escape(command) + r"\s*\n"
output = re.sub(pattern, "", output, count=1)
return output.strip()
def _run_persistent(self, command: str) -> str:
"""Run commands and return final output."""
if self.process is None:
raise ValueError("Process not initialized")
self.process.sendline(command)
# Clear the output with an empty string
self.process.expect(self.prompt, timeout=10)
self.process.sendline("")
try:
self.process.expect([self.prompt, pexpect.EOF], timeout=10)
except pexpect.TIMEOUT:
return f"Timeout error while executing command {command}"
if self.process.after == pexpect.EOF:
return f"Exited with error status: {self.process.exitstatus}"
output = self.process.before
output = self.process_output(output, command)
if self.strip_newlines:
return output.strip()
return output

View File

@@ -41,7 +41,7 @@ class DuckDuckGoSearchAPIWrapper(BaseModel):
def run(self, query: str) -> str:
from duckduckgo_search import ddg
"""Run query through DuckDuckGo and return concatenated results."""
"""Run query through DuckDuckGo and return results."""
results = ddg(
query,
region=self.region,
@@ -54,7 +54,7 @@ class DuckDuckGoSearchAPIWrapper(BaseModel):
snippets = [result["body"] for result in results]
return " ".join(snippets)
def results(self, query: str, num_results: int) -> List[Dict[str, str]]:
def results(self, query: str, num_results: int) -> List[Dict]:
"""Run query through DuckDuckGo and return metadata.
Args:
@@ -80,7 +80,7 @@ class DuckDuckGoSearchAPIWrapper(BaseModel):
if results is None or len(results) == 0:
return [{"Result": "No good DuckDuckGo Search Result was found"}]
def to_metadata(result: Dict) -> Dict[str, str]:
def to_metadata(result: Dict) -> Dict:
return {
"snippet": result["body"],
"title": result["title"],

View File

@@ -77,23 +77,7 @@ class SerpAPIWrapper(BaseModel):
return values
async def arun(self, query: str) -> str:
"""Run query through SerpAPI and parse result async."""
return self._process_response(await self.aresults(query))
def run(self, query: str) -> str:
"""Run query through SerpAPI and parse result."""
return self._process_response(self.results(query))
def results(self, query: str) -> dict:
"""Run query through SerpAPI and return the raw result."""
params = self.get_params(query)
with HiddenPrints():
search = self.search_engine(params)
res = search.get_dict()
return res
async def aresults(self, query: str) -> dict:
"""Use aiohttp to run query through SerpAPI and return the results async."""
"""Use aiohttp to run query through SerpAPI and parse result."""
def construct_url_and_params() -> Tuple[str, Dict[str, str]]:
params = self.get_params(query)
@@ -113,6 +97,18 @@ class SerpAPIWrapper(BaseModel):
async with self.aiosession.get(url, params=params) as response:
res = await response.json()
return self._process_response(res)
def run(self, query: str) -> str:
"""Run query through SerpAPI and parse result."""
return self._process_response(self.results(query))
def results(self, query: str) -> dict:
"""Run query through SerpAPI and return the raw result."""
params = self.get_params(query)
with HiddenPrints():
search = self.search_engine(params)
res = search.get_dict()
return res
def get_params(self, query: str) -> Dict[str, str]:

View File

@@ -9,17 +9,15 @@ from langchain.document_loaders.html_bs import BSHTMLLoader
def test_bs_html_loader() -> None:
"""Test unstructured loader."""
file_path = Path(__file__).parent.parent / "examples/example.html"
loader = BSHTMLLoader(str(file_path), get_text_separator="|")
loader = BSHTMLLoader(str(file_path))
docs = loader.load()
assert len(docs) == 1
metadata = docs[0].metadata
content = docs[0].page_content
assert metadata["title"] == "Chew dad's slippers"
assert metadata["source"] == str(file_path)
assert content[:2] == "\n|"
@pytest.mark.skipif(

View File

@@ -1,50 +0,0 @@
"""Test chat agents in various scenarios."""
from typing import Set
import pytest
from langchain.agents.agent_types import AgentType
from langchain.agents.initialize import initialize_agent
from langchain.agents.tools import Tool
from langchain.chains.llm_math.base import LLMMathChain
from langchain.chat_models.openai import ChatOpenAI
from langchain.tools.ddg_search.tool import DuckDuckGoSearchRun
from langchain.tools.plugin import AIPluginTool
TEST_CASES = [
(
"What's the current time in NYC?",
{"DuckDuckGo Search"},
),
("What is a shoe that's available on Klarna?", {"KlarnaProducts"}),
("What's 3*4.2*1.7", {"Calculator"}),
]
@pytest.mark.parametrize("query, used_tools", TEST_CASES)
def test_chat_agent(query: str, used_tools: Set[str]) -> None:
"""Test chat agent."""
llm = ChatOpenAI(temperature=0)
llm_math_chain = LLMMathChain(llm=llm)
tools = [
DuckDuckGoSearchRun(),
AIPluginTool.from_plugin_url(
"https://www.klarna.com/.well-known/ai-plugin.json"
),
Tool(
name="Calculator",
func=llm_math_chain.run,
description="useful for doing calculations",
),
]
agent_executor = initialize_agent(
tools,
llm,
AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
return_intermediate_steps=True,
)
result = agent_executor({"input": query})
intermediate_steps = result["intermediate_steps"]
tool_sequences = [act.tool for act, _ in intermediate_steps]
assert set(tool_sequences) == used_tools

View File

@@ -3,107 +3,26 @@ import sys
import pytest
from langchain.chains.llm_bash.base import BashOutputParser, LLMBashChain
from langchain.chains.llm_bash.base import LLMBashChain
from langchain.chains.llm_bash.prompt import _PROMPT_TEMPLATE
from langchain.schema import OutputParserException
from tests.unit_tests.llms.fake_llm import FakeLLM
_SAMPLE_CODE = """
Unrelated text
```bash
echo hello
```
Unrelated text
"""
_SAMPLE_CODE_2_LINES = """
Unrelated text
```bash
echo hello
echo world
```
Unrelated text
"""
@pytest.fixture
def output_parser() -> BashOutputParser:
"""Output parser for testing."""
return BashOutputParser()
def fake_llm_bash_chain() -> LLMBashChain:
"""Fake LLM Bash chain for testing."""
question = "Please write a bash script that prints 'Hello World' to the console."
prompt = _PROMPT_TEMPLATE.format(question=question)
queries = {prompt: "```bash\nexpr 1 + 1\n```"}
fake_llm = FakeLLM(queries=queries)
return LLMBashChain(llm=fake_llm, input_key="q", output_key="a")
@pytest.mark.skipif(
sys.platform.startswith("win"), reason="Test not supported on Windows"
)
def test_simple_question() -> None:
def test_simple_question(fake_llm_bash_chain: LLMBashChain) -> None:
"""Test simple question that should not need python."""
question = "Please write a bash script that prints 'Hello World' to the console."
prompt = _PROMPT_TEMPLATE.format(question=question)
queries = {prompt: "```bash\nexpr 1 + 1\n```"}
fake_llm = FakeLLM(queries=queries)
fake_llm_bash_chain = LLMBashChain(llm=fake_llm, input_key="q", output_key="a")
output = fake_llm_bash_chain.run(question)
assert output == "2\n"
def test_get_code(output_parser: BashOutputParser) -> None:
"""Test the parser."""
code_lines = output_parser.parse(_SAMPLE_CODE)
code = [c for c in code_lines if c.strip()]
assert code == code_lines
assert code == ["echo hello"]
code_lines = output_parser.parse(_SAMPLE_CODE + _SAMPLE_CODE_2_LINES)
assert code_lines == ["echo hello", "echo hello", "echo world"]
def test_parsing_error() -> None:
"""Test that LLM Output without a bash block raises an exce"""
question = "Please echo 'hello world' to the terminal."
prompt = _PROMPT_TEMPLATE.format(question=question)
queries = {
prompt: """
```text
echo 'hello world'
```
"""
}
fake_llm = FakeLLM(queries=queries)
fake_llm_bash_chain = LLMBashChain(llm=fake_llm, input_key="q", output_key="a")
with pytest.raises(OutputParserException):
fake_llm_bash_chain.run(question)
def test_get_code_lines_mixed_blocks(output_parser: BashOutputParser) -> None:
text = """
Unrelated text
```bash
echo hello
ls && pwd && ls
```
```python
print("hello")
```
```bash
echo goodbye
```
"""
code_lines = output_parser.parse(text)
assert code_lines == ["echo hello", "ls && pwd && ls", "echo goodbye"]
def test_get_code_lines_simple_nested_ticks(output_parser: BashOutputParser) -> None:
"""Test that backticks w/o a newline are ignored."""
text = """
Unrelated text
```bash
echo hello
echo "```bash is in this string```"
```
"""
code_lines = output_parser.parse(text)
assert code_lines == ["echo hello", 'echo "```bash is in this string```"']

View File

@@ -21,23 +21,6 @@ def test_pwd_command() -> None:
assert output == subprocess.check_output("pwd", shell=True).decode()
@pytest.mark.skipif(
sys.platform.startswith("win"), reason="Test not supported on Windows"
)
def test_pwd_command_persistent() -> None:
"""Test correct functionality when the bash process is persistent."""
session = BashProcess(persistent=True, strip_newlines=True)
commands = ["pwd"]
output = session.run(commands)
assert subprocess.check_output("pwd", shell=True).decode().strip() in output
session.run(["cd .."])
new_output = session.run(["pwd"])
# Assert that the new_output is a parent of the old output
assert Path(output).parent == Path(new_output)
@pytest.mark.skipif(
sys.platform.startswith("win"), reason="Test not supported on Windows"
)
@@ -83,16 +66,3 @@ def test_create_directory_and_files(tmp_path: Path) -> None:
# check that the files were created in the temporary directory
output = session.run([f"ls {temp_dir}"])
assert output == "file1.txt\nfile2.txt"
@pytest.mark.skipif(
sys.platform.startswith("win"), reason="Test not supported on Windows"
)
def test_create_bash_persistent() -> None:
"""Test the pexpect persistent bash terminal"""
session = BashProcess(persistent=True)
response = session.run("echo hello")
response += session.run("echo world")
assert "hello" in response
assert "world" in response