Compare commits

..

127 Commits

Author SHA1 Message Date
Harrison Chase
b634489b2e bump version to 141 (#2950) 2023-04-15 12:56:39 -07:00
Harrison Chase
274b25c010 SVM retriever (#2947) (#2949)
Add SVM retriever class, based on
https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.ipynb.

Testing still WIP, but the logic is correct (I have a local
implementation outside of Langchain working).

---------

Co-authored-by: Lance Martin <122662504+PineappleExpress808@users.noreply.github.com>
Co-authored-by: rlm <31treehaus@31s-MacBook-Pro.local>
2023-04-15 12:49:59 -07:00
Harrison Chase
baf350e32b parametrize redis (#2946) 2023-04-15 12:47:36 -07:00
dev2049
36aa7f30e4 Move PythonRepl -> langchain.utilities (#2917) 2023-04-15 10:50:25 -07:00
dev2049
7c73e9df5d Add kwargs to VectorStore.maximum_marginal_relevance (#2921)
Same as similarity_search, allows child classes to add vector
store-specific args (this was technically already happening in couple
places but now typing is correct).
2023-04-15 10:49:49 -07:00
Davit Buniatyan
b3a5b51728 [minor] Deep Lake auth improvements in docs, kwargs pass, faster tests (#2927)
Minor cosmetic changes 
- Activeloop environment cred authentication in notebooks with
`getpass.getpass` (instead of CLI which not always works)
- much faster tests with Deep Lake pytest mode on 
- Deep Lake kwargs pass

Notes
- I put pytest environment creds inside `vectorstores/conftest.py`, but
feel free to suggest a better location. For context, if I put in
`test_deeplake.py`, `ruff` doesn't let me to set them before import
deeplake

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-04-15 10:49:16 -07:00
Harrison Chase
c4ae8c1d24 bump ver to 140 (#2895) 2023-04-15 09:23:19 -07:00
Nahin Khan
ad3973a3b8 Fix typo (#2942) 2023-04-15 08:53:25 -07:00
Harrison Chase
cf2789d86d delete antropic chat notebook (#2945) 2023-04-15 08:48:51 -07:00
Hai Nguyen Mau
0aa828b1dc typo fix (#2937)
missing w in link
2023-04-15 08:31:43 -07:00
Ankush Gola
ec59e9d886 Fix ChatAnthropic stop_sequences error (#2919) (#2920)
Note to self: Always run integration tests, even on "that last minute
change you thought would be safe" :)

---------

Co-authored-by: Mike Lambert <mike.lambert@anthropic.com>
2023-04-14 17:22:01 -07:00
Akash NP
13a0ed064b add encoding to avoid UnicodeDecodeError (#2908)
**About**
Specify encoding to avoid UnicodeDecodeError when reading .txt for users
who are following the tutorial.

**Reference**
```
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1205: character maps to <undefined>
```

**Environment**
OS: Win 11
Python: 3.8
2023-04-14 16:36:03 -07:00
Mike Lambert
392f1b3218 Add Anthropic ChatModel to langchain (#2293)
* Adds an Anthropic ChatModel
* Factors out common code in our LLMModel and ChatModel
* Supports streaming llm-tokens to the callbacks on a delta basis (until
a future V2 API does that for us)
* Some fixes
2023-04-14 15:09:07 -07:00
Kwuang Tang
66bef1d7ed Ignore files from .gitignore in Git loader (#2909)
fixes #2905 

extends #2851
2023-04-14 15:02:21 -07:00
Boris Feld
7ee87eb0c8 Comet callback updates (#2889)
I'm working with @DN6 and I made some small fixes and
improvements after playing with the integration.
2023-04-14 13:19:58 -07:00
dev2049
634358db5e Fix OpenAI LLM docstring (#2910) 2023-04-14 11:09:36 -07:00
pranjaldoshi96
30573b2e30 Correct instruction to use openweathermap utility in docstring (#2906)
Co-authored-by: Pranjal Doshi <pranjald@nvidia.com>
2023-04-14 10:46:20 -07:00
Kwuang Tang
a508afa91c Add file filter param to Git loader (#2904)
Allows users to specify what files should be loaded instead of
indiscriminately loading the entire repo.

extends #2851 

NOTE: for reviewers, `hide whitespace` option recommended since I
changed the indentation of an if-block to use `continue` instead so it
looks less like a Christmas tree :)
2023-04-14 10:45:54 -07:00
Ismail Pelaseyed
7e525a3b91 Add link to repo for deploying LangChain to Digitalocean App Platform (#2894)
This PR adds a link to a minimal example of deploying `LangChain` to
`Digitalocean App Platform`.
2023-04-14 08:55:21 -07:00
Peter Stolz
ccacf804a8 Fix format string in pinecone error handling (#2897) 2023-04-14 08:53:02 -07:00
Francis Felici
86189cdcf9 Update load_qa_chain() docstring (#2900)
Seems to be missing `map_rerank` as a potential argument of
`chain_type`
2023-04-14 08:51:30 -07:00
Harrison Chase
8fef69296d nits (#2873) 2023-04-14 07:55:12 -07:00
Harrison Chase
0a38bbc750 updates to vectorstore memory (#2875) 2023-04-14 07:54:57 -07:00
Ikko Eltociear Ashimine
203c0eb2ae docs: update getting_started.ipynb (#2883)
HuggingFace -> Hugging Face
2023-04-14 07:40:26 -07:00
ecneladis
1a44b71ddf Fix Baby AGI notebooks (#2882)
- fix broken notebook cell in
ae485b623d
- Python Black formatting
2023-04-14 07:40:04 -07:00
Nicolas
3c7204d604 docs: Quick fix to Mendable Search (#2876)
Fixed a small issue on the icon UI when using in Safari.
2023-04-13 23:15:57 -07:00
Harrison Chase
1e9378d0a8 Harrison/weaviate fixes (#2872)
Co-authored-by: cs0lar <cristiano.solarino@gmail.com>
Co-authored-by: cs0lar <cristiano.solarino@brightminded.com>
2023-04-13 22:37:34 -07:00
Harrison Chase
07d7096de6 Harrison/playwright (#2871)
Co-authored-by: Manuel Saelices <msaelices@gmail.com>
2023-04-13 22:15:03 -07:00
Jon Luo
5565f56273 Use SQL dialect-specific prompts for SQLDatabaseChain (#2748)
Mentioned the idea here initially:
https://github.com/hwchase17/langchain/pull/2106#issuecomment-1487509106

Since there have been dialect-specific issues, we should use
dialect-specific prompts. This way, each prompt can be separately
modified to best suit each dialect as needed. This adds a prompt for
each dialect supported in sqlalchemy (mssql, mysql, mariadb, postgres,
oracle, sqlite). For this initial implementation, the only differencse
between the prompts is the instruction for the clause to use to limit
the number of rows queried for, and the instruction for wrapping column
names using each dialect's identifier quote character.
2023-04-13 22:10:49 -07:00
drod
9907cb0485 Refactor similarity_search function in elastic_vector_search.py (#2761)
Optimization :Limit search results when k < 10
Fix issue when k > 10: Elasticsearch will return only 10 docs


[default-search-result](https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html)
By default, searches return the top 10 matching hits

Add size parameter to the search request to limit the number of returned
results from Elasticsearch. Remove slicing of the hits list, since the
response will already contain the desired number of results.
2023-04-13 22:09:00 -07:00
rafael
1cc7ea333c chat_models.openai: Set tenacity timeout to openai's recommendation (#2768)
[OpenAI's
cookbook](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_handle_rate_limits.ipynb)
suggest a tenacity backoff between 1 and 60 seconds. Currently
langchain's backoff is between 4 and 10 seconds, which causes frequent
timeout errors on my end.

This PR changes the timeout to the suggested values.
2023-04-13 22:08:46 -07:00
Harrison Chase
705596b46a Harrison/fix create sql agent (#2870)
Co-authored-by: Timothé Pearce <timothe.pearce@gmail.com>
2023-04-13 22:07:58 -07:00
Harrison Chase
8a98e5b50b Harrison/index name (#2869)
Co-authored-by: Mesum Raza Hemani <mes.javacca@gmail.com>
2023-04-13 22:01:32 -07:00
Andrey Vasnetsov
dcb17503f2 Update qdrant.py (#2750)
At the moment of upload we should already know the format of data,
therefore we can skip the costly pydantic validation.
2023-04-13 21:57:05 -07:00
ecneladis
74abeb8c53 Update output in Git notebook (#2868)
Supplemental to https://github.com/hwchase17/langchain/pull/2851.
Updates one notebook cell that I forgot to commit before.
2023-04-13 21:56:17 -07:00
Nicolas
0226b375d9 docs: Mendable Search integration (#2803)
Mendable Seach Integration is Finally here!

Hey yall, 

After various requests for Mendable in Python docs, we decided to get
our hands dirty and try to implement it.
Here is a version where we implement our **floating button** that sits
on the bottom right of the screen that once triggered (via press or CMD
K) will work the same as the js langchain docs.

Super excited about this and hopefully the community will be too.
@hwchase17 will send you the admin details via dm etc. The anon_key is
fine to be public.

Let me know if you need any further customization. I added the langchain
logo to it.
2023-04-13 21:52:25 -07:00
sergerdn
04c458a270 feat: improve pinecone tests (#2806)
Improve the integration tests for Pinecone by adding an `.env.example`
file for local testing. Additionally, add some dev dependencies
specifically for integration tests.

This change also helps me understand how Pinecone deals with certain
things, see related issues
https://github.com/hwchase17/langchain/issues/2484
https://github.com/hwchase17/langchain/issues/2816
2023-04-13 21:49:31 -07:00
ecneladis
016738e676 Add GitLoader (#2851) 2023-04-13 21:39:20 -07:00
lizelive
8cfec2c5fe torch 2 support (#2865)
Lang-chain seems to work with torch 2
2023-04-13 21:38:49 -07:00
vowelparrot
bf0887c486 Add Slack Directory Loader (#2841)
Fixes linting issue from #2835 

Adds a loader for Slack Exports which can be a very valuable source of
knowledge to use for internal QA bots and other use cases.

```py
# Export data from your Slack Workspace first.
from langchain.document_loaders import SLackDirectoryLoader

SLACK_WORKSPACE_URL = "https://awesome.slack.com"

loader = ("Slack_Exports", SLACK_WORKSPACE_URL)
docs = loader.load()
```
2023-04-13 21:31:59 -07:00
Harrison Chase
ed2ef5cbe4 Harrison/rwkv utf8 (#2867)
Co-authored-by: Akihiro <ueyama0105@gmail.com>
2023-04-13 21:31:18 -07:00
Adam McCabe
6be5d7c612 Update reduce_openapi_spec for PATCH and DELETE (#2861)
My recent pull request (#2729) neglected to update the
`reduce_openapi_spec` in spec.py to also accommodate PATCH and DELETE
added to planner.py and prompt_planner.py.
2023-04-13 20:27:40 -07:00
Benjamin Tan Wei Hao
c26a259ba6 Fix tiny typo (#2863) 2023-04-13 20:26:26 -07:00
Jon Luo
f3180f05f9 Update sql chain notebook to clarify use of SQLAlchemy for connections (#2850)
Have seen questions about whether or not the `SQLDatabaseChain` supports
more than just sqlite, which was unclear in the docs, so tried to
clarify that and how to connect to other dialects.
2023-04-13 11:46:59 -07:00
leo-gan
ecc1a0c051 added code-analysis-deeplake.ipynb (#2844)
This notebook is heavily copied from the
`twitter-the-algorithm-analysis-deeplake.ipynb`
2023-04-13 11:29:59 -07:00
Tim Asp
70ffe470aa Add easy print method to openai callback (#2848)
Found myself constantly copying the snippet outputting all the callback
tracking details. so adding a simple way to output the full context
2023-04-13 11:28:42 -07:00
Tim Asp
be4fb24b32 OpenAI LLM: update modelname_to_contextsize with new models (#2843)
Token counts pulled from https://openai.com/pricing
2023-04-13 11:13:34 -07:00
vowelparrot
82d1d5f24e Fix grammar in Vector Memory Docs (#2847) 2023-04-13 11:00:09 -07:00
Tim Asp
53dc157145 [Docs] minor fixes to loaders links and rst warnings (#2846)
The doc loaders index was picking up a bunch of subheadings because I
mistakenly made the MD titles H1s. Fixed that.

also the easy minor warnings from docs_build
2023-04-13 10:54:40 -07:00
Harrison Chase
1609950597 Harrison/retriever memory (#2804)
Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>
2023-04-13 10:03:43 -07:00
Rounak Datta
7688bf9182 WhatsApp document loader - update regex (#2776)
I was testing out the WhatsApp Document loader, and noticed that
sometimes the date is of the following format (notice the additional
underscore):
```
3/24/23, 1:54_PM - +91 99999 99999 joined using this group's invite link
3/24/23, 6:29_PM - +91 99999 99999: When are we starting then?
```

Wierdly, the underscore is visible in Vim, but not on editors like
VSCode. I presume it is some unusual character/line terminator.
Nevertheless, I think handling this edge case will make the document
loader more robust.
2023-04-13 09:48:32 -07:00
vowelparrot
2db9b7a45d Revert "Add Slack Directory Loader (#2835)" (#2839)
This reverts commit a6f767ae7a.

To fix the linting error.
2023-04-13 09:42:54 -07:00
KullTC
802363eb6a Remove print statement from test (#2809)
Remove unnecessary print statement.
2023-04-13 09:31:48 -07:00
Azam Iftikhar
2a89dc8c1c Fixing factually incorrect example (#2810)
### https://github.com/hwchase17/langchain/issues/2802
It appears that Google's Flan model may not perform as well as other
models, I used a simple example to get factually correct answer.
2023-04-13 08:42:39 -07:00
vowelparrot
a6f767ae7a Add Slack Directory Loader (#2835)
Adds a loader for Slack Exports which can be a very valuable source of
    knowledge to use for internal QA bots and other use cases.

    ```py
    # Export data from your Slack Workspace first.
    from langchain.document_loaders import SLackDirectoryLoader

    SLACK_WORKSPACE_URL = "https://awesome.slack.com"

    loader = ("Slack_Exports", SLACK_WORKSPACE_URL)
    docs = loader.load()
```

---------

Co-authored-by: Mikhail Dubov <mikhail@chattermill.io>
2023-04-13 08:39:07 -07:00
st01cs
4f231b46ee Add openai.api_base to support openapi proxy (#2823)
I need access openai api through a proxy, so to add openai.api_base to
support this method.

Co-authored-by: bijia <bijia1@xiaomi.com>
2023-04-13 08:35:36 -07:00
Harrison Chase
414dc803b6 bump version to 139 (#2834) 2023-04-13 08:34:08 -07:00
Preetesh Jain
61858c5a08 Fix headings in docs (ClearML and Comet) (#2808)
This PR fixes the document structure in the
[Ecosystem](https://python.langchain.com/en/latest/ecosystem.html) page.
Also adds a fix for the heading on the
[Comet](https://python.langchain.com/en/latest/ecosystem/comet_tracking.html)
page for more consistency with other ecosystem tools.

## Screenshot

<img width="878" alt="image"
src="https://user-images.githubusercontent.com/6207830/231674921-9bf25376-cf14-4dba-be3c-08e0abda6154.png">

<img width="869" alt="image"
src="https://user-images.githubusercontent.com/6207830/231675105-d8e42df4-2d01-435b-9e09-3371522fd2ce.png">
2023-04-13 08:24:16 -07:00
Harrison Chase
9a96691803 cr 2023-04-13 08:23:33 -07:00
了空
324e9c83d5 Add BiliBiliLoader to langchain.document_loaders.__init__.py (#2826) 2023-04-13 06:47:27 -07:00
Nuhman Pk
ed03e965de Update README.md (#2805)
Added total download in a month (https://pepy.tech/project/langchain)
2023-04-12 22:02:06 -07:00
KullTC
64596b23b9 Return output of PythonAstREPLTool when falling back to exec() (#2780)
When the code ran by the PythonAstREPLTool contains multiple statements
it will fallback to exec() instead of using eval(). With this change, it
will also return the output of the code in the same way the
PythonREPLTool will.
2023-04-12 21:22:46 -07:00
Harrison Chase
1bb0706955 Harrison/comet ml (#2799)
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Boris Feld <lothiraldan@gmail.com>
2023-04-12 21:21:51 -07:00
Harrison Chase
b2bc5ef56a agent refactor (#2801) 2023-04-12 21:21:41 -07:00
Zach Jones
abfca72c0b Add max_execution_time to openapi, pandas, and sql creators (#2779)
In #2399 we added the ability to set `max_execution_time` when creating
an AgentExecutor. This PR adds the `max_execution_time` argument to the
built-in pandas, sql, and openapi agents.

Co-authored-by: Zachary Jones <zjones@zetaglobal.com>
2023-04-12 17:09:42 -07:00
Matt Robinson
f0be3b0689 feat: add support for non-html in UnstructuredURLLoader (#2793)
### Summary

Adds support for processing non HTML document types in the URL loader.
For example, the URL loader can now process a PDF or markdown files
hosted at a URL.

### Testing

```python
from langchain.document_loaders import UnstructuredURLLoader

urls = ["https://www.understandingwar.org/sites/default/files/Russian%20Offensive%20Campaign%20Assessment%2C%20April%2011%2C%202023.pdf"]

loader = UnstructuredURLLoader(urls=urls, strategy="fast")
docs = loader.load()
print(docs[0].page_content[:1000])
```
2023-04-12 17:06:28 -07:00
Tim Connors
e081c62aac Fixed k=0 bug on ConversationBufferWindowMemory (#2796)
Updated the "load_memory_variables" function of the
ConversationBufferWindowMemory to support a window size of 0 (k=0).
Previous behavior would return the full memory instead of an empty
array.
2023-04-12 17:05:54 -07:00
dev2049
a094b7f807 Improve eval chain prompt (#2798)
Eval chain is currently very sensitive to differences in phrasing,
punctuation, and tangential information. This prompt has worked better
for me on my examples.

More general q: Do we have any framework for evaluating default prompt
changes? Could maybe start doing some regression testing?
2023-04-12 17:05:20 -07:00
Kah Keng Tay
1c7fb31bba Weaviate attributes and error handling (#2800) 2023-04-12 17:04:42 -07:00
dev2049
0e763677e4 Fix typo in qa eval chain prompt (#2797) 2023-04-12 14:17:25 -07:00
Harrison Chase
e49f1e628c Harrison/gpt cache (#2744)
Co-authored-by: SimFG <bang.fu@zilliz.com>
2023-04-12 14:16:58 -07:00
Harrison Chase
425c437cd3 cr 2023-04-12 13:46:58 -07:00
Harrison Chase
a2d729e537 cr 2023-04-12 13:44:21 -07:00
Harrison Chase
7adbc4fbb4 agent memory (#2792) 2023-04-12 12:51:15 -07:00
Nuno Campos
1bea9ea4be Fix async task being destroyed before cancelled (#2787) 2023-04-12 12:38:38 -07:00
Harrison Chase
819d72614a version 138 (#2782) 2023-04-12 11:10:47 -07:00
wangml999
fa0c9390c2 Update custom_agent.ipynb (#2767)
Fixed an issue the agent is not taking the user's question as input.
2023-04-12 09:13:46 -07:00
Joshua Snyder
59d054308c Add type inference for output parsers (#2769)
Currently, the output type of a number of OutputParser's `parse` methods
is `Any` when it can in fact be inferred.

This PR makes BaseOutputParser use a generic type and fixes the output
types of the following parsers:
- `PydanticOutputParser`
- `OutputFixingParser`
- `RetryOutputParser`
- `RetryWithErrorOutputParser`

The output of the `StructuredOutputParser` is corrected from `BaseModel`
to `Any` since there are no type guarantees provided by the parser.

Fixes issue #2715
2023-04-12 09:12:20 -07:00
Nuhman Pk
789cc314c5 Typo (#2747) 2023-04-12 09:06:30 -07:00
Harrison Chase
b92a89e29f cr 2023-04-11 23:52:14 -07:00
vowelparrot
94a92abf24 Add Retrieval Example for AI Plugins (#2737)
This PR proposes
- An NLAToolkit method to instantiate from an AI Plugin URL
- A notebook that shows how to use that alongside an example of using a
Retriever object to lookup specs and route queries to them on the fly

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-11 23:22:14 -07:00
Nuhman Pk
b5bbe601fb Update chatgpt_plugins.ipynb (#2745)
Changed deprecated requests to requests_all in plugins example
2023-04-11 22:45:31 -07:00
Harrison Chase
b38a6ea7df Harrison/apply llm flag (#2743)
Co-authored-by: Nick Gibb <gibbnick@gmail.com>
Co-authored-by: Nick Gibb <nick.gibb@bluedot.global>
2023-04-11 22:02:37 -07:00
vr140
dd59193757 Remove unnecessary method from Qdrant vectorstore and clean up docstrings (#2700)
**Problem:**

The `from_documents` method in Qdrant vectorstore is unnecessary because
it does not change any default behavior from the abstract base class
method of `from_documents` (contrast this with the method in Chroma
which makes a change from default and turns `embeddings` into an
Optional parameter).

Also, the docstrings need some cleanup.

**Solution:**

Remove unnecessary method and improve docstrings.

---------

Co-authored-by: Vijay Rajaram <vrajaram3@gatech.edu>
2023-04-11 21:34:22 -07:00
Matthew Plachter
933dfac583 Add Zapier NLA OAuth access_token to be used (#2726)
This change allows the user to initialize the ZapierNLAWrapper with a
valid Zapier NLA OAuth Access_Token, which would be used to make
requests back to the Zapier NLA API.

When a `zapier_nla_oauth_access_token` is passed to the ZapierNLAWrapper
it is no longer required for the `ZAPIER_NLA_API_KEY ` environment
variable to be set, still having it set will not affect the behavior as
the `zapier_nla_oauth_access_token` will be used over the
`ZAPIER_NLA_API_KEY`
2023-04-11 21:32:54 -07:00
Harrison Chase
507cee5ee5 Harrison/pinecone hybrid update (#2742)
Co-authored-by: acatav <39461369+acatav@users.noreply.github.com>
Co-authored-by: Amnon Catav <catav.amnon1@gmail.com>
2023-04-11 21:32:17 -07:00
Johnny Lee
744c25cd0a Updating YoutubeLoader.from_youtube_channel name and doc to reflect actual usage (#2734)
the function actually updates video_id from URL not channel.

The docs still reflect the previous old function name
`from_youtube_url`. Resolves #1962


https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/youtube.html
2023-04-11 21:12:58 -07:00
Johnny Lee
0ab364404e add continue to fix 'continue_on_failure' parameter for URL doc loader (#2735)
Currently, the function still fails if `continue_on_failure` is set to
True, because `elements` is not set.

---------

Co-authored-by: leecjohnny <johnny-lee1255@users.noreply.github.com>
2023-04-11 21:12:39 -07:00
sergerdn
4bdcedab54 fix: some imports for integration tests (#2612)
Add more missed imports for integration tests. Bump `pytest` to the
current latest version.
Fix `tests/integration_tests/vectorstores/test_elasticsearch.py` to
update its cassette(easy fix).

Related PR: https://github.com/hwchase17/langchain/pull/2560
2023-04-11 20:45:36 -07:00
Ankush Gola
c1521ddbdb Add workaround for not having async vector store methods (#2733)
This allows us to use the async API for the Retrieval chains, though it is not guaranteed to be thread safe.
2023-04-11 18:49:08 -07:00
vowelparrot
0806951c07 Update VectorStore Class Method Typing (#2731)
Avoid using placeholder methods that only perform a `cast()`
operation because the typing would otherwise be inferred to be the
parent `VectorStore` class. This is unnecessary with TypeVar's.
2023-04-11 14:14:49 -07:00
Adam McCabe
446c3d586c Add PATCH and DELETE to OpenAPI Agent (#2729)
This PR proposes an update to the OpenAPI Planner and Planner Prompts to
make Patch and Delete available to the planner and executor. I followed
the same patterns as for GET and POST, and made some updates to the
examples available to the Planner and Orchestrator.

Of note, I tried to write prompts for DELETE such that the model will
only execute that job if the User specifically asks for a 'Delete' (see
the Prompt_planner.py examples to see specificity), or if the User had
previously authorized the Delete in the Conversation memory. Although
PATCH also modifies existing data, I considered it lower risk and so did
not try to enforce the same restrictions on the Planner.
2023-04-11 13:26:04 -07:00
vinoyang
8073bc849f Minor: Remove duplicated word in error message (#2706)
Removed the duplicated word "it" from the error message.
From:
`Please it install it with xxx`
To:
`Please install it with xxx`.
2023-04-11 13:10:33 -07:00
134ARG
1e60e6e15b Fix the unset argument in calling llama model (#2714)
When using the llama.cpp together with agent like
zero-shot-react-description, the missing branch will cause the parameter
`stop` left empty, resulting in unexpected output format from the model.

This patch fixes that issue.
2023-04-11 11:02:39 -07:00
Joshua Snyder
f435f2267c Use tiktoken for Python 3.8 (#2709)
Fixes issue #2677

`tiktoken` is supported for Python 3.8, so there is no need to use the
fallback GPT-2 tokenizer.
2023-04-11 11:02:28 -07:00
Kei Kamikawa
186ca9d3e4 fixed aiohttp.client_exceptions.ClientConnectionError: Connection closed (#2718)
I fixed an issue where an error would always occur when making a request
using the `TextRequestsWrapper` with async API.

This is caused by escaping the scope of the context, which causes the
connection to be broken when reading the response body.

The correct usage is as described in the [official
tutorial](https://docs.aiohttp.org/en/stable/client_quickstart.html#make-a-request),
where the text method must also be handled in the context scope.

<details>

<summary>Stacktrace</summary>

```
  File "/home/vscode/.cache/pypoetry/virtualenvs/codehex-workspace-xS3fZVNL-py3.11/lib/python3.11/site-packages/langchain/tools/base.py", line 116, in arun
    raise e
  File "/home/vscode/.cache/pypoetry/virtualenvs/codehex-workspace-xS3fZVNL-py3.11/lib/python3.11/site-packages/langchain/tools/base.py", line 110, in arun
    observation = await self._arun(tool_input)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.cache/pypoetry/virtualenvs/codehex-workspace-xS3fZVNL-py3.11/lib/python3.11/site-packages/langchain/agents/tools.py", line 22, in _arun
    return await self.coroutine(tool_input)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.cache/pypoetry/virtualenvs/codehex-workspace-xS3fZVNL-py3.11/lib/python3.11/site-packages/langchain/chains/base.py", line 234, in arun
    return (await self.acall(args[0]))[self.output_keys[0]]
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.cache/pypoetry/virtualenvs/codehex-workspace-xS3fZVNL-py3.11/lib/python3.11/site-packages/langchain/chains/base.py", line 154, in acall
    raise e
  File "/home/vscode/.cache/pypoetry/virtualenvs/codehex-workspace-xS3fZVNL-py3.11/lib/python3.11/site-packages/langchain/chains/base.py", line 148, in acall
    outputs = await self._acall(inputs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/src/tools/example.py", line 153, in _acall
    api_response = await self.requests_wrapper.aget("http://example.com")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.cache/pypoetry/virtualenvs/codehex-workspace-xS3fZVNL-py3.11/lib/python3.11/site-packages/langchain/requests.py", line 130, in aget
    return await response.text()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.cache/pypoetry/virtualenvs/codehex-workspace-xS3fZVNL-py3.11/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1081, in text
    await self.read()
  File "/home/vscode/.cache/pypoetry/virtualenvs/codehex-workspace-xS3fZVNL-py3.11/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1037, in read
    self._body = await self.content.read()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.cache/pypoetry/virtualenvs/codehex-workspace-xS3fZVNL-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 349, in read
  raise self._exception
aiohttp.client_exceptions.ClientConnectionError: Connection closed
```

</details>
2023-04-11 10:52:55 -07:00
Dogan Can Bakir
3623bdb31b Make the OpenAPI agent's verbose print optional (#2666) 2023-04-11 10:42:39 -07:00
vowelparrot
709f26b69e Added bilibili loader (#2673) (#2724)
I've added a bilibili loader, bilibili is a very active video site in
China and I think we need this loader.

Example:
```python
from langchain.document_loaders.bilibili import BiliBiliLoader

loader = BiliBiliLoader(
       ["https://www.bilibili.com/video/BV1xt411o7Xu/",
       "https://www.bilibili.com/video/av330407025/"]
)
docs = loader.load()
```

Co-authored-by: 了空 <568250549@qq.com>
2023-04-11 10:40:32 -07:00
David Wu
d42deff402 fixed typo (#2720)
changed "to" to "too" in the memory notebook
2023-04-11 09:53:38 -07:00
David Wu
263ce40844 added a missing word (typo) (#2719)
Changed from "You may often to" to "You may often have to" to fix the
sentence.
2023-04-11 09:09:28 -07:00
Harrison Chase
66786b0f0f cr 2023-04-11 08:16:06 -07:00
Harrison Chase
948b14b52a agents docs and version bump (#2717) 2023-04-11 08:08:43 -07:00
Abhik Singla
955bd2e1db Fixed Ast Python Repl for Chatgpt multiline commands (#2406)
Resolves issue https://github.com/hwchase17/langchain/issues/2252

---------

Co-authored-by: Abhik Singla <abhiksingla@microsoft.com>
2023-04-10 21:25:03 -07:00
Harrison Chase
1271c00ff0 Harrison/openapi planner (#2692)
Co-authored-by: Adam McCabe <adam.r.mccabe@gmail.com>
2023-04-10 21:22:42 -07:00
Harrison Chase
e0a13e9355 Harrison/postgres (#2691)
Co-authored-by: Ankit Jain <ankneo@users.noreply.github.com>
2023-04-10 21:15:42 -07:00
Guohao Li
bb5118f4c9 Add notebook example for camel role playing (#2689)
This PR adds a LangChain implementation of CAMEL role-playing example:
https://github.com/lightaime/camel.

I am sorry that I am not that familiar with LangChain. So I only
implement it in a naive way. There may be a better way to implement it.
2023-04-10 21:12:45 -07:00
Harrison Chase
d3f779d61d baby agi agent (#2648)
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
2023-04-10 21:03:30 -07:00
Naveen Tatikonda
4364d3316e Add custom vector fields and text fields for OpenSearch (#2652)
**Description**
Add custom vector field name and text field name while indexing and
querying for OpenSearch

**Issues**
https://github.com/hwchase17/langchain/issues/2500

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-04-10 21:02:02 -07:00
Pavel Shibanov
023de9a70b Add OpenAIEmbeddings special token params for tiktoken (#2682)
#2681 

Original type hints
```python
allowed_special: Union[Literal["all"], AbstractSet[str]] = set(),  # noqa: B006
disallowed_special: Union[Literal["all"], Collection[str]] = "all",
```
from

46287bfa49/tiktoken/core.py (L79-L80)
are not compatible with pydantic

<img width="718" alt="image"
src="https://user-images.githubusercontent.com/5096640/230993236-c744940e-85fb-4baa-b9da-8b00fb60a2a8.png">

I think we could use
```python
allowed_special: Union[Literal["all"], Set[str]] = set()
disallowed_special: Union[Literal["all"], Set[str], Tuple[()]] = "all"
```

Please let me know if you would like to implement it differently.
2023-04-10 21:00:55 -07:00
Nikita Zavgorodnii
1c979e320d docs: update tokenizer notice in llms/getting_started (#2641)
A tiny update in docs which is spotted here:
https://github.com/hwchase17/langchain/issues/2439
2023-04-10 20:55:45 -07:00
Yasin Tatar
9d20fd5135 add: conda installation instructions (#2678)
Hi, 

just wanted to mention that I added `langchain` to
[conda-forge](https://github.com/conda-forge/langchain-feedstock), so
that it can be installed with `conda`/`mamba` etc.
This makes it available to some corporate users with custom
conda-servers and people who like to manage their python envs with
conda.
2023-04-10 20:54:13 -07:00
vr140
28bef6f87d Clean up OpenAI Embeddings to fix method name and comments (#2687)
**Problem:**

OpenAI Embeddings has a few minor issues: method name and comment for
_completion_with_retry seems to be a copypasta error and a few comments
around usage of embedding_ctx_length seem to be incorrect.

**Solution:**

Clean up issues.

---------

Co-authored-by: Vijay Rajaram <vrajaram3@gatech.edu>
2023-04-10 20:53:56 -07:00
Harrison Chase
ad3c5dd186 Harrison/databerry (#2688)
Co-authored-by: Georges Petrov <georgesm.petrov@gmail.com>
2023-04-10 18:49:47 -07:00
Filip Haltmayer
b286d0e63f Adding milvus/zilliz into docs (#2686)
Adding Milvus and Zilliz to integrations.md and creating an ecosystems
doc for Zilliz.

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
2023-04-10 18:08:41 -07:00
Sean Sheng
90d5328eda docs: Update deployments.md to include a BentoML example (#2661)
Add a new deployment example with BentoML, see more
https://github.com/ssheng/BentoChain.
2023-04-10 14:57:32 -07:00
Tommertom
bd9f095ed2 Doc - Update google_search.ipynb - more explicit reference to places where to create API keys (#2670)
Took me a bit to find the proper places to get the API keys. The link
earlier provided to setup search is still good, but why not provide
direct link to the Google cloud tools that give you ability to create
keys?
2023-04-10 12:36:52 -07:00
Ankush Gola
e23a596a18 SqlDatabaseToolkit should have custom llm for QueryChecke… (#2676)
…rTool (#2655)

---------

Co-authored-by: Rushabh Agarwal <26388764+rushout09@users.noreply.github.com>
2023-04-10 11:43:24 -07:00
Ankush Gola
8d3b059332 Add docs for callbacks (#2643)
Basically copy what's in the ts docs:
https://js.langchain.com/docs/production/callbacks


Discovered a bug wrt not awaiting callbacks in `LLMMathChain` so fixed
that
2023-04-10 10:23:11 -07:00
Dmitri Melikyan
1931d4495e Update Graphsignal ecosystem page (#2662)
Added/updated information due to new automatic data recording feature.
2023-04-10 08:00:26 -07:00
Harrison Chase
e63f9a846b Harrison/docs agents (#2647) 2023-04-09 22:34:34 -07:00
Ankush Gola
b82cbd1be0 Use run and arun in place of combine_docs and acombine_docs (#2635)
`combine_docs` does not go through the standard chain call path which
means that chain callbacks won't be triggered, meaning QA chains won't
be traced properly, this fixes that.

Also fix several errors in the chat_vector_db notebook
2023-04-09 18:47:59 -07:00
Chetanya Rastogi
50c511d75f Add new loader to load pdf as html content (#2607)
Adds a new pdf loader using the existing dependency on PDFMiner. 

The new loader can be helpful for chunking texts semantically into
sections as the output html content can be parsed via `BeautifulSoup` to
get more structured and rich information about font size, page numbers,
pdf headers/footers, etc. which may not be available otherwise with
other pdf loaders
2023-04-09 17:57:25 -07:00
Ankush Gola
61f7bd7a3a fix question answering nb (#2637)
Was throwing exception bc `VectorIndexWrapper` did not have
`similarity_search` -- changed to just use retriever
2023-04-09 17:56:49 -07:00
William FH
10ff1fda8e Add Streaming for GPT4All (#2642)
- Adds  support for callback handlers in GPT4All models
- Updates notebook and docs
2023-04-09 17:54:26 -07:00
Ankush Gola
c51753250d Add async call to APIChain. (#2583) (#2644)
Co-authored-by: Yan <32036413+Yan-Zero@users.noreply.github.com>
2023-04-09 16:28:16 -07:00
William FH
e56673c7f9 BabyAGI Notebook Example (#2559)
Create a notebook implementing
[BabyAGI](https://github.com/yoheinakajima/babyagi/tree/main) by [Yohei
Nakajima](https://twitter.com/yoheinakajima) as LLM Chains.
2023-04-09 13:54:23 -07:00
Harrison Chase
7c1dd3057f cr 2023-04-09 13:10:46 -07:00
209 changed files with 214527 additions and 2611 deletions

View File

@@ -2,7 +2,7 @@
⚡ Building applications with LLMs through composability ⚡
[![lint](https://github.com/hwchase17/langchain/actions/workflows/lint.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/lint.yml) [![test](https://github.com/hwchase17/langchain/actions/workflows/test.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/test.yml) [![linkcheck](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai) [![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
[![lint](https://github.com/hwchase17/langchain/actions/workflows/lint.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/lint.yml) [![test](https://github.com/hwchase17/langchain/actions/workflows/test.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/test.yml) [![linkcheck](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml) [![Downloads](https://static.pepy.tech/badge/langchain/month)](https://pepy.tech/project/langchain) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai) [![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
**Production Support:** As you move your LangChains into production, we'd love to offer more comprehensive support.
Please fill out [this form](https://forms.gle/57d8AmXBYp8PP8tZA) and we'll set up a dedicated support Slack channel.
@@ -10,6 +10,8 @@ Please fill out [this form](https://forms.gle/57d8AmXBYp8PP8tZA) and we'll set u
## Quick Install
`pip install langchain`
or
`conda install langchain -c conda-forge`
## 🤔 What is this?

BIN
docs/_static/DataberryDashboard.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 157 KiB

View File

@@ -11,3 +11,7 @@ pre {
max-width: 2560px !important;
}
}
#my-component-root *, #headlessui-portal-root * {
z-index: 1000000000000;
}

58
docs/_static/js/mendablesearch.js vendored Normal file
View File

@@ -0,0 +1,58 @@
document.addEventListener('DOMContentLoaded', () => {
// Load the external dependencies
function loadScript(src, onLoadCallback) {
const script = document.createElement('script');
script.src = src;
script.onload = onLoadCallback;
document.head.appendChild(script);
}
function createRootElement() {
const rootElement = document.createElement('div');
rootElement.id = 'my-component-root';
document.body.appendChild(rootElement);
return rootElement;
}
function initializeMendable() {
const rootElement = createRootElement();
const { MendableFloatingButton } = Mendable;
const iconSpan1 = React.createElement('span', {
}, '🦜');
const iconSpan2 = React.createElement('span', {
}, '🔗');
const icon = React.createElement('p', {
style: { color: '#ffffff', fontSize: '22px',width: '48px', height: '48px', margin: '0px', padding: '0px', display: 'flex', alignItems: 'center', justifyContent: 'center', textAlign: 'center' },
}, [iconSpan1, iconSpan2]);
const mendableFloatingButton = React.createElement(
MendableFloatingButton,
{
style: { darkMode: false, accentColor: '#010810' },
floatingButtonStyle: { color: '#ffffff', backgroundColor: '#010810' },
anon_key: '82842b36-3ea6-49b2-9fb8-52cfc4bde6bf', // Mendable Search Public ANON key, ok to be public
messageSettings: {
openSourcesInNewTab: false,
},
icon: icon,
}
);
ReactDOM.render(mendableFloatingButton, rootElement);
}
loadScript('https://unpkg.com/react@17/umd/react.production.min.js', () => {
loadScript('https://unpkg.com/react-dom@17/umd/react-dom.production.min.js', () => {
loadScript('https://unpkg.com/@mendable/search@0.0.83/dist/umd/mendable.min.js', initializeMendable);
});
});
});

View File

@@ -103,5 +103,10 @@ html_static_path = ["_static"]
html_css_files = [
"css/custom.css",
]
html_js_files = [
"js/mendablesearch.js",
]
nb_execution_mode = "off"
myst_enable_extensions = ["colon_fence"]

View File

@@ -33,10 +33,19 @@ It implements a Question Answering app and contains instructions for deploying t
A minimal example on how to run LangChain on Vercel using Flask.
## [Digitalocean App Platform](https://github.com/homanp/digitalocean-langchain)
A minimal example on how to deploy LangChain to DigitalOcean App Platform.
## [SteamShip](https://github.com/steamship-core/steamship-langchain/)
This repository contains LangChain adapters for Steamship, enabling LangChain developers to rapidly deploy their apps on Steamship.
This includes: production ready endpoints, horizontal scaling across dependencies, persistant storage of app state, multi-tenancy support, etc.
## [Langchain-serve](https://github.com/jina-ai/langchain-serve)
This repository allows users to serve local chains and agents as RESTful, gRPC, or Websocket APIs thanks to [Jina](https://docs.jina.ai/). Deploy your chains & agents with ease and enjoy independent scaling, serverless and autoscaling APIs, as well as a Streamlit playground on Jina AI Cloud.
## [BentoML](https://github.com/ssheng/BentoChain)
This repository provides an example of how to deploy a LangChain application with [BentoML](https://github.com/bentoml/BentoML). BentoML is a framework that enables the containerization of machine learning applications as standard OCI images. BentoML also allows for the automatic generation of OpenAPI and gRPC endpoints. With BentoML, you can integrate models from all popular ML frameworks and deploy them as microservices running on the most optimal hardware and scaling independently.

View File

@@ -19,7 +19,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting API Credentials\n",
"## Getting API Credentials\n",
"\n",
"We'll be using quite some APIs in this notebook, here is a list and where to get them:\n",
"\n",
@@ -47,7 +47,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Setting Up"
"## Setting Up"
]
},
{
@@ -103,7 +103,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Scenario 1: Just an LLM\n",
"## Scenario 1: Just an LLM\n",
"\n",
"First, let's just run a single LLM a few times and capture the resulting prompt-answer conversation in ClearML"
]
@@ -361,7 +361,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Scenario 2: Creating a agent with tools\n",
"## Scenario 2: Creating an agent with tools\n",
"\n",
"To show a more advanced workflow, let's create an agent with access to tools. The way ClearML tracks the results is not different though, only the table will look slightly different as there are other types of actions taken when compared to the earlier, simpler example.\n",
"\n",
@@ -542,7 +542,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tips and Next Steps\n",
"## Tips and Next Steps\n",
"\n",
"- Make sure you always use a unique `name` argument for the `clearml_callback.flush_tracker` function. If not, the model parameters used for a run will override the previous run!\n",
"\n",

View File

@@ -0,0 +1,352 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Comet"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](https://user-images.githubusercontent.com/7529846/230328046-a8b18c51-12e3-4617-9b39-97614a571a2d.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this guide we will demonstrate how to track your Langchain Experiments, Evaluation Metrics, and LLM Sessions with [Comet](https://www.comet.com/site/?utm_source=langchain&utm_medium=referral&utm_campaign=comet_notebook). \n",
"\n",
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/hwchase17/langchain/blob/master/docs/ecosystem/comet_tracking.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a>\n",
"\n",
"**Example Project:** [Comet with LangChain](https://www.comet.com/examples/comet-example-langchain/view/b5ZThK6OFdhKWVSP3fDfRtrNF/panels?utm_source=langchain&utm_medium=referral&utm_campaign=comet_notebook)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img width=\"1280\" alt=\"comet-langchain\" src=\"https://user-images.githubusercontent.com/7529846/230326720-a9711435-9c6f-4edb-a707-94b67271ab25.png\">\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install Comet and Dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install comet_ml langchain openai google-search-results spacy textstat pandas\n",
"\n",
"import sys\n",
"!{sys.executable} -m spacy download en_core_web_sm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize Comet and Set your Credentials"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can grab your [Comet API Key here](https://www.comet.com/signup?utm_source=langchain&utm_medium=referral&utm_campaign=comet_notebook) or click the link after intializing Comet"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import comet_ml\n",
"\n",
"comet_ml.init(project_name=\"comet-example-langchain\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set OpenAI and SerpAPI credentials"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You will need an [OpenAI API Key](https://platform.openai.com/account/api-keys) and a [SerpAPI API Key](https://serpapi.com/dashboard) to run the following examples"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"...\"\n",
"#os.environ[\"OPENAI_ORGANIZATION\"] = \"...\"\n",
"os.environ[\"SERPAPI_API_KEY\"] = \"...\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scenario 1: Using just an LLM"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"\n",
"from langchain.callbacks import CometCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.llms import OpenAI\n",
"\n",
"comet_callback = CometCallbackHandler(\n",
" project_name=\"comet-example-langchain\",\n",
" complexity_metrics=True,\n",
" stream_logs=True,\n",
" tags=[\"llm\"],\n",
" visualizations=[\"dep\"],\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), comet_callback])\n",
"llm = OpenAI(temperature=0.9, callback_manager=manager, verbose=True)\n",
"\n",
"llm_result = llm.generate([\"Tell me a joke\", \"Tell me a poem\", \"Tell me a fact\"] * 3)\n",
"print(\"LLM result\", llm_result)\n",
"comet_callback.flush_tracker(llm, finish=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scenario 2: Using an LLM in a Chain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.callbacks import CometCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.chains import LLMChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"\n",
"comet_callback = CometCallbackHandler(\n",
" complexity_metrics=True,\n",
" project_name=\"comet-example-langchain\",\n",
" stream_logs=True,\n",
" tags=[\"synopsis-chain\"],\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), comet_callback])\n",
"\n",
"llm = OpenAI(temperature=0.9, callback_manager=manager, verbose=True)\n",
"\n",
"template = \"\"\"You are a playwright. Given the title of play, it is your job to write a synopsis for that title.\n",
"Title: {title}\n",
"Playwright: This is a synopsis for the above play:\"\"\"\n",
"prompt_template = PromptTemplate(input_variables=[\"title\"], template=template)\n",
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callback_manager=manager)\n",
"\n",
"test_prompts = [{\"title\": \"Documentary about Bigfoot in Paris\"}]\n",
"print(synopsis_chain.apply(test_prompts))\n",
"comet_callback.flush_tracker(synopsis_chain, finish=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scenario 3: Using An Agent with Tools "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import initialize_agent, load_tools\n",
"from langchain.callbacks import CometCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.llms import OpenAI\n",
"\n",
"comet_callback = CometCallbackHandler(\n",
" project_name=\"comet-example-langchain\",\n",
" complexity_metrics=True,\n",
" stream_logs=True,\n",
" tags=[\"agent\"],\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), comet_callback])\n",
"llm = OpenAI(temperature=0.9, callback_manager=manager, verbose=True)\n",
"\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callback_manager=manager)\n",
"agent = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=\"zero-shot-react-description\",\n",
" callback_manager=manager,\n",
" verbose=True,\n",
")\n",
"agent.run(\n",
" \"Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?\"\n",
")\n",
"comet_callback.flush_tracker(agent, finish=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scenario 4: Using Custom Evaluation Metrics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `CometCallbackManager` also allows you to define and use Custom Evaluation Metrics to assess generated outputs from your model. Let's take a look at how this works. \n",
"\n",
"\n",
"In the snippet below, we will use the [ROUGE](https://huggingface.co/spaces/evaluate-metric/rouge) metric to evaluate the quality of a generated summary of an input prompt. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install rouge-score"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from rouge_score import rouge_scorer\n",
"\n",
"from langchain.callbacks import CometCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.chains import LLMChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"\n",
"\n",
"class Rouge:\n",
" def __init__(self, reference):\n",
" self.reference = reference\n",
" self.scorer = rouge_scorer.RougeScorer([\"rougeLsum\"], use_stemmer=True)\n",
"\n",
" def compute_metric(self, generation, prompt_idx, gen_idx):\n",
" prediction = generation.text\n",
" results = self.scorer.score(target=self.reference, prediction=prediction)\n",
"\n",
" return {\n",
" \"rougeLsum_score\": results[\"rougeLsum\"].fmeasure,\n",
" \"reference\": self.reference,\n",
" }\n",
"\n",
"\n",
"reference = \"\"\"\n",
"The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building.\n",
"It was the first structure to reach a height of 300 metres.\n",
"\n",
"It is now taller than the Chrysler Building in New York City by 5.2 metres (17 ft)\n",
"Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France .\n",
"\"\"\"\n",
"rouge_score = Rouge(reference=reference)\n",
"\n",
"template = \"\"\"Given the following article, it is your job to write a summary.\n",
"Article:\n",
"{article}\n",
"Summary: This is the summary for the above article:\"\"\"\n",
"prompt_template = PromptTemplate(input_variables=[\"article\"], template=template)\n",
"\n",
"comet_callback = CometCallbackHandler(\n",
" project_name=\"comet-example-langchain\",\n",
" complexity_metrics=False,\n",
" stream_logs=True,\n",
" tags=[\"custom_metrics\"],\n",
" custom_metrics=rouge_score.compute_metric,\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), comet_callback])\n",
"llm = OpenAI(temperature=0.9, callback_manager=manager, verbose=True)\n",
"\n",
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callback_manager=manager)\n",
"\n",
"test_prompts = [\n",
" {\n",
" \"article\": \"\"\"\n",
" The tower is 324 metres (1,063 ft) tall, about the same height as\n",
" an 81-storey building, and the tallest structure in Paris. Its base is square,\n",
" measuring 125 metres (410 ft) on each side.\n",
" During its construction, the Eiffel Tower surpassed the\n",
" Washington Monument to become the tallest man-made structure in the world,\n",
" a title it held for 41 years until the Chrysler Building\n",
" in New York City was finished in 1930.\n",
"\n",
" It was the first structure to reach a height of 300 metres.\n",
" Due to the addition of a broadcasting aerial at the top of the tower in 1957,\n",
" it is now taller than the Chrysler Building by 5.2 metres (17 ft).\n",
"\n",
" Excluding transmitters, the Eiffel Tower is the second tallest\n",
" free-standing structure in France after the Millau Viaduct.\n",
" \"\"\"\n",
" }\n",
"]\n",
"print(synopsis_chain.apply(test_prompts))\n",
"comet_callback.flush_tracker(synopsis_chain, finish=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.15"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,25 @@
# Databerry
This page covers how to use the [Databerry](https://databerry.ai) within LangChain.
## What is Databerry?
Databerry is an [open source](https://github.com/gmpetrov/databerry) document retrievial platform that helps to connect your personal data with Large Language Models.
![Databerry](../_static/DataberryDashboard.png)
## Quick start
Retrieving documents stored in Databerry from LangChain is very easy!
```python
from langchain.retrievers import DataberryRetriever
retriever = DataberryRetriever(
datastore_url="https://api.databerry.ai/query/clg1xg2h80000l708dymr0fxc",
# api_key="DATABERRY_API_KEY", # optional if datastore is public
# top_k=10 # optional
)
docs = retriever.get_relevant_documents("What's Databerry?")
```

View File

@@ -1,21 +1,21 @@
# GPT4All
This page covers how to use the `GPT4All` wrapper within LangChain.
It is broken into two parts: installation and setup, and then usage with an example.
This page covers how to use the `GPT4All` wrapper within LangChain. The tutorial is divided into two parts: installation and setup, followed by usage with an example.
## Installation and Setup
- Install the Python package with `pip install pyllamacpp`
- Download a [GPT4All model](https://github.com/nomic-ai/gpt4all) and place it in your desired directory
- Download a [GPT4All model](https://github.com/nomic-ai/pyllamacpp#supported-model) and place it in your desired directory
## Usage
### GPT4All
To use the GPT4All wrapper, you need to provide the path to the pre-trained model file and the model's configuration.
```python
from langchain.llms import GPT4All
# Instantiate the model
# Instantiate the model. Callbacks support token-wise streaming
model = GPT4All(model="./models/gpt4all-model.bin", n_ctx=512, n_threads=8)
# Generate text
@@ -24,14 +24,24 @@ response = model("Once upon a time, ")
You can also customize the generation parameters, such as n_predict, temp, top_p, top_k, and others.
Example:
To stream the model's predictions, add in a CallbackManager.
```python
model = GPT4All(model="./models/gpt4all-model.bin", n_predict=55, temp=0)
response = model("Once upon a time, ")
from langchain.llms import GPT4All
from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
# There are many CallbackHandlers supported, such as
# from langchain.callbacks.streamlit import StreamlitCallbackHandler
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
model = GPT4All(model="./models/gpt4all-model.bin", n_ctx=512, n_threads=8, callback_handler=callback_handler, verbose=True)
# Generate text. Tokens are streamed through the callback manager.
model("Once upon a time, ")
```
## Model File
You can find links to model file downloads at the [GPT4all](https://github.com/nomic-ai/gpt4all) repository. They will need to be converted to `ggml` format to work, as specified in the [pyllamacpp](https://github.com/nomic-ai/pyllamacpp) repository.
You can find links to model file downloads in the [pyllamacpp](https://github.com/nomic-ai/pyllamacpp) repository.
For a more detailed walkthrough of this, see [this notebook](../modules/models/llms/integrations/gpt4all.ipynb)
For a more detailed walkthrough of this, see [this notebook](../modules/models/llms/integrations/gpt4all.ipynb)

View File

@@ -1,6 +1,6 @@
# Graphsignal
This page covers how to use the Graphsignal ecosystem to trace and monitor LangChain.
This page covers how to use [Graphsignal](https://app.graphsignal.com) to trace and monitor LangChain. Graphsignal enables full visibility into your application. It provides latency breakdowns by chains and tools, exceptions with full context, data monitoring, compute/GPU utilization, OpenAI cost analytics, and more.
## Installation and Setup
@@ -10,7 +10,7 @@ This page covers how to use the Graphsignal ecosystem to trace and monitor LangC
## Tracing and Monitoring
Graphsignal automatically instruments and starts tracing and monitoring chains. Traces, metrics and errors are then available in your [Graphsignal dashboard](https://app.graphsignal.com/). No prompts or other sensitive data are sent to Graphsignal cloud, only statistics and metadata.
Graphsignal automatically instruments and starts tracing and monitoring chains. Traces and metrics are then available in your [Graphsignal dashboards](https://app.graphsignal.com).
Initialize the tracer by providing a deployment name:
@@ -20,7 +20,13 @@ import graphsignal
graphsignal.configure(deployment='my-langchain-app-prod')
```
In order to trace full runs and see a breakdown by chains and tools, you can wrap the calling routine or use a decorator:
To additionally trace any function or code, you can use a decorator or a context manager:
```python
@graphsignal.trace_function
def handle_request():
chain.run("some initial text")
```
```python
with graphsignal.start_trace('my-chain'):

21
docs/ecosystem/zilliz.md Normal file
View File

@@ -0,0 +1,21 @@
# Zilliz
This page covers how to use the Zilliz Cloud ecosystem within LangChain.
Zilliz uses the Milvus integration.
It is broken into two parts: installation and setup, and then references to specific Milvus wrappers.
## Installation and Setup
- Install the Python SDK with `pip install pymilvus`
## Wrappers
### VectorStore
There exists a wrapper around Zilliz indexes, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
To import this vectorstore:
```python
from langchain.vectorstores import Milvus
```
For a more detailed walkthrough of the Miluvs wrapper, see [this notebook](../modules/indexes/vectorstores/examples/zilliz.ipynb)

View File

@@ -1,5 +1,5 @@
LangChain Gallery
=============
=================
Lots of people have built some pretty awesome stuff with LangChain.
This is a collection of our favorites.
@@ -223,7 +223,7 @@ Open Source
Answer questions about the documentation of any project
Misc. Colab Notebooks
~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. panels::
:body: text-center

View File

@@ -9,6 +9,8 @@ To get started, install LangChain with the following command:
```bash
pip install langchain
# or
conda install langchain -c conda-forge
```

View File

@@ -92,6 +92,7 @@ The above modules can be used in a variety of ways. LangChain also provides guid
./use_cases/question_answering.md
./use_cases/chatbots.md
./use_cases/tabular.rst
./use_cases/code.md
./use_cases/apis.md
./use_cases/summarization.md
./use_cases/extraction.md

View File

@@ -77,7 +77,7 @@
" Returns:\n",
" Action specifying what tool to use.\n",
" \"\"\"\n",
" return AgentAction(tool=\"Search\", tool_input=\"foo\", log=\"\")\n",
" return AgentAction(tool=\"Search\", tool_input=kwargs[\"input\"], log=\"\")\n",
"\n",
" async def aplan(\n",
" self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any\n",
@@ -92,7 +92,7 @@
" Returns:\n",
" Action specifying what tool to use.\n",
" \"\"\"\n",
" return AgentAction(tool=\"Search\", tool_input=\"foo\", log=\"\")"
" return AgentAction(tool=\"Search\", tool_input=kwargs[\"input\"], log=\"\")"
]
},
{

View File

@@ -42,7 +42,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 1,
"id": "9af9734e",
"metadata": {},
"outputs": [],
@@ -67,7 +67,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 2,
"id": "becda2a1",
"metadata": {},
"outputs": [],
@@ -99,7 +99,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 3,
"id": "339b1bb8",
"metadata": {},
"outputs": [],
@@ -128,7 +128,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 4,
"id": "fd969d31",
"metadata": {},
"outputs": [],
@@ -159,7 +159,7 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 5,
"id": "798ef9fb",
"metadata": {},
"outputs": [],
@@ -187,7 +187,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 6,
"id": "7c6fe0d3",
"metadata": {},
"outputs": [],
@@ -216,7 +216,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 7,
"id": "d278706a",
"metadata": {},
"outputs": [],
@@ -236,7 +236,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 8,
"id": "f9d4c374",
"metadata": {},
"outputs": [],
@@ -268,7 +268,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 9,
"id": "9b1cc2a2",
"metadata": {},
"outputs": [],
@@ -279,7 +279,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 10,
"id": "e4f5092f",
"metadata": {},
"outputs": [],
@@ -305,7 +305,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 11,
"id": "490604e9",
"metadata": {},
"outputs": [],
@@ -315,7 +315,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 12,
"id": "653b1617",
"metadata": {},
"outputs": [
@@ -326,11 +326,12 @@
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mAction: Search\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out the population of Canada in 2023\n",
"Action: Search\n",
"Action Input: Population of Canada in 2023\u001b[0m\n",
"\n",
"Observation:\u001b[36;1m\u001b[1;3m38,648,380\u001b[0m\u001b[32;1m\u001b[1;3m That's a lot of people!\n",
"Final Answer: Arrr, there be 38,648,380 people livin' in Canada come 2023!\u001b[0m\n",
"Observation:\u001b[36;1m\u001b[1;3mThe current population of Canada is 38,658,314 as of Wednesday, April 12, 2023, based on Worldometer elaboration of the latest United Nations data.\u001b[0m\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Arrr, there be 38,658,314 people livin' in Canada as of 2023!\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
@@ -338,10 +339,165 @@
{
"data": {
"text/plain": [
"\"Arrr, there be 38,648,380 people livin' in Canada come 2023!\""
"\"Arrr, there be 38,658,314 people livin' in Canada as of 2023!\""
]
},
"execution_count": 27,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(\"How many people live in canada as of 2023?\")"
]
},
{
"cell_type": "markdown",
"id": "d5b4a078",
"metadata": {},
"source": [
"## Adding Memory\n",
"\n",
"If you want to add memory to the agent, you'll need to:\n",
"\n",
"1. Add a place in the custom prompt for the chat_history\n",
"2. Add a memory object to the agent executor."
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "94fffda1",
"metadata": {},
"outputs": [],
"source": [
"# Set up the base template\n",
"template_with_history = \"\"\"Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:\n",
"\n",
"{tools}\n",
"\n",
"Use the following format:\n",
"\n",
"Question: the input question you must answer\n",
"Thought: you should always think about what to do\n",
"Action: the action to take, should be one of [{tool_names}]\n",
"Action Input: the input to the action\n",
"Observation: the result of the action\n",
"... (this Thought/Action/Action Input/Observation can repeat N times)\n",
"Thought: I now know the final answer\n",
"Final Answer: the final answer to the original input question\n",
"\n",
"Begin! Remember to speak as a pirate when giving your final answer. Use lots of \"Arg\"s\n",
"\n",
"Previous conversation history:\n",
"{history}\n",
"\n",
"New question: {input}\n",
"{agent_scratchpad}\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "f58488d7",
"metadata": {},
"outputs": [],
"source": [
"prompt_with_history = CustomPromptTemplate(\n",
" template=template_with_history,\n",
" tools=tools,\n",
" # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically\n",
" # This includes the `intermediate_steps` variable because that is needed\n",
" input_variables=[\"input\", \"intermediate_steps\", \"history\"]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "d28d4b5a",
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(llm=llm, prompt=prompt_with_history)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "3e37b32a",
"metadata": {},
"outputs": [],
"source": [
"tool_names = [tool.name for tool in tools]\n",
"agent = LLMSingleActionAgent(\n",
" llm_chain=llm_chain, \n",
" output_parser=output_parser,\n",
" stop=[\"\\nObservation:\"], \n",
" allowed_tools=tool_names\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "97ea1bce",
"metadata": {},
"outputs": [],
"source": [
"from langchain.memory import ConversationBufferWindowMemory"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "b5ad69ce",
"metadata": {},
"outputs": [],
"source": [
"memory=ConversationBufferWindowMemory(k=2)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "b7b5c9b1",
"metadata": {},
"outputs": [],
"source": [
"agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "5ec4c39b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out the population of Canada in 2023\n",
"Action: Search\n",
"Action Input: Population of Canada in 2023\u001b[0m\n",
"\n",
"Observation:\u001b[36;1m\u001b[1;3mThe current population of Canada is 38,658,314 as of Wednesday, April 12, 2023, based on Worldometer elaboration of the latest United Nations data.\u001b[0m\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Arrr, there be 38,658,314 people livin' in Canada as of 2023!\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"Arrr, there be 38,658,314 people livin' in Canada as of 2023!\""
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
@@ -350,10 +506,48 @@
"agent_executor.run(\"How many people live in canada as of 2023?\")"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "b2ba45bb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out how many people live in Mexico.\n",
"Action: Search\n",
"Action Input: How many people live in Mexico as of 2023?\u001b[0m\n",
"\n",
"Observation:\u001b[36;1m\u001b[1;3mThe current population of Mexico is 132,679,922 as of Tuesday, April 11, 2023, based on Worldometer elaboration of the latest United Nations data. Mexico 2020 ...\u001b[0m\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: Arrr, there be 132,679,922 people livin' in Mexico as of 2023!\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"Arrr, there be 132,679,922 people livin' in Mexico as of 2023!\""
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(\"how about in mexico?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "adefb4c2",
"id": "bd820a7a",
"metadata": {},
"outputs": [],
"source": []

View File

@@ -80,8 +80,8 @@
}
],
"source": [
"llm = ChatOpenAI(temperature=0,)\n",
"tools = load_tools([\"requests\"] )\n",
"llm = ChatOpenAI(temperature=0)\n",
"tools = load_tools([\"requests_all\"] )\n",
"tools += [tool]\n",
"\n",
"agent_chain = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)\n",

View File

@@ -9,7 +9,7 @@
"\n",
"This notebook goes over how to use the google search component.\n",
"\n",
"First, you need to set up the proper API keys and environment variables. To set it up, follow the instructions found [here](https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search).\n",
"First, you need to set up the proper API keys and environment variables. To set it up, create the GOOGLE_API_KEY in the Google Cloud credential console (https://console.cloud.google.com/apis/credentials) and a GOOGLE_CSE_ID using the Programmable Search Enginge (https://programmablesearchengine.google.com/controlpanel/create). Next, it is good to follow the instructions found [here](https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search).\n",
"\n",
"Then we will need to set some environment variables."
]

View File

@@ -0,0 +1,388 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "23234b50-e6c6-4c87-9f97-259c15f36894",
"metadata": {
"tags": []
},
"source": [
"# Callbacks"
]
},
{
"cell_type": "markdown",
"id": "29dd6333-307c-43df-b848-65001c01733b",
"metadata": {},
"source": [
"LangChain provides a callback system that allows you to hook into the various stages of your LLM application. This is useful for logging, [monitoring](https://python.langchain.com/en/latest/tracing.html), [streaming](https://python.langchain.com/en/latest/modules/models/llms/examples/streaming_llm.html), and other tasks.\n",
"\n",
"You can subscribe to these events by using the `callback_manager` argument available throughout the API. A `CallbackManager` is an object that manages a list of `CallbackHandlers`. The `CallbackManager` will call the appropriate method on each handler when the event is triggered."
]
},
{
"cell_type": "markdown",
"id": "fdb72e8d-a02a-474d-96bf-f5759432afc8",
"metadata": {
"tags": []
},
"source": [
"```python\n",
"class CallbackManager(BaseCallbackHandler):\n",
" \"\"\"Base callback manager that can be used to handle callbacks from LangChain.\"\"\"\n",
"\n",
" def add_handler(self, callback: BaseCallbackHandler) -> None:\n",
" \"\"\"Add a handler to the callback manager.\"\"\"\n",
"\n",
" def remove_handler(self, handler: BaseCallbackHandler) -> None:\n",
" \"\"\"Remove a handler from the callback manager.\"\"\"\n",
"\n",
" def set_handler(self, handler: BaseCallbackHandler) -> None:\n",
" \"\"\"Set handler as the only handler on the callback manager.\"\"\"\n",
" self.set_handlers([handler])\n",
"\n",
" def set_handlers(self, handlers: List[BaseCallbackHandler]) -> None:\n",
" \"\"\"Set handlers as the only handlers on the callback manager.\"\"\"\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "2b6d7dba-cd20-472a-ae05-f68675cc9ea4",
"metadata": {},
"source": [
"`CallbackHandlers` are objects that implement the `CallbackHandler` interface, which has a method for each event that can be subscribed to. The `CallbackManager` will call the appropriate method on each handler when the event is triggered."
]
},
{
"cell_type": "markdown",
"id": "e4592215-6604-47e2-89ff-5db3af6d1e40",
"metadata": {
"tags": []
},
"source": [
"```python\n",
"class BaseCallbackHandler(ABC):\n",
" \"\"\"Base callback handler that can be used to handle callbacks from langchain.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_llm_start(\n",
" self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any\n",
" ) -> Any:\n",
" \"\"\"Run when LLM starts running.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:\n",
" \"\"\"Run on new LLM token. Only available when streaming is enabled.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_llm_end(self, response: LLMResult, **kwargs: Any) -> Any:\n",
" \"\"\"Run when LLM ends running.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_llm_error(\n",
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
" ) -> Any:\n",
" \"\"\"Run when LLM errors.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_chain_start(\n",
" self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any\n",
" ) -> Any:\n",
" \"\"\"Run when chain starts running.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> Any:\n",
" \"\"\"Run when chain ends running.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_chain_error(\n",
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
" ) -> Any:\n",
" \"\"\"Run when chain errors.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_tool_start(\n",
" self, serialized: Dict[str, Any], input_str: str, **kwargs: Any\n",
" ) -> Any:\n",
" \"\"\"Run when tool starts running.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_tool_end(self, output: str, **kwargs: Any) -> Any:\n",
" \"\"\"Run when tool ends running.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_tool_error(\n",
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
" ) -> Any:\n",
" \"\"\"Run when tool errors.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_text(self, text: str, **kwargs: Any) -> Any:\n",
" \"\"\"Run on arbitrary text.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:\n",
" \"\"\"Run on agent action.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> Any:\n",
" \"\"\"Run on agent end.\"\"\"\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "d3bf3304-43fb-47ad-ae50-0637a17018a2",
"metadata": {},
"source": [
"## Creating and Using a Custom `CallbackHandler`\n",
"\n",
"By default, a shared CallbackManager with the StdOutCallbackHandler will be used by models, chains, agents, and tools. However, you can pass in your own CallbackManager with a custom CallbackHandler:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "80532dfc-d687-4147-a0c9-1f90cc3e868c",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"AgentAction(tool='Search', tool_input=\"US Open men's final 2019 winner\", log=' I need to find out who won the US Open men\\'s final in 2019 and then calculate his age raised to the 0.334 power.\\nAction: Search\\nAction Input: \"US Open men\\'s final 2019 winner\"')\n",
"Rafael Nadal defeated Daniil Medvedev in the final, 75, 63, 57, 46, 64 to win the men's singles tennis title at the 2019 US Open. It was his fourth US ...\n",
"AgentAction(tool='Search', tool_input='Rafael Nadal age', log=' I need to find out the age of the winner\\nAction: Search\\nAction Input: \"Rafael Nadal age\"')\n",
"36 years\n",
"AgentAction(tool='Calculator', tool_input='36^0.334', log=' I now need to calculate his age raised to the 0.334 power\\nAction: Calculator\\nAction Input: 36^0.334')\n",
"Answer: 3.3098250249682484\n",
"\n",
" I now know the final answer\n",
"Final Answer: Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.\""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from typing import Any, Dict, List, Optional, Union\n",
"\n",
"from langchain.agents import initialize_agent, load_tools\n",
"from langchain.agents import AgentType\n",
"from langchain.callbacks.base import CallbackManager, BaseCallbackHandler\n",
"from langchain.llms import OpenAI\n",
"from langchain.schema import AgentAction, AgentFinish, LLMResult\n",
"\n",
"class MyCustomCallbackHandler(BaseCallbackHandler):\n",
" \"\"\"Custom CallbackHandler.\"\"\"\n",
"\n",
" def on_llm_start(\n",
" self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any\n",
" ) -> None:\n",
" \"\"\"Print out the prompts.\"\"\"\n",
" pass\n",
"\n",
" def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:\n",
" \"\"\"Do nothing.\"\"\"\n",
" pass\n",
"\n",
" def on_llm_new_token(self, token: str, **kwargs: Any) -> None:\n",
" \"\"\"Do nothing.\"\"\"\n",
" pass\n",
"\n",
" def on_llm_error(\n",
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
" ) -> None:\n",
" \"\"\"Do nothing.\"\"\"\n",
" pass\n",
"\n",
" def on_chain_start(\n",
" self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any\n",
" ) -> None:\n",
" \"\"\"Print out that we are entering a chain.\"\"\"\n",
" class_name = serialized[\"name\"]\n",
" print(f\"\\n\\n\\033[1m> Entering new {class_name} chain...\\033[0m\")\n",
"\n",
" def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:\n",
" \"\"\"Print out that we finished a chain.\"\"\"\n",
" print(\"\\n\\033[1m> Finished chain.\\033[0m\")\n",
"\n",
" def on_chain_error(\n",
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
" ) -> None:\n",
" \"\"\"Do nothing.\"\"\"\n",
" pass\n",
"\n",
" def on_tool_start(\n",
" self,\n",
" serialized: Dict[str, Any],\n",
" input_str: str,\n",
" **kwargs: Any,\n",
" ) -> None:\n",
" \"\"\"Do nothing.\"\"\"\n",
" pass\n",
"\n",
" def on_agent_action(\n",
" self, action: AgentAction, color: Optional[str] = None, **kwargs: Any\n",
" ) -> Any:\n",
" \"\"\"Run on agent action.\"\"\"\n",
" print(action)\n",
"\n",
" def on_tool_end(\n",
" self,\n",
" output: str,\n",
" color: Optional[str] = None,\n",
" observation_prefix: Optional[str] = None,\n",
" llm_prefix: Optional[str] = None,\n",
" **kwargs: Any,\n",
" ) -> None:\n",
" \"\"\"If not the final action, print out observation.\"\"\"\n",
" print(output)\n",
"\n",
" def on_tool_error(\n",
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
" ) -> None:\n",
" \"\"\"Do nothing.\"\"\"\n",
" pass\n",
"\n",
" def on_text(\n",
" self,\n",
" text: str,\n",
" color: Optional[str] = None,\n",
" end: str = \"\",\n",
" **kwargs: Optional[str],\n",
" ) -> None:\n",
" \"\"\"Run when agent ends.\"\"\"\n",
" print(text)\n",
"\n",
" def on_agent_finish(\n",
" self, finish: AgentFinish, color: Optional[str] = None, **kwargs: Any\n",
" ) -> None:\n",
" \"\"\"Run on agent end.\"\"\"\n",
" print(finish.log)\n",
"manager = CallbackManager([MyCustomCallbackHandler()])\n",
"llm = OpenAI(temperature=0, callback_manager=manager, verbose=True)\n",
"tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm, callback_manager=manager)\n",
"agent = initialize_agent(\n",
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, callback_manager=manager\n",
")\n",
"agent.run(\"Who won the US Open men's final in 2019? What is his age raised to the 0.334 power?\")"
]
},
{
"cell_type": "markdown",
"id": "bc9785fa-4f71-4797-91a3-4fe7e57d0429",
"metadata": {
"tags": []
},
"source": [
"## Async Support\n",
"\n",
"If you are planning to use the async API, it is recommended to use `AsyncCallbackHandler` and `AsyncCallbackManager` to avoid blocking the runloop."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c702e0c9-a961-4897-90c1-cdd13b6f16b2",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"zzzz....\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"zzzz....\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"import asyncio\n",
"from aiohttp import ClientSession\n",
"\n",
"from langchain.callbacks.base import AsyncCallbackHandler, AsyncCallbackManager\n",
"\n",
"class MyCustomAsyncCallbackHandler(AsyncCallbackHandler):\n",
" \"\"\"Async callback handler that can be used to handle callbacks from langchain.\"\"\"\n",
"\n",
" async def on_chain_start(\n",
" self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any\n",
" ) -> None:\n",
" \"\"\"Run when chain starts running.\"\"\"\n",
" print(\"zzzz....\")\n",
" await asyncio.sleep(0.5)\n",
" class_name = serialized[\"name\"]\n",
" print(f\"\\n\\n\\033[1m> Entering new {class_name} chain...\\033[0m\")\n",
"\n",
" async def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:\n",
" \"\"\"Run when chain ends running.\"\"\"\n",
" print(\"zzzz....\")\n",
" await asyncio.sleep(0.5)\n",
" print(\"\\n\\033[1m> Finished chain.\\033[0m\")\n",
"\n",
"manager = AsyncCallbackManager([MyCustomAsyncCallbackHandler()])\n",
"\n",
"# To make async requests in Tools more efficient, you can pass in your own aiohttp.ClientSession, \n",
"# but you must manually close the client session at the end of your program/event loop\n",
"aiosession = ClientSession()\n",
"llm = OpenAI(temperature=0, callback_manager=manager)\n",
"async_tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm, aiosession=aiosession, callback_manager=manager)\n",
"async_agent = initialize_agent(async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, callback_manager=manager)\n",
"await async_agent.arun(\"Who won the US Open men's final in 2019? What is his age raised to the 0.334 power?\")\n",
"await aiosession.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86be6304-e433-4048-880c-a92a73244407",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -144,6 +144,160 @@
"\u001b[32;1m\u001b[1;3mYou are a helpful AI Assistant. Please provide JSON arguments to agentFunc() based on the user's instructions.\n",
"\n",
"API_SCHEMA: ```typescript\n",
"/* API for fetching Klarna product information */\n",
"type productsUsingGET = (_: {\n",
"/* A precise query that matches one very small category or product that needs to be searched for to find the products the user is looking for. If the user explicitly stated what they want, use that as a query. The query is as specific as possible to the product name or category mentioned by the user in its singular form, and don't contain any clarifiers like latest, newest, cheapest, budget, premium, expensive or similar. The query is always taken from the latest topic, if there is a new topic a new query is started. */\n",
"\t\tq: string,\n",
"/* number of products returned */\n",
"\t\tsize?: number,\n",
"/* (Optional) Minimum price in local currency for the product searched for. Either explicitly stated by the user or implicitly inferred from a combination of the user's request and the kind of product searched for. */\n",
"\t\tmin_price?: number,\n",
"/* (Optional) Maximum price in local currency for the product searched for. Either explicitly stated by the user or implicitly inferred from a combination of the user's request and the kind of product searched for. */\n",
"\t\tmax_price?: number,\n",
"}) => any;\n",
"```\n",
"\n",
"USER_INSTRUCTIONS: \"whats the most expensive shirt?\"\n",
"\n",
"Your arguments must be plain json provided in a markdown block:\n",
"\n",
"ARGS: ```json\n",
"{valid json conforming to API_SCHEMA}\n",
"```\n",
"\n",
"Example\n",
"-----\n",
"\n",
"ARGS: ```json\n",
"{\"foo\": \"bar\", \"baz\": {\"qux\": \"quux\"}}\n",
"```\n",
"\n",
"The block must be no more than 1 line long, and all arguments must be valid JSON. All string arguments must be wrapped in double quotes.\n",
"You MUST strictly comply to the types indicated by the provided schema, including all required args.\n",
"\n",
"If you don't have sufficient information to call the function due to things like requiring specific uuid's, you can reply with the following message:\n",
"\n",
"Message: ```text\n",
"Concise response requesting the additional information that would make calling the function successful.\n",
"```\n",
"\n",
"Begin\n",
"-----\n",
"ARGS:\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\"q\": \"shirt\", \"size\": 1, \"max_price\": null}\u001b[0m\n",
"\u001b[36;1m\u001b[1;3m{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]}]}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new APIResponderChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mYou are a helpful AI assistant trained to answer user queries from API responses.\n",
"You attempted to call an API, which resulted in:\n",
"API_RESPONSE: {\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]}]}\n",
"\n",
"USER_COMMENT: \"whats the most expensive shirt?\"\n",
"\n",
"\n",
"If the API_RESPONSE can answer the USER_COMMENT respond with the following markdown json block:\n",
"Response: ```json\n",
"{\"response\": \"Human-understandable synthesis of the API_RESPONSE\"}\n",
"```\n",
"\n",
"Otherwise respond with the following markdown json block:\n",
"Response Error: ```json\n",
"{\"response\": \"What you did and a concise statement of the resulting error. If it can be easily fixed, provide a suggestion.\"}\n",
"```\n",
"\n",
"You MUST respond as a markdown json code block. The person you are responding to CANNOT see the API_RESPONSE, so if there is any relevant information there you must include it in your response.\n",
"\n",
"Begin:\n",
"---\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[33;1m\u001b[1;3mThe most expensive shirt in the API response is the Burberry Check Poplin Shirt, which costs $360.00.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"output = chain(\"whats the most expensive shirt?\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c000295e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'request_args': '{\"q\": \"shirt\", \"size\": 1, \"max_price\": null}',\n",
" 'response_text': '{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]}]}'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# View intermediate steps\n",
"output[\"intermediate_steps\"]"
]
},
{
"cell_type": "markdown",
"id": "092bdb4d",
"metadata": {},
"source": [
"## Return raw response\n",
"\n",
"We can also run this chain without synthesizing the response. This will have the effect of just returning the raw API output."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4dff3849",
"metadata": {},
"outputs": [],
"source": [
"chain = OpenAPIEndpointChain.from_api_operation(\n",
" operation, \n",
" llm, \n",
" requests=Requests(), \n",
" verbose=True,\n",
" return_intermediate_steps=True, # Return request and response text\n",
" raw_response=True # Return raw response\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "762499a9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new OpenAPIEndpointChain chain...\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new APIRequesterChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mYou are a helpful AI Assistant. Please provide JSON arguments to agentFunc() based on the user's instructions.\n",
"\n",
"API_SCHEMA: ```typescript\n",
"/* API for fetching Klarna product information */\n",
"type productsUsingGET = (_: {\n",
"/* A precise query that matches one very small category or product that needs to be searched for to find the products the user is looking for. If the user explicitly stated what they want, use that as a query. The query is as specific as possible to the product name or category mentioned by the user in its singular form, and don't contain any clarifiers like latest, newest, cheapest, budget, premium, expensive or similar. The query is always taken from the latest topic, if there is a new topic a new query is started. */\n",
"\t\tq: string,\n",
@@ -187,36 +341,7 @@
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\"q\": \"shirt\", \"max_price\": null}\u001b[0m\n",
"\u001b[36;1m\u001b[1;3m{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$196.30\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Calvin Klein Slim Fit Oxford Dress Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201839169/Clothing/Calvin-Klein-Slim-Fit-Oxford-Dress-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$24.91\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,White,Blue,Black\",\"Pattern:Solid Color\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new APIResponderChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mYou are a helpful AI assistant trained to answer user queries from API responses.\n",
"You attempted to call an API, which resulted in:\n",
"API_RESPONSE: {\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$196.30\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Calvin Klein Slim Fit Oxford Dress Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201839169/Clothing/Calvin-Klein-Slim-Fit-Oxford-Dress-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$24.91\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,White,Blue,Black\",\"Pattern:Solid Color\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}\n",
"\n",
"USER_COMMENT: \"whats the most expensive shirt?\"\n",
"\n",
"\n",
"If the API_RESPONSE can answer the USER_COMMENT respond with the following markdown json block:\n",
"Response: ```json\n",
"{\"response\": \"Concise response to USER_COMMENT based on API_RESPONSE.\"}\n",
"```\n",
"\n",
"Otherwise respond with the following markdown json block:\n",
"Response Error: ```json\n",
"{\"response\": \"What you did and a concise statement of the resulting error. If it can be easily fixed, provide a suggestion.\"}\n",
"```\n",
"\n",
"You MUST respond as a markdown json code block.\n",
"\n",
"Begin:\n",
"---\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[33;1m\u001b[1;3mThe most expensive shirt in this list is the 'Burberry Somerton Check Shirt - Camel' which is priced at $450.00\u001b[0m\n",
"\u001b[36;1m\u001b[1;3m{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$229.02\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Stretch Cotton Twill Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3202342515/Clothing/Burberry-Vintage-Check-Stretch-Cotton-Twill-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$309.99\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Woman\",\"Color:Beige\",\"Properties:Stretch\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
@@ -228,25 +353,26 @@
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c000295e",
"execution_count": 12,
"id": "4afc021a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['{\"q\": \"shirt\", \"max_price\": null}',\n",
" '{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$196.30\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Calvin Klein Slim Fit Oxford Dress Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201839169/Clothing/Calvin-Klein-Slim-Fit-Oxford-Dress-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$24.91\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,White,Blue,Black\",\"Pattern:Solid Color\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}']"
"{'instructions': 'whats the most expensive shirt?',\n",
" 'output': '{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$229.02\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Stretch Cotton Twill Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3202342515/Clothing/Burberry-Vintage-Check-Stretch-Cotton-Twill-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$309.99\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Woman\",\"Color:Beige\",\"Properties:Stretch\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}',\n",
" 'intermediate_steps': {'request_args': '{\"q\": \"shirt\", \"max_price\": null}',\n",
" 'response_text': '{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$229.02\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Stretch Cotton Twill Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3202342515/Clothing/Burberry-Vintage-Check-Stretch-Cotton-Twill-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$309.99\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Woman\",\"Color:Beige\",\"Properties:Stretch\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}'}}"
]
},
"execution_count": 8,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# View intermediate steps\n",
"output[\"intermediate_steps\"]"
"output"
]
},
{
@@ -448,7 +574,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -9,9 +9,9 @@
}
},
"source": [
"# SQLite example\n",
"# SQL Chain example\n",
"\n",
"This example showcases hooking up an LLM to answer questions over a database."
"This example demonstrates the use of the `SQLDatabaseChain` for answering questions over a database."
]
},
{
@@ -23,8 +23,10 @@
}
},
"source": [
"This uses the example Chinook database.\n",
"To set it up follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the `.db` file in a notebooks folder at the root of this repository."
"Under the hood, LangChain uses SQLAlchemy to connect to SQL databases. The `SQLDatabaseChain` can therefore be used with any SQL dialect supported by SQLAlchemy, such as MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, and SQLite. Please refer to the SQLAlchemy documentation for more information about requirements for connecting to your database. For example, a connection to MySQL requires an appropriate connector such as PyMySQL. A URI for a MySQL connection might look like: `mysql+pymysql://user:pass@some_mysql_db_address/db_name`\n",
"\n",
"This demonstration uses SQLite and the example Chinook database.\n",
"To set it up, follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the `.db` file in a notebooks folder at the root of this repository."
]
},
{
@@ -679,7 +681,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.10"
}
},
"nbformat": 4,

View File

@@ -5,14 +5,14 @@
"id": "134a0785",
"metadata": {},
"source": [
"# Chat Index\n",
"# Chat Over Documents with Chat History\n",
"\n",
"This notebook goes over how to set up a chain to chat with an index. The only difference between this chain and the [RetrievalQAChain](./vector_db_qa.ipynb) is that this allows for passing in of a chat history which can be used to allow for follow up questions."
"This notebook goes over how to set up a chain to chat over documents with chat history using a `ConversationalRetrievalChain`. The only difference between this chain and the [RetrievalQAChain](./vector_db_qa.ipynb) is that this allows for passing in of a chat history which can be used to allow for follow up questions."
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 3,
"id": "70c4e529",
"metadata": {
"tags": []
@@ -36,7 +36,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 4,
"id": "01c46e92",
"metadata": {
"tags": []
@@ -58,7 +58,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"id": "433363a5",
"metadata": {
"tags": []
@@ -81,7 +81,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"id": "a8930cf7",
"metadata": {
"tags": []
@@ -109,12 +109,12 @@
"id": "3c96b118",
"metadata": {},
"source": [
"We now initialize the ConversationalRetrievalChain"
"We now initialize the `ConversationalRetrievalChain`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"id": "7b4110f3",
"metadata": {
"tags": []
@@ -134,7 +134,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"id": "7fe3e730",
"metadata": {
"tags": []
@@ -148,7 +148,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 9,
"id": "bfff9cc8",
"metadata": {
"tags": []
@@ -160,7 +160,7 @@
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 7,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@@ -179,7 +179,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 10,
"id": "00b4cf00",
"metadata": {
"tags": []
@@ -193,7 +193,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 11,
"id": "f01828d1",
"metadata": {
"tags": []
@@ -202,10 +202,10 @@
{
"data": {
"text/plain": [
"' Justice Stephen Breyer'"
"' Ketanji Brown Jackson succeeded Justice Stephen Breyer on the United States Supreme Court.'"
]
},
"execution_count": 9,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
@@ -225,9 +225,11 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 12,
"id": "562769c6",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True)"
@@ -235,9 +237,11 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 13,
"id": "ea478300",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = []\n",
@@ -247,17 +251,19 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 14,
"id": "4cb75b4e",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0)"
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../state_of_the_union.txt'})"
]
},
"execution_count": 13,
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
@@ -277,9 +283,11 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 15,
"id": "5ed8d612",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"vectordbkwargs = {\"search_distance\": 0.9}"
@@ -287,9 +295,11 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 16,
"id": "6a7b3459",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True)\n",
@@ -309,21 +319,25 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 18,
"id": "e53a9d66",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chains import LLMChain\n",
"from langchain.chains.question_answering import load_qa_chain\n",
"from langchain.chains.chat_index.prompts import CONDENSE_QUESTION_PROMPT"
"from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "bf205e35",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
@@ -341,7 +355,9 @@
"cell_type": "code",
"execution_count": 20,
"id": "78155887",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = []\n",
@@ -353,7 +369,9 @@
"cell_type": "code",
"execution_count": 21,
"id": "e54b5fa2",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
@@ -384,7 +402,9 @@
"cell_type": "code",
"execution_count": 22,
"id": "d1058fd2",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chains.qa_with_sources import load_qa_with_sources_chain"
@@ -394,7 +414,9 @@
"cell_type": "code",
"execution_count": 23,
"id": "a6594482",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
@@ -412,7 +434,9 @@
"cell_type": "code",
"execution_count": 24,
"id": "e2badd21",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = []\n",
@@ -424,7 +448,9 @@
"cell_type": "code",
"execution_count": 25,
"id": "edb31fe5",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
@@ -453,7 +479,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 27,
"id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
"metadata": {
"tags": []
@@ -463,7 +489,7 @@
"from langchain.chains.llm import LLMChain\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"from langchain.chains.chat_index.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT\n",
"from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT\n",
"from langchain.chains.question_answering import load_qa_chain\n",
"\n",
"# Construct a ConversationalRetrievalChain with a streaming llm for combine docs\n",
@@ -480,7 +506,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 28,
"id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
"metadata": {
"tags": []
@@ -502,7 +528,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 29,
"id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
"metadata": {
"tags": []
@@ -512,7 +538,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
" Justice Stephen Breyer"
" Ketanji Brown Jackson succeeded Justice Stephen Breyer on the United States Supreme Court."
]
}
],
@@ -533,9 +559,11 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 31,
"id": "a7ba9d8c",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def get_chat_history(inputs) -> str:\n",
@@ -543,14 +571,16 @@
" for human, ai in inputs:\n",
" res.append(f\"Human:{human}\\nAI:{ai}\")\n",
" return \"\\n\".join(res)\n",
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore, get_chat_history=get_chat_history)"
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), get_chat_history=get_chat_history)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 32,
"id": "a3e33c0d",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = []\n",
@@ -560,9 +590,11 @@
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 33,
"id": "936dc62f",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
@@ -570,7 +602,7 @@
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 31,
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
@@ -604,7 +636,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.9"
}
},
"nbformat": 4,

View File

@@ -23,7 +23,9 @@
"cell_type": "code",
"execution_count": 1,
"id": "17fcbc0f",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
@@ -38,17 +40,26 @@
"cell_type": "code",
"execution_count": 2,
"id": "ef9305cc",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"index_creator = VectorstoreIndexCreator()"
"with open(\"../../state_of_the_union.txt\") as f:\n",
" state_of_the_union = f.read()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "291f0117",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
@@ -60,27 +71,29 @@
}
],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader(\"../../state_of_the_union.txt\")\n",
"docsearch = index_creator.from_loaders([loader])"
"docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{\"source\": str(i)} for i in range(len(texts))]).as_retriever()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d1eaf6e6",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"docs = docsearch.similarity_search(query)"
"docs = docsearch.get_relevant_documents(query)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a16e3453",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chains.question_answering import load_qa_chain\n",
@@ -98,17 +111,19 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 6,
"id": "fd9e6190",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"' The president said that he was honoring Justice Breyer for his service to the country and that he was a Constitutional scholar, Army veteran, and retiring Justice of the United States Supreme Court.'"
"' The president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service.'"
]
},
"execution_count": 19,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -139,9 +154,11 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"id": "180fd4c1",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"stuff\")"
@@ -149,17 +166,19 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"id": "77fdf1aa",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': ' The president said that he was honoring Justice Breyer for his service to the country and that he was a Constitutional scholar, Army veteran, and retiring Justice of the United States Supreme Court.'}"
"{'output_text': ' The president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service.'}"
]
},
"execution_count": 7,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -181,17 +200,19 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 9,
"id": "5558c9e0",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese e ha onorato la sua carriera come giudice della Corte Suprema degli Stati Uniti.'}"
"{'output_text': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese e ha ricevuto una vasta gamma di supporto.'}"
]
},
"execution_count": 7,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@@ -222,9 +243,11 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 10,
"id": "b0060f51",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"map_reduce\")"
@@ -232,17 +255,19 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 11,
"id": "fbdb9137",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': ' The president said, \"Justice Breyer, thank you for your service.\"'}"
"{'output_text': ' The president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service.'}"
]
},
"execution_count": 9,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
@@ -264,9 +289,11 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 12,
"id": "452c8680",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"map_reduce\", return_map_steps=True)"
@@ -274,21 +301,23 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 13,
"id": "90b47a75",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'intermediate_steps': [' \"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\"',\n",
" ' None',\n",
" ' A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.',\n",
" ' None',\n",
" ' None'],\n",
" 'output_text': ' The president said, \"Justice Breyer, thank you for your service.\"'}"
" 'output_text': ' The president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service.'}"
]
},
"execution_count": 11,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
@@ -309,21 +338,23 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 14,
"id": "af03a578",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'intermediate_steps': [\"\\nStasera vorrei onorare qualcuno che ha dedicato la sua vita a servire questo paese: il giustizia Stephen Breyer - un veterano dell'esercito, uno studioso costituzionale e un giustizia in uscita della Corte Suprema degli Stati Uniti. Giustizia Breyer, grazie per il tuo servizio.\",\n",
" '\\nNessun testo pertinente.',\n",
" \"\\nCome ho detto l'anno scorso, soprattutto ai nostri giovani americani transgender, avrò sempre il tuo sostegno come tuo Presidente, in modo che tu possa essere te stesso e raggiungere il tuo potenziale donato da Dio.\",\n",
" '\\nNella mia amministrazione, i guardiani sono stati accolti di nuovo. Stiamo andando dietro ai criminali che hanno rubato miliardi di dollari di aiuti di emergenza destinati alle piccole imprese e a milioni di americani. E stasera, annuncio che il Dipartimento di Giustizia nominerà un procuratore capo per la frode pandemica.'],\n",
" 'output_text': ' Non conosco la risposta alla tua domanda su cosa abbia detto il Presidente riguardo al Giustizia Breyer.'}"
" ' Non ha detto nulla riguardo a Justice Breyer.',\n",
" \" Non c'è testo pertinente.\"],\n",
" 'output_text': ' Non ha detto nulla riguardo a Justice Breyer.'}"
]
},
"execution_count": 13,
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
@@ -379,9 +410,11 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 15,
"id": "fb167057",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"refine\")"
@@ -389,17 +422,19 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 16,
"id": "d8b5286e",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his commitment to protecting the rights of LGBTQ+ Americans and his support for the bipartisan Equality Act. He also mentioned his plan to lower costs to give families a fair shot, lower the deficit, and go after criminals who stole pandemic relief funds. He also announced that the Justice Department will name a chief prosecutor for pandemic fraud.'}"
"{'output_text': '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his support of the Equality Act and his commitment to protecting the rights of LGBTQ+ Americans. He also praised Justice Breyer for his role in helping to pass the Bipartisan Infrastructure Law, which he said would be the most sweeping investment to rebuild America in history and would help the country compete for the jobs of the 21st Century.'}"
]
},
"execution_count": 13,
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
@@ -421,9 +456,11 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 17,
"id": "a5c64200",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"refine\", return_refine_steps=True)"
@@ -431,21 +468,23 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 18,
"id": "817546ac",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'intermediate_steps': ['\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country and his legacy of excellence.',\n",
" '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice.',\n",
" '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his commitment to protecting the rights of LGBTQ+ Americans and his support for the bipartisan Equality Act.',\n",
" '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his commitment to protecting the rights of LGBTQ+ Americans and his support for the bipartisan Equality Act. He also mentioned his plan to lower costs to give families a fair shot, lower the deficit, and go after criminals who stole pandemic relief funds. He also announced that the Justice Department will name a chief prosecutor for pandemic fraud.'],\n",
" 'output_text': '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his commitment to protecting the rights of LGBTQ+ Americans and his support for the bipartisan Equality Act. He also mentioned his plan to lower costs to give families a fair shot, lower the deficit, and go after criminals who stole pandemic relief funds. He also announced that the Justice Department will name a chief prosecutor for pandemic fraud.'}"
" '\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice.',\n",
" '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his support of the Equality Act and his commitment to protecting the rights of LGBTQ+ Americans.',\n",
" '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his support of the Equality Act and his commitment to protecting the rights of LGBTQ+ Americans. He also praised Justice Breyer for his role in helping to pass the Bipartisan Infrastructure Law, which is the most sweeping investment to rebuild America in history.'],\n",
" 'output_text': '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his support of the Equality Act and his commitment to protecting the rights of LGBTQ+ Americans. He also praised Justice Breyer for his role in helping to pass the Bipartisan Infrastructure Law, which is the most sweeping investment to rebuild America in history.'}"
]
},
"execution_count": 15,
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
@@ -466,21 +505,23 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 19,
"id": "6664bda7",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'intermediate_steps': ['\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera. Ha anche detto che la sua nomina di Circuit Court of Appeals Judge Ketanji Brown Jackson continuerà il suo eccezionale lascito.',\n",
" \"\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera. Ha anche detto che la sua nomina di Circuit Court of Appeals Judge Ketanji Brown Jackson continuerà il suo eccezionale lascito. Ha sottolineato che la sua esperienza come avvocato di alto livello in pratica privata, come ex difensore federale pubblico e come membro di una famiglia di educatori e agenti di polizia, la rende una costruttrice di consenso. Ha anche sottolineato che, dalla sua nomina, ha ricevuto un ampio sostegno, dall'Ordine Fraterno della Polizia a ex giudici nominati da democratici e repubblicani.\",\n",
" \"\\n\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera. Ha anche detto che la sua nomina di Circuit Court of Appeals Judge Ketanji Brown Jackson continuerà il suo eccezionale lascito. Ha sottolineato che la sua esperienza come avvocato di alto livello in pratica privata, come ex difensore federale pubblico e come membro di una famiglia di educatori e agenti di polizia, la rende una costruttrice di consenso. Ha anche sottolineato che, dalla sua nomina, ha ricevuto un ampio sostegno, dall'Ordine Fraterno della Polizia a ex giudici nominati da democratici e repubblicani. Ha inoltre sottolineato che la nomina di Justice Breyer è un passo importante verso l'uguaglianza per tutti gli americani, in partic\",\n",
" \"\\n\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera. Ha anche detto che la sua nomina di Circuit Court of Appeals Judge Ketanji Brown Jackson continuerà il suo eccezionale lascito. Ha sottolineato che la sua esperienza come avvocato di alto livello in pratica privata, come ex difensore federale pubblico e come membro di una famiglia di educatori e agenti di polizia, la rende una costruttrice di consenso. Ha anche sottolineato che, dalla sua nomina, ha ricevuto un ampio sostegno, dall'Ordine Fraterno della Polizia a ex giudici nominati da democratici e repubblicani. Ha inoltre sottolineato che la nomina di Justice Breyer è un passo importante verso l'uguaglianza per tutti gli americani, in partic\"],\n",
" 'output_text': \"\\n\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera. Ha anche detto che la sua nomina di Circuit Court of Appeals Judge Ketanji Brown Jackson continuerà il suo eccezionale lascito. Ha sottolineato che la sua esperienza come avvocato di alto livello in pratica privata, come ex difensore federale pubblico e come membro di una famiglia di educatori e agenti di polizia, la rende una costruttrice di consenso. Ha anche sottolineato che, dalla sua nomina, ha ricevuto un ampio sostegno, dall'Ordine Fraterno della Polizia a ex giudici nominati da democratici e repubblicani. Ha inoltre sottolineato che la nomina di Justice Breyer è un passo importante verso l'uguaglianza per tutti gli americani, in partic\"}"
"{'intermediate_steps': ['\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha reso omaggio al suo servizio.',\n",
" \"\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese, ha reso omaggio al suo servizio e ha sostenuto la nomina di una top litigatrice in pratica privata, un ex difensore pubblico federale e una famiglia di insegnanti e agenti di polizia delle scuole pubbliche. Ha anche sottolineato l'importanza di avanzare la libertà e la giustizia attraverso la sicurezza delle frontiere e la risoluzione del sistema di immigrazione.\",\n",
" \"\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese, ha reso omaggio al suo servizio e ha sostenuto la nomina di una top litigatrice in pratica privata, un ex difensore pubblico federale e una famiglia di insegnanti e agenti di polizia delle scuole pubbliche. Ha anche sottolineato l'importanza di avanzare la libertà e la giustizia attraverso la sicurezza delle frontiere, la risoluzione del sistema di immigrazione, la protezione degli americani LGBTQ+ e l'approvazione dell'Equality Act. Ha inoltre sottolineato l'importanza di lavorare insieme per sconfiggere l'epidemia di oppiacei.\",\n",
" \"\\n\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese, ha reso omaggio al suo servizio e ha sostenuto la nomina di una top litigatrice in pratica privata, un ex difensore pubblico federale e una famiglia di insegnanti e agenti di polizia delle scuole pubbliche. Ha anche sottolineato l'importanza di avanzare la libertà e la giustizia attraverso la sicurezza delle frontiere, la risoluzione del sistema di immigrazione, la protezione degli americani LGBTQ+ e l'approvazione dell'Equality Act. Ha inoltre sottolineato l'importanza di lavorare insieme per sconfiggere l'epidemia di oppiacei e per investire in America, educare gli americani, far crescere la forza lavoro e costruire l'economia dal\"],\n",
" 'output_text': \"\\n\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese, ha reso omaggio al suo servizio e ha sostenuto la nomina di una top litigatrice in pratica privata, un ex difensore pubblico federale e una famiglia di insegnanti e agenti di polizia delle scuole pubbliche. Ha anche sottolineato l'importanza di avanzare la libertà e la giustizia attraverso la sicurezza delle frontiere, la risoluzione del sistema di immigrazione, la protezione degli americani LGBTQ+ e l'approvazione dell'Equality Act. Ha inoltre sottolineato l'importanza di lavorare insieme per sconfiggere l'epidemia di oppiacei e per investire in America, educare gli americani, far crescere la forza lavoro e costruire l'economia dal\"}"
]
},
"execution_count": 14,
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
@@ -532,9 +573,11 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 20,
"id": "e2bfe203",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"map_rerank\", return_intermediate_steps=True)"
@@ -542,9 +585,11 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 21,
"id": "5c28880c",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
@@ -553,17 +598,19 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 22,
"id": "80ac2db3",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"' The president thanked Justice Breyer for his service and honored him for dedicating his life to serving the country. '"
"' The President thanked Justice Breyer for his service and honored him for dedicating his life to serve the country.'"
]
},
"execution_count": 18,
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
@@ -574,24 +621,23 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 23,
"id": "b428fcb9",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"[{'answer': ' The president thanked Justice Breyer for his service and honored him for dedicating his life to serving the country. ',\n",
"[{'answer': ' The President thanked Justice Breyer for his service and honored him for dedicating his life to serve the country.',\n",
" 'score': '100'},\n",
" {'answer': \" The president said that Justice Breyer is a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that since she's been nominated, she's received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans, and that she is a consensus builder.\",\n",
" 'score': '100'},\n",
" {'answer': ' The president did not mention Justice Breyer in this context.',\n",
" 'score': '0'},\n",
" {'answer': ' The president did not mention Justice Breyer in the given context. ',\n",
" 'score': '0'}]"
" {'answer': ' This document does not answer the question', 'score': '0'},\n",
" {'answer': ' This document does not answer the question', 'score': '0'},\n",
" {'answer': ' This document does not answer the question', 'score': '0'}]"
]
},
"execution_count": 19,
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
@@ -612,24 +658,25 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 24,
"id": "41b83cd8",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'intermediate_steps': [{'answer': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese e ha onorato la sua carriera.',\n",
"{'intermediate_steps': [{'answer': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese.',\n",
" 'score': '100'},\n",
" {'answer': ' Il presidente non ha detto nulla sulla Giustizia Breyer.',\n",
" 'score': '100'},\n",
" {'answer': ' Non so.', 'score': '0'},\n",
" {'answer': ' Il presidente non ha detto nulla sulla giustizia Breyer.',\n",
" 'score': '100'}],\n",
" 'output_text': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese e ha onorato la sua carriera.'}"
" {'answer': ' Non so.', 'score': '0'}],\n",
" 'output_text': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese.'}"
]
},
"execution_count": 16,
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
@@ -694,7 +741,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.9"
},
"vscode": {
"interpreter": {

View File

@@ -0,0 +1,87 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "66a7777e",
"metadata": {},
"source": [
"# Bilibili\n",
"\n",
"This loader utilizes the `bilibili-api` to fetch the text transcript from Bilibili, one of the most beloved long-form video sites in China.\n",
"\n",
"With this BiliBiliLoader, users can easily obtain the transcript of their desired video content on the platform."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "9ec8a3b3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders.bilibili import BiliBiliLoader"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "43128d8d",
"metadata": {},
"outputs": [],
"source": [
"#!pip install bilibili-api"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "35d6809a",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"loader = BiliBiliLoader(\n",
" [\"https://www.bilibili.com/video/BV1xt411o7Xu/\"]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"loader.load()"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -106,7 +106,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Specify a column to be used identify the document source\n",
"## Specify a column to be used identify the document source\n",
"\n",
"Use the `source_column` argument to specify a column to be set as the source for the document created from each row. Otherwise `file_path` will be used as the source for all documents created from the csv file.\n",
"\n",

Submodule docs/modules/indexes/document_loaders/examples/example_data/test_repo1 added at 7e525a3b91

View File

@@ -8,4 +8,5 @@
1/23/23, 3:02 AM - User 1: I thought you were selling the blue one!
1/23/23, 3:18 AM - User 2: No Im sorry it was my mistake, the blue one is not for sale
1/23/23, 3:19 AM - User 1: Oh no worries! Bye
1/23/23, 3:19 AM - User 2: Bye!
1/23/23, 3:19 AM - User 2: Bye!
1/23/23, 3:22_AM - User 1: And let me know if anything changes

View File

@@ -0,0 +1,192 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Git\n",
"\n",
"This notebook shows how to load text files from Git repository."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load existing repository from disk"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from git import Repo\n",
"\n",
"repo = Repo.clone_from(\n",
" \"https://github.com/hwchase17/langchain\", to_path=\"./example_data/test_repo1\"\n",
")\n",
"branch = repo.head.reference"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GitLoader"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"loader = GitLoader(repo_path=\"./example_data/test_repo1/\", branch=branch)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"len(data)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='.venv\\n.github\\n.git\\n.mypy_cache\\n.pytest_cache\\nDockerfile' metadata={'file_path': '.dockerignore', 'file_name': '.dockerignore', 'file_type': ''}\n"
]
}
],
"source": [
"print(data[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clone repository from url"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GitLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"loader = GitLoader(\n",
" clone_url=\"https://github.com/hwchase17/langchain\",\n",
" repo_path=\"./example_data/test_repo2/\",\n",
" branch=\"master\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1074"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Filtering files to load"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GitLoader\n",
"\n",
"# eg. loading only python files\n",
"loader = GitLoader(repo_path=\"./example_data/test_repo1/\", file_filter=lambda file_path: file_path.endswith(\".py\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -104,10 +104,11 @@
"Efficient Data AnnotationC u s t o m i z e d M o d e l T r a i n i n gModel Cust omizationDI A Model HubDI A Pipeline SharingCommunity PlatformLa y out Detection ModelsDocument Images \n",
"T h e C o r e L a y o u t P a r s e r L i b r a r yOCR ModuleSt or age & VisualizationLa y out Data Structur e\n",
"Fig. 1: The overall architecture of LayoutParser . For an input document image,\n",
"the core LayoutParser library provides a set of o\u000B",
"the core LayoutParser library provides a set of o\u000b",
"\n",
"-the-shelf tools for layout\n",
"detection, OCR, visualization, and storage, backed by a carefully designed layout\n",
"data structure. LayoutParser also supports high level customization via e\u000Ecient\n",
"data structure. LayoutParser also supports high level customization via e\u000ecient\n",
"layout annotation and model training functions. These improve model accuracy\n",
"on the target samples. The community platform enables the easy sharing of DIA\n",
"models and whole digitization pipelines to promote reusability and reproducibility.\n",
@@ -117,6 +118,7 @@
"DL-based support for developing and deploying models for general computer\n",
"vision and natural language processing problems. LayoutParser , on the other\n",
"hand, specializes speci\f",
"\n",
"cally in DIA tasks. LayoutParser is also equipped with a\n",
"community platform inspired by established model hubs such as Torch Hub [23]\n",
"andTensorFlow Hub [1]. It enables the sharing of pretrained models as well as\n",
@@ -125,13 +127,16 @@
"development of DL models. Some examples include PRImA [ 3](magazine layouts),\n",
"PubLayNet [ 38](academic paper layouts), Table Bank [ 18](tables in academic\n",
"papers), Newspaper Navigator Dataset [ 16,17](newspaper \f",
"\n",
"gure layouts) and\n",
"HJDataset [31](historical Japanese document layouts). A spectrum of models\n",
"trained on these datasets are currently available in the LayoutParser model zoo\n",
"to support di\u000B",
"to support di\u000b",
"\n",
"erent use cases.\n",
"3 The Core LayoutParser Library\n",
"At the core of LayoutParser is an o\u000B",
"At the core of LayoutParser is an o\u000b",
"\n",
"-the-shelf toolkit that streamlines DL-\n",
"based document image analysis. Five components support a simple interface\n",
"with comprehensive functionalities: 1) The layout detection models enable using\n",
@@ -226,7 +231,9 @@
"outputs": [
{
"data": {
"text/plain": "Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 (<28>), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: Document Image Analysis · Deep Learning · Layout Analysis\\n· Character Recognition · Open Source library · Toolkit.\\n1\\nIntroduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n', lookup_str='', metadata={'file_path': 'example_data/layout-parser-paper.pdf', 'page_number': 1, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': '', 'encryption': None}, lookup_index=0)"
"text/plain": [
"Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 (<28>), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: Document Image Analysis · Deep Learning · Layout Analysis\\n· Character Recognition · Open Source library · Toolkit.\\n1\\nIntroduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n', lookup_str='', metadata={'file_path': 'example_data/layout-parser-paper.pdf', 'page_number': 1, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': '', 'encryption': None}, lookup_index=0)"
]
},
"execution_count": 5,
"metadata": {},
@@ -239,53 +246,51 @@
},
{
"cell_type": "markdown",
"id": "278c881f",
"metadata": {},
"source": [
"### Fetching remote PDFs using Unstructured\n",
"\n",
"This covers how to load online pdfs into a document format that we can use downstream. This can be used for various online pdf sites such as https://open.umn.edu/opentextbooks/textbooks/ and https://arxiv.org/archive/\n",
"\n",
"Note: all other pdf loaders can also be used to fetch remote PDFs, but `OnlinePDFLoader` is a legacy function, and works specifically with `UnstructuredPDFLoader`.\n"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0c2686fc",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import OnlinePDFLoader"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "101e0b82",
"metadata": {},
"outputs": [],
"source": [
"loader = OnlinePDFLoader(\"https://arxiv.org/pdf/2302.03803.pdf\")"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "be3ccbfa",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "e1298dd6",
"metadata": {},
"outputs": [
{
"name": "stdout",
@@ -297,17 +302,13 @@
],
"source": [
"print(data)"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "markdown",
"source": [],
"metadata": {
"collapsed": false
}
"id": "05187b33",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
@@ -349,55 +350,101 @@
},
{
"cell_type": "markdown",
"id": "c90a5fe8",
"metadata": {},
"source": [
"## Using PyMuPDF\n",
"\n",
"This is the fastest of the PDF parsing options, and contains detailed metadata about the PDF and its pages, as well as returns one document per page."
],
"metadata": {
"collapsed": false
}
"## Using PDFMiner to generate HTML text"
]
},
{
"cell_type": "markdown",
"id": "eb785e1c",
"metadata": {},
"source": [
"This can be helpful for chunking texts semantically into sections as the output html content can be parsed via `BeautifulSoup` to get more structured and rich information about font size, page numbers, pdf headers/footers, etc."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "601000d7",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import PyMuPDFLoader"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 2,
"outputs": [],
"source": [
"loader = PyMuPDFLoader(\"example_data/layout-parser-paper.pdf\")"
],
"metadata": {
"collapsed": false
}
"from langchain.document_loaders import PDFMinerPDFasHTMLLoader"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a5525fb0",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
],
"metadata": {
"collapsed": false
}
"loader = PDFMinerPDFasHTMLLoader(\"example_data/layout-parser-paper.pdf\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "dac7ff68",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "cc2c2f4f",
"metadata": {},
"source": [
"## Using PyMuPDF\n",
"\n",
"This is the fastest of the PDF parsing options, and contains detailed metadata about the PDF and its pages, as well as returns one document per page."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "55f0c4d8",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import PyMuPDFLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "718cbfbc",
"metadata": {},
"outputs": [],
"source": [
"loader = PyMuPDFLoader(\"example_data/layout-parser-paper.pdf\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f2f93a15",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a24dfaa6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": "Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 (<28>), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: Document Image Analysis · Deep Learning · Layout Analysis\\n· Character Recognition · Open Source library · Toolkit.\\n1\\nIntroduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n', lookup_str='', metadata={'file_path': 'example_data/layout-parser-paper.pdf', 'page_number': 1, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': '', 'encryption': None}, lookup_index=0)"
"text/plain": [
"Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 (<28>), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: Document Image Analysis · Deep Learning · Layout Analysis\\n· Character Recognition · Open Source library · Toolkit.\\n1\\nIntroduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n', lookup_str='', metadata={'file_path': 'example_data/layout-parser-paper.pdf', 'page_number': 1, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': '', 'encryption': None}, lookup_index=0)"
]
},
"execution_count": 4,
"metadata": {},
@@ -406,28 +453,23 @@
],
"source": [
"data[0]"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "markdown",
"id": "83cb52a0",
"metadata": {},
"source": [
"Additionally, you can pass along any of the options from the [PyMuPDF documentation](https://pymupdf.readthedocs.io/en/latest/app1.html#plain-text/) as keyword arguments in the `load` call, and it will be pass along to the `get_text()` call."
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1bf73c97",
"metadata": {},
"outputs": [],
"source": [],
"metadata": {
"collapsed": false
}
"source": []
}
],
"metadata": {
@@ -446,7 +488,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.8"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,81 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "1dc7df1d",
"metadata": {},
"source": [
"# Slack (Local Exported Zipfile)\n",
"\n",
"This notebook covers how to load documents from a Zipfile generated from a Slack export.\n",
"\n",
"In order to get this Slack export, follow these instructions:\n",
"\n",
"## 🧑 Instructions for ingesting your own dataset\n",
"\n",
"Export your Slack data. You can do this by going to your Workspace Management page and clicking the Import/Export option ({your_slack_domain}.slack.com/services/export). Then, choose the right date range and click `Start export`. Slack will send you an email and a DM when the export is ready.\n",
"\n",
"The download will produce a `.zip` file in your Downloads folder (or wherever your downloads can be found, depending on your OS configuration).\n",
"\n",
"Copy the path to the `.zip` file, and assign it as `LOCAL_ZIPFILE` below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "007c5cbf",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import SlackDirectoryLoader "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1caec59",
"metadata": {},
"outputs": [],
"source": [
"# Optionally set your Slack URL. This will give you proper URLs in the docs sources.\n",
"SLACK_WORKSPACE_URL = \"https://xxx.slack.com\"\n",
"LOCAL_ZIPFILE = \"\" # Paste the local paty to your Slack zip file here.\n",
"\n",
"loader = SlackDirectoryLoader(LOCAL_ZIPFILE, SLACK_WORKSPACE_URL)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1c30ff7",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()\n",
"docs"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -112,6 +112,79 @@
"source": [
"data = loader.load()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a2c1c79f",
"metadata": {},
"source": [
"# Playwright URL Loader\n",
"\n",
"This covers how to load HTML documents from a list of URLs using the `PlaywrightURLLoader`.\n",
"\n",
"As in the Selenium case, Playwright allows us to load pages that need JavaScript to render.\n",
"\n",
"## Setup\n",
"\n",
"To use the `PlaywrightURLLoader`, you will need to install `playwright` and `unstructured`. Additionally, you will need to install the Playwright Chromium browser:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53158417",
"metadata": {},
"outputs": [],
"source": [
"# Install playwright\n",
"!pip install \"playwright\"\n",
"!pip install \"unstructured\"\n",
"!playwright install"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0ab4e115",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import PlaywrightURLLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce5a9a0a",
"metadata": {},
"outputs": [],
"source": [
"urls = [\n",
" \"https://www.youtube.com/watch?v=dQw4w9WgXcQ\",\n",
" \"https://goo.gl/maps/NDSHwePEyaHMFGwh8\"\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2dc3e0bc",
"metadata": {},
"outputs": [],
"source": [
"loader = PlaywrightURLLoader(urls=urls, remove_selectors=[\"header\", \"footer\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10b79f80",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
}
],
"metadata": {
@@ -130,7 +203,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
"version": "3.10.6"
}
},
"nbformat": 4,

View File

@@ -89,7 +89,7 @@
"id": "150988e6",
"metadata": {},
"source": [
"# Loading multiple webpages\n",
"## Loading multiple webpages\n",
"\n",
"You can also load multiple webpages at once by passing in a list of urls to the loader. This will return a list of documents in the same order as the urls passed in."
]
@@ -123,7 +123,7 @@
"id": "641be294",
"metadata": {},
"source": [
"## Load multiple urls concurrently\n",
"### Load multiple urls concurrently\n",
"\n",
"You can speed up the scraping process by scraping and parsing multiple urls concurrently.\n",
"\n",

View File

@@ -99,7 +99,7 @@
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../state_of_the_union.txt')"
"loader = TextLoader('../state_of_the_union.txt', encoding='utf8')"
]
},
{

View File

@@ -0,0 +1,95 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "9fc6205b",
"metadata": {},
"source": [
"# Databerry\n",
"\n",
"This notebook shows how to use [Databerry's](https://www.databerry.ai/) retriever.\n",
"\n",
"First, you will need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url"
]
},
{
"cell_type": "markdown",
"id": "944e172b",
"metadata": {},
"source": [
"## Query\n",
"\n",
"Now that our index is set up, we can set up a retriever and start querying it."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d0e6f506",
"metadata": {},
"outputs": [],
"source": [
"from langchain.retrievers import DataberryRetriever"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "f381f642",
"metadata": {},
"outputs": [],
"source": [
"retriever = DataberryRetriever(\n",
" datastore_url=\"https://clg1xg2h80000l708dymr0fxc.databerry.ai/query\",\n",
" # api_key=\"DATABERRY_API_KEY\", # optional if datastore is public\n",
" # top_k=10 # optional\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "20ae1a74",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='✨ Made with DaftpageOpen main menuPricingTemplatesLoginSearchHelpGetting StartedFeaturesAffiliate ProgramGetting StartedDaftpage is a new type of website builder that works like a doc.It makes website building easy, fun and offers tons of powerful features for free. Just type / in your page to get started!DaftpageCopyright © 2022 Daftpage, Inc.All rights reserved.ProductPricingTemplatesHelp & SupportHelp CenterGetting startedBlogCompanyAboutRoadmapTwitterAffiliate Program👾 Discord', metadata={'source': 'https:/daftpage.com/help/getting-started', 'score': 0.8697265}),\n",
" Document(page_content=\"✨ Made with DaftpageOpen main menuPricingTemplatesLoginSearchHelpGetting StartedFeaturesAffiliate ProgramHelp CenterWelcome to Daftpages help center—the one-stop shop for learning everything about building websites with Daftpage.Daftpage is the simplest way to create websites for all purposes in seconds. Without knowing how to code, and for free!Get StartedDaftpage is a new type of website builder that works like a doc.It makes website building easy, fun and offers tons of powerful features for free. Just type / in your page to get started!Start here✨ Create your first site🧱 Add blocks🚀 PublishGuides🔖 Add a custom domainFeatures🔥 Drops🎨 Drawings👻 Ghost mode💀 Skeleton modeCant find the answer you're looking for?mail us at support@daftpage.comJoin the awesome Daftpage community on: 👾 DiscordDaftpageCopyright © 2022 Daftpage, Inc.All rights reserved.ProductPricingTemplatesHelp & SupportHelp CenterGetting startedBlogCompanyAboutRoadmapTwitterAffiliate Program👾 Discord\", metadata={'source': 'https:/daftpage.com/help', 'score': 0.86570895}),\n",
" Document(page_content=\" is the simplest way to create websites for all purposes in seconds. Without knowing how to code, and for free!Get StartedDaftpage is a new type of website builder that works like a doc.It makes website building easy, fun and offers tons of powerful features for free. Just type / in your page to get started!Start here✨ Create your first site🧱 Add blocks🚀 PublishGuides🔖 Add a custom domainFeatures🔥 Drops🎨 Drawings👻 Ghost mode💀 Skeleton modeCant find the answer you're looking for?mail us at support@daftpage.comJoin the awesome Daftpage community on: 👾 DiscordDaftpageCopyright © 2022 Daftpage, Inc.All rights reserved.ProductPricingTemplatesHelp & SupportHelp CenterGetting startedBlogCompanyAboutRoadmapTwitterAffiliate Program👾 Discord\", metadata={'source': 'https:/daftpage.com/help', 'score': 0.8645384})]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.get_relevant_documents(\"What is Daftpage?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,6 +1,7 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "ab66dd43",
"metadata": {},
@@ -9,12 +10,12 @@
"\n",
"This notebook goes over how to use a retriever that under the hood uses Pinecone and Hybrid Search.\n",
"\n",
"The logic of this retriever is largely taken from [this blog post](https://www.pinecone.io/learn/hybrid-search-intro/)"
"The logic of this retriever is taken from [this documentaion](https://docs.pinecone.io/docs/hybrid-search)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 75,
"id": "393ac030",
"metadata": {},
"outputs": [],
@@ -31,43 +32,61 @@
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "15390796",
"metadata": {},
"outputs": [],
"source": [
"import pinecone # !pip install pinecone-client\n",
"\n",
"pinecone.init(\n",
" api_key=\"...\", # API key here\n",
" environment=\"...\" # find next to api key in console\n",
")\n",
"# choose a name for your index\n",
"index_name = \"...\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "95d5d7f9",
"metadata": {},
"source": [
"You should only have to do this part once."
"You should only have to do this part once.\n",
"\n",
"Note: it's important to make sure that the \"context\" field that holds the document text in the metadata is not indexed. Currently you need to specify explicitly the fields you do want to index. For more information checkout Pinecone's [docs](https://docs.pinecone.io/docs/manage-indexes#selective-metadata-indexing)."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 76,
"id": "3b8f7697",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"WhoAmIResponse(username='load', user_label='label', projectname='load-test')"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import os\n",
"import pinecone\n",
"\n",
"api_key = os.getenv(\"PINECONE_API_KEY\") or \"PINECONE_API_KEY\"\n",
"# find environment next to your API key in the Pinecone console\n",
"env = os.getenv(\"PINECONE_ENVIRONMENT\") or \"PINECONE_ENVIRONMENT\"\n",
"\n",
"index_name = \"langchain-pinecone-hybrid-search\"\n",
"\n",
"pinecone.init(api_key=api_key, enviroment=env)\n",
"pinecone.whoami()"
]
},
{
"cell_type": "code",
"execution_count": 77,
"id": "cfa3a8d8",
"metadata": {},
"outputs": [],
"source": [
"# create the index\n",
" # create the index\n",
"pinecone.create_index(\n",
" name = index_name,\n",
" dimension = 1536, # dimensionality of dense model\n",
" metric = \"dotproduct\",\n",
" pod_type = \"s1\"\n",
" metric = \"dotproduct\", # sparse values supported only for dotproduct\n",
" pod_type = \"s1\",\n",
" metadata_config={\"indexed\": []} # see explaination above\n",
")"
]
},
@@ -81,7 +100,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 78,
"id": "bcb3c8c2",
"metadata": {},
"outputs": [],
@@ -90,18 +109,19 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "dbc025d6",
"metadata": {},
"source": [
"## Get embeddings and tokenizers\n",
"## Get embeddings and sparse encoders\n",
"\n",
"Embeddings are used for the dense vectors, tokenizer is used for the sparse vector"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 79,
"id": "2f63c911",
"metadata": {},
"outputs": [],
@@ -110,19 +130,51 @@
"embeddings = OpenAIEmbeddings()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "96bf8879",
"metadata": {},
"source": [
"To encode the text to sparse values you can either choose SPLADE or BM25. For out of domain tasks we recommend using BM25.\n",
"\n",
"For more information about the sparse encoders you can checkout pinecone-text library [docs](https://pinecone-io.github.io/pinecone-text/pinecone_text.html)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 80,
"id": "c3f030e5",
"metadata": {},
"outputs": [],
"source": [
"from transformers import BertTokenizerFast # !pip install transformers\n",
"from pinecone_text.sparse import BM25Encoder\n",
"# or from pinecone_text.sparse import SpladeEncoder if you wish to work with SPLADE\n",
"\n",
"# load bert tokenizer from huggingface\n",
"tokenizer = BertTokenizerFast.from_pretrained(\n",
" 'bert-base-uncased'\n",
")"
"# use default tf-idf values\n",
"bm25_encoder = BM25Encoder().default()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "23601ddb",
"metadata": {},
"source": [
"The above code is using default tfids values. It's highly recommended to fit the tf-idf values to your own corpus. You can do it as follow:\n",
"\n",
"```python\n",
"corpus = [\"foo\", \"bar\", \"world\", \"hello\"]\n",
"\n",
"# fit tf-idf values on your corpus\n",
"bm25_encoder.fit(corpus)\n",
"\n",
"# store the values to a json file\n",
"bm25_encoder.dump(\"bm25_values.json\")\n",
"\n",
"# load to your BM25Encoder object\n",
"bm25_encoder = BM25Encoder().load(\"bm25_values.json\")\n",
"```"
]
},
{
@@ -137,12 +189,12 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 81,
"id": "ac77d835",
"metadata": {},
"outputs": [],
"source": [
"retriever = PineconeHybridSearchRetriever(embeddings=embeddings, index=index, tokenizer=tokenizer)"
"retriever = PineconeHybridSearchRetriever(embeddings=embeddings, sparse_encoder=bm25_encoder, index=index)"
]
},
{
@@ -157,23 +209,16 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 82,
"id": "98b1c017",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4d6f3ee7ca754d07a1a18d100d99e0cd",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 1/1 [00:02<00:00, 2.27s/it]\n"
]
}
],
"source": [
@@ -192,7 +237,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 83,
"id": "c0455218",
"metadata": {},
"outputs": [],
@@ -202,7 +247,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 84,
"id": "7dfa5c29",
"metadata": {},
"outputs": [
@@ -212,7 +257,7 @@
"Document(page_content='foo', metadata={})"
]
},
"execution_count": 10,
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
@@ -220,19 +265,11 @@
"source": [
"result[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74bd9256",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": ".venv",
"language": "python",
"name": "python3"
},
@@ -246,7 +283,12 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.9.13"
},
"vscode": {
"interpreter": {
"hash": "7ec0d8babd8cabf695a1d94b1e586d626e046c9df609f6bad065d15d49f67f54"
}
}
},
"nbformat": 4,

View File

@@ -0,0 +1,128 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ab66dd43",
"metadata": {},
"source": [
"# SVM Retriever\n",
"\n",
"This notebook goes over how to use a retriever that under the hood uses an SVM using scikit-learn.\n",
"\n",
"Largely based on https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.ipynb"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "393ac030",
"metadata": {},
"outputs": [],
"source": [
"from langchain.retrievers import SVMRetriever\n",
"from langchain.embeddings import OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a801b57c",
"metadata": {},
"outputs": [],
"source": [
"# !pip install scikit-learn"
]
},
{
"cell_type": "markdown",
"id": "aaf80e7f",
"metadata": {},
"source": [
"## Create New Retriever with Texts"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "98b1c017",
"metadata": {},
"outputs": [],
"source": [
"retriever = SVMRetriever.from_texts([\"foo\", \"bar\", \"world\", \"hello\", \"foo bar\"], OpenAIEmbeddings())"
]
},
{
"cell_type": "markdown",
"id": "08437fa2",
"metadata": {},
"source": [
"## Use Retriever\n",
"\n",
"We can now use the retriever!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c0455218",
"metadata": {},
"outputs": [],
"source": [
"result = retriever.get_relevant_documents(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7dfa5c29",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='foo', metadata={}),\n",
" Document(page_content='foo bar', metadata={}),\n",
" Document(page_content='hello', metadata={}),\n",
" Document(page_content='world', metadata={})]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74bd9256",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -17,34 +17,36 @@
"metadata": {},
"outputs": [],
"source": [
"!python3 -m pip install openai deeplake"
"!python3 -m pip install openai deeplake tiktoken"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import DeepLake\n",
"from langchain.document_loaders import TextLoader"
"from langchain.vectorstores import DeepLake"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.environ['OPENAI_API_KEY'] = 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'"
"import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
@@ -60,9 +62,38 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mem://langchain loaded successfully.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 1/1 [00:04<00:00\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='mem://langchain', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (4, 1536) float32 None \n",
" ids text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
}
],
"source": [
"db = DeepLake.from_documents(docs, embeddings)\n",
"\n",
@@ -72,9 +103,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 7,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
@@ -89,9 +134,18 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/media/sdb/davit/.local/lib/python3.10/site-packages/langchain/llms/openai.py:624: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`\n",
" warnings.warn(\n"
]
}
],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.llms import OpenAIChat\n",
@@ -101,9 +155,20 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"'The president nominated Circuit Court of Appeals Judge Ketanji Brown Jackson for the United States Supreme Court and praised her qualifications and broad support from both Democrats and Republicans.'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = 'What did the president say about Ketanji Brown Jackson'\n",
"qa.run(query)"
@@ -119,9 +184,43 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 11,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mem://langchain loaded successfully.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 1/1 [00:04<00:00\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='mem://langchain', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (42, 1536) float32 None \n",
" ids text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": []
}
],
"source": [
"import random\n",
"\n",
@@ -133,9 +232,30 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 12,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 42/42 [00:00<00:00, 3456.17it/s]\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='Vice President Harris and I ran for office with a new economic vision for America. \\n\\nInvest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up \\nand the middle out, not from the top down. \\n\\nBecause we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \\n\\nAmerica used to have the best roads, bridges, and airports on Earth. \\n\\nNow our infrastructure is ranked 13th in the world. \\n\\nWe wont be able to compete for the jobs of the 21st Century if we dont fix that. \\n\\nThats why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history. \\n\\nThis was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. \\n\\nWere done talking about infrastructure weeks. \\n\\nWere going to have an infrastructure decade.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China. \\n\\nAs Ive told Xi Jinping, it is never a good bet to bet against the American people. \\n\\nWell create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. \\n\\nAnd well do it all to withstand the devastating effects of the climate crisis and promote environmental justice. \\n\\nWell build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities. \\n\\n4,000 projects have already been announced. \\n\\nAnd tonight, Im announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013})]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db.similarity_search('What did the president say about Ketanji Brown Jackson', filter={'year': 2013})"
]
@@ -151,9 +271,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 13,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2014})]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db.similarity_search('What did the president say about Ketanji Brown Jackson?', distance_metric='cos')"
]
@@ -169,9 +303,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 14,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \\n\\nWhen they came home, many of the worlds fittest and best trained warriors were never the same. \\n\\nHeadaches. Numbness. Dizziness. \\n\\nA cancer that would put them in a flag-draped coffin. \\n\\nI know. \\n\\nOne of those soldiers was my son Major Beau Biden. \\n\\nWe dont know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \\n\\nBut Im committed to finding out everything we can. \\n\\nCommitted to military families like Danielle Robinson from Ohio. \\n\\nThe widow of Sergeant First Class Heath Robinson. \\n\\nHe was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \\n\\nStationed near Baghdad, just yards from burn pits the size of football fields. \\n\\nHeaths widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2014}),\n",
" Document(page_content='As Ohio Senator Sherrod Brown says, “Its time to bury the label “Rust Belt.” \\n\\nIts time. \\n\\nBut with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills. \\n\\nInflation is robbing them of the gains they might otherwise feel. \\n\\nI get it. Thats why my top priority is getting prices under control. \\n\\nLook, our economy roared back faster than most predicted, but the pandemic meant that businesses had a hard time hiring enough workers to keep up production in their factories. \\n\\nThe pandemic also disrupted global supply chains. \\n\\nWhen factories close, it takes longer to make goods and get them from the warehouse to the store, and prices go up. \\n\\nLook at cars. \\n\\nLast year, there werent enough semiconductors to make all the cars that people wanted to buy. \\n\\nAnd guess what, prices of automobiles went up. \\n\\nSo—we have a choice. \\n\\nOne way to fight inflation is to drive down wages and make Americans poorer.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='We cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \\n\\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \\n\\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \\n\\nOfficer Mora was 27 years old. \\n\\nOfficer Rivera was 22. \\n\\nBoth Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \\n\\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \\n\\nIve worked on these issues a long time. \\n\\nI know what works: Investing in crime preventionand community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012})]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db.max_marginal_relevance_search('What did the president say about Ketanji Brown Jackson?')"
]
@@ -187,21 +335,87 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"!activeloop login -t <token>"
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 17,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your Deep Lake dataset has been successfully created!\n",
"The dataset is private so make sure you are logged in!\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\\"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/linkedin\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"hub://davitbun/linkedin loaded successfully.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 1/1 [00:23<00:00\n",
"/"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://davitbun/linkedin', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (42, 1536) float32 None \n",
" ids text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
}
],
"source": [
"# Embed and store the texts\n",
"dataset_path = \"hub://{username}/{dataset_name}\" # could be also ./local/path (much faster locally), s3://bucket/path/to/dataset, gcs://path/to/dataset, etc.\n",
"dataset_path = f\"hub://{USERNAME}/{DATASET_NAME}\" # could be also ./local/path (much faster locally), s3://bucket/path/to/dataset, gcs://path/to/dataset, etc.\n",
"\n",
"embedding = OpenAIEmbeddings()\n",
"vectordb = DeepLake.from_documents(documents=docs, embedding=embedding, dataset_path=dataset_path)"
@@ -209,9 +423,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 18,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)\n",
@@ -220,11 +448,35 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://davitbun/linkedin', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (42, 1536) float32 None \n",
" ids text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
}
],
"source": [
"vectordb.ds.summary()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"vectordb.ds.summary()"
"embeddings = vectordb.ds.embedding.numpy()"
]
},
{
@@ -232,9 +484,7 @@
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"embeddings = vectordb.ds.embedding.numpy()"
]
"source": []
}
],
"metadata": {

View File

@@ -55,7 +55,7 @@
},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\")\n",
"docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url=\"http://localhost:9200\")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
@@ -94,7 +94,7 @@
},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\", engine=\"faiss\", space_type=\"innerproduct\", ef_construction=256, m=48)\n",
"docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url=\"http://localhost:9200\", engine=\"faiss\", space_type=\"innerproduct\", ef_construction=256, m=48)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
@@ -133,7 +133,7 @@
},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False)\n",
"docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(\"What did the president say about Ketanji Brown Jackson\", k=1, search_type=\"script_scoring\")"
@@ -172,10 +172,10 @@
},
"outputs": [],
"source": [
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False)\n",
"docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False)\n",
"filter = {\"bool\": {\"filter\": {\"term\": {\"text\": \"smuggling\"}}}}\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(\"What did the president say about Ketanji Brown Jackson\", search_type=\"painless_scripting\", space_type=\"cosinesimil\", pre_filter=filter)"
"docs = docsearch.similarity_search(\"What did the president say about Ketanji Brown Jackson\", search_type=\"painless_scripting\", space_type=\"cosineSimilarity\", pre_filter=filter)"
]
},
{
@@ -238,4 +238,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -139,7 +139,7 @@
}
],
"source": [
"llm_chain.predict(human_input=\"Not to bad - how are you?\")"
"llm_chain.predict(human_input=\"Not too bad - how are you?\")"
]
},
{

View File

@@ -16,7 +16,7 @@
"In order to add a memory with an external message store to an agent we are going to do the following steps:\n",
"\n",
"1. We are going to create a `RedisChatMessageHistory` to connect to an external database to store the messages in.\n",
"2. We are going to create an `LLMChain` useing that chat history as memory.\n",
"2. We are going to create an `LLMChain` using that chat history as memory.\n",
"3. We are going to use that `LLMChain` to create a custom Agent.\n",
"\n",
"For the purposes of this exercise, we are going to create a simple custom Agent that has access to a search tool and utilizes the `ConversationBufferMemory` class."

View File

@@ -0,0 +1,62 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "91c6a7ef",
"metadata": {},
"source": [
"# Postgres Chat Message History\n",
"\n",
"This notebook goes over how to use Postgres to store chat message history."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d15e3302",
"metadata": {},
"outputs": [],
"source": [
"from langchain.memory import PostgresChatMessageHistory\n",
"\n",
"history = PostgresChatMessageHistory(connection_string=\"postgresql://postgres:mypassword@localhost/chat_history\", session_id=\"foo\")\n",
"\n",
"history.add_user_message(\"hi!\")\n",
"\n",
"history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "64fc465e",
"metadata": {},
"outputs": [],
"source": [
"history.messages"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -314,7 +314,7 @@
"source": [
"## Saving Message History\n",
"\n",
"You may often to save messages, and then load them to use again. This can be done easily by first converting the messages to normal python dictionaries, saving those (as json or something) and then loading those. Here is an example of doing that."
"You may often have to save messages, and then load them to use again. This can be done easily by first converting the messages to normal python dictionaries, saving those (as json or something) and then loading those. Here is an example of doing that."
]
},
{

View File

@@ -0,0 +1,368 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ff4be5f3",
"metadata": {},
"source": [
"# VectorStore-Backed Memory\n",
"\n",
"`VectorStoreRetrieverMemory` stores memories in a VectorDB and queries the top-K most \"salient\" docs every time it is called.\n",
"\n",
"This differs from most of the other Memory classes in that it doesn't explicitly track the order of interactions.\n",
"\n",
"In this case, the \"docs\" are previous conversation snippets. This can be useful to refer to relevant pieces of information that the AI was told earlier in the conversation."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "da3384db",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from datetime import datetime\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.llms import OpenAI\n",
"from langchain.memory import VectorStoreRetrieverMemory\n",
"from langchain.chains import ConversationChain\n",
"from langchain.prompts import PromptTemplate"
]
},
{
"cell_type": "markdown",
"id": "c2e7abdf",
"metadata": {},
"source": [
"### Initialize your VectorStore\n",
"\n",
"Depending on the store you choose, this step may look different. Consult the relevant VectorStore documentation for more details."
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "eef56f65",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import faiss\n",
"\n",
"from langchain.docstore import InMemoryDocstore\n",
"from langchain.vectorstores import FAISS\n",
"\n",
"\n",
"embedding_size = 1536 # Dimensions of the OpenAIEmbeddings\n",
"index = faiss.IndexFlatL2(embedding_size)\n",
"embedding_fn = OpenAIEmbeddings().embed_query\n",
"vectorstore = FAISS(embedding_fn, index, InMemoryDocstore({}), {})"
]
},
{
"cell_type": "markdown",
"id": "8f4bdf92",
"metadata": {},
"source": [
"### Create your the VectorStoreRetrieverMemory\n",
"\n",
"The memory object is instantiated from any VectorStoreRetriever."
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "e00d4938",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# In actual usage, you would set `k` to be a higher value, but we use k=1 to show that\n",
"# the vector lookup still returns the semantically relevant information\n",
"retriever = vectorstore.as_retriever(search_kwargs=dict(k=1))\n",
"memory = VectorStoreRetrieverMemory(retriever=retriever)\n",
"\n",
"# When added to an agent, the memory object can save pertinent information from conversations or used tools\n",
"memory.save_context({\"input\": \"My favorite food is pizza\"}, {\"output\": \"thats good to know\"})\n",
"memory.save_context({\"input\": \"My favorite sport is soccer\"}, {\"output\": \"...\"})\n",
"memory.save_context({\"input\": \"I don't the Celtics\"}, {\"output\": \"ok\"}) # "
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "2fe28a28",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"input: My favorite sport is soccer\n",
"output: ...\n"
]
}
],
"source": [
"# Notice the first result returned is the memory pertaining to tax help, which the language model deems more semantically relevant\n",
"# to a 1099 than the other documents, despite them both containing numbers.\n",
"print(memory.load_memory_variables({\"prompt\": \"what sport should i watch?\"})[\"history\"])"
]
},
{
"cell_type": "markdown",
"id": "a6d2569f",
"metadata": {},
"source": [
"## Using in a chain\n",
"Let's walk through an example, again setting `verbose=True` so we can see the prompt."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "ebd68c10",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ConversationChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Relevant pieces of previous conversation:\n",
"input: My favorite food is pizza\n",
"output: thats good to know\n",
"\n",
"(You do not need to use these pieces of information if not relevant)\n",
"\n",
"Current conversation:\n",
"Human: Hi, my name is Perry, what's up?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\" Hi Perry, I'm doing well. How about you?\""
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm = OpenAI(temperature=0) # Can be any valid LLM\n",
"_DEFAULT_TEMPLATE = \"\"\"The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Relevant pieces of previous conversation:\n",
"{history}\n",
"\n",
"(You do not need to use these pieces of information if not relevant)\n",
"\n",
"Current conversation:\n",
"Human: {input}\n",
"AI:\"\"\"\n",
"PROMPT = PromptTemplate(\n",
" input_variables=[\"history\", \"input\"], template=_DEFAULT_TEMPLATE\n",
")\n",
"conversation_with_summary = ConversationChain(\n",
" llm=llm, \n",
" prompt=PROMPT,\n",
" # We set a very low max_token_limit for the purposes of testing.\n",
" memory=memory,\n",
" verbose=True\n",
")\n",
"conversation_with_summary.predict(input=\"Hi, my name is Perry, what's up?\")"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "86207a61",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ConversationChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Relevant pieces of previous conversation:\n",
"input: My favorite sport is soccer\n",
"output: ...\n",
"\n",
"(You do not need to use these pieces of information if not relevant)\n",
"\n",
"Current conversation:\n",
"Human: what's my favorite sport?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' You told me earlier that your favorite sport is soccer.'"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Here, the basketball related content is surfaced\n",
"conversation_with_summary.predict(input=\"what's my favorite sport?\")"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "8c669db1",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ConversationChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Relevant pieces of previous conversation:\n",
"input: My favorite food is pizza\n",
"output: thats good to know\n",
"\n",
"(You do not need to use these pieces of information if not relevant)\n",
"\n",
"Current conversation:\n",
"Human: Whats my favorite food\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' You said your favorite food is pizza.'"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Even though the language model is stateless, since relavent memory is fetched, it can \"reason\" about the time.\n",
"# Timestamping memories and data is useful in general to let the agent determine temporal relevance\n",
"conversation_with_summary.predict(input=\"Whats my favorite food\")"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "8c09a239",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ConversationChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Relevant pieces of previous conversation:\n",
"input: Hi, my name is Perry, what's up?\n",
"response: Hi Perry, I'm doing well. How about you?\n",
"\n",
"(You do not need to use these pieces of information if not relevant)\n",
"\n",
"Current conversation:\n",
"Human: What's my name?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' Your name is Perry.'"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The memories from the conversation are automatically stored,\n",
"# since this query best matches the introduction chat above,\n",
"# the agent is able to 'remember' the user's name.\n",
"conversation_with_summary.predict(input=\"What's my name?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "df27c7dc",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -115,7 +115,7 @@
"id": "a2d76826",
"metadata": {},
"source": [
"**The above request should now appear on your [PromptLayer dashboard](https://ww.promptlayer.com).**"
"**The above request should now appear on your [PromptLayer dashboard](https://www.promptlayer.com).**"
]
},
{

View File

@@ -60,14 +60,14 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 30.7 ms, sys: 18.6 ms, total: 49.3 ms\n",
"Wall time: 791 ms\n"
"CPU times: user 14.2 ms, sys: 4.9 ms, total: 19.1 ms\n",
"Wall time: 1.1 s\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nWhy couldn't the bicycle stand up by itself? Because it was...two tired!\""
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 4,
@@ -91,14 +91,14 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 80 µs, sys: 0 ns, total: 80 µs\n",
"Wall time: 83.9 µs\n"
"CPU times: user 162 µs, sys: 7 µs, total: 169 µs\n",
"Wall time: 175 µs\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nWhy couldn't the bicycle stand up by itself? Because it was...two tired!\""
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 5,
@@ -252,6 +252,249 @@
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "markdown",
"id": "684eab55",
"metadata": {},
"source": [
"## GPTCache\n",
"\n",
"We can use [GPTCache](https://github.com/zilliztech/GPTCache) for exact match caching OR to cache results based on semantic similarity\n",
"\n",
"Let's first start with an example of exact match"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "14a82124",
"metadata": {},
"outputs": [],
"source": [
"import gptcache\n",
"from gptcache.processor.pre import get_prompt\n",
"from gptcache.manager.factory import get_data_manager\n",
"from langchain.cache import GPTCache\n",
"\n",
"# Avoid multiple caches using the same file, causing different llm model caches to affect each other\n",
"i = 0\n",
"file_prefix = \"data_map\"\n",
"\n",
"def init_gptcache_map(cache_obj: gptcache.Cache):\n",
" global i\n",
" cache_path = f'{file_prefix}_{i}.txt'\n",
" cache_obj.init(\n",
" pre_embedding_func=get_prompt,\n",
" data_manager=get_data_manager(data_path=cache_path),\n",
" )\n",
" i += 1\n",
"\n",
"langchain.llm_cache = GPTCache(init_gptcache_map)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9e4ecfd1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 8.6 ms, sys: 3.82 ms, total: 12.4 ms\n",
"Wall time: 881 ms\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The first time, it is not yet in cache, so it should take longer\n",
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c98bbe3b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 286 µs, sys: 21 µs, total: 307 µs\n",
"Wall time: 316 µs\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The second time it is, so it goes faster\n",
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "markdown",
"id": "502b6076",
"metadata": {},
"source": [
"Let's now show an example of similarity caching"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b3c663bb",
"metadata": {},
"outputs": [],
"source": [
"import gptcache\n",
"from gptcache.processor.pre import get_prompt\n",
"from gptcache.manager.factory import get_data_manager\n",
"from langchain.cache import GPTCache\n",
"from gptcache.manager import get_data_manager, CacheBase, VectorBase\n",
"from gptcache import Cache\n",
"from gptcache.embedding import Onnx\n",
"from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\n",
"\n",
"# Avoid multiple caches using the same file, causing different llm model caches to affect each other\n",
"i = 0\n",
"file_prefix = \"data_map\"\n",
"llm_cache = Cache()\n",
"\n",
"\n",
"def init_gptcache_map(cache_obj: gptcache.Cache):\n",
" global i\n",
" cache_path = f'{file_prefix}_{i}.txt'\n",
" onnx = Onnx()\n",
" cache_base = CacheBase('sqlite')\n",
" vector_base = VectorBase('faiss', dimension=onnx.dimension)\n",
" data_manager = get_data_manager(cache_base, vector_base, max_size=10, clean_size=2)\n",
" cache_obj.init(\n",
" pre_embedding_func=get_prompt,\n",
" embedding_func=onnx.to_embeddings,\n",
" data_manager=data_manager,\n",
" similarity_evaluation=SearchDistanceEvaluation(),\n",
" )\n",
" i += 1\n",
"\n",
"langchain.llm_cache = GPTCache(init_gptcache_map)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "8c273ced",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.01 s, sys: 153 ms, total: 1.16 s\n",
"Wall time: 2.49 s\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The first time, it is not yet in cache, so it should take longer\n",
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "93e21a5f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 745 ms, sys: 13.2 ms, total: 758 ms\n",
"Wall time: 136 ms\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# This is an exact match, so it finds it in the cache\n",
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c4bb024b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 737 ms, sys: 7.79 ms, total: 745 ms\n",
"Wall time: 135 ms\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# This is not an exact match, but semantically within distance so it hits!\n",
"llm(\"Tell me joke\")"
]
},
{
"cell_type": "markdown",
"id": "934943dc",

View File

@@ -43,22 +43,18 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Total Tokens: 39\n",
"Prompt Tokens: 4\n",
"Completion Tokens: 35\n",
"Tokens Used: 42\n",
"\tPrompt Tokens: 4\n",
"\tCompletion Tokens: 38\n",
"Successful Requests: 1\n",
"Total Cost (USD): $0.0007800000000000001\n"
"Total Cost (USD): $0.00084\n"
]
}
],
"source": [
"with get_openai_callback() as cb:\n",
" result = llm(\"Tell me a joke\")\n",
" print(f\"Total Tokens: {cb.total_tokens}\")\n",
" print(f\"Prompt Tokens: {cb.prompt_tokens}\")\n",
" print(f\"Completion Tokens: {cb.completion_tokens}\")\n",
" print(f\"Successful Requests: {cb.successful_requests}\")\n",
" print(f\"Total Cost (USD): ${cb.total_cost}\")"
" print(cb)"
]
},
{

View File

@@ -186,7 +186,7 @@
"source": [
"**Number of Tokens:** You can also estimate how many tokens a piece of text will be in that model. This is useful because models have a context length (and cost more for more tokens), which means you need to be aware of how long the text you are passing in is.\n",
"\n",
"Notice that by default the tokens are estimated using a HuggingFace tokenizer."
"Notice that by default the tokens are estimated using [tiktoken](https://github.com/openai/tiktoken) (except for legacy version <3.8, where a Hugging Face tokenizer is used)"
]
},
{

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# GPT4all\n",
"# GPT4All\n",
"\n",
"This example goes over how to use LangChain to interact with GPT4All models"
]
@@ -15,7 +15,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install pyllamacpp"
"%pip install pyllamacpp > /dev/null"
]
},
{
@@ -24,8 +24,10 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain import PromptTemplate, LLMChain\n",
"from langchain.llms import GPT4All\n",
"from langchain import PromptTemplate, LLMChain"
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler"
]
},
{
@@ -41,15 +43,70 @@
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Specify Model\n",
"\n",
"To run locally, download a compatible ggml-formatted model. For more info, visit https://github.com/nomic-ai/pyllamacpp\n",
"\n",
"Note that new models are uploaded regularly - check the link above for the most recent `.bin` URL"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You'll need to download a compatible model and convert it to ggml.\n",
"# See: https://github.com/nomic-ai/gpt4all for more information.\n",
"llm = GPT4All(model=\"./models/gpt4all-model.bin\")"
"local_path = './models/gpt4all-lora-quantized-ggml.bin' # replace with your desired local file path"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Uncomment the below block to download a model. You may want to update `url` to a new version."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# import requests\n",
"\n",
"# from pathlib import Path\n",
"# from tqdm import tqdm\n",
"\n",
"# Path(local_path).parent.mkdir(parents=True, exist_ok=True)\n",
"\n",
"# # Example model. Check https://github.com/nomic-ai/pyllamacpp for the latest models.\n",
"# url = 'https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin'\n",
"\n",
"# # send a GET request to the URL to download the file. Stream since it's large\n",
"# response = requests.get(url, stream=True)\n",
"\n",
"# # open the file in binary mode and write the contents of the response to it in chunks\n",
"# # This is a large file, so be prepared to wait.\n",
"# with open(local_path, 'wb') as f:\n",
"# for chunk in tqdm(response.iter_content(chunk_size=8192)):\n",
"# if chunk:\n",
"# f.write(chunk)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Callbacks support token-wise streaming\n",
"callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])\n",
"# Verbose is required to pass to the callback manager\n",
"llm = GPT4All(model=local_path, callback_manager=callback_manager, verbose=True)"
]
},
{
@@ -89,7 +146,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.11.2"
}
},
"nbformat": 4,

View File

@@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 41,
"id": "3acf0069",
"metadata": {},
"outputs": [
@@ -20,7 +20,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"The Seattle Seahawks won the Super Bowl in 2010. Justin Beiber was born in 2010. The final answer: Seattle Seahawks.\n"
"The FIFA World Cup is a football tournament that is played every 4 years. The year 1994 was the 44th FIFA World Cup. The final answer: Brazil.\n"
]
}
],
@@ -33,7 +33,7 @@
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
"llm_chain = LLMChain(prompt=prompt, llm=HuggingFaceHub(repo_id=\"google/flan-t5-xl\", model_kwargs={\"temperature\":0, \"max_length\":64}))\n",
"\n",
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"question = \"Who won the FIFA World Cup in the year 1994? \"\n",
"\n",
"print(llm_chain.run(question))"
]
@@ -41,7 +41,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ae4559c7",
"id": "843a3837",
"metadata": {},
"outputs": [],
"source": []
@@ -63,7 +63,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.8.12"
}
},
"nbformat": 4,

View File

@@ -55,6 +55,12 @@ The following use cases require specific installs and api keys:
- _LlamaCpp_:
- Install requirements with `pip install llama-cpp-python`
- Download model and convert following [llama.cpp instructions](https://github.com/ggerganov/llama.cpp)
- _Milvus_:
- Install requirements with `pip install pymilvus`
- In order to setup a local cluster, take a look [here](https://milvus.io/docs).
- _Zilliz_:
- Install requirements with `pip install pymilvus`
- To get up and running, take a look [here](https://zilliz.com/doc/quick_start).
If you are using the `NLTKTextSplitter` or the `SpacyTextSplitter`, you will also need to install the appropriate models. For example, if you want to use the `SpacyTextSplitter`, you will need to install the `en_core_web_sm` model with `python -m spacy download en_core_web_sm`. Similarly, if you want to use the `NLTKTextSplitter`, you will need to install the `punkt` model with `python -m nltk.downloader punkt`.

View File

@@ -0,0 +1,565 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "517a9fd4",
"metadata": {},
"source": [
"# BabyAGI User Guide\n",
"\n",
"This notebook demonstrates how to implement [BabyAGI](https://github.com/yoheinakajima/babyagi/tree/main) by [Yohei Nakajima](https://twitter.com/yoheinakajima). BabyAGI is an AI agent that can generate and pretend to execute tasks based on a given objective.\n",
"\n",
"This guide will help you understand the components to create your own recursive agents.\n",
"\n",
"Although BabyAGI uses specific vectorstores/model providers (Pinecone, OpenAI), one of the benefits of implementing it with LangChain is that you can easily swap those out for different options. In this implementation we use a FAISS vectorstore (because it runs locally and is free)."
]
},
{
"cell_type": "markdown",
"id": "556af556",
"metadata": {},
"source": [
"## Install and Import Required Modules"
]
},
{
"cell_type": "code",
"execution_count": 116,
"id": "c8a354b6",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from collections import deque\n",
"from typing import Dict, List, Optional, Any\n",
"\n",
"from langchain import LLMChain, OpenAI, PromptTemplate\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.llms import BaseLLM\n",
"from langchain.vectorstores.base import VectorStore\n",
"from pydantic import BaseModel, Field\n",
"from langchain.chains.base import Chain"
]
},
{
"cell_type": "markdown",
"id": "09f70772",
"metadata": {},
"source": [
"## Connect to the Vector Store\n",
"\n",
"Depending on what vectorstore you use, this step may look different."
]
},
{
"cell_type": "code",
"execution_count": 71,
"id": "794045d4",
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import FAISS\n",
"from langchain.docstore import InMemoryDocstore"
]
},
{
"cell_type": "code",
"execution_count": 72,
"id": "6e0305eb",
"metadata": {},
"outputs": [],
"source": [
"# Define your embedding model\n",
"embeddings_model = OpenAIEmbeddings()\n",
"# Initialize the vectorstore as empty\n",
"import faiss\n",
"\n",
"embedding_size = 1536\n",
"index = faiss.IndexFlatL2(embedding_size)\n",
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})"
]
},
{
"cell_type": "markdown",
"id": "0f3b72bf",
"metadata": {},
"source": [
"## Define the Chains\n",
"\n",
"BabyAGI relies on three LLM chains:\n",
"- Task creation chain to select new tasks to add to the list\n",
"- Task prioritization chain to re-prioritize tasks\n",
"- Execution Chain to execute the tasks"
]
},
{
"cell_type": "code",
"execution_count": 73,
"id": "bf4bd5cd",
"metadata": {},
"outputs": [],
"source": [
"class TaskCreationChain(LLMChain):\n",
" \"\"\"Chain to generates tasks.\"\"\"\n",
"\n",
" @classmethod\n",
" def from_llm(cls, llm: BaseLLM, verbose: bool = True) -> LLMChain:\n",
" \"\"\"Get the response parser.\"\"\"\n",
" task_creation_template = (\n",
" \"You are an task creation AI that uses the result of an execution agent\"\n",
" \" to create new tasks with the following objective: {objective},\"\n",
" \" The last completed task has the result: {result}.\"\n",
" \" This result was based on this task description: {task_description}.\"\n",
" \" These are incomplete tasks: {incomplete_tasks}.\"\n",
" \" Based on the result, create new tasks to be completed\"\n",
" \" by the AI system that do not overlap with incomplete tasks.\"\n",
" \" Return the tasks as an array.\"\n",
" )\n",
" prompt = PromptTemplate(\n",
" template=task_creation_template,\n",
" input_variables=[\n",
" \"result\",\n",
" \"task_description\",\n",
" \"incomplete_tasks\",\n",
" \"objective\",\n",
" ],\n",
" )\n",
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
]
},
{
"cell_type": "code",
"execution_count": 74,
"id": "b6488ffe",
"metadata": {},
"outputs": [],
"source": [
"class TaskPrioritizationChain(LLMChain):\n",
" \"\"\"Chain to prioritize tasks.\"\"\"\n",
"\n",
" @classmethod\n",
" def from_llm(cls, llm: BaseLLM, verbose: bool = True) -> LLMChain:\n",
" \"\"\"Get the response parser.\"\"\"\n",
" task_prioritization_template = (\n",
" \"You are an task prioritization AI tasked with cleaning the formatting of and reprioritizing\"\n",
" \" the following tasks: {task_names}.\"\n",
" \" Consider the ultimate objective of your team: {objective}.\"\n",
" \" Do not remove any tasks. Return the result as a numbered list, like:\"\n",
" \" #. First task\"\n",
" \" #. Second task\"\n",
" \" Start the task list with number {next_task_id}.\"\n",
" )\n",
" prompt = PromptTemplate(\n",
" template=task_prioritization_template,\n",
" input_variables=[\"task_names\", \"next_task_id\", \"objective\"],\n",
" )\n",
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "b43cd580",
"metadata": {},
"outputs": [],
"source": [
"class ExecutionChain(LLMChain):\n",
" \"\"\"Chain to execute tasks.\"\"\"\n",
"\n",
" @classmethod\n",
" def from_llm(cls, llm: BaseLLM, verbose: bool = True) -> LLMChain:\n",
" \"\"\"Get the response parser.\"\"\"\n",
" execution_template = (\n",
" \"You are an AI who performs one task based on the following objective: {objective}.\"\n",
" \" Take into account these previously completed tasks: {context}.\"\n",
" \" Your task: {task}.\"\n",
" \" Response:\"\n",
" )\n",
" prompt = PromptTemplate(\n",
" template=execution_template,\n",
" input_variables=[\"objective\", \"context\", \"task\"],\n",
" )\n",
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
]
},
{
"cell_type": "markdown",
"id": "3ad996c5",
"metadata": {},
"source": [
"### Define the BabyAGI Controller\n",
"\n",
"BabyAGI composes the chains defined above in a (potentially-)infinite loop."
]
},
{
"cell_type": "code",
"execution_count": 85,
"id": "0ada0636",
"metadata": {},
"outputs": [],
"source": [
"def get_next_task(\n",
" task_creation_chain: LLMChain,\n",
" result: Dict,\n",
" task_description: str,\n",
" task_list: List[str],\n",
" objective: str,\n",
") -> List[Dict]:\n",
" \"\"\"Get the next task.\"\"\"\n",
" incomplete_tasks = \", \".join(task_list)\n",
" response = task_creation_chain.run(\n",
" result=result,\n",
" task_description=task_description,\n",
" incomplete_tasks=incomplete_tasks,\n",
" objective=objective,\n",
" )\n",
" new_tasks = response.split(\"\\n\")\n",
" return [{\"task_name\": task_name} for task_name in new_tasks if task_name.strip()]"
]
},
{
"cell_type": "code",
"execution_count": 86,
"id": "d35250ad",
"metadata": {},
"outputs": [],
"source": [
"def prioritize_tasks(\n",
" task_prioritization_chain: LLMChain,\n",
" this_task_id: int,\n",
" task_list: List[Dict],\n",
" objective: str,\n",
") -> List[Dict]:\n",
" \"\"\"Prioritize tasks.\"\"\"\n",
" task_names = [t[\"task_name\"] for t in task_list]\n",
" next_task_id = int(this_task_id) + 1\n",
" response = task_prioritization_chain.run(\n",
" task_names=task_names, next_task_id=next_task_id, objective=objective\n",
" )\n",
" new_tasks = response.split(\"\\n\")\n",
" prioritized_task_list = []\n",
" for task_string in new_tasks:\n",
" if not task_string.strip():\n",
" continue\n",
" task_parts = task_string.strip().split(\".\", 1)\n",
" if len(task_parts) == 2:\n",
" task_id = task_parts[0].strip()\n",
" task_name = task_parts[1].strip()\n",
" prioritized_task_list.append({\"task_id\": task_id, \"task_name\": task_name})\n",
" return prioritized_task_list"
]
},
{
"cell_type": "code",
"execution_count": 87,
"id": "e3f1840c",
"metadata": {},
"outputs": [],
"source": [
"def _get_top_tasks(vectorstore, query: str, k: int) -> List[str]:\n",
" \"\"\"Get the top k tasks based on the query.\"\"\"\n",
" results = vectorstore.similarity_search_with_score(query, k=k)\n",
" if not results:\n",
" return []\n",
" sorted_results, _ = zip(*sorted(results, key=lambda x: x[1], reverse=True))\n",
" return [str(item.metadata[\"task\"]) for item in sorted_results]\n",
"\n",
"\n",
"def execute_task(\n",
" vectorstore, execution_chain: LLMChain, objective: str, task: str, k: int = 5\n",
") -> str:\n",
" \"\"\"Execute a task.\"\"\"\n",
" context = _get_top_tasks(vectorstore, query=objective, k=k)\n",
" return execution_chain.run(objective=objective, context=context, task=task)"
]
},
{
"cell_type": "code",
"execution_count": 137,
"id": "1e978938",
"metadata": {},
"outputs": [],
"source": [
"class BabyAGI(Chain, BaseModel):\n",
" \"\"\"Controller model for the BabyAGI agent.\"\"\"\n",
"\n",
" task_list: deque = Field(default_factory=deque)\n",
" task_creation_chain: TaskCreationChain = Field(...)\n",
" task_prioritization_chain: TaskPrioritizationChain = Field(...)\n",
" execution_chain: ExecutionChain = Field(...)\n",
" task_id_counter: int = Field(1)\n",
" vectorstore: VectorStore = Field(init=False)\n",
" max_iterations: Optional[int] = None\n",
"\n",
" class Config:\n",
" \"\"\"Configuration for this pydantic object.\"\"\"\n",
"\n",
" arbitrary_types_allowed = True\n",
"\n",
" def add_task(self, task: Dict):\n",
" self.task_list.append(task)\n",
"\n",
" def print_task_list(self):\n",
" print(\"\\033[95m\\033[1m\" + \"\\n*****TASK LIST*****\\n\" + \"\\033[0m\\033[0m\")\n",
" for t in self.task_list:\n",
" print(str(t[\"task_id\"]) + \": \" + t[\"task_name\"])\n",
"\n",
" def print_next_task(self, task: Dict):\n",
" print(\"\\033[92m\\033[1m\" + \"\\n*****NEXT TASK*****\\n\" + \"\\033[0m\\033[0m\")\n",
" print(str(task[\"task_id\"]) + \": \" + task[\"task_name\"])\n",
"\n",
" def print_task_result(self, result: str):\n",
" print(\"\\033[93m\\033[1m\" + \"\\n*****TASK RESULT*****\\n\" + \"\\033[0m\\033[0m\")\n",
" print(result)\n",
"\n",
" @property\n",
" def input_keys(self) -> List[str]:\n",
" return [\"objective\"]\n",
"\n",
" @property\n",
" def output_keys(self) -> List[str]:\n",
" return []\n",
"\n",
" def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:\n",
" \"\"\"Run the agent.\"\"\"\n",
" objective = inputs[\"objective\"]\n",
" first_task = inputs.get(\"first_task\", \"Make a todo list\")\n",
" self.add_task({\"task_id\": 1, \"task_name\": first_task})\n",
" num_iters = 0\n",
" while True:\n",
" if self.task_list:\n",
" self.print_task_list()\n",
"\n",
" # Step 1: Pull the first task\n",
" task = self.task_list.popleft()\n",
" self.print_next_task(task)\n",
"\n",
" # Step 2: Execute the task\n",
" result = execute_task(\n",
" self.vectorstore, self.execution_chain, objective, task[\"task_name\"]\n",
" )\n",
" this_task_id = int(task[\"task_id\"])\n",
" self.print_task_result(result)\n",
"\n",
" # Step 3: Store the result in Pinecone\n",
" result_id = f\"result_{task['task_id']}\"\n",
" self.vectorstore.add_texts(\n",
" texts=[result],\n",
" metadatas=[{\"task\": task[\"task_name\"]}],\n",
" ids=[result_id],\n",
" )\n",
"\n",
" # Step 4: Create new tasks and reprioritize task list\n",
" new_tasks = get_next_task(\n",
" self.task_creation_chain,\n",
" result,\n",
" task[\"task_name\"],\n",
" [t[\"task_name\"] for t in self.task_list],\n",
" objective,\n",
" )\n",
" for new_task in new_tasks:\n",
" self.task_id_counter += 1\n",
" new_task.update({\"task_id\": self.task_id_counter})\n",
" self.add_task(new_task)\n",
" self.task_list = deque(\n",
" prioritize_tasks(\n",
" self.task_prioritization_chain,\n",
" this_task_id,\n",
" list(self.task_list),\n",
" objective,\n",
" )\n",
" )\n",
" num_iters += 1\n",
" if self.max_iterations is not None and num_iters == self.max_iterations:\n",
" print(\n",
" \"\\033[91m\\033[1m\" + \"\\n*****TASK ENDING*****\\n\" + \"\\033[0m\\033[0m\"\n",
" )\n",
" break\n",
" return {}\n",
"\n",
" @classmethod\n",
" def from_llm(\n",
" cls, llm: BaseLLM, vectorstore: VectorStore, verbose: bool = False, **kwargs\n",
" ) -> \"BabyAGI\":\n",
" \"\"\"Initialize the BabyAGI Controller.\"\"\"\n",
" task_creation_chain = TaskCreationChain.from_llm(llm, verbose=verbose)\n",
" task_prioritization_chain = TaskPrioritizationChain.from_llm(\n",
" llm, verbose=verbose\n",
" )\n",
" execution_chain = ExecutionChain.from_llm(llm, verbose=verbose)\n",
" return cls(\n",
" task_creation_chain=task_creation_chain,\n",
" task_prioritization_chain=task_prioritization_chain,\n",
" execution_chain=execution_chain,\n",
" vectorstore=vectorstore,\n",
" **kwargs,\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "05ba762e",
"metadata": {},
"source": [
"### Run the BabyAGI\n",
"\n",
"Now it's time to create the BabyAGI controller and watch it try to accomplish your objective."
]
},
{
"cell_type": "code",
"execution_count": 138,
"id": "3d220b69",
"metadata": {},
"outputs": [],
"source": [
"OBJECTIVE = \"Write a weather report for SF today\""
]
},
{
"cell_type": "code",
"execution_count": 139,
"id": "8a8e5543",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 140,
"id": "3d69899b",
"metadata": {},
"outputs": [],
"source": [
"# Logging of LLMChains\n",
"verbose = False\n",
"# If None, will keep on going forever\n",
"max_iterations: Optional[int] = 3\n",
"baby_agi = BabyAGI.from_llm(\n",
" llm=llm, vectorstore=vectorstore, verbose=verbose, max_iterations=max_iterations\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 141,
"id": "f7957b51",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[95m\u001b[1m\n",
"*****TASK LIST*****\n",
"\u001b[0m\u001b[0m\n",
"1: Make a todo list\n",
"\u001b[92m\u001b[1m\n",
"*****NEXT TASK*****\n",
"\u001b[0m\u001b[0m\n",
"1: Make a todo list\n",
"\u001b[93m\u001b[1m\n",
"*****TASK RESULT*****\n",
"\u001b[0m\u001b[0m\n",
"\n",
"\n",
"1. Check the temperature range for the day.\n",
"2. Gather temperature data for SF today.\n",
"3. Analyze the temperature data and create a weather report.\n",
"4. Publish the weather report.\n",
"\u001b[95m\u001b[1m\n",
"*****TASK LIST*****\n",
"\u001b[0m\u001b[0m\n",
"2: Gather data on the expected temperature range for the day.\n",
"3: Collect data on the expected precipitation for the day.\n",
"4: Analyze the data and create a weather report.\n",
"5: Check the current weather conditions in SF.\n",
"6: Publish the weather report.\n",
"\u001b[92m\u001b[1m\n",
"*****NEXT TASK*****\n",
"\u001b[0m\u001b[0m\n",
"2: Gather data on the expected temperature range for the day.\n",
"\u001b[93m\u001b[1m\n",
"*****TASK RESULT*****\n",
"\u001b[0m\u001b[0m\n",
"\n",
"\n",
"I have gathered data on the expected temperature range for the day in San Francisco. The forecast is for temperatures to range from a low of 55 degrees Fahrenheit to a high of 68 degrees Fahrenheit.\n",
"\u001b[95m\u001b[1m\n",
"*****TASK LIST*****\n",
"\u001b[0m\u001b[0m\n",
"3: Check the current weather conditions in SF.\n",
"4: Calculate the average temperature for the day in San Francisco.\n",
"5: Determine the probability of precipitation for the day in San Francisco.\n",
"6: Identify any potential weather warnings or advisories for the day in San Francisco.\n",
"7: Research any historical weather patterns for the day in San Francisco.\n",
"8: Compare the expected temperature range to the historical average for the day in San Francisco.\n",
"9: Collect data on the expected precipitation for the day.\n",
"10: Analyze the data and create a weather report.\n",
"11: Publish the weather report.\n",
"\u001b[92m\u001b[1m\n",
"*****NEXT TASK*****\n",
"\u001b[0m\u001b[0m\n",
"3: Check the current weather conditions in SF.\n",
"\u001b[93m\u001b[1m\n",
"*****TASK RESULT*****\n",
"\u001b[0m\u001b[0m\n",
"\n",
"\n",
"I am checking the current weather conditions in SF. According to the data I have gathered, the temperature in SF today is currently around 65 degrees Fahrenheit with clear skies. The temperature range for the day is expected to be between 60 and 70 degrees Fahrenheit.\n",
"\u001b[91m\u001b[1m\n",
"*****TASK ENDING*****\n",
"\u001b[0m\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'objective': 'Write a weather report for SF today'}"
]
},
"execution_count": 141,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"baby_agi({\"objective\": OBJECTIVE})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "898a210b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,647 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "517a9fd4",
"metadata": {},
"source": [
"# BabyAGI with Tools\n",
"\n",
"This notebook builds on top of [baby agi](baby_agi.ipynb), but shows how you can swap out the execution chain. The previous execution chain was just an LLM which made stuff up. By swapping it out with an agent that has access to tools, we can hopefully get real reliable information"
]
},
{
"cell_type": "markdown",
"id": "556af556",
"metadata": {},
"source": [
"## Install and Import Required Modules"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c8a354b6",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from collections import deque\n",
"from typing import Dict, List, Optional, Any\n",
"\n",
"from langchain import LLMChain, OpenAI, PromptTemplate\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.llms import BaseLLM\n",
"from langchain.vectorstores.base import VectorStore\n",
"from pydantic import BaseModel, Field\n",
"from langchain.chains.base import Chain"
]
},
{
"cell_type": "markdown",
"id": "09f70772",
"metadata": {},
"source": [
"## Connect to the Vector Store\n",
"\n",
"Depending on what vectorstore you use, this step may look different."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "794045d4",
"metadata": {},
"outputs": [],
"source": [
"%pip install faiss-cpu > /dev/null\n",
"%pip install google-search-results > /dev/null\n",
"from langchain.vectorstores import FAISS\n",
"from langchain.docstore import InMemoryDocstore"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "6e0305eb",
"metadata": {},
"outputs": [],
"source": [
"# Define your embedding model\n",
"embeddings_model = OpenAIEmbeddings()\n",
"# Initialize the vectorstore as empty\n",
"import faiss\n",
"\n",
"embedding_size = 1536\n",
"index = faiss.IndexFlatL2(embedding_size)\n",
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})"
]
},
{
"cell_type": "markdown",
"id": "0f3b72bf",
"metadata": {},
"source": [
"## Define the Chains\n",
"\n",
"BabyAGI relies on three LLM chains:\n",
"- Task creation chain to select new tasks to add to the list\n",
"- Task prioritization chain to re-prioritize tasks\n",
"- Execution Chain to execute the tasks\n",
"\n",
"\n",
"NOTE: in this notebook, the Execution chain will now be an agent."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "bf4bd5cd",
"metadata": {},
"outputs": [],
"source": [
"class TaskCreationChain(LLMChain):\n",
" \"\"\"Chain to generates tasks.\"\"\"\n",
"\n",
" @classmethod\n",
" def from_llm(cls, llm: BaseLLM, verbose: bool = True) -> LLMChain:\n",
" \"\"\"Get the response parser.\"\"\"\n",
" task_creation_template = (\n",
" \"You are an task creation AI that uses the result of an execution agent\"\n",
" \" to create new tasks with the following objective: {objective},\"\n",
" \" The last completed task has the result: {result}.\"\n",
" \" This result was based on this task description: {task_description}.\"\n",
" \" These are incomplete tasks: {incomplete_tasks}.\"\n",
" \" Based on the result, create new tasks to be completed\"\n",
" \" by the AI system that do not overlap with incomplete tasks.\"\n",
" \" Return the tasks as an array.\"\n",
" )\n",
" prompt = PromptTemplate(\n",
" template=task_creation_template,\n",
" input_variables=[\n",
" \"result\",\n",
" \"task_description\",\n",
" \"incomplete_tasks\",\n",
" \"objective\",\n",
" ],\n",
" )\n",
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b6488ffe",
"metadata": {},
"outputs": [],
"source": [
"class TaskPrioritizationChain(LLMChain):\n",
" \"\"\"Chain to prioritize tasks.\"\"\"\n",
"\n",
" @classmethod\n",
" def from_llm(cls, llm: BaseLLM, verbose: bool = True) -> LLMChain:\n",
" \"\"\"Get the response parser.\"\"\"\n",
" task_prioritization_template = (\n",
" \"You are an task prioritization AI tasked with cleaning the formatting of and reprioritizing\"\n",
" \" the following tasks: {task_names}.\"\n",
" \" Consider the ultimate objective of your team: {objective}.\"\n",
" \" Do not remove any tasks. Return the result as a numbered list, like:\"\n",
" \" #. First task\"\n",
" \" #. Second task\"\n",
" \" Start the task list with number {next_task_id}.\"\n",
" )\n",
" prompt = PromptTemplate(\n",
" template=task_prioritization_template,\n",
" input_variables=[\"task_names\", \"next_task_id\", \"objective\"],\n",
" )\n",
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "b43cd580",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import ZeroShotAgent, Tool, AgentExecutor\n",
"from langchain import OpenAI, SerpAPIWrapper, LLMChain\n",
"\n",
"todo_prompt = PromptTemplate.from_template(\n",
" \"You are a planner who is an expert at coming up with a todo list for a given objective. Come up with a todo list for this objective: {objective}\"\n",
")\n",
"todo_chain = LLMChain(llm=OpenAI(temperature=0), prompt=todo_prompt)\n",
"search = SerpAPIWrapper()\n",
"tools = [\n",
" Tool(\n",
" name=\"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events\",\n",
" ),\n",
" Tool(\n",
" name=\"TODO\",\n",
" func=todo_chain.run,\n",
" description=\"useful for when you need to come up with todo lists. Input: an objective to create a todo list for. Output: a todo list for that objective. Please be very clear what the objective is!\",\n",
" ),\n",
"]\n",
"\n",
"\n",
"prefix = \"\"\"You are an AI who performs one task based on the following objective: {objective}. Take into account these previously completed tasks: {context}.\"\"\"\n",
"suffix = \"\"\"Question: {task}\n",
"{agent_scratchpad}\"\"\"\n",
"prompt = ZeroShotAgent.create_prompt(\n",
" tools,\n",
" prefix=prefix,\n",
" suffix=suffix,\n",
" input_variables=[\"objective\", \"task\", \"context\", \"agent_scratchpad\"],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "3ad996c5",
"metadata": {},
"source": [
"### Define the BabyAGI Controller\n",
"\n",
"BabyAGI composes the chains defined above in a (potentially-)infinite loop."
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "0ada0636",
"metadata": {},
"outputs": [],
"source": [
"def get_next_task(\n",
" task_creation_chain: LLMChain,\n",
" result: Dict,\n",
" task_description: str,\n",
" task_list: List[str],\n",
" objective: str,\n",
") -> List[Dict]:\n",
" \"\"\"Get the next task.\"\"\"\n",
" incomplete_tasks = \", \".join(task_list)\n",
" response = task_creation_chain.run(\n",
" result=result,\n",
" task_description=task_description,\n",
" incomplete_tasks=incomplete_tasks,\n",
" objective=objective,\n",
" )\n",
" new_tasks = response.split(\"\\n\")\n",
" return [{\"task_name\": task_name} for task_name in new_tasks if task_name.strip()]"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "d35250ad",
"metadata": {},
"outputs": [],
"source": [
"def prioritize_tasks(\n",
" task_prioritization_chain: LLMChain,\n",
" this_task_id: int,\n",
" task_list: List[Dict],\n",
" objective: str,\n",
") -> List[Dict]:\n",
" \"\"\"Prioritize tasks.\"\"\"\n",
" task_names = [t[\"task_name\"] for t in task_list]\n",
" next_task_id = int(this_task_id) + 1\n",
" response = task_prioritization_chain.run(\n",
" task_names=task_names, next_task_id=next_task_id, objective=objective\n",
" )\n",
" new_tasks = response.split(\"\\n\")\n",
" prioritized_task_list = []\n",
" for task_string in new_tasks:\n",
" if not task_string.strip():\n",
" continue\n",
" task_parts = task_string.strip().split(\".\", 1)\n",
" if len(task_parts) == 2:\n",
" task_id = task_parts[0].strip()\n",
" task_name = task_parts[1].strip()\n",
" prioritized_task_list.append({\"task_id\": task_id, \"task_name\": task_name})\n",
" return prioritized_task_list"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "e3f1840c",
"metadata": {},
"outputs": [],
"source": [
"def _get_top_tasks(vectorstore, query: str, k: int) -> List[str]:\n",
" \"\"\"Get the top k tasks based on the query.\"\"\"\n",
" results = vectorstore.similarity_search_with_score(query, k=k)\n",
" if not results:\n",
" return []\n",
" sorted_results, _ = zip(*sorted(results, key=lambda x: x[1], reverse=True))\n",
" return [str(item.metadata[\"task\"]) for item in sorted_results]\n",
"\n",
"\n",
"def execute_task(\n",
" vectorstore, execution_chain: LLMChain, objective: str, task: str, k: int = 5\n",
") -> str:\n",
" \"\"\"Execute a task.\"\"\"\n",
" context = _get_top_tasks(vectorstore, query=objective, k=k)\n",
" return execution_chain.run(objective=objective, context=context, task=task)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "1e978938",
"metadata": {},
"outputs": [],
"source": [
"class BabyAGI(Chain, BaseModel):\n",
" \"\"\"Controller model for the BabyAGI agent.\"\"\"\n",
"\n",
" task_list: deque = Field(default_factory=deque)\n",
" task_creation_chain: TaskCreationChain = Field(...)\n",
" task_prioritization_chain: TaskPrioritizationChain = Field(...)\n",
" execution_chain: AgentExecutor = Field(...)\n",
" task_id_counter: int = Field(1)\n",
" vectorstore: VectorStore = Field(init=False)\n",
" max_iterations: Optional[int] = None\n",
"\n",
" class Config:\n",
" \"\"\"Configuration for this pydantic object.\"\"\"\n",
"\n",
" arbitrary_types_allowed = True\n",
"\n",
" def add_task(self, task: Dict):\n",
" self.task_list.append(task)\n",
"\n",
" def print_task_list(self):\n",
" print(\"\\033[95m\\033[1m\" + \"\\n*****TASK LIST*****\\n\" + \"\\033[0m\\033[0m\")\n",
" for t in self.task_list:\n",
" print(str(t[\"task_id\"]) + \": \" + t[\"task_name\"])\n",
"\n",
" def print_next_task(self, task: Dict):\n",
" print(\"\\033[92m\\033[1m\" + \"\\n*****NEXT TASK*****\\n\" + \"\\033[0m\\033[0m\")\n",
" print(str(task[\"task_id\"]) + \": \" + task[\"task_name\"])\n",
"\n",
" def print_task_result(self, result: str):\n",
" print(\"\\033[93m\\033[1m\" + \"\\n*****TASK RESULT*****\\n\" + \"\\033[0m\\033[0m\")\n",
" print(result)\n",
"\n",
" @property\n",
" def input_keys(self) -> List[str]:\n",
" return [\"objective\"]\n",
"\n",
" @property\n",
" def output_keys(self) -> List[str]:\n",
" return []\n",
"\n",
" def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:\n",
" \"\"\"Run the agent.\"\"\"\n",
" objective = inputs[\"objective\"]\n",
" first_task = inputs.get(\"first_task\", \"Make a todo list\")\n",
" self.add_task({\"task_id\": 1, \"task_name\": first_task})\n",
" num_iters = 0\n",
" while True:\n",
" if self.task_list:\n",
" self.print_task_list()\n",
"\n",
" # Step 1: Pull the first task\n",
" task = self.task_list.popleft()\n",
" self.print_next_task(task)\n",
"\n",
" # Step 2: Execute the task\n",
" result = execute_task(\n",
" self.vectorstore, self.execution_chain, objective, task[\"task_name\"]\n",
" )\n",
" this_task_id = int(task[\"task_id\"])\n",
" self.print_task_result(result)\n",
"\n",
" # Step 3: Store the result in Pinecone\n",
" result_id = f\"result_{task['task_id']}\"\n",
" self.vectorstore.add_texts(\n",
" texts=[result],\n",
" metadatas=[{\"task\": task[\"task_name\"]}],\n",
" ids=[result_id],\n",
" )\n",
"\n",
" # Step 4: Create new tasks and reprioritize task list\n",
" new_tasks = get_next_task(\n",
" self.task_creation_chain,\n",
" result,\n",
" task[\"task_name\"],\n",
" [t[\"task_name\"] for t in self.task_list],\n",
" objective,\n",
" )\n",
" for new_task in new_tasks:\n",
" self.task_id_counter += 1\n",
" new_task.update({\"task_id\": self.task_id_counter})\n",
" self.add_task(new_task)\n",
" self.task_list = deque(\n",
" prioritize_tasks(\n",
" self.task_prioritization_chain,\n",
" this_task_id,\n",
" list(self.task_list),\n",
" objective,\n",
" )\n",
" )\n",
" num_iters += 1\n",
" if self.max_iterations is not None and num_iters == self.max_iterations:\n",
" print(\n",
" \"\\033[91m\\033[1m\" + \"\\n*****TASK ENDING*****\\n\" + \"\\033[0m\\033[0m\"\n",
" )\n",
" break\n",
" return {}\n",
"\n",
" @classmethod\n",
" def from_llm(\n",
" cls, llm: BaseLLM, vectorstore: VectorStore, verbose: bool = False, **kwargs\n",
" ) -> \"BabyAGI\":\n",
" \"\"\"Initialize the BabyAGI Controller.\"\"\"\n",
" task_creation_chain = TaskCreationChain.from_llm(llm, verbose=verbose)\n",
" task_prioritization_chain = TaskPrioritizationChain.from_llm(\n",
" llm, verbose=verbose\n",
" )\n",
" llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
" tool_names = [tool.name for tool in tools]\n",
" agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)\n",
" agent_executor = AgentExecutor.from_agent_and_tools(\n",
" agent=agent, tools=tools, verbose=True\n",
" )\n",
" return cls(\n",
" task_creation_chain=task_creation_chain,\n",
" task_prioritization_chain=task_prioritization_chain,\n",
" execution_chain=agent_executor,\n",
" vectorstore=vectorstore,\n",
" **kwargs,\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "05ba762e",
"metadata": {},
"source": [
"### Run the BabyAGI\n",
"\n",
"Now it's time to create the BabyAGI controller and watch it try to accomplish your objective."
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "3d220b69",
"metadata": {},
"outputs": [],
"source": [
"OBJECTIVE = \"Write a weather report for SF today\""
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "8a8e5543",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "3d69899b",
"metadata": {},
"outputs": [],
"source": [
"# Logging of LLMChains\n",
"verbose = False\n",
"# If None, will keep on going forever\n",
"max_iterations: Optional[int] = 3\n",
"baby_agi = BabyAGI.from_llm(\n",
" llm=llm, vectorstore=vectorstore, verbose=verbose, max_iterations=max_iterations\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "f7957b51",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[95m\u001b[1m\n",
"*****TASK LIST*****\n",
"\u001b[0m\u001b[0m\n",
"1: Make a todo list\n",
"\u001b[92m\u001b[1m\n",
"*****NEXT TASK*****\n",
"\u001b[0m\u001b[0m\n",
"1: Make a todo list\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to gather data on the current weather conditions in SF\n",
"Action: Search\n",
"Action Input: Current weather conditions in SF\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mHigh 67F. Winds WNW at 10 to 15 mph. Clear to partly cloudy.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to make a todo list\n",
"Action: TODO\n",
"Action Input: Write a weather report for SF today\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m\n",
"\n",
"1. Research current weather conditions in San Francisco\n",
"2. Gather data on temperature, humidity, wind speed, and other relevant weather conditions\n",
"3. Analyze data to determine current weather trends\n",
"4. Write a brief introduction to the weather report\n",
"5. Describe current weather conditions in San Francisco\n",
"6. Discuss any upcoming weather changes\n",
"7. Summarize the weather report\n",
"8. Proofread and edit the report\n",
"9. Submit the report\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: A weather report for SF today should include research on current weather conditions in San Francisco, gathering data on temperature, humidity, wind speed, and other relevant weather conditions, analyzing data to determine current weather trends, writing a brief introduction to the weather report, describing current weather conditions in San Francisco, discussing any upcoming weather changes, summarizing the weather report, proofreading and editing the report, and submitting the report.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[93m\u001b[1m\n",
"*****TASK RESULT*****\n",
"\u001b[0m\u001b[0m\n",
"A weather report for SF today should include research on current weather conditions in San Francisco, gathering data on temperature, humidity, wind speed, and other relevant weather conditions, analyzing data to determine current weather trends, writing a brief introduction to the weather report, describing current weather conditions in San Francisco, discussing any upcoming weather changes, summarizing the weather report, proofreading and editing the report, and submitting the report.\n",
"\u001b[95m\u001b[1m\n",
"*****TASK LIST*****\n",
"\u001b[0m\u001b[0m\n",
"2: Gather data on temperature, humidity, wind speed, and other relevant weather conditions\n",
"3: Analyze data to determine current weather trends\n",
"4: Write a brief introduction to the weather report\n",
"5: Describe current weather conditions in San Francisco\n",
"6: Discuss any upcoming weather changes\n",
"7: Summarize the weather report\n",
"8: Proofread and edit the report\n",
"9: Submit the report\n",
"1: Research current weather conditions in San Francisco\n",
"\u001b[92m\u001b[1m\n",
"*****NEXT TASK*****\n",
"\u001b[0m\u001b[0m\n",
"2: Gather data on temperature, humidity, wind speed, and other relevant weather conditions\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to search for the current weather conditions in SF\n",
"Action: Search\n",
"Action Input: Current weather conditions in SF\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mHigh 67F. Winds WNW at 10 to 15 mph. Clear to partly cloudy.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to make a todo list\n",
"Action: TODO\n",
"Action Input: Create a weather report for SF today\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m\n",
"\n",
"1. Gather current weather data for SF, including temperature, wind speed, humidity, and precipitation.\n",
"2. Research historical weather data for SF to compare current conditions.\n",
"3. Analyze current and historical data to determine any trends or patterns.\n",
"4. Create a visual representation of the data, such as a graph or chart.\n",
"5. Write a summary of the weather report, including key findings and any relevant information.\n",
"6. Publish the weather report on a website or other platform.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Today in San Francisco, the temperature is 67F with winds WNW at 10 to 15 mph. The sky is clear to partly cloudy.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[93m\u001b[1m\n",
"*****TASK RESULT*****\n",
"\u001b[0m\u001b[0m\n",
"Today in San Francisco, the temperature is 67F with winds WNW at 10 to 15 mph. The sky is clear to partly cloudy.\n",
"\u001b[95m\u001b[1m\n",
"*****TASK LIST*****\n",
"\u001b[0m\u001b[0m\n",
"3: Research current weather conditions in San Francisco\n",
"4: Compare the current weather conditions in San Francisco to the average for this time of year.\n",
"5: Identify any potential weather-related hazards in the area.\n",
"6: Research any historical weather patterns in San Francisco.\n",
"7: Analyze data to determine current weather trends\n",
"8: Include any relevant data from nearby cities in the report.\n",
"9: Include any relevant data from the National Weather Service in the report.\n",
"10: Include any relevant data from local news sources in the report.\n",
"11: Include any relevant data from online weather sources in the report.\n",
"12: Include any relevant data from local meteorologists in the report.\n",
"13: Include any relevant data from local weather stations in the report.\n",
"14: Include any relevant data from satellite images in the report.\n",
"15: Describe current weather conditions in San Francisco\n",
"16: Discuss any upcoming weather changes\n",
"17: Write a brief introduction to the weather report\n",
"18: Summarize the weather report\n",
"19: Proofread and edit the report\n",
"20: Submit the report\n",
"\u001b[92m\u001b[1m\n",
"*****NEXT TASK*****\n",
"\u001b[0m\u001b[0m\n",
"3: Research current weather conditions in San Francisco\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to search for current weather conditions in San Francisco\n",
"Action: Search\n",
"Action Input: Current weather conditions in San Francisco\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mTodaySun 04/09 High 67 · 1% Precip. ; TonightSun 04/09 Low 49 · 9% Precip. ; TomorrowMon 04/10 High 64 · 11% Precip.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Today in San Francisco, the high temperature is 67 degrees with 1% chance of precipitation. The low temperature tonight is 49 degrees with 9% chance of precipitation. Tomorrow's high temperature is 64 degrees with 11% chance of precipitation.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[93m\u001b[1m\n",
"*****TASK RESULT*****\n",
"\u001b[0m\u001b[0m\n",
"Today in San Francisco, the high temperature is 67 degrees with 1% chance of precipitation. The low temperature tonight is 49 degrees with 9% chance of precipitation. Tomorrow's high temperature is 64 degrees with 11% chance of precipitation.\n",
"\u001b[91m\u001b[1m\n",
"*****TASK ENDING*****\n",
"\u001b[0m\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'objective': 'Write a weather report for SF today'}"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"baby_agi({\"objective\": OBJECTIVE})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "898a210b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,693 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# CAMEL Role-Playing Autonomous Cooperative Agents\n",
"\n",
"This is a langchain implementation of paper: \"CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society\".\n",
"\n",
"Overview:\n",
"\n",
"The rapid advancement of conversational and chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents and provide insight into their \"cognitive\" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of chat agents, providing a valuable resource for investigating conversational language models. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond.\n",
"\n",
"The original implementation: https://github.com/lightaime/camel\n",
"\n",
"Project website: https://www.camel-ai.org/\n",
"\n",
"Arxiv paper: https://arxiv.org/abs/2303.17760\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import LangChain related modules "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from typing import List\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts.chat import (\n",
" SystemMessagePromptTemplate,\n",
" HumanMessagePromptTemplate,\n",
")\n",
"from langchain.schema import (\n",
" AIMessage,\n",
" HumanMessage,\n",
" SystemMessage,\n",
" BaseMessage,\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define a CAMEL agent helper class"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"class CAMELAgent:\n",
"\n",
" def __init__(\n",
" self,\n",
" system_message: SystemMessage,\n",
" model: ChatOpenAI,\n",
" ) -> None:\n",
" self.system_message = system_message\n",
" self.model = model\n",
" self.init_messages()\n",
"\n",
" def reset(self) -> None:\n",
" self.init_messages()\n",
" return self.stored_messages\n",
"\n",
" def init_messages(self) -> None:\n",
" self.stored_messages = [self.system_message]\n",
"\n",
" def update_messages(self, message: BaseMessage) -> List[BaseMessage]:\n",
" self.stored_messages.append(message)\n",
" return self.stored_messages\n",
"\n",
" def step(\n",
" self,\n",
" input_message: HumanMessage,\n",
" ) -> AIMessage:\n",
" messages = self.update_messages(input_message)\n",
"\n",
" output_message = self.model(messages)\n",
" self.update_messages(output_message)\n",
"\n",
" return output_message\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup OpenAI API key and roles and task for role-playing"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
"\n",
"assistant_role_name = \"Python Programmer\"\n",
"user_role_name = \"Stock Trader\"\n",
"task = \"Develop a trading bot for the stock market\"\n",
"word_limit = 50 # word limit for task brainstorming"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a task specify agent for brainstorming and get the specified task"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Specified task: Develop a Python-based swing trading bot that scans market trends, monitors stocks, and generates trading signals to help a stock trader to place optimal buy and sell orders with defined stop losses and profit targets.\n"
]
}
],
"source": [
"task_specifier_sys_msg = SystemMessage(content=\"You can make a task more specific.\")\n",
"task_specifier_prompt = (\n",
"\"\"\"Here is a task that {assistant_role_name} will help {user_role_name} to complete: {task}.\n",
"Please make it more specific. Be creative and imaginative.\n",
"Please reply with the specified task in {word_limit} words or less. Do not add anything else.\"\"\"\n",
")\n",
"task_specifier_template = HumanMessagePromptTemplate.from_template(template=task_specifier_prompt)\n",
"task_specify_agent = CAMELAgent(task_specifier_sys_msg, ChatOpenAI(temperature=1.0))\n",
"task_specifier_msg = task_specifier_template.format_messages(assistant_role_name=assistant_role_name,\n",
" user_role_name=user_role_name,\n",
" task=task, word_limit=word_limit)[0]\n",
"specified_task_msg = task_specify_agent.step(task_specifier_msg)\n",
"print(f\"Specified task: {specified_task_msg.content}\")\n",
"specified_task = specified_task_msg.content"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create inception prompts for AI assistant and AI user for role-playing"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"assistant_inception_prompt = (\n",
"\"\"\"Never forget you are a {assistant_role_name} and I am a {user_role_name}. Never flip roles! Never instruct me!\n",
"We share a common interest in collaborating to successfully complete a task.\n",
"You must help me to complete the task.\n",
"Here is the task: {task}. Never forget our task!\n",
"I must instruct you based on your expertise and my needs to complete the task.\n",
"\n",
"I must give you one instruction at a time.\n",
"You must write a specific solution that appropriately completes the requested instruction.\n",
"You must decline my instruction honestly if you cannot perform the instruction due to physical, moral, legal reasons or your capability and explain the reasons.\n",
"Do not add anything else other than your solution to my instruction.\n",
"You are never supposed to ask me any questions you only answer questions.\n",
"You are never supposed to reply with a flake solution. Explain your solutions.\n",
"Your solution must be declarative sentences and simple present tense.\n",
"Unless I say the task is completed, you should always start with:\n",
"\n",
"Solution: <YOUR_SOLUTION>\n",
"\n",
"<YOUR_SOLUTION> should be specific and provide preferable implementations and examples for task-solving.\n",
"Always end <YOUR_SOLUTION> with: Next request.\"\"\"\n",
")\n",
"\n",
"user_inception_prompt = (\n",
"\"\"\"Never forget you are a {user_role_name} and I am a {assistant_role_name}. Never flip roles! You will always instruct me.\n",
"We share a common interest in collaborating to successfully complete a task.\n",
"I must help you to complete the task.\n",
"Here is the task: {task}. Never forget our task!\n",
"You must instruct me based on my expertise and your needs to complete the task ONLY in the following two ways:\n",
"\n",
"1. Instruct with a necessary input:\n",
"Instruction: <YOUR_INSTRUCTION>\n",
"Input: <YOUR_INPUT>\n",
"\n",
"2. Instruct without any input:\n",
"Instruction: <YOUR_INSTRUCTION>\n",
"Input: None\n",
"\n",
"The \"Instruction\" describes a task or question. The paired \"Input\" provides further context or information for the requested \"Instruction\".\n",
"\n",
"You must give me one instruction at a time.\n",
"I must write a response that appropriately completes the requested instruction.\n",
"I must decline your instruction honestly if I cannot perform the instruction due to physical, moral, legal reasons or my capability and explain the reasons.\n",
"You should instruct me not ask me questions.\n",
"Now you must start to instruct me using the two ways described above.\n",
"Do not add anything else other than your instruction and the optional corresponding input!\n",
"Keep giving me instructions and necessary inputs until you think the task is completed.\n",
"When the task is completed, you must only reply with a single word <CAMEL_TASK_DONE>.\n",
"Never say <CAMEL_TASK_DONE> unless my responses have solved your task.\"\"\"\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a helper helper to get system messages for AI assistant and AI user from role names and the task"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def get_sys_msgs(assistant_role_name: str, user_role_name: str, task: str):\n",
" \n",
" assistant_sys_template = SystemMessagePromptTemplate.from_template(template=assistant_inception_prompt)\n",
" assistant_sys_msg = assistant_sys_template.format_messages(assistant_role_name=assistant_role_name, user_role_name=user_role_name, task=task)[0]\n",
" \n",
" user_sys_template = SystemMessagePromptTemplate.from_template(template=user_inception_prompt)\n",
" user_sys_msg = user_sys_template.format_messages(assistant_role_name=assistant_role_name, user_role_name=user_role_name, task=task)[0]\n",
" \n",
" return assistant_sys_msg, user_sys_msg"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create AI assistant agent and AI user agent from obtained system messages"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"assistant_sys_msg, user_sys_msg = get_sys_msgs(assistant_role_name, user_role_name, specified_task)\n",
"assistant_agent = CAMELAgent(assistant_sys_msg, ChatOpenAI(temperature=0.2))\n",
"user_agent = CAMELAgent(user_sys_msg, ChatOpenAI(temperature=0.2))\n",
"\n",
"# Reset agents\n",
"assistant_agent.reset()\n",
"user_agent.reset()\n",
"\n",
"# Initialize chats \n",
"assistant_msg = HumanMessage(\n",
" content=(f\"{user_sys_msg.content}. \"\n",
" \"Now start to give me introductions one by one. \"\n",
" \"Only reply with Instruction and Input.\"))\n",
"\n",
"user_msg = HumanMessage(content=f\"{assistant_sys_msg.content}\")\n",
"user_msg = assistant_agent.step(user_msg)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Start role-playing session to solve the task!"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Original task prompt:\n",
"Develop a trading bot for the stock market\n",
"\n",
"Specified task prompt:\n",
"Develop a Python-based swing trading bot that scans market trends, monitors stocks, and generates trading signals to help a stock trader to place optimal buy and sell orders with defined stop losses and profit targets.\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Install the necessary Python libraries for data analysis and trading.\n",
"Input: None\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can install the necessary Python libraries using pip, a package installer for Python. We can install pandas, numpy, matplotlib, and ta-lib for data analysis and trading. We can use the following command to install these libraries:\n",
"\n",
"```\n",
"pip install pandas numpy matplotlib ta-lib\n",
"```\n",
"\n",
"Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Import the necessary libraries in the Python script.\n",
"Input: None\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can import the necessary libraries in the Python script using the import statement. We need to import pandas, numpy, matplotlib, and ta-lib for data analysis and trading. We can use the following code to import these libraries:\n",
"\n",
"```\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import talib as ta\n",
"```\n",
"\n",
"Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Load historical stock data into a pandas DataFrame.\n",
"Input: The path to the CSV file containing the historical stock data.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can load historical stock data into a pandas DataFrame using the `read_csv()` function from pandas. We need to pass the path to the CSV file containing the historical stock data as an argument to this function. We can use the following code to load the historical stock data:\n",
"\n",
"```\n",
"df = pd.read_csv('path/to/csv/file.csv')\n",
"```\n",
"\n",
"This will load the historical stock data into a pandas DataFrame called `df`. Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Preprocess the historical stock data by setting the date column as the index and sorting the DataFrame in ascending order by date.\n",
"Input: None.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can preprocess the historical stock data by setting the date column as the index and sorting the DataFrame in ascending order by date using the `set_index()` and `sort_index()` functions from pandas. We can use the following code to preprocess the historical stock data:\n",
"\n",
"```\n",
"df = df.set_index('date')\n",
"df = df.sort_index(ascending=True)\n",
"```\n",
"\n",
"This will set the date column as the index and sort the DataFrame in ascending order by date. Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Calculate the short-term and long-term moving averages for the stock data using the `ta.SMA()` function from ta-lib.\n",
"Input: The period for the short-term moving average and the period for the long-term moving average.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can calculate the short-term and long-term moving averages for the stock data using the `ta.SMA()` function from ta-lib. We need to pass the period for the short-term moving average and the period for the long-term moving average as arguments to this function. We can use the following code to calculate the short-term and long-term moving averages:\n",
"\n",
"```\n",
"short_ma = ta.SMA(df['close'], timeperiod=short_period)\n",
"long_ma = ta.SMA(df['close'], timeperiod=long_period)\n",
"```\n",
"\n",
"This will calculate the short-term and long-term moving averages for the stock data and store them in the `short_ma` and `long_ma` variables, respectively. Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Create a new DataFrame that combines the historical stock data with the short-term and long-term moving averages.\n",
"Input: None.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can create a new DataFrame that combines the historical stock data with the short-term and long-term moving averages using the `concat()` function from pandas. We need to pass the historical stock data, the short-term moving average, and the long-term moving average as arguments to this function. We can use the following code to create the new DataFrame:\n",
"\n",
"```\n",
"new_df = pd.concat([df, short_ma, long_ma], axis=1)\n",
"new_df.columns = ['open', 'high', 'low', 'close', 'volume', 'short_ma', 'long_ma']\n",
"```\n",
"\n",
"This will create a new DataFrame called `new_df` that combines the historical stock data with the short-term and long-term moving averages. The columns of the new DataFrame are named 'open', 'high', 'low', 'close', 'volume', 'short_ma', and 'long_ma'. Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Create a new column in the DataFrame that indicates when to buy or sell the stock based on the crossover of the short-term and long-term moving averages.\n",
"Input: None.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can create a new column in the DataFrame that indicates when to buy or sell the stock based on the crossover of the short-term and long-term moving averages. We can use the following code to create the new column:\n",
"\n",
"```\n",
"new_df['signal'] = np.where(new_df['short_ma'] > new_df['long_ma'], 1, -1)\n",
"```\n",
"\n",
"This will create a new column called 'signal' in the DataFrame that indicates when to buy or sell the stock based on the crossover of the short-term and long-term moving averages. If the short-term moving average is greater than the long-term moving average, the signal is 1 (buy), otherwise the signal is -1 (sell). Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Create a new column in the DataFrame that indicates the profit or loss for each trade based on the buy and sell signals and the defined stop loss and profit target.\n",
"Input: The stop loss and profit target as percentages.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can create a new column in the DataFrame that indicates the profit or loss for each trade based on the buy and sell signals and the defined stop loss and profit target. We need to pass the stop loss and profit target as percentages as arguments to this function. We can use the following code to create the new column:\n",
"\n",
"```\n",
"stop_loss = stop_loss_percent / 100\n",
"profit_target = profit_target_percent / 100\n",
"\n",
"new_df['pnl'] = 0.0\n",
"buy_price = 0.0\n",
"for i in range(1, len(new_df)):\n",
" if new_df['signal'][i] == 1 and new_df['signal'][i-1] == -1:\n",
" buy_price = new_df['close'][i]\n",
" elif new_df['signal'][i] == -1 and new_df['signal'][i-1] == 1:\n",
" sell_price = new_df['close'][i]\n",
" if sell_price <= buy_price * (1 - stop_loss):\n",
" new_df['pnl'][i] = -stop_loss\n",
" elif sell_price >= buy_price * (1 + profit_target):\n",
" new_df['pnl'][i] = profit_target\n",
" else:\n",
" new_df['pnl'][i] = (sell_price - buy_price) / buy_price\n",
"```\n",
"\n",
"This will create a new column called 'pnl' in the DataFrame that indicates the profit or loss for each trade based on the buy and sell signals and the defined stop loss and profit target. The stop loss and profit target are calculated based on the stop_loss_percent and profit_target_percent variables, respectively. The buy and sell prices are stored in the buy_price and sell_price variables, respectively. If the sell price is less than or equal to the stop loss, the profit or loss is set to -stop_loss. If the sell price is greater than or equal to the profit target, the profit or loss is set to profit_target. Otherwise, the profit or loss is calculated as (sell_price - buy_price) / buy_price. Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Calculate the total profit or loss for all trades.\n",
"Input: None.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can calculate the total profit or loss for all trades by summing the values in the 'pnl' column of the DataFrame. We can use the following code to calculate the total profit or loss:\n",
"\n",
"```\n",
"total_pnl = new_df['pnl'].sum()\n",
"```\n",
"\n",
"This will calculate the total profit or loss for all trades and store it in the total_pnl variable. Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Visualize the stock data, short-term moving average, and long-term moving average using a line chart.\n",
"Input: None.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can visualize the stock data, short-term moving average, and long-term moving average using a line chart using the `plot()` function from pandas. We can use the following code to visualize the data:\n",
"\n",
"```\n",
"plt.figure(figsize=(12,6))\n",
"plt.plot(new_df.index, new_df['close'], label='Close')\n",
"plt.plot(new_df.index, new_df['short_ma'], label='Short MA')\n",
"plt.plot(new_df.index, new_df['long_ma'], label='Long MA')\n",
"plt.xlabel('Date')\n",
"plt.ylabel('Price')\n",
"plt.title('Stock Data with Moving Averages')\n",
"plt.legend()\n",
"plt.show()\n",
"```\n",
"\n",
"This will create a line chart that visualizes the stock data, short-term moving average, and long-term moving average. The x-axis represents the date and the y-axis represents the price. The chart also includes a legend that labels each line. Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Visualize the buy and sell signals using a scatter plot.\n",
"Input: None.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can visualize the buy and sell signals using a scatter plot using the `scatter()` function from matplotlib. We can use the following code to visualize the signals:\n",
"\n",
"```\n",
"buy_signals = new_df[new_df['signal'] == 1]\n",
"sell_signals = new_df[new_df['signal'] == -1]\n",
"\n",
"plt.figure(figsize=(12,6))\n",
"plt.scatter(buy_signals.index, buy_signals['close'], label='Buy', marker='^', color='green')\n",
"plt.scatter(sell_signals.index, sell_signals['close'], label='Sell', marker='v', color='red')\n",
"plt.plot(new_df.index, new_df['close'], label='Close')\n",
"plt.xlabel('Date')\n",
"plt.ylabel('Price')\n",
"plt.title('Buy and Sell Signals')\n",
"plt.legend()\n",
"plt.show()\n",
"```\n",
"\n",
"This will create a scatter plot that visualizes the buy and sell signals. The x-axis represents the date and the y-axis represents the price. The chart also includes a legend that labels each point. Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Print the total profit or loss for all trades.\n",
"Input: None.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: We can print the total profit or loss for all trades using the `print()` function. We can use the following code to print the total profit or loss:\n",
"\n",
"```\n",
"print('Total Profit/Loss: {:.2%}'.format(total_pnl))\n",
"```\n",
"\n",
"This will print the total profit or loss for all trades in percentage format with two decimal places. Next request.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"Instruction: Finalize the code by putting all the code snippets together in a single Python script.\n",
"Input: The necessary inputs for each code snippet.\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Solution: Here is the complete Python script that puts all the code snippets together:\n",
"\n",
"```\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import talib as ta\n",
"\n",
"# Load historical stock data into a pandas DataFrame\n",
"df = pd.read_csv('path/to/csv/file.csv')\n",
"\n",
"# Preprocess the historical stock data\n",
"df = df.set_index('date')\n",
"df = df.sort_index(ascending=True)\n",
"\n",
"# Calculate the short-term and long-term moving averages\n",
"short_period = 10\n",
"long_period = 30\n",
"short_ma = ta.SMA(df['close'], timeperiod=short_period)\n",
"long_ma = ta.SMA(df['close'], timeperiod=long_period)\n",
"\n",
"# Create a new DataFrame that combines the historical stock data with the short-term and long-term moving averages\n",
"new_df = pd.concat([df, short_ma, long_ma], axis=1)\n",
"new_df.columns = ['open', 'high', 'low', 'close', 'volume', 'short_ma', 'long_ma']\n",
"\n",
"# Create a new column in the DataFrame that indicates when to buy or sell the stock based on the crossover of the short-term and long-term moving averages\n",
"new_df['signal'] = np.where(new_df['short_ma'] > new_df['long_ma'], 1, -1)\n",
"\n",
"# Create a new column in the DataFrame that indicates the profit or loss for each trade based on the buy and sell signals and the defined stop loss and profit target\n",
"stop_loss_percent = 5\n",
"profit_target_percent = 10\n",
"stop_loss = stop_loss_percent / 100\n",
"profit_target = profit_target_percent / 100\n",
"new_df['pnl'] = 0.0\n",
"buy_price = 0.0\n",
"for i in range(1, len(new_df)):\n",
" if new_df['signal'][i] == 1 and new_df['signal'][i-1] == -1:\n",
" buy_price = new_df['close'][i]\n",
" elif new_df['signal'][i] == -1 and new_df['signal'][i-1] == 1:\n",
" sell_price = new_df['close'][i]\n",
" if sell_price <= buy_price * (1 - stop_loss):\n",
" new_df['pnl'][i] = -stop_loss\n",
" elif sell_price >= buy_price * (1 + profit_target):\n",
" new_df['pnl'][i] = profit_target\n",
" else:\n",
" new_df['pnl'][i] = (sell_price - buy_price) / buy_price\n",
"\n",
"# Calculate the total profit or loss for all trades\n",
"total_pnl = new_df['pnl'].sum()\n",
"\n",
"# Visualize the stock data, short-term moving average, and long-term moving average using a line chart\n",
"plt.figure(figsize=(12,6))\n",
"plt.plot(new_df.index, new_df['close'], label='Close')\n",
"plt.plot(new_df.index, new_df['short_ma'], label='Short MA')\n",
"plt.plot(new_df.index, new_df['long_ma'], label='Long MA')\n",
"plt.xlabel('Date')\n",
"plt.ylabel('Price')\n",
"plt.title('Stock Data with Moving Averages')\n",
"plt.legend()\n",
"plt.show()\n",
"\n",
"# Visualize the buy and sell signals using a scatter plot\n",
"buy_signals = new_df[new_df['signal'] == 1]\n",
"sell_signals = new_df[new_df['signal'] == -1]\n",
"plt.figure(figsize=(12,6))\n",
"plt.scatter(buy_signals.index, buy_signals['close'], label='Buy', marker='^', color='green')\n",
"plt.scatter(sell_signals.index, sell_signals['close'], label='Sell', marker='v', color='red')\n",
"plt.plot(new_df.index, new_df['close'], label='Close')\n",
"plt.xlabel('Date')\n",
"plt.ylabel('Price')\n",
"plt.title('Buy and Sell Signals')\n",
"plt.legend()\n",
"plt.show()\n",
"\n",
"# Print the total profit or loss for all trades\n",
"print('Total Profit/Loss: {:.2%}'.format(total_pnl))\n",
"```\n",
"\n",
"You need to replace the path/to/csv/file.csv with the actual path to the CSV file containing the historical stock data. You can also adjust the short_period, long_period, stop_loss_percent, and profit_target_percent variables to suit your needs.\n",
"\n",
"\n",
"AI User (Stock Trader):\n",
"\n",
"<CAMEL_TASK_DONE>\n",
"\n",
"\n",
"AI Assistant (Python Programmer):\n",
"\n",
"Great! Let me know if you need any further assistance.\n",
"\n",
"\n"
]
}
],
"source": [
"print(f\"Original task prompt:\\n{task}\\n\")\n",
"print(f\"Specified task prompt:\\n{specified_task}\\n\")\n",
"\n",
"chat_turn_limit, n = 30, 0\n",
"while n < chat_turn_limit:\n",
" n += 1\n",
" user_ai_msg = user_agent.step(assistant_msg)\n",
" user_msg = HumanMessage(content=user_ai_msg.content)\n",
" print(f\"AI User ({user_role_name}):\\n\\n{user_msg.content}\\n\\n\")\n",
" \n",
" assistant_ai_msg = assistant_agent.step(user_msg)\n",
" assistant_msg = HumanMessage(content=assistant_ai_msg.content)\n",
" print(f\"AI Assistant ({assistant_role_name}):\\n\\n{assistant_msg.content}\\n\\n\")\n",
" if \"<CAMEL_TASK_DONE>\" in user_msg.content:\n",
" break"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "camel",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,538 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ba5f8741",
"metadata": {},
"source": [
"# Custom Agent with PlugIn Retrieval\n",
"\n",
"This notebook combines two concepts in order to build a custom agent that can interact with AI Plugins:\n",
"\n",
"1. [Custom Agent with Retrieval](../../modules/agents/agents/custom_agent_with_plugin_retrieval.ipynb): This introduces the concept of retrieving many tools, which is useful when trying to work with arbitrarily many plugins.\n",
"2. [Natural Language API Chains](../../modules/chains/examples/openapi.ipynb): This creates Natural Language wrappers around OpenAPI endpoints. This is useful because (1) plugins use OpenAPI endpoints under the hood, (2) wrapping them in an NLAChain allows the router agent to call it more easily.\n",
"\n",
"The novel idea introduced in this notebook is the idea of using retrieval to select not the tools explicitly, but the set of OpenAPI specs to use. We can then generate tools from those OpenAPI specs. The use case for this is when trying to get agents to use plugins. It may be more efficient to choose plugins first, then the endpoints, rather than the endpoints directly. This is because the plugins may contain more useful information for selection."
]
},
{
"cell_type": "markdown",
"id": "fea4812c",
"metadata": {},
"source": [
"## Set up environment\n",
"\n",
"Do necessary imports, etc."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9af9734e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser\n",
"from langchain.prompts import StringPromptTemplate\n",
"from langchain import OpenAI, SerpAPIWrapper, LLMChain\n",
"from typing import List, Union\n",
"from langchain.schema import AgentAction, AgentFinish\n",
"from langchain.agents.agent_toolkits import NLAToolkit\n",
"from langchain.tools.plugin import AIPlugin\n",
"import re"
]
},
{
"cell_type": "markdown",
"id": "2f91d8b4",
"metadata": {},
"source": [
"## Setup LLM"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a1a3b59c",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)"
]
},
{
"cell_type": "markdown",
"id": "6df0253f",
"metadata": {},
"source": [
"## Set up plugins\n",
"\n",
"Load and index plugins"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "becda2a1",
"metadata": {},
"outputs": [],
"source": [
"urls = [\n",
" \"https://datasette.io/.well-known/ai-plugin.json\",\n",
" \"https://api.speak.com/.well-known/ai-plugin.json\",\n",
" \"https://www.wolframalpha.com/.well-known/ai-plugin.json\",\n",
" \"https://www.zapier.com/.well-known/ai-plugin.json\",\n",
" \"https://www.klarna.com/.well-known/ai-plugin.json\",\n",
" \"https://www.joinmilo.com/.well-known/ai-plugin.json\",\n",
" \"https://slack.com/.well-known/ai-plugin.json\",\n",
" \"https://schooldigger.com/.well-known/ai-plugin.json\",\n",
"]\n",
"\n",
"AI_PLUGINS = [AIPlugin.from_url(url) for url in urls]"
]
},
{
"cell_type": "markdown",
"id": "17362717",
"metadata": {},
"source": [
"## Tool Retriever\n",
"\n",
"We will use a vectorstore to create embeddings for each tool description. Then, for an incoming query we can create embeddings for that query and do a similarity search for relevant tools."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "77c4be4b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import FAISS\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.schema import Document"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9092a158",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
"Attempting to load an OpenAPI 3.0.2 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
"Attempting to load a Swagger 2.0 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n"
]
}
],
"source": [
"embeddings = OpenAIEmbeddings()\n",
"docs = [\n",
" Document(page_content=plugin.description_for_model, \n",
" metadata={\"plugin_name\": plugin.name_for_model}\n",
" )\n",
" for plugin in AI_PLUGINS\n",
"]\n",
"vector_store = FAISS.from_documents(docs, embeddings)\n",
"toolkits_dict = {plugin.name_for_model: \n",
" NLAToolkit.from_llm_and_ai_plugin(llm, plugin) \n",
" for plugin in AI_PLUGINS}"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "735a7566",
"metadata": {},
"outputs": [],
"source": [
"retriever = vector_store.as_retriever()\n",
"\n",
"def get_tools(query):\n",
" # Get documents, which contain the Plugins to use\n",
" docs = retriever.get_relevant_documents(query)\n",
" # Get the toolkits, one for each plugin\n",
" tool_kits = [toolkits_dict[d.metadata[\"plugin_name\"]] for d in docs]\n",
" # Get the tools: a separate NLAChain for each endpoint\n",
" tools = []\n",
" for tk in tool_kits:\n",
" tools.extend(tk.nla_tools)\n",
" return tools"
]
},
{
"cell_type": "markdown",
"id": "7699afd7",
"metadata": {},
"source": [
"We can now test this retriever to see if it seems to work."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "425f2886",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Milo.askMilo',\n",
" 'Zapier_Natural_Language_Actions_(NLA)_API_(Dynamic)_-_Beta.search_all_actions',\n",
" 'Zapier_Natural_Language_Actions_(NLA)_API_(Dynamic)_-_Beta.preview_a_zap',\n",
" 'Zapier_Natural_Language_Actions_(NLA)_API_(Dynamic)_-_Beta.get_configuration_link',\n",
" 'Zapier_Natural_Language_Actions_(NLA)_API_(Dynamic)_-_Beta.list_exposed_actions',\n",
" 'SchoolDigger_API_V2.0.Autocomplete_GetSchools',\n",
" 'SchoolDigger_API_V2.0.Districts_GetAllDistricts2',\n",
" 'SchoolDigger_API_V2.0.Districts_GetDistrict2',\n",
" 'SchoolDigger_API_V2.0.Rankings_GetSchoolRank2',\n",
" 'SchoolDigger_API_V2.0.Rankings_GetRank_District',\n",
" 'SchoolDigger_API_V2.0.Schools_GetAllSchools20',\n",
" 'SchoolDigger_API_V2.0.Schools_GetSchool20',\n",
" 'Speak.translate',\n",
" 'Speak.explainPhrase',\n",
" 'Speak.explainTask']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tools = get_tools(\"What could I do today with my kiddo\")\n",
"[t.name for t in tools]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "3aa88768",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Open_AI_Klarna_product_Api.productsUsingGET',\n",
" 'Milo.askMilo',\n",
" 'Zapier_Natural_Language_Actions_(NLA)_API_(Dynamic)_-_Beta.search_all_actions',\n",
" 'Zapier_Natural_Language_Actions_(NLA)_API_(Dynamic)_-_Beta.preview_a_zap',\n",
" 'Zapier_Natural_Language_Actions_(NLA)_API_(Dynamic)_-_Beta.get_configuration_link',\n",
" 'Zapier_Natural_Language_Actions_(NLA)_API_(Dynamic)_-_Beta.list_exposed_actions',\n",
" 'SchoolDigger_API_V2.0.Autocomplete_GetSchools',\n",
" 'SchoolDigger_API_V2.0.Districts_GetAllDistricts2',\n",
" 'SchoolDigger_API_V2.0.Districts_GetDistrict2',\n",
" 'SchoolDigger_API_V2.0.Rankings_GetSchoolRank2',\n",
" 'SchoolDigger_API_V2.0.Rankings_GetRank_District',\n",
" 'SchoolDigger_API_V2.0.Schools_GetAllSchools20',\n",
" 'SchoolDigger_API_V2.0.Schools_GetSchool20']"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tools = get_tools(\"what shirts can i buy?\")\n",
"[t.name for t in tools]"
]
},
{
"cell_type": "markdown",
"id": "2e7a075c",
"metadata": {},
"source": [
"## Prompt Template\n",
"\n",
"The prompt template is pretty standard, because we're not actually changing that much logic in the actual prompt template, but rather we are just changing how retrieval is done."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "339b1bb8",
"metadata": {},
"outputs": [],
"source": [
"# Set up the base template\n",
"template = \"\"\"Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:\n",
"\n",
"{tools}\n",
"\n",
"Use the following format:\n",
"\n",
"Question: the input question you must answer\n",
"Thought: you should always think about what to do\n",
"Action: the action to take, should be one of [{tool_names}]\n",
"Action Input: the input to the action\n",
"Observation: the result of the action\n",
"... (this Thought/Action/Action Input/Observation can repeat N times)\n",
"Thought: I now know the final answer\n",
"Final Answer: the final answer to the original input question\n",
"\n",
"Begin! Remember to speak as a pirate when giving your final answer. Use lots of \"Arg\"s\n",
"\n",
"Question: {input}\n",
"{agent_scratchpad}\"\"\""
]
},
{
"cell_type": "markdown",
"id": "1583acdc",
"metadata": {},
"source": [
"The custom prompt template now has the concept of a tools_getter, which we call on the input to select the tools to use"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "fd969d31",
"metadata": {},
"outputs": [],
"source": [
"from typing import Callable\n",
"# Set up a prompt template\n",
"class CustomPromptTemplate(StringPromptTemplate):\n",
" # The template to use\n",
" template: str\n",
" ############## NEW ######################\n",
" # The list of tools available\n",
" tools_getter: Callable\n",
" \n",
" def format(self, **kwargs) -> str:\n",
" # Get the intermediate steps (AgentAction, Observation tuples)\n",
" # Format them in a particular way\n",
" intermediate_steps = kwargs.pop(\"intermediate_steps\")\n",
" thoughts = \"\"\n",
" for action, observation in intermediate_steps:\n",
" thoughts += action.log\n",
" thoughts += f\"\\nObservation: {observation}\\nThought: \"\n",
" # Set the agent_scratchpad variable to that value\n",
" kwargs[\"agent_scratchpad\"] = thoughts\n",
" ############## NEW ######################\n",
" tools = self.tools_getter(kwargs[\"input\"])\n",
" # Create a tools variable from the list of tools provided\n",
" kwargs[\"tools\"] = \"\\n\".join([f\"{tool.name}: {tool.description}\" for tool in tools])\n",
" # Create a list of tool names for the tools provided\n",
" kwargs[\"tool_names\"] = \", \".join([tool.name for tool in tools])\n",
" return self.template.format(**kwargs)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "798ef9fb",
"metadata": {},
"outputs": [],
"source": [
"prompt = CustomPromptTemplate(\n",
" template=template,\n",
" tools_getter=get_tools,\n",
" # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically\n",
" # This includes the `intermediate_steps` variable because that is needed\n",
" input_variables=[\"input\", \"intermediate_steps\"]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ef3a1af3",
"metadata": {},
"source": [
"## Output Parser\n",
"\n",
"The output parser is unchanged from the previous notebook, since we are not changing anything about the output format."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "7c6fe0d3",
"metadata": {},
"outputs": [],
"source": [
"class CustomOutputParser(AgentOutputParser):\n",
" \n",
" def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:\n",
" # Check if agent should finish\n",
" if \"Final Answer:\" in llm_output:\n",
" return AgentFinish(\n",
" # Return values is generally always a dictionary with a single `output` key\n",
" # It is not recommended to try anything else at the moment :)\n",
" return_values={\"output\": llm_output.split(\"Final Answer:\")[-1].strip()},\n",
" log=llm_output,\n",
" )\n",
" # Parse out the action and action input\n",
" regex = r\"Action: (.*?)[\\n]*Action Input:[\\s]*(.*)\"\n",
" match = re.search(regex, llm_output, re.DOTALL)\n",
" if not match:\n",
" raise ValueError(f\"Could not parse LLM output: `{llm_output}`\")\n",
" action = match.group(1).strip()\n",
" action_input = match.group(2)\n",
" # Return the action and action input\n",
" return AgentAction(tool=action, tool_input=action_input.strip(\" \").strip('\"'), log=llm_output)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "d278706a",
"metadata": {},
"outputs": [],
"source": [
"output_parser = CustomOutputParser()"
]
},
{
"cell_type": "markdown",
"id": "170587b1",
"metadata": {},
"source": [
"## Set up LLM, stop sequence, and the agent\n",
"\n",
"Also the same as the previous notebook"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "f9d4c374",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "9b1cc2a2",
"metadata": {},
"outputs": [],
"source": [
"# LLM chain consisting of the LLM and a prompt\n",
"llm_chain = LLMChain(llm=llm, prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "e4f5092f",
"metadata": {},
"outputs": [],
"source": [
"tool_names = [tool.name for tool in tools]\n",
"agent = LLMSingleActionAgent(\n",
" llm_chain=llm_chain, \n",
" output_parser=output_parser,\n",
" stop=[\"\\nObservation:\"], \n",
" allowed_tools=tool_names\n",
")"
]
},
{
"cell_type": "markdown",
"id": "aa8a5326",
"metadata": {},
"source": [
"## Use the Agent\n",
"\n",
"Now we can use it!"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "490604e9",
"metadata": {},
"outputs": [],
"source": [
"agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "653b1617",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find a product API\n",
"Action: Open_AI_Klarna_product_Api.productsUsingGET\n",
"Action Input: shirts\u001b[0m\n",
"\n",
"Observation:\u001b[36;1m\u001b[1;3mI found 10 shirts from the API response. They range in price from $9.99 to $450.00 and come in a variety of materials, colors, and patterns.\u001b[0m\u001b[32;1m\u001b[1;3m I now know what shirts I can buy\n",
"Final Answer: Arg, I found 10 shirts from the API response. They range in price from $9.99 to $450.00 and come in a variety of materials, colors, and patterns.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Arg, I found 10 shirts from the API response. They range in price from $9.99 to $450.00 and come in a variety of materials, colors, and patterns.'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(\"what shirts can i buy?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2481ee76",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "18784188d7ecd866c0586ac068b02361a6896dc3a29b64f5cc957f09c590acef"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -23,3 +23,4 @@ Query Understanding: GPT-4 processes user queries, grasping the context and extr
The full tutorial is available below.
- [Twitter the-algorithm codebase analysis with Deep Lake](code/twitter-the-algorithm-analysis-deeplake.ipynb): A notebook walking through how to parse github source code and run queries conversation.
- [LangChain codebase analysis with Deep Lake](code/code-analysis-deeplake.ipynb): A notebook walking through how to analyze and do question answering over THIS code base.

View File

@@ -0,0 +1,644 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Use LangChain, GPT and Deep Lake to work with code base\n",
"In this tutorial, we are going to use Langchain + Deep Lake with GPT to analyze the code base of the LangChain itself. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Design"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Prepare data:\n",
" 1. Upload all python project files using the `langchain.document_loaders.TextLoader`. We will call these files the **documents**.\n",
" 2. Split all documents to chunks using the `langchain.text_splitter.CharacterTextSplitter`.\n",
" 3. Embed chunks and upload them into the DeepLake using `langchain.embeddings.openai.OpenAIEmbeddings` and `langchain.vectorstores.DeepLake`\n",
"2. Question-Answering:\n",
" 1. Build a chain from `langchain.chat_models.ChatOpenAI` and `langchain.chains.ConversationalRetrievalChain`\n",
" 2. Prepare questions.\n",
" 3. Get answers running the chain.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Implementation"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"### Integration preparations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need to set up keys for external services and install necessary python libraries."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#!python3 -m pip install --upgrade langchain deeplake openai"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set up OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. \n",
"\n",
"For full documentation of Deep Lake please follow https://docs.activeloop.ai/ and API reference https://docs.deeplake.ai/en/latest/"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"import os\n",
"from getpass import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass()\n",
"# Please manually enter OpenAI Key"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at [app.activeloop.ai](https://app.activeloop.ai)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare data "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load all repository files. Here we assume this notebook is downloaded as the part of the langchain fork and we work with the python files of the `langchain` repo.\n",
"\n",
"If you want to use files from different repo, change `root_dir` to the root dir of your repo."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1147\n"
]
}
],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"root_dir = '../../../..'\n",
"\n",
"docs = []\n",
"for dirpath, dirnames, filenames in os.walk(root_dir):\n",
" for file in filenames:\n",
" if file.endswith('.py') and '/.venv/' not in dirpath:\n",
" try: \n",
" loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')\n",
" docs.extend(loader.load_and_split())\n",
" except Exception as e: \n",
" pass\n",
"print(f'{len(docs)}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, chunk the files"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Created a chunk of size 1620, which is longer than the specified 1000\n",
"Created a chunk of size 1213, which is longer than the specified 1000\n",
"Created a chunk of size 1263, which is longer than the specified 1000\n",
"Created a chunk of size 1448, which is longer than the specified 1000\n",
"Created a chunk of size 1120, which is longer than the specified 1000\n",
"Created a chunk of size 1148, which is longer than the specified 1000\n",
"Created a chunk of size 1826, which is longer than the specified 1000\n",
"Created a chunk of size 1260, which is longer than the specified 1000\n",
"Created a chunk of size 1195, which is longer than the specified 1000\n",
"Created a chunk of size 2147, which is longer than the specified 1000\n",
"Created a chunk of size 1410, which is longer than the specified 1000\n",
"Created a chunk of size 1269, which is longer than the specified 1000\n",
"Created a chunk of size 1030, which is longer than the specified 1000\n",
"Created a chunk of size 1046, which is longer than the specified 1000\n",
"Created a chunk of size 1024, which is longer than the specified 1000\n",
"Created a chunk of size 1026, which is longer than the specified 1000\n",
"Created a chunk of size 1285, which is longer than the specified 1000\n",
"Created a chunk of size 1370, which is longer than the specified 1000\n",
"Created a chunk of size 1031, which is longer than the specified 1000\n",
"Created a chunk of size 1999, which is longer than the specified 1000\n",
"Created a chunk of size 1029, which is longer than the specified 1000\n",
"Created a chunk of size 1120, which is longer than the specified 1000\n",
"Created a chunk of size 1033, which is longer than the specified 1000\n",
"Created a chunk of size 1143, which is longer than the specified 1000\n",
"Created a chunk of size 1416, which is longer than the specified 1000\n",
"Created a chunk of size 2482, which is longer than the specified 1000\n",
"Created a chunk of size 1890, which is longer than the specified 1000\n",
"Created a chunk of size 1418, which is longer than the specified 1000\n",
"Created a chunk of size 1848, which is longer than the specified 1000\n",
"Created a chunk of size 1069, which is longer than the specified 1000\n",
"Created a chunk of size 2369, which is longer than the specified 1000\n",
"Created a chunk of size 1045, which is longer than the specified 1000\n",
"Created a chunk of size 1501, which is longer than the specified 1000\n",
"Created a chunk of size 1208, which is longer than the specified 1000\n",
"Created a chunk of size 1950, which is longer than the specified 1000\n",
"Created a chunk of size 1283, which is longer than the specified 1000\n",
"Created a chunk of size 1414, which is longer than the specified 1000\n",
"Created a chunk of size 1304, which is longer than the specified 1000\n",
"Created a chunk of size 1224, which is longer than the specified 1000\n",
"Created a chunk of size 1060, which is longer than the specified 1000\n",
"Created a chunk of size 2461, which is longer than the specified 1000\n",
"Created a chunk of size 1099, which is longer than the specified 1000\n",
"Created a chunk of size 1178, which is longer than the specified 1000\n",
"Created a chunk of size 1449, which is longer than the specified 1000\n",
"Created a chunk of size 1345, which is longer than the specified 1000\n",
"Created a chunk of size 3359, which is longer than the specified 1000\n",
"Created a chunk of size 2248, which is longer than the specified 1000\n",
"Created a chunk of size 1589, which is longer than the specified 1000\n",
"Created a chunk of size 2104, which is longer than the specified 1000\n",
"Created a chunk of size 1505, which is longer than the specified 1000\n",
"Created a chunk of size 1387, which is longer than the specified 1000\n",
"Created a chunk of size 1215, which is longer than the specified 1000\n",
"Created a chunk of size 1240, which is longer than the specified 1000\n",
"Created a chunk of size 1635, which is longer than the specified 1000\n",
"Created a chunk of size 1075, which is longer than the specified 1000\n",
"Created a chunk of size 2180, which is longer than the specified 1000\n",
"Created a chunk of size 1791, which is longer than the specified 1000\n",
"Created a chunk of size 1555, which is longer than the specified 1000\n",
"Created a chunk of size 1082, which is longer than the specified 1000\n",
"Created a chunk of size 1225, which is longer than the specified 1000\n",
"Created a chunk of size 1287, which is longer than the specified 1000\n",
"Created a chunk of size 1085, which is longer than the specified 1000\n",
"Created a chunk of size 1117, which is longer than the specified 1000\n",
"Created a chunk of size 1966, which is longer than the specified 1000\n",
"Created a chunk of size 1150, which is longer than the specified 1000\n",
"Created a chunk of size 1285, which is longer than the specified 1000\n",
"Created a chunk of size 1150, which is longer than the specified 1000\n",
"Created a chunk of size 1585, which is longer than the specified 1000\n",
"Created a chunk of size 1208, which is longer than the specified 1000\n",
"Created a chunk of size 1267, which is longer than the specified 1000\n",
"Created a chunk of size 1542, which is longer than the specified 1000\n",
"Created a chunk of size 1183, which is longer than the specified 1000\n",
"Created a chunk of size 2424, which is longer than the specified 1000\n",
"Created a chunk of size 1017, which is longer than the specified 1000\n",
"Created a chunk of size 1304, which is longer than the specified 1000\n",
"Created a chunk of size 1379, which is longer than the specified 1000\n",
"Created a chunk of size 1324, which is longer than the specified 1000\n",
"Created a chunk of size 1205, which is longer than the specified 1000\n",
"Created a chunk of size 1056, which is longer than the specified 1000\n",
"Created a chunk of size 1195, which is longer than the specified 1000\n",
"Created a chunk of size 3608, which is longer than the specified 1000\n",
"Created a chunk of size 1058, which is longer than the specified 1000\n",
"Created a chunk of size 1075, which is longer than the specified 1000\n",
"Created a chunk of size 1217, which is longer than the specified 1000\n",
"Created a chunk of size 1109, which is longer than the specified 1000\n",
"Created a chunk of size 1440, which is longer than the specified 1000\n",
"Created a chunk of size 1046, which is longer than the specified 1000\n",
"Created a chunk of size 1220, which is longer than the specified 1000\n",
"Created a chunk of size 1403, which is longer than the specified 1000\n",
"Created a chunk of size 1241, which is longer than the specified 1000\n",
"Created a chunk of size 1427, which is longer than the specified 1000\n",
"Created a chunk of size 1049, which is longer than the specified 1000\n",
"Created a chunk of size 1580, which is longer than the specified 1000\n",
"Created a chunk of size 1565, which is longer than the specified 1000\n",
"Created a chunk of size 1131, which is longer than the specified 1000\n",
"Created a chunk of size 1425, which is longer than the specified 1000\n",
"Created a chunk of size 1054, which is longer than the specified 1000\n",
"Created a chunk of size 1027, which is longer than the specified 1000\n",
"Created a chunk of size 2559, which is longer than the specified 1000\n",
"Created a chunk of size 1028, which is longer than the specified 1000\n",
"Created a chunk of size 1382, which is longer than the specified 1000\n",
"Created a chunk of size 1888, which is longer than the specified 1000\n",
"Created a chunk of size 1475, which is longer than the specified 1000\n",
"Created a chunk of size 1652, which is longer than the specified 1000\n",
"Created a chunk of size 1891, which is longer than the specified 1000\n",
"Created a chunk of size 1899, which is longer than the specified 1000\n",
"Created a chunk of size 1021, which is longer than the specified 1000\n",
"Created a chunk of size 1085, which is longer than the specified 1000\n",
"Created a chunk of size 1854, which is longer than the specified 1000\n",
"Created a chunk of size 1672, which is longer than the specified 1000\n",
"Created a chunk of size 2537, which is longer than the specified 1000\n",
"Created a chunk of size 1251, which is longer than the specified 1000\n",
"Created a chunk of size 1734, which is longer than the specified 1000\n",
"Created a chunk of size 1642, which is longer than the specified 1000\n",
"Created a chunk of size 1376, which is longer than the specified 1000\n",
"Created a chunk of size 1253, which is longer than the specified 1000\n",
"Created a chunk of size 1642, which is longer than the specified 1000\n",
"Created a chunk of size 1419, which is longer than the specified 1000\n",
"Created a chunk of size 1438, which is longer than the specified 1000\n",
"Created a chunk of size 1427, which is longer than the specified 1000\n",
"Created a chunk of size 1684, which is longer than the specified 1000\n",
"Created a chunk of size 1760, which is longer than the specified 1000\n",
"Created a chunk of size 1157, which is longer than the specified 1000\n",
"Created a chunk of size 2504, which is longer than the specified 1000\n",
"Created a chunk of size 1082, which is longer than the specified 1000\n",
"Created a chunk of size 2268, which is longer than the specified 1000\n",
"Created a chunk of size 1784, which is longer than the specified 1000\n",
"Created a chunk of size 1311, which is longer than the specified 1000\n",
"Created a chunk of size 2972, which is longer than the specified 1000\n",
"Created a chunk of size 1144, which is longer than the specified 1000\n",
"Created a chunk of size 1825, which is longer than the specified 1000\n",
"Created a chunk of size 1508, which is longer than the specified 1000\n",
"Created a chunk of size 2901, which is longer than the specified 1000\n",
"Created a chunk of size 1715, which is longer than the specified 1000\n",
"Created a chunk of size 1062, which is longer than the specified 1000\n",
"Created a chunk of size 1206, which is longer than the specified 1000\n",
"Created a chunk of size 1102, which is longer than the specified 1000\n",
"Created a chunk of size 1184, which is longer than the specified 1000\n",
"Created a chunk of size 1002, which is longer than the specified 1000\n",
"Created a chunk of size 1065, which is longer than the specified 1000\n",
"Created a chunk of size 1871, which is longer than the specified 1000\n",
"Created a chunk of size 1754, which is longer than the specified 1000\n",
"Created a chunk of size 2413, which is longer than the specified 1000\n",
"Created a chunk of size 1771, which is longer than the specified 1000\n",
"Created a chunk of size 2054, which is longer than the specified 1000\n",
"Created a chunk of size 2000, which is longer than the specified 1000\n",
"Created a chunk of size 2061, which is longer than the specified 1000\n",
"Created a chunk of size 1066, which is longer than the specified 1000\n",
"Created a chunk of size 1419, which is longer than the specified 1000\n",
"Created a chunk of size 1368, which is longer than the specified 1000\n",
"Created a chunk of size 1008, which is longer than the specified 1000\n",
"Created a chunk of size 1227, which is longer than the specified 1000\n",
"Created a chunk of size 1745, which is longer than the specified 1000\n",
"Created a chunk of size 2296, which is longer than the specified 1000\n",
"Created a chunk of size 1083, which is longer than the specified 1000\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"3477\n"
]
}
],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(docs)\n",
"print(f\"{len(texts)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then embed chunks and upload them to the DeepLake.\n",
"\n",
"This can take several minutes. "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', document_model_name='text-embedding-ada-002', query_model_name='text-embedding-ada-002', embedding_ctx_length=8191, openai_api_key=None, openai_organization=None, allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"embeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.vectorstores import DeepLake\n",
"\n",
"db = DeepLake.from_documents(texts, embeddings, dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\")\n",
"db"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Question Answering\n",
"First load the dataset, construct the retriever, then construct the Conversational Chain"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/user_name/langchain-code\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"hub://user_name/langchain-code loaded successfully.\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Deep Lake Dataset in hub://user_name/langchain-code already exists, loading from the storage\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://user_name/langchain-code', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (3477, 1536) float32 None \n",
" ids text (3477, 1) str None \n",
" metadata json (3477, 1) str None \n",
" text text (3477, 1) str None \n"
]
}
],
"source": [
"db = DeepLake(dataset_path=f\"hub://{DEEPLAKE_ACCOUNT_NAME}/langchain-code\", read_only=True, embedding_function=embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"retriever = db.as_retriever()\n",
"retriever.search_kwargs['distance_metric'] = 'cos'\n",
"retriever.search_kwargs['fetch_k'] = 20\n",
"retriever.search_kwargs['maximal_marginal_relevance'] = True\n",
"retriever.search_kwargs['k'] = 20"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def filter(x):\n",
" # filter based on source code\n",
" if 'something' in x['text'].data()['value']:\n",
" return False\n",
" \n",
" # filter based on path e.g. extension\n",
" metadata = x['metadata'].data()['value']\n",
" return 'only_this' in metadata['source'] or 'also_that' in metadata['source']\n",
"\n",
"### turn on below for custom filtering\n",
"# retriever.search_kwargs['filter'] = filter"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import ConversationalRetrievalChain\n",
"\n",
"model = ChatOpenAI(model='gpt-3.5-turbo') # 'ada' 'gpt-3.5-turbo' 'gpt-4',\n",
"qa = ConversationalRetrievalChain.from_llm(model,retriever=retriever)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"questions = [\n",
" \"What is the class hierarchy?\",\n",
" # \"What classes are derived from the Chain class?\",\n",
" # \"What classes and functions in the ./langchain/utilities/ forlder are not covered by unit tests?\",\n",
" # \"What one improvement do you propose in code in relation to the class herarchy for the Chain class?\",\n",
"] \n",
"chat_history = []\n",
"\n",
"for question in questions: \n",
" result = qa({\"question\": question, \"chat_history\": chat_history})\n",
" chat_history.append((question, result['answer']))\n",
" print(f\"-> **Question**: {question} \\n\")\n",
" print(f\"**Answer**: {result['answer']} \\n\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"-> **Question**: What is the class hierarchy? \n",
"\n",
"**Answer**: There are several class hierarchies in the provided code, so I'll list a few:\n",
"\n",
"1. `BaseModel` -> `ConstitutionalPrinciple`: `ConstitutionalPrinciple` is a subclass of `BaseModel`.\n",
"2. `BasePromptTemplate` -> `StringPromptTemplate`, `AIMessagePromptTemplate`, `BaseChatPromptTemplate`, `ChatMessagePromptTemplate`, `ChatPromptTemplate`, `HumanMessagePromptTemplate`, `MessagesPlaceholder`, `SystemMessagePromptTemplate`, `FewShotPromptTemplate`, `FewShotPromptWithTemplates`, `Prompt`, `PromptTemplate`: All of these classes are subclasses of `BasePromptTemplate`.\n",
"3. `APIChain`, `Chain`, `MapReduceDocumentsChain`, `MapRerankDocumentsChain`, `RefineDocumentsChain`, `StuffDocumentsChain`, `HypotheticalDocumentEmbedder`, `LLMChain`, `LLMBashChain`, `LLMCheckerChain`, `LLMMathChain`, `LLMRequestsChain`, `PALChain`, `QAWithSourcesChain`, `VectorDBQAWithSourcesChain`, `VectorDBQA`, `SQLDatabaseChain`: All of these classes are subclasses of `Chain`.\n",
"4. `BaseLoader`: `BaseLoader` is a subclass of `ABC`.\n",
"5. `BaseTracer` -> `ChainRun`, `LLMRun`, `SharedTracer`, `ToolRun`, `Tracer`, `TracerException`, `TracerSession`: All of these classes are subclasses of `BaseTracer`.\n",
"6. `OpenAIEmbeddings`, `HuggingFaceEmbeddings`, `CohereEmbeddings`, `JinaEmbeddings`, `LlamaCppEmbeddings`, `HuggingFaceHubEmbeddings`, `TensorflowHubEmbeddings`, `SagemakerEndpointEmbeddings`, `HuggingFaceInstructEmbeddings`, `SelfHostedEmbeddings`, `SelfHostedHuggingFaceEmbeddings`, `SelfHostedHuggingFaceInstructEmbeddings`, `FakeEmbeddings`, `AlephAlphaAsymmetricSemanticEmbedding`, `AlephAlphaSymmetricSemanticEmbedding`: All of these classes are subclasses of `BaseLLM`. \n",
"\n",
"\n",
"-> **Question**: What classes are derived from the Chain class? \n",
"\n",
"**Answer**: There are multiple classes that are derived from the Chain class. Some of them are:\n",
"- APIChain\n",
"- AnalyzeDocumentChain\n",
"- ChatVectorDBChain\n",
"- CombineDocumentsChain\n",
"- ConstitutionalChain\n",
"- ConversationChain\n",
"- GraphQAChain\n",
"- HypotheticalDocumentEmbedder\n",
"- LLMChain\n",
"- LLMCheckerChain\n",
"- LLMRequestsChain\n",
"- LLMSummarizationCheckerChain\n",
"- MapReduceChain\n",
"- OpenAPIEndpointChain\n",
"- PALChain\n",
"- QAWithSourcesChain\n",
"- RetrievalQA\n",
"- RetrievalQAWithSourcesChain\n",
"- SequentialChain\n",
"- SQLDatabaseChain\n",
"- TransformChain\n",
"- VectorDBQA\n",
"- VectorDBQAWithSourcesChain\n",
"\n",
"There might be more classes that are derived from the Chain class as it is possible to create custom classes that extend the Chain class.\n",
"\n",
"\n",
"-> **Question**: What classes and functions in the ./langchain/utilities/ forlder are not covered by unit tests? \n",
"\n",
"**Answer**: All classes and functions in the `./langchain/utilities/` folder seem to have unit tests written for them. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -18,31 +18,13 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Define OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. For full documentation of Deep Lake please follow https://docs.activeloop.ai/ and API reference https://docs.deeplake.ai/en/latest/"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import DeepLake\n",
"Define OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. For full documentation of Deep Lake please follow [docs](https://docs.activeloop.ai/) and [API reference](https://docs.deeplake.ai/en/latest/).\n",
"\n",
"os.environ['OPENAI_API_KEY']='sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at https://app.activeloop.ai"
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the [platform](https://app.activeloop.ai)"
]
},
{
@@ -51,7 +33,15 @@
"metadata": {},
"outputs": [],
"source": [
"!activeloop login -t <TOKEN>"
"import os\n",
"import getpass\n",
"\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import DeepLake\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
@@ -143,15 +133,35 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/twitter-algorithm\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"hub://davitbun/twitter-algorithm loaded successfully.\n",
"\n"
]
@@ -184,7 +194,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -205,7 +215,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -224,7 +234,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -267,9 +277,14 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"-> **Question**: What does favCountParams do? \n",
"\n",
"**Answer**: `favCountParams` is an optional ThriftLinearFeatureRankingParams instance that represents the parameters related to the \"favorite count\" feature in the ranking process. It is used to control the weight of the favorite count feature while ranking tweets. The favorite count is the number of times a tweet has been marked as a favorite by users, and it is considered an important signal in the ranking of tweets. By using `favCountParams`, the system can adjust the importance of the favorite count while calculating the final ranking score of a tweet. \n",
"\n",
"-> **Question**: is it Likes + Bookmarks, or not clear from the code?\n",
"\n",
"**Answer**: From the provided code, it is not clear if the favorite count metric is determined by the sum of likes and bookmarks. The favorite count is mentioned in the code, but there is no explicit reference to how it is calculated in terms of likes and bookmarks. \n",
@@ -423,7 +438,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.0"
}
},
"nbformat": 4,

View File

@@ -1,5 +1,5 @@
Evaluation
==============
==========
.. note::
`Conceptual Guide <https://docs.langchain.com/docs/use-cases/evaluation>`_
@@ -83,7 +83,7 @@ The existing examples we have are:
Other Examples
------------
--------------
In addition, we also have some more generic resources for evaluation.

View File

@@ -1,4 +1,4 @@
# Personal Assistants
# Personal Assistants (Agents)
> [Conceptual Guide](https://docs.langchain.com/docs/use-cases/personal-assistants)
@@ -16,3 +16,10 @@ Highlighting specific parts:
- [Agent Documentation](../modules/agents.rst) (for interacting with the outside world)
- [Index Documentation](../modules/indexes.rst) (for giving them knowledge of your data)
- [Memory](../modules/memory.rst) (for helping them remember interactions)
Specific examples of this include:
- [Baby AGI](agents/baby_agi.ipynb): a notebook implementing [BabyAGI](https://github.com/yoheinakajima/babyagi) by Yohei Nakajima as LLM Chains
- [Baby AGI with Tools](agents/baby_agi_with_agent.ipynb): building off the above notebook, this example substitutes in an agent with tools as the execution tools, allowing it to actually take actions.
- [CAMEL](agents/camel_role_playing.ipynb): an implementation of the CAMEL (Communicative Agents for “Mind” Exploration of Large Scale Language Model Society) paper, where two agents communicate with eachother.
- [AI Plugins](agents/custom_agent_with_plugin_retrieval.ipynb): an implementation of an agent that is designed to be able to use all AI Plugins.

View File

@@ -99,6 +99,16 @@ class BaseSingleActionAgent(BaseModel):
f"Got unsupported early_stopping_method `{early_stopping_method}`"
)
@classmethod
def from_llm_and_tools(
cls,
llm: BaseLanguageModel,
tools: Sequence[BaseTool],
callback_manager: Optional[BaseCallbackManager] = None,
**kwargs: Any,
) -> BaseSingleActionAgent:
raise NotImplementedError
@property
def _agent_type(self) -> str:
"""Return Identifier of agent type."""

View File

@@ -1,5 +1,5 @@
"""Toolkit for interacting with API's using natural language."""
from __future__ import annotations
from typing import Any, List, Optional, Sequence
@@ -11,6 +11,7 @@ from langchain.llms.base import BaseLLM
from langchain.requests import Requests
from langchain.tools.base import BaseTool
from langchain.tools.openapi.utils.openapi_utils import OpenAPISpec
from langchain.tools.plugin import AIPlugin
class NLAToolkit(BaseToolkit):
@@ -23,19 +24,18 @@ class NLAToolkit(BaseToolkit):
"""Get the tools for all the API operations."""
return list(self.nla_tools)
@classmethod
def from_llm_and_spec(
cls,
@staticmethod
def _get_http_operation_tools(
llm: BaseLLM,
spec: OpenAPISpec,
requests: Optional[Requests] = None,
verbose: bool = False,
**kwargs: Any
) -> "NLAToolkit":
"""Instantiate the toolkit by creating tools for each operation."""
http_operation_tools: List[NLATool] = []
**kwargs: Any,
) -> List[NLATool]:
"""Get the tools for all the API operations."""
if not spec.paths:
return cls(nla_tools=http_operation_tools)
return []
http_operation_tools = []
for path in spec.paths:
for method in spec.get_methods_for_path(path):
endpoint_tool = NLATool.from_llm_and_method(
@@ -45,9 +45,24 @@ class NLAToolkit(BaseToolkit):
spec=spec,
requests=requests,
verbose=verbose,
**kwargs
**kwargs,
)
http_operation_tools.append(endpoint_tool)
return http_operation_tools
@classmethod
def from_llm_and_spec(
cls,
llm: BaseLLM,
spec: OpenAPISpec,
requests: Optional[Requests] = None,
verbose: bool = False,
**kwargs: Any,
) -> NLAToolkit:
"""Instantiate the toolkit by creating tools for each operation."""
http_operation_tools = cls._get_http_operation_tools(
llm=llm, spec=spec, requests=requests, verbose=verbose, **kwargs
)
return cls(nla_tools=http_operation_tools)
@classmethod
@@ -57,10 +72,45 @@ class NLAToolkit(BaseToolkit):
open_api_url: str,
requests: Optional[Requests] = None,
verbose: bool = False,
**kwargs: Any
) -> "NLAToolkit":
**kwargs: Any,
) -> NLAToolkit:
"""Instantiate the toolkit from an OpenAPI Spec URL"""
spec = OpenAPISpec.from_url(open_api_url)
return cls.from_llm_and_spec(
llm=llm, spec=spec, requests=requests, verbose=verbose, **kwargs
)
@classmethod
def from_llm_and_ai_plugin(
cls,
llm: BaseLLM,
ai_plugin: AIPlugin,
requests: Optional[Requests] = None,
verbose: bool = False,
**kwargs: Any,
) -> NLAToolkit:
"""Instantiate the toolkit from an OpenAPI Spec URL"""
spec = OpenAPISpec.from_url(ai_plugin.api.url)
# TODO: Merge optional Auth information with the `requests` argument
return cls.from_llm_and_spec(
llm=llm,
spec=spec,
requests=requests,
verbose=verbose,
**kwargs,
)
@classmethod
def from_llm_and_ai_plugin_url(
cls,
llm: BaseLLM,
ai_plugin_url: str,
requests: Optional[Requests] = None,
verbose: bool = False,
**kwargs: Any,
) -> NLAToolkit:
"""Instantiate the toolkit from an OpenAPI Spec URL"""
plugin = AIPlugin.from_url(ai_plugin_url)
return cls.from_llm_and_ai_plugin(
llm=llm, ai_plugin=plugin, requests=requests, verbose=verbose, **kwargs
)

View File

@@ -22,6 +22,9 @@ def create_openapi_agent(
suffix: str = OPENAPI_SUFFIX,
format_instructions: str = FORMAT_INSTRUCTIONS,
input_variables: Optional[List[str]] = None,
max_iterations: Optional[int] = 15,
max_execution_time: Optional[float] = None,
early_stopping_method: str = "force",
verbose: bool = False,
return_intermediate_steps: bool = False,
**kwargs: Any,
@@ -47,4 +50,7 @@ def create_openapi_agent(
tools=toolkit.get_tools(),
verbose=verbose,
return_intermediate_steps=return_intermediate_steps,
max_iterations=max_iterations,
max_execution_time=max_execution_time,
early_stopping_method=early_stopping_method,
)

View File

@@ -14,9 +14,13 @@ from langchain.agents.agent_toolkits.openapi.planner_prompt import (
API_PLANNER_PROMPT,
API_PLANNER_TOOL_DESCRIPTION,
API_PLANNER_TOOL_NAME,
PARSING_DELETE_PROMPT,
PARSING_GET_PROMPT,
PARSING_PATCH_PROMPT,
PARSING_POST_PROMPT,
REQUESTS_DELETE_TOOL_DESCRIPTION,
REQUESTS_GET_TOOL_DESCRIPTION,
REQUESTS_PATCH_TOOL_DESCRIPTION,
REQUESTS_POST_TOOL_DESCRIPTION,
)
from langchain.agents.agent_toolkits.openapi.spec import ReducedOpenAPISpec
@@ -24,6 +28,7 @@ from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents.tools import Tool
from langchain.chains.llm import LLMChain
from langchain.llms.openai import OpenAI
from langchain.memory import ReadOnlySharedMemory
from langchain.prompts import PromptTemplate
from langchain.requests import RequestsWrapper
from langchain.schema import BaseLanguageModel
@@ -53,7 +58,8 @@ class RequestsGetToolWithParsing(BaseRequestsTool, BaseTool):
data = json.loads(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.get(data["url"])
data_params = data.get("params")
response = self.requests_wrapper.get(data["url"], params=data_params)
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
@@ -88,6 +94,56 @@ class RequestsPostToolWithParsing(BaseRequestsTool, BaseTool):
raise NotImplementedError()
class RequestsPatchToolWithParsing(BaseRequestsTool, BaseTool):
name = "requests_patch"
description = REQUESTS_PATCH_TOOL_DESCRIPTION
response_length: Optional[int] = MAX_RESPONSE_LENGTH
llm_chain = LLMChain(
llm=OpenAI(),
prompt=PARSING_PATCH_PROMPT,
)
def _run(self, text: str) -> str:
try:
data = json.loads(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.patch(data["url"], data["data"])
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
class RequestsDeleteToolWithParsing(BaseRequestsTool, BaseTool):
name = "requests_delete"
description = REQUESTS_DELETE_TOOL_DESCRIPTION
response_length: Optional[int] = MAX_RESPONSE_LENGTH
llm_chain = LLMChain(
llm=OpenAI(),
prompt=PARSING_DELETE_PROMPT,
)
def _run(self, text: str) -> str:
try:
data = json.loads(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.delete(data["url"])
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
#
# Orchestrator, planner, controller.
#
@@ -155,7 +211,7 @@ def _create_api_controller_tool(
base_url = api_spec.servers[0]["url"] # TODO: do better.
def _create_and_run_api_controller_agent(plan_str: str) -> str:
pattern = r"\b(GET|POST)\s+(/\S+)*"
pattern = r"\b(GET|POST|PATCH|DELETE)\s+(/\S+)*"
matches = re.findall(pattern, plan_str)
endpoint_names = [
"{method} {route}".format(method=method, route=route.split("?")[0])
@@ -183,6 +239,8 @@ def create_openapi_agent(
api_spec: ReducedOpenAPISpec,
requests_wrapper: RequestsWrapper,
llm: BaseLanguageModel,
shared_memory: Optional[ReadOnlySharedMemory] = None,
verbose: bool = True,
) -> AgentExecutor:
"""Instantiate API planner and controller for a given spec.
@@ -207,7 +265,7 @@ def create_openapi_agent(
},
)
agent = ZeroShotAgent(
llm_chain=LLMChain(llm=llm, prompt=prompt),
llm_chain=LLMChain(llm=llm, prompt=prompt, memory=shared_memory),
allowed_tools=[tool.name for tool in tools],
)
return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=verbose)

View File

@@ -2,13 +2,16 @@
from langchain.prompts.prompt import PromptTemplate
API_PLANNER_PROMPT = """You are a planner that plans a sequence of API calls to assist with user queries against an API.
You should:
1) evaluate whether the user query can be solved by the API documentated below. If no, say why.
2) if yes, generate a plan of API calls and say what they are doing step by step.
3) If the plan includes a DELETE call, you should always return an ask from the User for authorization first unless the User has specifically asked to delete something.
You should only use API endpoints documented below ("Endpoints you can use:").
You can only use the DELETE tool if the User has specifically asked to delete something. Otherwise, you should return a request authorization from the User first.
Some user queries can be resolved in a single API call, but some will require several API calls.
The plan will be passed to an API controller that can format it into web requests and return the responses.
@@ -20,15 +23,31 @@ Fake endpoints for examples:
GET /user to get information about the current user
GET /products/search search across products
POST /users/{{id}}/cart to add products to a user's cart
PATCH /users/{{id}}/cart to update a user's cart
DELETE /users/{{id}}/cart to delete a user's cart
User query: tell me a joke
Plan: Sorry, this API's domain is shopping, not comedy.
Usery query: I want to buy a couch
Plan: 1. GET /products/search to search for couches
Plan: 1. GET /products with a query param to search for couches
2. GET /user to find the user's id
3. POST /users/{{id}}/cart to add a couch to the user's cart
User query: I want to add a lamp to my cart
Plan: 1. GET /products with a query param to search for lamps
2. GET /user to find the user's id
3. PATCH /users/{{id}}/cart to add a lamp to the user's cart
User query: I want to delete my cart
Plan: 1. GET /user to find the user's id
2. DELETE required. Did user specify DELETE or previously authorize? Yes, proceed.
3. DELETE /users/{{id}}/cart to delete the user's cart
User query: I want to start a new cart
Plan: 1. GET /user to find the user's id
2. DELETE required. Did user specify DELETE or previously authorize? No, ask for authorization.
3. Are you sure you want to delete your cart?
----
Here are endpoints you can use. Do not reference any of the endpoints above.
@@ -40,7 +59,7 @@ Here are endpoints you can use. Do not reference any of the endpoints above.
User query: {query}
Plan:"""
API_PLANNER_TOOL_NAME = "api_planner"
API_PLANNER_TOOL_DESCRIPTION = f"Can be used to generate the right API calls to assist with a user query, like {API_PLANNER_TOOL_NAME}(query). Should always be called before trying to calling the API controller."
API_PLANNER_TOOL_DESCRIPTION = f"Can be used to generate the right API calls to assist with a user query, like {API_PLANNER_TOOL_NAME}(query). Should always be called before trying to call the API controller."
# Execution.
API_CONTROLLER_PROMPT = """You are an agent that gets a sequence of API calls and given their documentation, should execute them and return the final response.
@@ -81,8 +100,9 @@ API_CONTROLLER_TOOL_DESCRIPTION = f"Can be used to execute a plan of API calls,
# The goal is to have an agent at the top-level (e.g. so it can recover from errors and re-plan) while
# keeping planning (and specifically the planning prompt) simple.
API_ORCHESTRATOR_PROMPT = """You are an agent that assists with user queries against API, things like querying information or creating resources.
Some user queries can be resolved in a single API call though some require several API call.
Some user queries can be resolved in a single API call, particularly if you can find appropriate params from the OpenAPI spec; though some require several API call.
You should always plan your API calls first, and then execute the plan second.
If the plan includes a DELETE call, be sure to ask the User for authorization first unless the User has specifically asked to delete something.
You should never return information without executing the api_controller tool.
@@ -106,12 +126,12 @@ User query: can you add some trendy stuff to my shopping cart.
Thought: I should plan API calls first.
Action: api_planner
Action Input: I need to find the right API calls to add trendy items to the users shopping cart
Observation: 1) GET /items/trending to get trending item ids
Observation: 1) GET /items with params 'trending' is 'True' to get trending item ids
2) GET /user to get user
3) POST /cart to post the trending items to the user's cart
Thought: I'm ready to execute the API calls.
Action: api_controller
Action Input: 1) GET /items/trending to get trending item ids
Action Input: 1) GET /items params 'trending' is 'True' to get trending item ids
2) GET /user to get user
3) POST /cart to post the trending items to the user's cart
...
@@ -123,8 +143,12 @@ Thought: I should generate a plan to help with this query and then copy that pla
{agent_scratchpad}"""
REQUESTS_GET_TOOL_DESCRIPTION = """Use this to GET content from a website.
Input to the tool should be a json string with 2 keys: "url" and "output_instructions".
The value of "url" should be a string. The value of "output_instructions" should be instructions on what information to extract from the response, for example the id(s) for a resource(s) that the GET request fetches.
Input to the tool should be a json string with 3 keys: "url", "params" and "output_instructions".
The value of "url" should be a string.
The value of "params" should be a dict of the needed and available parameters from the OpenAPI spec related to the endpoint.
If parameters are not needed, or not available, leave it empty.
The value of "output_instructions" should be instructions on what information to extract from the response,
for example the id(s) for a resource(s) that the GET request fetches.
"""
PARSING_GET_PROMPT = PromptTemplate(
@@ -141,7 +165,7 @@ REQUESTS_POST_TOOL_DESCRIPTION = """Use this when you want to POST to a website.
Input to the tool should be a json string with 3 keys: "url", "data", and "output_instructions".
The value of "url" should be a string.
The value of "data" should be a dictionary of key-value pairs you want to POST to the url.
The value of "summary_instructions" should be instructions on what information to extract from the response, for example the id(s) for a resource(s) that the POST request creates.
The value of "output_instructions" should be instructions on what information to extract from the response, for example the id(s) for a resource(s) that the POST request creates.
Always use double quotes for strings in the json string."""
PARSING_POST_PROMPT = PromptTemplate(
@@ -153,3 +177,37 @@ If the response indicates an error, you should instead output a summary of the e
Output:""",
input_variables=["response", "instructions"],
)
REQUESTS_PATCH_TOOL_DESCRIPTION = """Use this when you want to PATCH content on a website.
Input to the tool should be a json string with 3 keys: "url", "data", and "output_instructions".
The value of "url" should be a string.
The value of "data" should be a dictionary of key-value pairs of the body params available in the OpenAPI spec you want to PATCH the content with at the url.
The value of "output_instructions" should be instructions on what information to extract from the response, for example the id(s) for a resource(s) that the PATCH request creates.
Always use double quotes for strings in the json string."""
PARSING_PATCH_PROMPT = PromptTemplate(
template="""Here is an API response:\n\n{response}\n\n====
Your task is to extract some information according to these instructions: {instructions}
When working with API objects, you should usually use ids over names. Do not return any ids or names that are not in the response.
If the response indicates an error, you should instead output a summary of the error.
Output:""",
input_variables=["response", "instructions"],
)
REQUESTS_DELETE_TOOL_DESCRIPTION = """ONLY USE THIS TOOL WHEN THE USER HAS SPECIFICALLY REQUESTED TO DELETE CONTENT FROM A WEBSITE.
Input to the tool should be a json string with 2 keys: "url", and "output_instructions".
The value of "url" should be a string.
The value of "output_instructions" should be instructions on what information to extract from the response, for example the id(s) for a resource(s) that the DELETE request creates.
Always use double quotes for strings in the json string.
ONLY USE THIS TOOL IF THE USER HAS SPECIFICALLY REQUESTED TO DELETE SOMETHING."""
PARSING_DELETE_PROMPT = PromptTemplate(
template="""Here is an API response:\n\n{response}\n\n====
Your task is to extract some information according to these instructions: {instructions}
When working with API objects, you should usually use ids over names. Do not return any ids or names that are not in the response.
If the response indicates an error, you should instead output a summary of the error.
Output:""",
input_variables=["response", "instructions"],
)

View File

@@ -68,12 +68,12 @@ def reduce_openapi_spec(spec: dict, dereference: bool = True) -> ReducedOpenAPIS
I was hoping https://openapi.tools/ would have some useful bits
to this end, but doesn't seem so.
"""
# 1. Consider only get, post endpoints.
# 1. Consider only get, post, patch, delete endpoints.
endpoints = [
(f"{operation_name.upper()} {route}", docs.get("description"), docs)
for route, operation in spec["paths"].items()
for operation_name, docs in operation.items()
if operation_name in ["get", "post"]
if operation_name in ["get", "post", "patch", "delete"]
]
# 2. Replace any refs so that complete docs are retrieved.

View File

@@ -20,6 +20,7 @@ def create_pandas_dataframe_agent(
verbose: bool = False,
return_intermediate_steps: bool = False,
max_iterations: Optional[int] = 15,
max_execution_time: Optional[float] = None,
early_stopping_method: str = "force",
**kwargs: Any,
) -> AgentExecutor:
@@ -48,5 +49,6 @@ def create_pandas_dataframe_agent(
verbose=verbose,
return_intermediate_steps=return_intermediate_steps,
max_iterations=max_iterations,
max_execution_time=max_execution_time,
early_stopping_method=early_stopping_method,
)

View File

@@ -21,6 +21,7 @@ def create_sql_agent(
input_variables: Optional[List[str]] = None,
top_k: int = 10,
max_iterations: Optional[int] = 15,
max_execution_time: Optional[float] = None,
early_stopping_method: str = "force",
verbose: bool = False,
**kwargs: Any,
@@ -44,8 +45,9 @@ def create_sql_agent(
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=toolkit.get_tools(),
tools=tools,
verbose=verbose,
max_iterations=max_iterations,
max_execution_time=max_execution_time,
early_stopping_method=early_stopping_method,
)

View File

@@ -4,6 +4,8 @@ from typing import List
from pydantic import Field
from langchain.agents.agent_toolkits.base import BaseToolkit
from langchain.llms.base import BaseLLM
from langchain.llms.openai import OpenAI
from langchain.sql_database import SQLDatabase
from langchain.tools import BaseTool
from langchain.tools.sql_database.tool import (
@@ -18,6 +20,7 @@ class SQLDatabaseToolkit(BaseToolkit):
"""Toolkit for interacting with SQL databases."""
db: SQLDatabase = Field(exclude=True)
llm: BaseLLM = Field(default_factory=lambda: OpenAI(temperature=0))
@property
def dialect(self) -> str:
@@ -35,5 +38,5 @@ class SQLDatabaseToolkit(BaseToolkit):
QuerySQLDataBaseTool(db=self.db),
InfoSQLDatabaseTool(db=self.db),
ListSQLDatabaseTool(db=self.db),
QueryCheckerTool(db=self.db),
QueryCheckerTool(db=self.db, llm=self.llm),
]

View File

@@ -8,3 +8,4 @@ class AgentType(str, Enum):
CONVERSATIONAL_REACT_DESCRIPTION = "conversational-react-description"
CHAT_ZERO_SHOT_REACT_DESCRIPTION = "chat-zero-shot-react-description"
CHAT_CONVERSATIONAL_REACT_DESCRIPTION = "chat-conversational-react-description"
CHAT_ZERO_SHOT_REACT_DESCRIPTION_V2 = "chat-zero-shot-react-description-002"

View File

View File

@@ -0,0 +1,53 @@
from typing import Any, List, Optional, Sequence
from langchain.agents.agent import AgentOutputParser, LLMSingleActionAgent
from langchain.agents.chat_v2.prompt import (
FORMAT_INSTRUCTIONS,
PREFIX,
SUFFIX,
ChatOutputParser,
create_prompt,
)
from langchain.callbacks.base import BaseCallbackManager
from langchain.chains.llm import LLMChain
from langchain.schema import BaseLanguageModel
from langchain.tools import BaseTool
class ChatAgentV2(LLMSingleActionAgent):
@classmethod
def from_llm_and_tools(
cls,
llm: BaseLanguageModel,
tools: Sequence[BaseTool],
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = PREFIX,
suffix: str = SUFFIX,
format_instructions: str = FORMAT_INSTRUCTIONS,
input_variables: Optional[List[str]] = None,
output_parser: Optional[AgentOutputParser] = None,
stop: Optional[List[str]] = None,
**kwargs: Any,
) -> LLMSingleActionAgent:
"""Construct an agent from an LLM and tools."""
_stop = stop or ["Observation:"]
_output_parser = output_parser or ChatOutputParser()
prompt = create_prompt(
tools,
prefix=prefix,
suffix=suffix,
format_instructions=format_instructions,
input_variables=input_variables,
)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
)
return cls(
llm_chain=llm_chain, output_parser=_output_parser, stop=_stop, **kwargs
)
@property
def _agent_type(self) -> str:
raise ValueError

View File

@@ -0,0 +1,84 @@
# flake8: noqa
import json
from langchain.prompts.chat import (
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.agents.schema import AgentScratchPadChatPromptTemplate
from langchain.prompts.base import BasePromptTemplate
from langchain.schema import AgentAction, AgentFinish
from langchain.tools.base import BaseTool
from typing import Sequence, Optional, List, Union
from langchain.agents.agent import AgentOutputParser
PREFIX = """Answer the following questions as best you can. You have access to the following tools:"""
FORMAT_INSTRUCTIONS = """The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).
The only values that should be in the "action" field are: {tool_names}
The $JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. Here is an example of a valid $JSON_BLOB:
```
{{{{
"action": $TOOL_NAME,
"action_input": $INPUT
}}}}
```
ALWAYS use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action:
```
$JSON_BLOB
```
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question"""
SUFFIX = """Begin! Reminder to always use the exact characters `Final Answer` when responding."""
def create_prompt(
tools: Sequence[BaseTool],
prefix: str = PREFIX,
suffix: str = SUFFIX,
format_instructions: str = FORMAT_INSTRUCTIONS,
input_variables: Optional[List[str]] = None,
) -> BasePromptTemplate:
tool_strings = "\n".join([f"{tool.name}: {tool.description}" for tool in tools])
tool_names = ", ".join([tool.name for tool in tools])
format_instructions = format_instructions.format(tool_names=tool_names)
template = "\n\n".join([prefix, tool_strings, format_instructions, suffix])
messages = [
SystemMessagePromptTemplate.from_template(template),
HumanMessagePromptTemplate.from_template("{input}\n\n{agent_scratchpad}"),
]
if input_variables is None:
input_variables = ["input", "intermediate_steps"]
return AgentScratchPadChatPromptTemplate(
input_variables=input_variables, messages=messages
)
class ChatOutputParser(AgentOutputParser):
def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
if "Final Answer:" in text:
return AgentFinish(
# Return values is generally always a dictionary with a single `output` key
# It is not recommended to try anything else at the moment :)
return_values={"output": text.split("Final Answer:")[-1].strip()},
log=text,
)
try:
_, action, _ = text.split("```")
response = json.loads(action.strip())
agent_action = AgentAction(
tool=response["action"], tool_input=response["action_input"], log=text
)
return agent_action
except Exception:
raise ValueError(f"Could not parse LLM output: {text}")

View File

@@ -5,9 +5,10 @@ from typing import Any, List, Optional, Union
import yaml
from langchain.agents.agent import Agent
from langchain.agents.agent import BaseSingleActionAgent
from langchain.agents.agent_types import AgentType
from langchain.agents.chat.base import ChatAgent
from langchain.agents.chat_v2.base import ChatAgentV2
from langchain.agents.conversational.base import ConversationalAgent
from langchain.agents.conversational_chat.base import ConversationalChatAgent
from langchain.agents.mrkl.base import ZeroShotAgent
@@ -25,6 +26,7 @@ AGENT_TO_CLASS = {
AgentType.CONVERSATIONAL_REACT_DESCRIPTION: ConversationalAgent,
AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION: ChatAgent,
AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION: ConversationalChatAgent,
AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION_V2: ChatAgentV2,
}
URL_BASE = "https://raw.githubusercontent.com/hwchase17/langchain-hub/master/agents/"
@@ -32,7 +34,7 @@ URL_BASE = "https://raw.githubusercontent.com/hwchase17/langchain-hub/master/age
def _load_agent_from_tools(
config: dict, llm: BaseLLM, tools: List[Tool], **kwargs: Any
) -> Agent:
) -> BaseSingleActionAgent:
config_type = config.pop("_type")
if config_type not in AGENT_TO_CLASS:
raise ValueError(f"Loading {config_type} agent not supported")
@@ -49,7 +51,7 @@ def load_agent_from_config(
llm: Optional[BaseLLM] = None,
tools: Optional[List[Tool]] = None,
**kwargs: Any,
) -> Agent:
) -> BaseSingleActionAgent:
"""Load agent from Config Dict."""
if "_type" not in config:
raise ValueError("Must specify an agent Type in config")
@@ -82,7 +84,7 @@ def load_agent_from_config(
return agent_cls(**combined_config) # type: ignore
def load_agent(path: Union[str, Path], **kwargs: Any) -> Agent:
def load_agent(path: Union[str, Path], **kwargs: Any) -> BaseSingleActionAgent:
"""Unified method for loading a agent from LangChainHub or local fs."""
if hub_result := try_load_from_hub(
path, _load_agent_from_file, "agents", {"json", "yaml"}
@@ -92,7 +94,9 @@ def load_agent(path: Union[str, Path], **kwargs: Any) -> Agent:
return _load_agent_from_file(path, **kwargs)
def _load_agent_from_file(file: Union[str, Path], **kwargs: Any) -> Agent:
def _load_agent_from_file(
file: Union[str, Path], **kwargs: Any
) -> BaseSingleActionAgent:
"""Load agent from file."""
# Convert file to Path object.
if isinstance(file, str):

View File

@@ -0,0 +1,28 @@
from typing import Any, Dict, List, Tuple
from langchain.prompts.chat import ChatPromptTemplate
from langchain.schema import AgentAction
class AgentScratchPadChatPromptTemplate(ChatPromptTemplate):
def _construct_agent_scratchpad(
self, intermediate_steps: List[Tuple[AgentAction, str]]
) -> str:
if len(intermediate_steps) == 0:
return ""
thoughts = ""
for action, observation in intermediate_steps:
thoughts += action.log
thoughts += f"\nObservation: {observation}\nThought: "
return (
f"This was your previous work "
f"(but I haven't seen any of it! I only see what "
f"you return as final answer):\n{thoughts}"
)
def _merge_partial_and_user_variables(self, **kwargs: Any) -> Dict[str, Any]:
intermediate_steps = kwargs.pop("intermediate_steps")
kwargs["agent_scratchpad"] = self._construct_agent_scratchpad(
intermediate_steps
)
return kwargs

View File

@@ -1,6 +1,7 @@
"""Beta Feature: base interface for cache."""
import json
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional, Tuple
from typing import Any, Callable, Dict, List, Optional, Tuple
from sqlalchemy import Column, Integer, String, create_engine, select
from sqlalchemy.engine.base import Engine
@@ -137,3 +138,125 @@ class RedisCache(BaseCache):
"""Update cache based on prompt and llm_string."""
for i, generation in enumerate(return_val):
self.redis.set(self._key(prompt, llm_string, i), generation.text)
class GPTCache(BaseCache):
"""Cache that uses GPTCache as a backend."""
def __init__(self, init_func: Callable[[Any], None]):
"""Initialize by passing in the `init` GPTCache func
Args:
init_func (Callable[[Any], None]): init `GPTCache` function
Example:
.. code-block:: python
import gptcache
from gptcache.processor.pre import get_prompt
from gptcache.manager.factory import get_data_manager
# Avoid multiple caches using the same file,
causing different llm model caches to affect each other
i = 0
file_prefix = "data_map"
def init_gptcache_map(cache_obj: gptcache.Cache):
nonlocal i
cache_path = f'{file_prefix}_{i}.txt'
cache_obj.init(
pre_embedding_func=get_prompt,
data_manager=get_data_manager(data_path=cache_path),
)
i += 1
langchain.llm_cache = GPTCache(init_gptcache_map)
"""
try:
import gptcache # noqa: F401
except ImportError:
raise ValueError(
"Could not import gptcache python package. "
"Please install it with `pip install gptcache`."
)
self.init_gptcache_func: Callable[[Any], None] = init_func
self.gptcache_dict: Dict[str, Any] = {}
@staticmethod
def _update_cache_callback_none(*_: Any, **__: Any) -> None:
"""When updating cached data, do nothing.
Because currently only cached queries are processed."""
return None
@staticmethod
def _llm_handle_none(*_: Any, **__: Any) -> None:
"""Do nothing on a cache miss"""
return None
@staticmethod
def _cache_data_converter(data: str) -> RETURN_VAL_TYPE:
"""Convert the `data` in the cache to the `RETURN_VAL_TYPE` data format."""
return [Generation(**generation_dict) for generation_dict in json.loads(data)]
def _get_gptcache(self, llm_string: str) -> Any:
"""Get a cache object.
When the corresponding llm model cache does not exist, it will be created."""
from gptcache import Cache
_gptcache = self.gptcache_dict.get(llm_string, None)
if _gptcache is None:
_gptcache = Cache()
self.init_gptcache_func(_gptcache)
self.gptcache_dict[llm_string] = _gptcache
return _gptcache
def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
"""Look up the cache data.
First, retrieve the corresponding cache object using the `llm_string` parameter,
and then retrieve the data from the cache based on the `prompt`.
"""
from gptcache.adapter.adapter import adapt
_gptcache = self.gptcache_dict.get(llm_string)
if _gptcache is None:
return None
res = adapt(
GPTCache._llm_handle_none,
GPTCache._cache_data_converter,
GPTCache._update_cache_callback_none,
cache_obj=_gptcache,
prompt=prompt,
)
return res
@staticmethod
def _update_cache_callback(
llm_data: RETURN_VAL_TYPE, update_cache_func: Callable[[Any], None]
) -> None:
"""Save the `llm_data` to cache storage"""
handled_data = json.dumps([generation.dict() for generation in llm_data])
update_cache_func(handled_data)
def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
"""Update cache.
First, retrieve the corresponding cache object using the `llm_string` parameter,
and then store the `prompt` and `return_val` in the cache object.
"""
from gptcache.adapter.adapter import adapt
_gptcache = self._get_gptcache(llm_string)
def llm_handle(*_: Any, **__: Any) -> RETURN_VAL_TYPE:
return return_val
return adapt(
llm_handle,
GPTCache._cache_data_converter,
GPTCache._update_cache_callback,
cache_obj=_gptcache,
cache_skip=True,
prompt=prompt,
)

View File

@@ -11,6 +11,7 @@ from langchain.callbacks.base import (
CallbackManager,
)
from langchain.callbacks.clearml_callback import ClearMLCallbackHandler
from langchain.callbacks.comet_ml_callback import CometCallbackHandler
from langchain.callbacks.openai_info import OpenAICallbackHandler
from langchain.callbacks.shared import SharedCallbackManager
from langchain.callbacks.stdout import StdOutCallbackHandler
@@ -78,6 +79,7 @@ __all__ = [
"AimCallbackHandler",
"WandbCallbackHandler",
"ClearMLCallbackHandler",
"CometCallbackHandler",
"AsyncIteratorCallbackHandler",
"get_openai_callback",
"set_tracing_callback_manager",

View File

@@ -0,0 +1,625 @@
import tempfile
from copy import deepcopy
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, Sequence, Union
import langchain
from langchain.callbacks.base import BaseCallbackHandler
from langchain.callbacks.utils import (
BaseMetadataCallbackHandler,
flatten_dict,
import_pandas,
import_spacy,
import_textstat,
)
from langchain.schema import AgentAction, AgentFinish, Generation, LLMResult
LANGCHAIN_MODEL_NAME = "langchain-model"
def import_comet_ml() -> Any:
try:
import comet_ml # noqa: F401
except ImportError:
raise ImportError(
"To use the comet_ml callback manager you need to have the "
"`comet_ml` python package installed. Please install it with"
" `pip install comet_ml`"
)
return comet_ml
def _get_experiment(
workspace: Optional[str] = None, project_name: Optional[str] = None
) -> Any:
comet_ml = import_comet_ml()
experiment = comet_ml.Experiment( # type: ignore
workspace=workspace,
project_name=project_name,
)
return experiment
def _fetch_text_complexity_metrics(text: str) -> dict:
textstat = import_textstat()
text_complexity_metrics = {
"flesch_reading_ease": textstat.flesch_reading_ease(text),
"flesch_kincaid_grade": textstat.flesch_kincaid_grade(text),
"smog_index": textstat.smog_index(text),
"coleman_liau_index": textstat.coleman_liau_index(text),
"automated_readability_index": textstat.automated_readability_index(text),
"dale_chall_readability_score": textstat.dale_chall_readability_score(text),
"difficult_words": textstat.difficult_words(text),
"linsear_write_formula": textstat.linsear_write_formula(text),
"gunning_fog": textstat.gunning_fog(text),
"text_standard": textstat.text_standard(text),
"fernandez_huerta": textstat.fernandez_huerta(text),
"szigriszt_pazos": textstat.szigriszt_pazos(text),
"gutierrez_polini": textstat.gutierrez_polini(text),
"crawford": textstat.crawford(text),
"gulpease_index": textstat.gulpease_index(text),
"osman": textstat.osman(text),
}
return text_complexity_metrics
def _summarize_metrics_for_generated_outputs(metrics: Sequence) -> dict:
pd = import_pandas()
metrics_df = pd.DataFrame(metrics)
metrics_summary = metrics_df.describe()
return metrics_summary.to_dict()
class CometCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
"""Callback Handler that logs to Comet.
Parameters:
job_type (str): The type of comet_ml task such as "inference",
"testing" or "qc"
project_name (str): The comet_ml project name
tags (list): Tags to add to the task
task_name (str): Name of the comet_ml task
visualize (bool): Whether to visualize the run.
complexity_metrics (bool): Whether to log complexity metrics
stream_logs (bool): Whether to stream callback actions to Comet
This handler will utilize the associated callback method and formats
the input of each callback function with metadata regarding the state of LLM run,
and adds the response to the list of records for both the {method}_records and
action. It then logs the response to Comet.
"""
def __init__(
self,
task_type: Optional[str] = "inference",
workspace: Optional[str] = None,
project_name: Optional[str] = "comet-langchain-demo",
tags: Optional[Sequence] = None,
name: Optional[str] = None,
visualizations: Optional[List[str]] = None,
complexity_metrics: bool = False,
custom_metrics: Optional[Callable] = None,
stream_logs: bool = True,
) -> None:
"""Initialize callback handler."""
comet_ml = import_comet_ml()
super().__init__()
self.task_type = task_type
self.workspace = workspace
self.project_name = project_name
self.tags = tags
self.visualizations = visualizations
self.complexity_metrics = complexity_metrics
self.custom_metrics = custom_metrics
self.stream_logs = stream_logs
self.temp_dir = tempfile.TemporaryDirectory()
self.experiment = _get_experiment(workspace, project_name)
self.experiment.log_other("Created from", "langchain")
if tags:
self.experiment.add_tags(tags)
self.name = name
if self.name:
self.experiment.set_name(self.name)
warning = (
"The comet_ml callback is currently in beta and is subject to change "
"based on updates to `langchain`. Please report any issues to "
"https://github.com/comet-ml/issue_tracking/issues with the tag "
"`langchain`."
)
comet_ml.LOGGER.warning(warning)
self.callback_columns: list = []
self.action_records: list = []
self.complexity_metrics = complexity_metrics
if self.visualizations:
spacy = import_spacy()
self.nlp = spacy.load("en_core_web_sm")
else:
self.nlp = None
def _init_resp(self) -> Dict:
return {k: None for k in self.callback_columns}
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Run when LLM starts."""
self.step += 1
self.llm_starts += 1
self.starts += 1
metadata = self._init_resp()
metadata.update({"action": "on_llm_start"})
metadata.update(flatten_dict(serialized))
metadata.update(self.get_custom_callback_meta())
for prompt in prompts:
prompt_resp = deepcopy(metadata)
prompt_resp["prompts"] = prompt
self.on_llm_start_records.append(prompt_resp)
self.action_records.append(prompt_resp)
if self.stream_logs:
self._log_stream(prompt, metadata, self.step)
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Run when LLM generates a new token."""
self.step += 1
self.llm_streams += 1
resp = self._init_resp()
resp.update({"action": "on_llm_new_token", "token": token})
resp.update(self.get_custom_callback_meta())
self.action_records.append(resp)
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Run when LLM ends running."""
self.step += 1
self.llm_ends += 1
self.ends += 1
metadata = self._init_resp()
metadata.update({"action": "on_llm_end"})
metadata.update(flatten_dict(response.llm_output or {}))
metadata.update(self.get_custom_callback_meta())
output_complexity_metrics = []
output_custom_metrics = []
for prompt_idx, generations in enumerate(response.generations):
for gen_idx, generation in enumerate(generations):
text = generation.text
generation_resp = deepcopy(metadata)
generation_resp.update(flatten_dict(generation.dict()))
complexity_metrics = self._get_complexity_metrics(text)
if complexity_metrics:
output_complexity_metrics.append(complexity_metrics)
generation_resp.update(complexity_metrics)
custom_metrics = self._get_custom_metrics(
generation, prompt_idx, gen_idx
)
if custom_metrics:
output_custom_metrics.append(custom_metrics)
generation_resp.update(custom_metrics)
if self.stream_logs:
self._log_stream(text, metadata, self.step)
self.action_records.append(generation_resp)
self.on_llm_end_records.append(generation_resp)
self._log_text_metrics(output_complexity_metrics, step=self.step)
self._log_text_metrics(output_custom_metrics, step=self.step)
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Run when LLM errors."""
self.step += 1
self.errors += 1
def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> None:
"""Run when chain starts running."""
self.step += 1
self.chain_starts += 1
self.starts += 1
resp = self._init_resp()
resp.update({"action": "on_chain_start"})
resp.update(flatten_dict(serialized))
resp.update(self.get_custom_callback_meta())
comet_ml = import_comet_ml()
for chain_input_key, chain_input_val in inputs.items():
if isinstance(chain_input_val, str):
input_resp = deepcopy(resp)
if self.stream_logs:
self._log_stream(chain_input_val, resp, self.step)
input_resp.update({chain_input_key: chain_input_val})
self.action_records.append(input_resp)
else:
comet_ml.LOGGER.warning(
f"Unexpected data format provided! "
f"Input Value for {chain_input_key} will not be logged"
)
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
"""Run when chain ends running."""
self.step += 1
self.chain_ends += 1
self.ends += 1
resp = self._init_resp()
resp.update({"action": "on_chain_end"})
resp.update(self.get_custom_callback_meta())
comet_ml = import_comet_ml()
for chain_output_key, chain_output_val in outputs.items():
if isinstance(chain_output_val, str):
output_resp = deepcopy(resp)
if self.stream_logs:
self._log_stream(chain_output_val, resp, self.step)
output_resp.update({chain_output_key: chain_output_val})
self.action_records.append(output_resp)
else:
comet_ml.LOGGER.warning(
f"Unexpected data format provided! "
f"Output Value for {chain_output_key} will not be logged"
)
def on_chain_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Run when chain errors."""
self.step += 1
self.errors += 1
def on_tool_start(
self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
) -> None:
"""Run when tool starts running."""
self.step += 1
self.tool_starts += 1
self.starts += 1
resp = self._init_resp()
resp.update({"action": "on_tool_start"})
resp.update(flatten_dict(serialized))
resp.update(self.get_custom_callback_meta())
if self.stream_logs:
self._log_stream(input_str, resp, self.step)
resp.update({"input_str": input_str})
self.action_records.append(resp)
def on_tool_end(self, output: str, **kwargs: Any) -> None:
"""Run when tool ends running."""
self.step += 1
self.tool_ends += 1
self.ends += 1
resp = self._init_resp()
resp.update({"action": "on_tool_end"})
resp.update(self.get_custom_callback_meta())
if self.stream_logs:
self._log_stream(output, resp, self.step)
resp.update({"output": output})
self.action_records.append(resp)
def on_tool_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Run when tool errors."""
self.step += 1
self.errors += 1
def on_text(self, text: str, **kwargs: Any) -> None:
"""
Run when agent is ending.
"""
self.step += 1
self.text_ctr += 1
resp = self._init_resp()
resp.update({"action": "on_text"})
resp.update(self.get_custom_callback_meta())
if self.stream_logs:
self._log_stream(text, resp, self.step)
resp.update({"text": text})
self.action_records.append(resp)
def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> None:
"""Run when agent ends running."""
self.step += 1
self.agent_ends += 1
self.ends += 1
resp = self._init_resp()
output = finish.return_values["output"]
log = finish.log
resp.update({"action": "on_agent_finish", "log": log})
resp.update(self.get_custom_callback_meta())
if self.stream_logs:
self._log_stream(output, resp, self.step)
resp.update({"output": output})
self.action_records.append(resp)
def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
"""Run on agent action."""
self.step += 1
self.tool_starts += 1
self.starts += 1
tool = action.tool
tool_input = action.tool_input
log = action.log
resp = self._init_resp()
resp.update({"action": "on_agent_action", "log": log, "tool": tool})
resp.update(self.get_custom_callback_meta())
if self.stream_logs:
self._log_stream(tool_input, resp, self.step)
resp.update({"tool_input": tool_input})
self.action_records.append(resp)
def _get_complexity_metrics(self, text: str) -> dict:
"""Compute text complexity metrics using textstat.
Parameters:
text (str): The text to analyze.
Returns:
(dict): A dictionary containing the complexity metrics.
"""
resp = {}
if self.complexity_metrics:
text_complexity_metrics = _fetch_text_complexity_metrics(text)
resp.update(text_complexity_metrics)
return resp
def _get_custom_metrics(
self, generation: Generation, prompt_idx: int, gen_idx: int
) -> dict:
"""Compute Custom Metrics for an LLM Generated Output
Args:
generation (LLMResult): Output generation from an LLM
prompt_idx (int): List index of the input prompt
gen_idx (int): List index of the generated output
Returns:
dict: A dictionary containing the custom metrics.
"""
resp = {}
if self.custom_metrics:
custom_metrics = self.custom_metrics(generation, prompt_idx, gen_idx)
resp.update(custom_metrics)
return resp
def flush_tracker(
self,
langchain_asset: Any = None,
task_type: Optional[str] = "inference",
workspace: Optional[str] = None,
project_name: Optional[str] = "comet-langchain-demo",
tags: Optional[Sequence] = None,
name: Optional[str] = None,
visualizations: Optional[List[str]] = None,
complexity_metrics: bool = False,
custom_metrics: Optional[Callable] = None,
finish: bool = False,
reset: bool = False,
) -> None:
"""Flush the tracker and setup the session.
Everything after this will be a new table.
Args:
name: Name of the preformed session so far so it is identifyable
langchain_asset: The langchain asset to save.
finish: Whether to finish the run.
Returns:
None
"""
self._log_session(langchain_asset)
if langchain_asset:
self._log_model(langchain_asset)
if finish:
self.experiment.end()
if reset:
self._reset(
task_type,
workspace,
project_name,
tags,
name,
visualizations,
complexity_metrics,
custom_metrics,
)
def _log_stream(self, prompt: str, metadata: dict, step: int) -> None:
self.experiment.log_text(prompt, metadata=metadata, step=step)
def _log_model(self, langchain_asset: Any) -> None:
comet_ml = import_comet_ml()
model_parameters = self._get_llm_parameters(langchain_asset)
self.experiment.log_parameters(model_parameters, prefix="model")
langchain_asset_path = Path(self.temp_dir.name, "model.json")
model_name = self.name if self.name else LANGCHAIN_MODEL_NAME
try:
if hasattr(langchain_asset, "save"):
langchain_asset.save(langchain_asset_path)
self.experiment.log_model(model_name, str(langchain_asset_path))
except (ValueError, AttributeError, NotImplementedError) as e:
if hasattr(langchain_asset, "save_agent"):
langchain_asset.save_agent(langchain_asset_path)
self.experiment.log_model(model_name, str(langchain_asset_path))
else:
comet_ml.LOGGER.warning(
f"{e}"
" Could not save Langchain Asset "
f"for {langchain_asset.__class__.__name__}"
)
def _log_session(self, langchain_asset: Optional[Any] = None) -> None:
llm_session_df = self._create_session_analysis_dataframe(langchain_asset)
# Log the cleaned dataframe as a table
self.experiment.log_table("langchain-llm-session.csv", llm_session_df)
metadata = {"langchain_version": str(langchain.__version__)}
# Log the langchain low-level records as a JSON file directly
self.experiment.log_asset_data(
self.action_records, "langchain-action_records.json", metadata=metadata
)
self._log_visualizations(llm_session_df)
def _log_text_metrics(self, metrics: Sequence[dict], step: int) -> None:
if not metrics:
return
metrics_summary = _summarize_metrics_for_generated_outputs(metrics)
for key, value in metrics_summary.items():
self.experiment.log_metrics(value, prefix=key, step=step)
def _log_visualizations(self, session_df: Any) -> None:
if not (self.visualizations and self.nlp):
return
spacy = import_spacy()
comet_ml = import_comet_ml()
prompts = session_df["prompts"].tolist()
outputs = session_df["text"].tolist()
for idx, (prompt, output) in enumerate(zip(prompts, outputs)):
doc = self.nlp(output)
sentence_spans = list(doc.sents)
for visualization in self.visualizations:
try:
html = spacy.displacy.render(
sentence_spans,
style=visualization,
options={"compact": True},
jupyter=False,
page=True,
)
self.experiment.log_asset_data(
html,
name=f"langchain-viz-{visualization}-{idx}.html",
metadata={"prompt": prompt},
step=idx,
)
except Exception as e:
comet_ml.LOGGER.warning(e)
return
def _reset(
self,
task_type: Optional[str] = None,
workspace: Optional[str] = None,
project_name: Optional[str] = None,
tags: Optional[Sequence] = None,
name: Optional[str] = None,
visualizations: Optional[List[str]] = None,
complexity_metrics: bool = False,
custom_metrics: Optional[Callable] = None,
) -> None:
_task_type = task_type if task_type else self.task_type
_workspace = workspace if workspace else self.workspace
_project_name = project_name if project_name else self.project_name
_tags = tags if tags else self.tags
_name = name if name else self.name
_visualizations = visualizations if visualizations else self.visualizations
_complexity_metrics = (
complexity_metrics if complexity_metrics else self.complexity_metrics
)
_custom_metrics = custom_metrics if custom_metrics else self.custom_metrics
self.__init__( # type: ignore
task_type=_task_type,
workspace=_workspace,
project_name=_project_name,
tags=_tags,
name=_name,
visualizations=_visualizations,
complexity_metrics=_complexity_metrics,
custom_metrics=_custom_metrics,
)
self.reset_callback_meta()
self.temp_dir = tempfile.TemporaryDirectory()
def _create_session_analysis_dataframe(self, langchain_asset: Any = None) -> dict:
pd = import_pandas()
llm_parameters = self._get_llm_parameters(langchain_asset)
num_generations_per_prompt = llm_parameters.get("n", 1)
llm_start_records_df = pd.DataFrame(self.on_llm_start_records)
# Repeat each input row based on the number of outputs generated per prompt
llm_start_records_df = llm_start_records_df.loc[
llm_start_records_df.index.repeat(num_generations_per_prompt)
].reset_index(drop=True)
llm_end_records_df = pd.DataFrame(self.on_llm_end_records)
llm_session_df = pd.merge(
llm_start_records_df,
llm_end_records_df,
left_index=True,
right_index=True,
suffixes=["_llm_start", "_llm_end"],
)
return llm_session_df
def _get_llm_parameters(self, langchain_asset: Any = None) -> dict:
if not langchain_asset:
return {}
try:
if hasattr(langchain_asset, "agent"):
llm_parameters = langchain_asset.agent.llm_chain.llm.dict()
elif hasattr(langchain_asset, "llm_chain"):
llm_parameters = langchain_asset.llm_chain.llm.dict()
elif hasattr(langchain_asset, "llm"):
llm_parameters = langchain_asset.llm.dict()
else:
llm_parameters = langchain_asset.dict()
except Exception:
return {}
return llm_parameters

View File

@@ -53,6 +53,15 @@ class OpenAICallbackHandler(BaseCallbackHandler):
successful_requests: int = 0
total_cost: float = 0.0
def __repr__(self) -> str:
return (
f"Tokens Used: {self.total_tokens}\n"
f"\tPrompt Tokens: {self.prompt_tokens}\n"
f"\tCompletion Tokens: {self.completion_tokens}\n"
f"Successful Requests: {self.successful_requests}\n"
f"Total Cost (USD): ${self.total_cost}"
)
@property
def always_verbose(self) -> bool:
"""Whether to call verbose callbacks even if verbose is False."""

View File

@@ -47,14 +47,19 @@ class AsyncIteratorCallbackHandler(AsyncCallbackHandler):
while not self.queue.empty() or not self.done.is_set():
# Wait for the next token in the queue,
# but stop waiting if the done event is set
done, _ = await asyncio.wait(
done, other = await asyncio.wait(
[
# NOTE: If you add other tasks here, update the code below,
# which assumes each set has exactly one task each
asyncio.ensure_future(self.queue.get()),
asyncio.ensure_future(self.done.wait()),
],
return_when=asyncio.FIRST_COMPLETED,
)
# Cancel the other task
other.pop().cancel()
# Extract the value of the first completed task
token_or_done = cast(Union[str, Literal[True]], done.pop().result())

View File

@@ -81,6 +81,26 @@ class APIChain(Chain):
)
return {self.output_key: answer}
async def _acall(self, inputs: Dict[str, str]) -> Dict[str, str]:
question = inputs[self.question_key]
api_url = await self.api_request_chain.apredict(
question=question, api_docs=self.api_docs
)
self.callback_manager.on_text(
api_url, color="green", end="\n", verbose=self.verbose
)
api_response = await self.requests_wrapper.aget(api_url)
self.callback_manager.on_text(
api_response, color="yellow", end="\n", verbose=self.verbose
)
answer = await self.api_answer_chain.apredict(
question=question,
api_docs=self.api_docs,
api_url=api_url,
api_response=api_response,
)
return {self.output_key: answer}
@classmethod
def from_llm_and_api_docs(
cls,

View File

@@ -28,7 +28,7 @@ class OpenAPIEndpointChain(Chain, BaseModel):
"""Chain interacts with an OpenAPI endpoint using natural language."""
api_request_chain: LLMChain
api_response_chain: LLMChain
api_response_chain: Optional[LLMChain]
api_operation: APIOperation
requests: Requests = Field(exclude=True, default_factory=Requests)
param_mapping: _ParamMapping = Field(alias="param_mapping")
@@ -144,15 +144,18 @@ class OpenAPIEndpointChain(Chain, BaseModel):
self.callback_manager.on_text(
response_text, color="blue", end="\n", verbose=self.verbose
)
_answer = self.api_response_chain.predict_and_parse(
response=response_text,
instructions=instructions,
)
answer = cast(str, _answer)
self.callback_manager.on_text(
answer, color="yellow", end="\n", verbose=self.verbose
)
return self._get_output(answer, intermediate_steps)
if self.api_response_chain is not None:
_answer = self.api_response_chain.predict_and_parse(
response=response_text,
instructions=instructions,
)
answer = cast(str, _answer)
self.callback_manager.on_text(
answer, color="yellow", end="\n", verbose=self.verbose
)
return self._get_output(answer, intermediate_steps)
else:
return self._get_output(response_text, intermediate_steps)
@classmethod
def from_url_and_method(
@@ -184,6 +187,7 @@ class OpenAPIEndpointChain(Chain, BaseModel):
requests: Optional[Requests] = None,
verbose: bool = False,
return_intermediate_steps: bool = False,
raw_response: bool = False,
**kwargs: Any
# TODO: Handle async
) -> "OpenAPIEndpointChain":
@@ -196,7 +200,10 @@ class OpenAPIEndpointChain(Chain, BaseModel):
requests_chain = APIRequesterChain.from_llm_and_typescript(
llm, typescript_definition=operation.to_typescript(), verbose=verbose
)
response_chain = APIResponderChain.from_llm(llm, verbose=verbose)
if raw_response:
response_chain = None
else:
response_chain = APIResponderChain.from_llm(llm, verbose=verbose)
_requests = requests or Requests()
return cls(
api_request_chain=requests_chain,

View File

@@ -14,7 +14,7 @@ from langchain.docstore.document import Document
class CombineDocsProtocol(Protocol):
"""Interface for the combine_docs method."""
def __call__(self, docs: List[Document], **kwargs: Any) -> Tuple[str, dict]:
def __call__(self, docs: List[Document], **kwargs: Any) -> str:
"""Interface for the combine_docs method."""
@@ -48,7 +48,7 @@ def _collapse_docs(
combine_document_func: CombineDocsProtocol,
**kwargs: Any,
) -> Document:
result, _ = combine_document_func(docs, **kwargs)
result = combine_document_func(docs, **kwargs)
combined_metadata = {k: str(v) for k, v in docs[0].metadata.items()}
for doc in docs[1:]:
for k, v in doc.metadata.items():
@@ -171,15 +171,17 @@ class MapReduceDocumentsChain(BaseCombineDocumentsChain):
]
length_func = self.combine_document_chain.prompt_length
num_tokens = length_func(result_docs, **kwargs)
def _collapse_docs_func(docs: List[Document], **kwargs: Any) -> str:
return self._collapse_chain.run(input_documents=docs, **kwargs)
while num_tokens is not None and num_tokens > token_max:
new_result_doc_list = _split_list_of_docs(
result_docs, length_func, token_max, **kwargs
)
result_docs = []
for docs in new_result_doc_list:
new_doc = _collapse_docs(
docs, self._collapse_chain.combine_docs, **kwargs
)
new_doc = _collapse_docs(docs, _collapse_docs_func, **kwargs)
result_docs.append(new_doc)
num_tokens = self.combine_document_chain.prompt_length(
result_docs, **kwargs
@@ -189,7 +191,7 @@ class MapReduceDocumentsChain(BaseCombineDocumentsChain):
extra_return_dict = {"intermediate_steps": _results}
else:
extra_return_dict = {}
output, _ = self.combine_document_chain.combine_docs(result_docs, **kwargs)
output = self.combine_document_chain.run(input_documents=result_docs, **kwargs)
return output, extra_return_dict
@property

View File

@@ -80,7 +80,7 @@ class BaseConversationalRetrievalChain(Chain):
new_inputs = inputs.copy()
new_inputs["question"] = new_question
new_inputs["chat_history"] = chat_history_str
answer, _ = self.combine_docs_chain.combine_docs(docs, **new_inputs)
answer = self.combine_docs_chain.run(input_documents=docs, **new_inputs)
if self.return_source_documents:
return {self.output_key: answer, "source_documents": docs}
else:
@@ -104,7 +104,7 @@ class BaseConversationalRetrievalChain(Chain):
new_inputs = inputs.copy()
new_inputs["question"] = new_question
new_inputs["chat_history"] = chat_history_str
answer, _ = await self.combine_docs_chain.acombine_docs(docs, **new_inputs)
answer = await self.combine_docs_chain.arun(input_documents=docs, **new_inputs)
if self.return_source_documents:
return {self.output_key: answer, "source_documents": docs}
else:

View File

@@ -7,8 +7,8 @@ from langchain.chains.base import Chain
from langchain.chains.llm import LLMChain
from langchain.chains.llm_math.prompt import PROMPT
from langchain.prompts.base import BasePromptTemplate
from langchain.python import PythonREPL
from langchain.schema import BaseLanguageModel
from langchain.utilities import PythonREPL
class LLMMathChain(Chain):
@@ -68,6 +68,35 @@ class LLMMathChain(Chain):
raise ValueError(f"unknown format from LLM: {t}")
return {self.output_key: answer}
async def _aprocess_llm_result(self, t: str) -> Dict[str, str]:
python_executor = PythonREPL()
if self.callback_manager.is_async:
await self.callback_manager.on_text(t, color="green", verbose=self.verbose)
else:
self.callback_manager.on_text(t, color="green", verbose=self.verbose)
t = t.strip()
if t.startswith("```python"):
code = t[9:-4]
output = python_executor.run(code)
if self.callback_manager.is_async:
await self.callback_manager.on_text("\nAnswer: ", verbose=self.verbose)
await self.callback_manager.on_text(
output, color="yellow", verbose=self.verbose
)
else:
await self.callback_manager.on_text("\nAnswer: ", verbose=self.verbose)
await self.callback_manager.on_text(
output, color="yellow", verbose=self.verbose
)
answer = "Answer: " + output
elif t.startswith("Answer:"):
answer = t
elif "Answer:" in t:
answer = "Answer: " + t.split("Answer:")[-1]
else:
raise ValueError(f"unknown format from LLM: {t}")
return {self.output_key: answer}
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
llm_executor = LLMChain(
prompt=self.prompt, llm=self.llm, callback_manager=self.callback_manager
@@ -80,11 +109,16 @@ class LLMMathChain(Chain):
llm_executor = LLMChain(
prompt=self.prompt, llm=self.llm, callback_manager=self.callback_manager
)
self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
if self.callback_manager.is_async:
await self.callback_manager.on_text(
inputs[self.input_key], verbose=self.verbose
)
else:
self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
t = await llm_executor.apredict(
question=inputs[self.input_key], stop=["```output"]
)
return self._process_llm_result(t)
return await self._aprocess_llm_result(t)
@property
def _chain_type(self) -> str:

View File

@@ -57,7 +57,7 @@ class LLMRequestsChain(Chain):
except ImportError:
raise ValueError(
"Could not import bs4 python package. "
"Please it install it with `pip install bs4`."
"Please install it with `pip install bs4`."
)
return values

View File

@@ -70,5 +70,5 @@ class MapReduceChain(Chain):
# Split the larger text into smaller chunks.
texts = self.text_splitter.split_text(inputs[self.input_key])
docs = [Document(page_content=text) for text in texts]
outputs, _ = self.combine_documents_chain.combine_docs(docs)
outputs = self.combine_documents_chain.run(input_documents=docs)
return {self.output_key: outputs}

View File

@@ -55,7 +55,7 @@ class OpenAIModerationChain(Chain):
except ImportError:
raise ValueError(
"Could not import openai python package. "
"Please it install it with `pip install openai`."
"Please install it with `pip install openai`."
)
return values

View File

@@ -53,7 +53,7 @@ class Crawler:
except ImportError:
raise ValueError(
"Could not import playwright python package. "
"Please it install it with `pip install playwright`."
"Please install it with `pip install playwright`."
)
self.browser: Browser = (
sync_playwright().start().chromium.launch(headless=False)

View File

@@ -13,8 +13,8 @@ from langchain.chains.llm import LLMChain
from langchain.chains.pal.colored_object_prompt import COLORED_OBJECT_PROMPT
from langchain.chains.pal.math_prompt import MATH_PROMPT
from langchain.prompts.base import BasePromptTemplate
from langchain.python import PythonREPL
from langchain.schema import BaseLanguageModel
from langchain.utilities import PythonREPL
class PALChain(Chain):

View File

@@ -116,7 +116,7 @@ class BaseQAWithSourcesChain(Chain, ABC):
def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
docs = self._get_docs(inputs)
answer, _ = self.combine_documents_chain.combine_docs(docs, **inputs)
answer = self.combine_documents_chain.run(input_documents=docs, **inputs)
if re.search(r"SOURCES:\s", answer):
answer, sources = re.split(r"SOURCES:\s", answer)
else:
@@ -135,7 +135,7 @@ class BaseQAWithSourcesChain(Chain, ABC):
async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
docs = await self._aget_docs(inputs)
answer, _ = await self.combine_documents_chain.acombine_docs(docs, **inputs)
answer = await self.combine_documents_chain.arun(input_documents=docs, **inputs)
if re.search(r"SOURCES:\s", answer):
answer, sources = re.split(r"SOURCES:\s", answer)
else:

View File

@@ -196,7 +196,7 @@ def load_qa_chain(
Args:
llm: Language Model to use in the chain.
chain_type: Type of document combining chain to use. Should be one of "stuff",
"map_reduce", and "refine".
"map_reduce", "map_rerank", and "refine".
verbose: Whether chains should be run in verbose mode or not. Note that this
applies to all chains that make up the final chain.
callback_manager: Callback manager to use for the chain.

View File

@@ -107,7 +107,9 @@ class BaseRetrievalQA(Chain):
question = inputs[self.input_key]
docs = self._get_docs(question)
answer, _ = self.combine_documents_chain.combine_docs(docs, question=question)
answer = self.combine_documents_chain.run(
input_documents=docs, question=question
)
if self.return_source_documents:
return {self.output_key: answer, "source_documents": docs}
@@ -133,8 +135,8 @@ class BaseRetrievalQA(Chain):
question = inputs[self.input_key]
docs = await self._aget_docs(question)
answer, _ = await self.combine_documents_chain.acombine_docs(
docs, question=question
answer = await self.combine_documents_chain.arun(
input_documents=docs, question=question
)
if self.return_source_documents:

View File

@@ -1,13 +1,13 @@
"""Chain for interacting with SQL Database."""
from __future__ import annotations
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional
from pydantic import Extra, Field
from langchain.chains.base import Chain
from langchain.chains.llm import LLMChain
from langchain.chains.sql_database.prompt import DECIDER_PROMPT, PROMPT
from langchain.chains.sql_database.prompt import DECIDER_PROMPT, PROMPT, SQL_PROMPTS
from langchain.prompts.base import BasePromptTemplate
from langchain.schema import BaseLanguageModel
from langchain.sql_database import SQLDatabase
@@ -28,7 +28,7 @@ class SQLDatabaseChain(Chain):
"""LLM wrapper to use."""
database: SQLDatabase = Field(exclude=True)
"""SQL Database to connect to."""
prompt: BasePromptTemplate = PROMPT
prompt: Optional[BasePromptTemplate] = None
"""Prompt to use to translate natural language to SQL."""
top_k: int = 5
"""Number of results to return from the query"""
@@ -65,8 +65,9 @@ class SQLDatabaseChain(Chain):
return [self.output_key, "intermediate_steps"]
def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
llm_chain = LLMChain(llm=self.llm, prompt=self.prompt)
input_text = f"{inputs[self.input_key]} \nSQLQuery:"
prompt = self.prompt or SQL_PROMPTS.get(self.database.dialect, PROMPT)
llm_chain = LLMChain(llm=self.llm, prompt=prompt)
input_text = f"{inputs[self.input_key]}\nSQLQuery:"
self.callback_manager.on_text(input_text, verbose=self.verbose)
# If not present, then defaults to None which is all tables.
table_names_to_use = inputs.get("table_names_to_use")

View File

@@ -2,6 +2,7 @@
from langchain.output_parsers.list import CommaSeparatedListOutputParser
from langchain.prompts.prompt import PromptTemplate
_DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Unless the user specifies in his question a specific number of examples he wishes to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.
@@ -38,3 +39,149 @@ DECIDER_PROMPT = PromptTemplate(
template=_DECIDER_TEMPLATE,
output_parser=CommaSeparatedListOutputParser(),
)
_mssql_prompt = """You are an MS SQL expert. Given an input question, first create a syntactically correct MS SQL query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the TOP clause as per MS SQL. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in square brackets ([]) to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
Only use the following tables:
{table_info}
Question: {input}"""
MSSQL_PROMPT = PromptTemplate(
input_variables=["input", "table_info", "top_k"], template=_mssql_prompt
)
_mysql_prompt = """You are a MySQL expert. Given an input question, first create a syntactically correct MySQL query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per MySQL. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in backticks (`) to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
Only use the following tables:
{table_info}
Question: {input}"""
MYSQL_PROMPT = PromptTemplate(
input_variables=["input", "table_info", "top_k"],
template=_mysql_prompt,
)
_mariadb_prompt = """You are a MariaDB expert. Given an input question, first create a syntactically correct MariaDB query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per MariaDB. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in backticks (`) to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
Only use the following tables:
{table_info}
Question: {input}"""
MARIADB_PROMPT = PromptTemplate(
input_variables=["input", "table_info", "top_k"],
template=_mariadb_prompt,
)
_oracle_prompt = """You are an Oracle SQL expert. Given an input question, first create a syntactically correct Oracle SQL query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the FETCH FIRST n ROWS ONLY clause as per Oracle SQL. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
Only use the following tables:
{table_info}
Question: {input}"""
ORACLE_PROMPT = PromptTemplate(
input_variables=["input", "table_info", "top_k"],
template=_oracle_prompt,
)
_postgres_prompt = """You are a PostgreSQL expert. Given an input question, first create a syntactically correct PostgreSQL query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per PostgreSQL. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
Only use the following tables:
{table_info}
Question: {input}"""
POSTGRES_PROMPT = PromptTemplate(
input_variables=["input", "table_info", "top_k"], template=_postgres_prompt
)
_sqlite_prompt = """You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
Only use the following tables:
{table_info}
Question: {input}"""
SQLITE_PROMPT = PromptTemplate(
input_variables=["input", "table_info", "top_k"],
template=_sqlite_prompt,
)
SQL_PROMPTS = {
"mssql": MSSQL_PROMPT,
"mysql": MYSQL_PROMPT,
"mariadb": MARIADB_PROMPT,
"oracle": ORACLE_PROMPT,
"postgresql": POSTGRES_PROMPT,
"sqlite": SQLITE_PROMPT,
}

View File

@@ -1,5 +1,6 @@
from langchain.chat_models.anthropic import ChatAnthropic
from langchain.chat_models.azure_openai import AzureChatOpenAI
from langchain.chat_models.openai import ChatOpenAI
from langchain.chat_models.promptlayer_openai import PromptLayerChatOpenAI
__all__ = ["ChatOpenAI", "AzureChatOpenAI", "PromptLayerChatOpenAI"]
__all__ = ["ChatOpenAI", "AzureChatOpenAI", "PromptLayerChatOpenAI", "ChatAnthropic"]

View File

@@ -0,0 +1,139 @@
from typing import Any, Dict, List, Optional
from pydantic import Extra
from langchain.chat_models.base import BaseChatModel
from langchain.llms.anthropic import _AnthropicCommon
from langchain.schema import (
AIMessage,
BaseMessage,
ChatGeneration,
ChatMessage,
ChatResult,
HumanMessage,
SystemMessage,
)
class ChatAnthropic(BaseChatModel, _AnthropicCommon):
r"""Wrapper around Anthropic's large language model.
To use, you should have the ``anthropic`` python package installed, and the
environment variable ``ANTHROPIC_API_KEY`` set with your API key, or pass
it as a named parameter to the constructor.
Example:
.. code-block:: python
import anthropic
from langchain.llms import Anthropic
model = ChatAnthropic(model="<model_name>", anthropic_api_key="my-api-key")
"""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@property
def _llm_type(self) -> str:
"""Return type of chat model."""
return "anthropic-chat"
def _convert_one_message_to_text(self, message: BaseMessage) -> str:
if isinstance(message, ChatMessage):
message_text = f"\n\n{message.role.capitalize()}: {message.content}"
elif isinstance(message, HumanMessage):
message_text = f"{self.HUMAN_PROMPT} {message.content}"
elif isinstance(message, AIMessage):
message_text = f"{self.AI_PROMPT} {message.content}"
elif isinstance(message, SystemMessage):
message_text = f"{self.HUMAN_PROMPT} <admin>{message.content}</admin>"
else:
raise ValueError(f"Got unknown type {message}")
return message_text
def _convert_messages_to_text(self, messages: List[BaseMessage]) -> str:
"""Format a list of strings into a single string with necessary newlines.
Args:
messages (List[BaseMessage]): List of BaseMessage to combine.
Returns:
str: Combined string with necessary newlines.
"""
return "".join(
self._convert_one_message_to_text(message) for message in messages
)
def _convert_messages_to_prompt(self, messages: List[BaseMessage]) -> str:
"""Format a list of messages into a full prompt for the Anthropic model
Args:
messages (List[BaseMessage]): List of BaseMessage to combine.
Returns:
str: Combined string with necessary HUMAN_PROMPT and AI_PROMPT tags.
"""
if not self.AI_PROMPT:
raise NameError("Please ensure the anthropic package is loaded")
if not isinstance(messages[-1], AIMessage):
messages.append(AIMessage(content=""))
text = self._convert_messages_to_text(messages)
return (
text.rstrip()
) # trim off the trailing ' ' that might come from the "Assistant: "
def _generate(
self, messages: List[BaseMessage], stop: Optional[List[str]] = None
) -> ChatResult:
prompt = self._convert_messages_to_prompt(messages)
params: Dict[str, Any] = {"prompt": prompt, **self._default_params}
if stop:
params["stop_sequences"] = stop
if self.streaming:
completion = ""
stream_resp = self.client.completion_stream(**params)
for data in stream_resp:
delta = data["completion"][len(completion) :]
completion = data["completion"]
self.callback_manager.on_llm_new_token(
delta,
verbose=self.verbose,
)
else:
response = self.client.completion(**params)
completion = response["completion"]
message = AIMessage(content=completion)
return ChatResult(generations=[ChatGeneration(message=message)])
async def _agenerate(
self, messages: List[BaseMessage], stop: Optional[List[str]] = None
) -> ChatResult:
prompt = self._convert_messages_to_prompt(messages)
params: Dict[str, Any] = {"prompt": prompt, **self._default_params}
if stop:
params["stop_sequences"] = stop
if self.streaming:
completion = ""
stream_resp = await self.client.acompletion_stream(**params)
async for data in stream_resp:
delta = data["completion"][len(completion) :]
completion = data["completion"]
if self.callback_manager.is_async:
await self.callback_manager.on_llm_new_token(
delta,
verbose=self.verbose,
)
else:
self.callback_manager.on_llm_new_token(
delta,
verbose=self.verbose,
)
else:
response = await self.client.acompletion(**params)
completion = response["completion"]
message = AIMessage(content=completion)
return ChatResult(generations=[ChatGeneration(message=message)])

View File

@@ -87,7 +87,7 @@ class AzureChatOpenAI(ChatOpenAI):
except ImportError:
raise ValueError(
"Could not import openai python package. "
"Please it install it with `pip install openai`."
"Please install it with `pip install openai`."
)
try:
values["client"] = openai.ChatCompletion

View File

@@ -32,8 +32,8 @@ logger = logging.getLogger(__file__)
def _create_retry_decorator(llm: ChatOpenAI) -> Callable[[Any], Any]:
import openai
min_seconds = 4
max_seconds = 10
min_seconds = 1
max_seconds = 60
# Wait 2^x * 1 second between each retry starting with
# 4 seconds, then up to 10 seconds, then 10 seconds afterwards
return retry(
@@ -167,7 +167,7 @@ class ChatOpenAI(BaseChatModel):
except ImportError:
raise ValueError(
"Could not import openai python package. "
"Please it install it with `pip install openai`."
"Please install it with `pip install openai`."
)
try:
values["client"] = openai.ChatCompletion
@@ -199,8 +199,8 @@ class ChatOpenAI(BaseChatModel):
def _create_retry_decorator(self) -> Callable[[Any], Any]:
import openai
min_seconds = 4
max_seconds = 10
min_seconds = 1
max_seconds = 60
# Wait 2^x * 1 second between each retry starting with
# 4 seconds, then up to 10 seconds, then 10 seconds afterwards
return retry(
@@ -327,8 +327,8 @@ class ChatOpenAI(BaseChatModel):
def get_num_tokens(self, text: str) -> int:
"""Calculate num tokens with tiktoken package."""
# tiktoken NOT supported for Python 3.8 or below
if sys.version_info[1] <= 8:
# tiktoken NOT supported for Python 3.7 or below
if sys.version_info[1] <= 7:
return super().get_num_tokens(text)
try:
import tiktoken
@@ -336,7 +336,7 @@ class ChatOpenAI(BaseChatModel):
raise ValueError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_num_tokens. "
"Please it install it with `pip install tiktoken`."
"Please install it with `pip install tiktoken`."
)
# create a GPT-3.5-Turbo encoder instance
enc = tiktoken.encoding_for_model(self.model_name)
@@ -358,7 +358,7 @@ class ChatOpenAI(BaseChatModel):
raise ValueError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_num_tokens. "
"Please it install it with `pip install tiktoken`."
"Please install it with `pip install tiktoken`."
)
model = self.model_name

View File

@@ -10,6 +10,7 @@ from langchain.document_loaders.azure_blob_storage_file import (
AzureBlobStorageFileLoader,
)
from langchain.document_loaders.bigquery import BigQueryLoader
from langchain.document_loaders.bilibili import BiliBiliLoader
from langchain.document_loaders.blackboard import BlackboardLoader
from langchain.document_loaders.college_confidential import CollegeConfidentialLoader
from langchain.document_loaders.conllu import CoNLLULoader
@@ -26,6 +27,7 @@ from langchain.document_loaders.evernote import EverNoteLoader
from langchain.document_loaders.facebook_chat import FacebookChatLoader
from langchain.document_loaders.gcs_directory import GCSDirectoryLoader
from langchain.document_loaders.gcs_file import GCSFileLoader
from langchain.document_loaders.git import GitLoader
from langchain.document_loaders.gitbook import GitbookLoader
from langchain.document_loaders.googledrive import GoogleDriveLoader
from langchain.document_loaders.gutenberg import GutenbergLoader
@@ -43,6 +45,7 @@ from langchain.document_loaders.obsidian import ObsidianLoader
from langchain.document_loaders.pdf import (
OnlinePDFLoader,
PDFMinerLoader,
PDFMinerPDFasHTMLLoader,
PyMuPDFLoader,
PyPDFLoader,
UnstructuredPDFLoader,
@@ -53,6 +56,7 @@ from langchain.document_loaders.roam import RoamLoader
from langchain.document_loaders.s3_directory import S3DirectoryLoader
from langchain.document_loaders.s3_file import S3FileLoader
from langchain.document_loaders.sitemap import SitemapLoader
from langchain.document_loaders.slack_directory import SlackDirectoryLoader
from langchain.document_loaders.srt import SRTLoader
from langchain.document_loaders.telegram import TelegramChatLoader
from langchain.document_loaders.text import TextLoader
@@ -61,6 +65,7 @@ from langchain.document_loaders.unstructured import (
UnstructuredFileLoader,
)
from langchain.document_loaders.url import UnstructuredURLLoader
from langchain.document_loaders.url_playwright import PlaywrightURLLoader
from langchain.document_loaders.url_selenium import SeleniumURLLoader
from langchain.document_loaders.web_base import WebBaseLoader
from langchain.document_loaders.whatsapp_chat import WhatsAppChatLoader
@@ -79,6 +84,7 @@ __all__ = [
"UnstructuredFileIOLoader",
"UnstructuredURLLoader",
"SeleniumURLLoader",
"PlaywrightURLLoader",
"DirectoryLoader",
"NotionDirectoryLoader",
"NotionDBLoader",
@@ -116,6 +122,7 @@ __all__ = [
"AirbyteJSONLoader",
"OnlinePDFLoader",
"PDFMinerLoader",
"PDFMinerPDFasHTMLLoader",
"PyMuPDFLoader",
"TelegramChatLoader",
"SRTLoader",
@@ -134,4 +141,7 @@ __all__ = [
"SitemapLoader",
"DuckDBLoader",
"BigQueryLoader",
"BiliBiliLoader",
"SlackDirectoryLoader",
"GitLoader",
]

Some files were not shown because too many files have changed in this diff Show More