Compare commits

..

189 Commits

Author SHA1 Message Date
Harrison Chase
6e0d3880df bump version to 122 (#1970) 2023-03-24 08:24:44 -07:00
Harrison Chase
6ec5780547 add docs for openai retriever ingest (#1969) 2023-03-24 08:24:33 -07:00
Harrison Chase
47d37db2d2 WIP: Harrison/base retriever (#1765) 2023-03-24 07:46:49 -07:00
Enwei Jiao
4f364db9a9 Add milvus for ecosystem (#1951) 2023-03-23 22:01:28 -07:00
Tim Asp
030ce9f506 fix import error of bs4 (#1952)
Ran into a broken build if bs4 wasn't installed in the project.

Minor tweak to follow the other doc loaders optional package-loading
conventions.

Also updated html docs to include reference to this new html loader.

side note: Should there be 2 different html-to-text document loaders?
This new one only handles local files, while the existing unstructured
html loader handles HTML from local and remote. So it seems like the
improvement was adding the title to the metadata, which is useful but
could also be added to `html.py`
2023-03-23 21:56:13 -07:00
Harrison Chase
8990122d5d retrievers interface (#1948) 2023-03-23 19:00:38 -07:00
Harrison Chase
52d6bf04d0 tracing improvements to docs (#1947) 2023-03-23 19:00:18 -07:00
Harrison Chase
910da8518f hotfix (#1928) 2023-03-23 07:11:15 -07:00
Naoki Ainoya
2f27ef92fe Fix typo in VectorStoreIndexWrapper method (#1922)
Fixed a typo in the argument of the query method within the
VectorStoreIndexWrapper class. Specifically, the argument `retriver` has
been changed to `retriever`. With this correction, the correct argument
name is used, and potential bugs are avoided.
2023-03-23 07:08:04 -07:00
Harrison Chase
75149d6d38 bump version 120 (#1918) 2023-03-22 23:21:56 -07:00
Harrison Chase
fab7994b74 Harrison/retrieval code (#1916) 2023-03-22 23:15:04 -07:00
Harrison Chase
eb80d6e0e4 Harrison/from methods (#1912)
Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>
2023-03-22 21:10:09 -07:00
Harrison Chase
b5667bed9e human input default (#1911) 2023-03-22 20:30:45 -07:00
Eric Zhu
b3be83c750 Add human as a tool (#1879)
Human can help AI.  #1871
2023-03-22 20:14:52 -07:00
Harrison Chase
50626a10ee Hx23840 feat/add redisearch vectorstore (#1909)
Co-authored-by: Peter <peter.shi@alephf.com>
Co-authored-by: Peter Shi <42536066+hx23840@users.noreply.github.com>
2023-03-22 19:57:56 -07:00
Harrison Chase
6e1b5b8f7e Harrison/figma doc loader (#1908)
Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>
2023-03-22 19:57:46 -07:00
Harrison Chase
eec9b1b306 Harrison/opensearch vectorstore (#1907)
Co-authored-by: Mehmet Öner Yalçın <oneryalcin@gmail.com>
2023-03-22 19:57:38 -07:00
Xin Qiu
ea142f6a32 feat: add drop index in redis and fix prefix generate logic (#1857)
# Description

Add `drop_index` for redis

RediSearch: [RediSearch quick
start](https://redis.io/docs/stack/search/quick_start/)

# How to use

```
from langchain.vectorstores.redis import Redis

Redis.drop_index(index_name="doc",delete_documents=False)
```
2023-03-22 19:44:42 -07:00
Eli
12f868b292 Propagate "filter" arg in Chroma similarity_search (#1869)
Technically a duplicate fix to #1619 but with unit tests and a small
documentation update
- Propagate `filter` arg in Chroma `similarity_search` to delegated call
to `similarity_search_with_score`
- Add `filter` arg to `similarity_search_by_vector`
- Clarify doc strings on FakeEmbeddings
2023-03-22 19:40:10 -07:00
Memento Mori
31f9ecfc19 Fix tiktoken version (#1882)
Fix https://github.com/hwchase17/langchain/issues/1881
This issue occurs when using `'gpt-3.5-turbo'` with
`VectorDBQAWithSourcesChain`
2023-03-22 19:39:57 -07:00
Eric Zhu
273e9bf296 Simplify AzureChatOpenAI implementation. (#1902)
Change AzureChatOpenAI class implementation as Azure just added support
for chat completion API. See:
https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions.
This should make the code much simpler.
2023-03-22 19:36:51 -07:00
Maurício Maia
f155d9d3ec Add metadata filter to PGVector search (#1872)
Add ability to filter pgvector documents by metadata.
2023-03-22 15:21:40 -07:00
Klein Tahiraj
d3d4503ce2 Remove redundant .docx loader (closes #1716) + update how_to_guides.rst (#1891)
In https://github.com/hwchase17/langchain/issues/1716 , it was
identified that there were two .py files performing similar tasks. As a
resolution, one of the files has been removed, as its purpose had
already been fulfilled by the other file. Additionally, the init has
been updated accordingly.

Furthermore, the how_to_guides.rst file has been updated to include
links to documentation that was previously missing. This was deemed
necessary as the existing list on
https://langchain.readthedocs.io/en/latest/modules/document_loaders/how_to_guides.html
was incomplete, causing confusion for users who rely on the full list of
documentation on the left sidebar of the website.
2023-03-22 15:19:42 -07:00
Harrison Chase
1f93c5cf69 extraction docs (#1898) 2023-03-22 15:00:44 -07:00
Sean Zheng
15b5a08f4b Update how_to_guides.rst (#1893)
Adding OpenSearch examples
2023-03-22 14:30:43 -07:00
Kushal Chordiya
ff4a25b841 Fix minor bug in opensearch vector store add_texts function (#1878)
In the langchain.vectorstores.opensearch_vector_search.py, in the
add_texts function, around line 247, we have the following code

```python
embeddings = [
     self.embedding_function.embed_documents(list(text))[0] for text in texts
]
```

the goal of the `list(text)` part I believe is to pass a list to the
embed_documents list instead of a a str. However, `list(text)` is a
subtle bug

`list(text)` would convert the string text into an array, where each
element of the array is a character of the string

<img width="937" alt="Screenshot 2023-03-22 at 1 27 18 PM"
src="https://user-images.githubusercontent.com/88190553/226836470-384665a1-2f13-46bc-acfc-9a37417cd918.png">

The correct way should be to change the code to 

```python
embeddings = [
      self.embedding_function.embed_documents([text])[0] for text in texts
]
```
Which wraps the string inside a list.
2023-03-22 11:27:32 -07:00
Maurício Maia
2212520a6c Add PGVector collection metadata (#1887)
The `CollectionStore` for `PGVector` has a `cmetadata` field but it's
never used. This PR add the ability to save metadata information to the
collection.
2023-03-22 11:27:07 -07:00
Harrison Chase
d08f940336 principles list (#1888) 2023-03-22 10:48:38 -07:00
Harrison Chase
2280a2cb2f bump version to 119 (#1886) 2023-03-22 08:36:09 -07:00
Harrison Chase
ce5d97bcb3 Harrison/guarded output parser (#1804)
Co-authored-by: jerwelborn <jeremy.welborn@gmail.com>
2023-03-21 22:07:23 -07:00
DeadBranch
8fa1764c60 docs: update gpt index references to LlamaIndex (#1856)
The GPT Index project is transitioning to the new project name,
LlamaIndex.

I've updated a few files referencing the old project name and repository
URL to the current ones.

From the [LlamaIndex repo](https://github.com/jerryjliu/llama_index):
> NOTE: We are rebranding GPT Index as LlamaIndex! We will carry out
this transition gradually.
>
> 2/25/2023: By default, our docs/notebooks/instructions now reference
"LlamaIndex" instead of "GPT Index".
>
> 2/19/2023: By default, our docs/notebooks/instructions now use the
llama-index package. However the gpt-index package still exists as a
duplicate!
>
> 2/16/2023: We have a duplicate llama-index pip package. Simply replace
all imports of gpt_index with llama_index if you choose to pip install
llama-index.

I'm not associated with LlamaIndex in any way. I just noticed the
discrepancy when studying the lanchain documentation.
2023-03-21 22:01:05 -07:00
Harrison Chase
f299bd1416 clean up sagemaker nb (#1875) 2023-03-21 22:00:08 -07:00
Philipp Schmid
064be93edf [Embeddings] Add SageMaker Endpoint Embedding class (#1859)
# What does this PR do? 

This PR adds similar to `llms` a SageMaker-powered `embeddings` class.
This is helpful if you want to leverage Hugging Face models on SageMaker
for creating your indexes.

I added a example into the
[docs/modules/indexes/examples/embeddings.ipynb](https://github.com/hwchase17/langchain/compare/master...philschmid:add-sm-embeddings?expand=1#diff-e82629e2894974ec87856aedd769d4bdfe400314b03734f32bee5990bc7e8062)
document. The example currently includes some `_### TEMPORARY: Showing
how to deploy a SageMaker Endpoint from a Hugging Face model ###_ ` code
showing how you can deploy a sentence-transformers to SageMaker and then
run the methods of the embeddings class.

@hwchase17 please let me know if/when i should remove the `_###
TEMPORARY: Showing how to deploy a SageMaker Endpoint from a Hugging
Face model ###_` in the description i linked to a detail blog on how to
deploy a Sentence Transformers so i think we don't need to include those
steps here.

I also reused the `ContentHandlerBase` from
`langchain.llms.sagemaker_endpoint` and changed the output type to `any`
since it is depending on the implementation.
2023-03-21 21:51:48 -07:00
anupam-tiwari
86822d1cc2 Fixes the import typo in the vector db text generator notebook (#1874)
Fixes the import typo in the vector db text generator notebook for the
chroma library

Co-authored-by: Anupam <anupam@10-16-252-145.dynapool.wireless.nyu.edu>
2023-03-21 21:48:26 -07:00
Harrison Chase
a581bce379 remove key (#1863) 2023-03-21 12:43:41 -07:00
Harrison Chase
2ffc643086 add listen api docs (#1855) 2023-03-21 09:29:34 -07:00
Harrison Chase
2136dc94bb bump version to 118 (#1854) 2023-03-21 09:15:52 -07:00
Matt Tucker
a92344f476 Use regex match for bash process error output test assertion. (#1837)
I was getting the same issue reported in #1339 by
[MacYang555](https://github.com/MacYang555) when running the test suite
on my Mac. I implemented the fix they suggested to use a regex match in
the output assertion for the scenario under test.

Resolves #1339
2023-03-21 09:06:52 -07:00
Tomoko Uchida
b706966ebc Add setup instruction in Getting Started for Indexing (#1847)
`VectorstoreIndexCreator` [uses Chroma as the vectorstore by
default](1c22657256/langchain/indexes/vectorstore.py (L49)).
It may be helpful to add a short note for the setup.

You can see how the notebook looks here.

https://github.com/mocobeta/langchain/blob/feat/add-setup-instruction-to-index-getting-started/docs/modules/indexes/getting_started.ipynb
2023-03-21 09:06:35 -07:00
Harrison Chase
1c22657256 Harrison/faiss merge (#1843)
Co-authored-by: Ting Su <ting.su.1995@outlook.com>
2023-03-20 22:54:08 -07:00
Harrison Chase
6f02286805 Harrison/subtitles (#1842)
Co-authored-by: David Ruan <ruanwz@gmail.com>
Co-authored-by: David Ruan <david.ruan@analyticservice.net>
2023-03-20 22:53:52 -07:00
Simon Zhou
3674074eb0 Add Qdrant to ecosystem page (#1830)
Add [Qdrant](https://qdrant.tech/) to [LangChain
ecosystem](https://langchain.readthedocs.io/en/latest/ecosystem.html)
page.
2023-03-20 22:06:40 -07:00
Wenbin Fang
a7e09d46c5 Add podcast api tool to use NLP to search all podcasts or episodes. (#1833)
Use the following code to test:

```python
import os
from langchain.llms import OpenAI
from langchain.chains.api import podcast_docs
from langchain.chains import APIChain

# Get api key here: https://openai.com/pricing
os.environ["OPENAI_API_KEY"] = "sk-xxxxx"

# Get api key here: https://www.listennotes.com/api/pricing/
listen_api_key = 'xxx'

llm = OpenAI(temperature=0)
headers = {"X-ListenAPI-Key": listen_api_key}
chain = APIChain.from_llm_and_api_docs(llm, podcast_docs.PODCAST_DOCS, headers=headers, verbose=True)
chain.run("Search for 'silicon valley bank' podcast episodes, audio length is more than 30 minutes, return only 1 results")
```

Known issues: the api response data might be too big, and we'll get such
error:
`openai.error.InvalidRequestError: This model's maximum context length
is 4097 tokens, however you requested 6733 tokens (6477 in your prompt;
256 for the completion). Please reduce your prompt; or completion
length.`
2023-03-20 22:04:17 -07:00
Matt Tucker
fa2e546b76 Add workaround for debugpy install issue to contrib docs. (#1835)
When following the Quick Start instructions in the contributing docs, I
was getting a "WheelFileValidationError" on installation of debugpy
which was blocking the installation of a number of other deps. Google
turned up this [GitHub
issue](https://github.com/microsoft/debugpy/issues/1246) indicating a
regression in Poetry 1.4.1 and workarounds.

This PR updates the contrib docs noting the issue and the workarounds.
2023-03-20 22:03:19 -07:00
Daniel Dror (Dubovski)
c592b12043 Allow passing in encoding to csv_loader (#1836) 2023-03-20 22:03:00 -07:00
Ikko Eltociear Ashimine
9555bbd5bb Fix typo in sqlite.ipynb (#1828)
overriden -> overridden
2023-03-20 16:47:19 -07:00
Harrison Chase
0ca1641b14 release 0.0.117 (#1819) 2023-03-20 08:04:04 -07:00
Harrison Chase
d5b4393bb2 Harrison/llm math (#1808)
Co-authored-by: Vadym Barda <vadim.barda@gmail.com>
2023-03-20 07:53:26 -07:00
Bryan Helmig
7b6ff7fe00 Follow up to #1803 to remove dynamic docs route. (#1818)
The base docs are going to be more stable and familiar for folks.
Dynamic route is currently in flux.
2023-03-20 07:52:41 -07:00
Harrison Chase
76c7b1f677 Harrison/wandb (#1764)
Co-authored-by: Anish Shah <93145909+ash0ts@users.noreply.github.com>
2023-03-20 07:52:27 -07:00
Paul
5aa8ece211 Corrected small typo in error message. (#1791) 2023-03-20 07:51:35 -07:00
Harrison Chase
f6d24d5740 fix bug with openai token count (#1806) 2023-03-20 07:51:18 -07:00
Harrison Chase
b1c4480d7c fix typing (#1807) 2023-03-20 07:50:49 -07:00
Daniel Chalef
b6ba989f2f Add request timeout to ChatOpenAI (#1798)
Add request_timeout field to ChatOpenAI. Defaults to 60s.

---------

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-03-19 20:19:42 -07:00
Ankush Gola
04acda55ec Don't use dynamic api endpoint for Zapier NLA (#1803)
From Robert "Right now the dynamic/ route for specifically the above
endpoints is acting on all providers a user has set up, not just the
provider for the supplied API key."
2023-03-19 20:12:33 -07:00
Harrison Chase
8e5c4ac867 bump version to 0.0.116 (#1788) 2023-03-19 11:01:16 -07:00
Aratako
df8702fead Small fix: Remove unused variable summary_message_role (#1789)
After the changes in #1783, `summary_message_role` is no longer used in
`ConversationSummaryBufferMemory`, so this PR removes it.
2023-03-19 11:01:03 -07:00
Harrison Chase
d5d50c39e6 Harrison/azure embeddings (#1787)
Co-authored-by: Hemant <4627288+ghaccount@users.noreply.github.com>
2023-03-19 10:42:33 -07:00
Harrison Chase
1f18698b2a Harrison/token buffer memory (#1786)
Co-authored-by: Aratako <127325395+Aratako@users.noreply.github.com>
2023-03-19 10:42:24 -07:00
Harrison Chase
ef4945af6b Harrison/chat token usage (#1785) 2023-03-19 10:32:31 -07:00
Harrison Chase
7de2ada3ea Harrison/add source column (#1784)
Co-authored-by: Brian Graham <46691715+briangrahamww@users.noreply.github.com>
Co-authored-by: briangrahamww <brian.graham@ww.com>
2023-03-19 10:32:13 -07:00
Bernat Felip i Díaz
262d4cb9a8 Use embedding instead of embedding function in ElasticVectorStore (#1692)
While it might be a bit more restrictive, I find that using the
Embedding interface as an input for the vector store creation is better
than an embedding function because we can use bulk requests and possibly
the retry logic if needed.

I have seen that some vector store implementations use Embedding while
others use embedding function so I don't know what is the criteria to
have one or the other, in my opinion they should all just be Embedding
or have a way more complex embedding function that accepts multiple
texts instead of one by one.

---------

Co-authored-by: Bernat Felip <bernat.felip@rea.ch>
2023-03-19 10:23:38 -07:00
Harrison Chase
951c158106 Harrison/summary message rol (#1783)
Co-authored-by: Aratako <127325395+Aratako@users.noreply.github.com>
2023-03-19 10:09:18 -07:00
Bao Nguyen
85e4dd7fc3 Fix wrong prompt in refine chain (#1770)
I got this during testing 

```
ValueError: Missing some input keys: {'existing_answer'}
```

Upon review, the initial prompt should be `QUESTION_PROMPT_SELECTOR`.

Co-authored-by: Bao Nguyen <bnguyen@roku.com>
2023-03-19 10:03:45 -07:00
Harrison Chase
b1b4a4065a change chat default (#1782)
Resolves https://github.com/hwchase17/langchain/issues/1532, resolves
https://github.com/hwchase17/langchain/issues/1652.
2023-03-19 10:01:59 -07:00
Huang Chongdi
08f23c95d9 add encoding parameter to ObsidianLoader (#1752) 2023-03-19 09:48:31 -07:00
hitoshi44
3cf493b089 Fix Document & Expose StringPromptTemplate as a custom-prompt-template. (#1753)
Regarding [this
issue](https://github.com/hwchase17/langchain/issues/1754), the code in
the document [Creating a custom prompt
template](https://langchain.readthedocs.io/en/latest/modules/prompts/examples/custom_prompt_template.html)
is no longer functional and outdated.

To address this, I have made the following changes:

1. Updated the guide in the document to use `StringPromptTemplate`
instead of `BasePromptTemplate`.
2. Exposed `StringPromptTemplate` in `prompts/__init__.py` for easier
importing.
2023-03-19 09:47:56 -07:00
hitoshi44
e635c86145 Slightly modified the docstring in BasePromptTemplate and StringPromptTemplate. (#1755)
Regarding [this
issue](https://github.com/hwchase17/langchain/issues/1754),
`BasePromptTample` class docstring is a little outdated, thus it
requires new method `format_prompt` for now.

As such, I have made some modifications to the docstring to bring it up
to date.

I tried to adhere to the established document style, and would
appreciate you for taking a look at this PR.
2023-03-19 09:47:37 -07:00
Harrison Chase
779790167e Harrison/add warning to openaichat (#1781) 2023-03-19 09:43:56 -07:00
Nils Durner
3161ced4bc GPT-4 support (#1778) 2023-03-19 09:29:44 -07:00
hung_ng__
3d6fcb85dc Add load json prompt example (#1776)
Hi, I just want to add a PR on the prompt serialization examples of
loading from JSON so that it can contain the same as loading from YAML.
2023-03-19 09:28:56 -07:00
LeoGrin
3701b2901e use namespace argument in Pinecone constructor (#1757)
Fix #1756

Use the `namespace` argument of `Pinecone.from_exisiting_index` to set
the default value of `namespace` for other methods. Leads to more
expected behavior and easier integration in chains.

For the test, I've added a line to delete and rebuild the
`langchain-demo` index at the beginning of the test. I'm not 100% sure
if it's a good idea but it makes the test reproducible.
2023-03-18 19:55:38 -07:00
Ben Gahtan
280cb4160d Update tool.py (#1760)
Fixed typo that said the Wikipedia tool was using Wolfram Alpha (instead
of Wikipedia)
2023-03-18 19:55:26 -07:00
Kevin
80d8db5f60 Add service account support to Google Drive (#1761)
Having service account support in the drive document loader would be
nice.

This is already present in the youtube loader. 

cb646082ba/langchain/document_loaders/youtube.py (L76-L78)
2023-03-18 19:55:17 -07:00
Piyush Jain
1a8790d808 Corrects copyright year (#1762)
Corrected copyright year.
2023-03-18 19:55:05 -07:00
Eric Zhu
34840f3aee AzureChatOpenAI for Azure Open AI's ChatGPT API (#1673)
Add support for Azure OpenAI's ChatGPT API, which uses ChatML markups to
format messages instead of objects.

Related issues: #1591, #1659
2023-03-18 19:54:20 -07:00
Harrison Chase
8685d53adc querying tabular data (#1758) 2023-03-18 11:12:18 -07:00
Harrison Chase
2f6833d433 hotfix (#1742) 2023-03-17 09:05:08 -07:00
Harrison Chase
dd90fd02d5 Harrison/move docs (#1741) 2023-03-17 08:49:10 -07:00
Harrison Chase
07766a69f3 move docs (#1740) 2023-03-17 08:42:28 -07:00
Harrison Chase
aa854988bf bump version to 114 (#1739) 2023-03-17 08:26:06 -07:00
Harrison Chase
96ebe98dc2 Harrison/latex splitter (#1738)
Co-authored-by: Aidan Holland <thehappydinoa@gmail.com>
Co-authored-by: Jan de Boer <44832123+Janldeboer@users.noreply.github.com>
2023-03-17 08:10:27 -07:00
Harrison Chase
45f05fc939 Harrison/blackboard loader (#1737)
Co-authored-by: Aidan Holland <thehappydinoa@gmail.com>
2023-03-17 08:02:44 -07:00
Vincent Liao
cf9c3f54f7 docs: add docs link to agent toolkits (#1735)
New to Langchain, was a bit confused where I should find the toolkits
section when I'm at `agent/key_concepts` docs. I added a short link that
points to the how to section.
2023-03-17 07:59:49 -07:00
Merbin J Anselm
fbc0c85b90 fix: agent json parser fails with text in suffix (#1734)
While testing out `VectorDBQA` as a `Tool` for one of the conversation,
I happened to get a response from LLM (OpenAI) like this

<code>
Could not parse LLM output: Here's a response using the Product Search
tool:

```json
{
    "action": "Product Search",
    "action_input": "pots for plants"
}
```

This will allow you to search for pots for your plants and find a
variety of options that are available for purchase. You can use this
information to choose the pots that best fit your needs and preferences.
</code>

i.e. The response had a text before & *after* the expected JSON, leading
to `JSONDecodeError`. It's fixed now, by removing text after '```' to
remove unwanted text.

The error I encountered in this Jupyter Notebook -
[link](https://github.com/anselm94/chatbot-llm-ecommerce/blob/main/chatcommerce.ipynb)

<details>
    <summary>Error encountered</summary>
    <code>
    

---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
File
~/Git/chatbot-llm-ecommerce/.venv/lib/python3.11/site-packages/langchain/agents/conversational_chat/base.py:104,
in ConversationalChatAgent._extract_tool_and_input(self, llm_output)
        103 try:
    --> 104     response = self.output_parser.parse(llm_output)
        105     return response["action"], response["action_input"]

File
~/Git/chatbot-llm-ecommerce/.venv/lib/python3.11/site-packages/langchain/agents/conversational_chat/base.py:49,
in AgentOutputParser.parse(self, text)
        48 cleaned_output = cleaned_output.strip()
    ---> 49 response = json.loads(cleaned_output)
50 return {"action": response["action"], "action_input":
response["action_input"]}

File
/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py:346,
in loads(s, cls, object_hook, parse_float, parse_int, parse_constant,
object_pairs_hook, **kw)
        343 if (cls is None and object_hook is None and
        344         parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
    --> 346     return _default_decoder.decode(s)
        347 if cls is None:

File
/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py:340,
in JSONDecoder.decode(self, s, _w)
        339 if end != len(s):
    --> 340     raise JSONDecodeError("Extra data", s, end)
        341 return obj

    JSONDecodeError: Extra data: line 5 column 1 (char 74)

    During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
    Cell In[22], line 1
    ----> 1 ask_ai.run("Yes. I need pots for my plants")

File
~/Git/chatbot-llm-ecommerce/.venv/lib/python3.11/site-packages/langchain/chains/base.py:213,
in Chain.run(self, *args, **kwargs)
        211     if len(args) != 1:
212 raise ValueError("`run` supports only one positional argument.")
    --> 213     return self(args[0])[self.output_keys[0]]
        215 if kwargs and not args:
        216     return self(kwargs)[self.output_keys[0]]

File
~/Git/chatbot-llm-ecommerce/.venv/lib/python3.11/site-packages/langchain/chains/base.py:116,
in Chain.__call__(self, inputs, return_only_outputs)
        114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)
    --> 116     raise e
117 self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
118 return self.prep_outputs(inputs, outputs, return_only_outputs)

File
~/Git/chatbot-llm-ecommerce/.venv/lib/python3.11/site-packages/langchain/chains/base.py:113,
in Chain.__call__(self, inputs, return_only_outputs)
        107 self.callback_manager.on_chain_start(
        108     {"name": self.__class__.__name__},
        109     inputs,
        110     verbose=self.verbose,
        111 )
        112 try:
    --> 113     outputs = self._call(inputs)
        114 except (KeyboardInterrupt, Exception) as e:
115 self.callback_manager.on_chain_error(e, verbose=self.verbose)

File
~/Git/chatbot-llm-ecommerce/.venv/lib/python3.11/site-packages/langchain/agents/agent.py:499,
in AgentExecutor._call(self, inputs)
        497 # We now enter the agent loop (until it returns something).
        498 while self._should_continue(iterations):
    --> 499     next_step_output = self._take_next_step(
500 name_to_tool_map, color_mapping, inputs, intermediate_steps
        501     )
        502     if isinstance(next_step_output, AgentFinish):
503 return self._return(next_step_output, intermediate_steps)

File
~/Git/chatbot-llm-ecommerce/.venv/lib/python3.11/site-packages/langchain/agents/agent.py:409,
in AgentExecutor._take_next_step(self, name_to_tool_map, color_mapping,
inputs, intermediate_steps)
404 """Take a single step in the thought-action-observation loop.
        405
406 Override this to take control of how the agent makes and acts on
choices.
        407 """
        408 # Call the LLM to see what to do.
    --> 409 output = self.agent.plan(intermediate_steps, **inputs)
410 # If the tool chosen is the finishing tool, then we end and return.
        411 if isinstance(output, AgentFinish):

File
~/Git/chatbot-llm-ecommerce/.venv/lib/python3.11/site-packages/langchain/agents/agent.py:105,
in Agent.plan(self, intermediate_steps, **kwargs)
        94 """Given input, decided what to do.
        95
        96 Args:
    (...)
        102     Action specifying what tool to use.
        103 """
104 full_inputs = self.get_full_inputs(intermediate_steps, **kwargs)
    --> 105 action = self._get_next_action(full_inputs)
        106 if action.tool == self.finish_tool_name:
107 return AgentFinish({"output": action.tool_input}, action.log)

File
~/Git/chatbot-llm-ecommerce/.venv/lib/python3.11/site-packages/langchain/agents/agent.py:67,
in Agent._get_next_action(self, full_inputs)
65 def _get_next_action(self, full_inputs: Dict[str, str]) ->
AgentAction:
        66     full_output = self.llm_chain.predict(**full_inputs)
---> 67 parsed_output = self._extract_tool_and_input(full_output)
        68     while parsed_output is None:
        69         full_output = self._fix_text(full_output)

File
~/Git/chatbot-llm-ecommerce/.venv/lib/python3.11/site-packages/langchain/agents/conversational_chat/base.py:107,
in ConversationalChatAgent._extract_tool_and_input(self, llm_output)
        105     return response["action"], response["action_input"]
        106 except Exception:
--> 107 raise ValueError(f"Could not parse LLM output: {llm_output}")

ValueError: Could not parse LLM output: Here's a response using the
Product Search tool:

    ```json
    {
        "action": "Product Search",
        "action_input": "pots for plants"
    }
    ```

This will allow you to search for pots for your plants and find a
variety of options that are available for purchase. You can use this
information to choose the pots that best fit your needs and preferences.

</details>
2023-03-17 07:59:39 -07:00
Harrison Chase
276940fd9b Harrison/official method (#1728)
Co-authored-by: Aratako <127325395+Aratako@users.noreply.github.com>
2023-03-16 23:20:08 -07:00
Piyush Jain
cdff6c8181 Sagemaker Endpoint LLM (#1686)
Updates #965

---------

Co-authored-by: Nimisha Mehta <116048415+nimimeht@users.noreply.github.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-03-16 21:58:06 -07:00
alekhyablue
cd45adbea2 adding new agent types in comments (#1711) 2023-03-16 21:56:08 -07:00
Mario Kostelac
aff44d0a98 (OpenAI) Add model_name to LLMResult.llm_output (#1713)
Given that different models have very different latencies and pricings,
it's benefitial to pass the information about the model that generated
the response. Such information allows implementing custom callback
managers and track usage and price per model.

Addresses https://github.com/hwchase17/langchain/issues/1557.
2023-03-16 21:55:55 -07:00
libra
8a95fdaee1 Fix all the bug in init Tool in docs (#1725)
Fix all the example in the docs when init `Tool`

Test by render with jupyter
2023-03-16 21:55:44 -07:00
Alexandros Mavrogiannis
5d8dc83ede Bump duckdb-engine to 0.7.0 (#1726)
Resolves https://github.com/hwchase17/langchain/issues/1272
Resolves https://github.com/hwchase17/langchain/issues/1578
2023-03-16 21:55:35 -07:00
Daniel Chalef
b157e0c1c3 Add HTML document_loader that includes page title metadata (#1720)
This `BSHTMLLoader` document_loader loads an HTML document, extracts
text and adds the page title to the returned Document's metadata. The
loader uses the already installed bs4 package to extract both text
content and the page title.

Included in this PR is an example HTML file and an integration test that
tests against this file.

---------

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-03-16 21:47:17 -07:00
Harrison Chase
40e9488055 fix async in agent (#1723) 2023-03-16 21:43:22 -07:00
jerwelborn
55efbb8a7e pydantic/json parsing (#1722)
```
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

joke_query = "Tell me a joke."

# Or, an example with compound type fields.
#class FloatArray(BaseModel):
#    values: List[float] = Field(description="list of floats")
#
#float_array_query = "Write out a few terms of fiboacci."

model = OpenAI(model_name='text-davinci-003', temperature=0.0)
parser = PydanticOutputParser(pydantic_object=Joke)
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

_input = prompt.format_prompt(query=joke_query)
print("Prompt:\n", _input.to_string())
output = model(_input.to_string())
print("Completion:\n", output)
parsed_output = parser.parse(output)
print("Parsed completion:\n", parsed_output)
```

```
Prompt:
 Answer the user query.
The output should be formatted as a JSON instance that conforms to the JSON schema below.  For example, the object {"foo":  ["bar", "baz"]} conforms to the schema {"foo": {"description": "a list of strings field", "type": "string"}}.

Here is the output schema:
---
{"setup": {"description": "question to set up a joke", "type": "string"}, "punchline": {"description": "answer to resolve the joke", "type": "string"}}
---

Tell me a joke.

Completion:
 {"setup": "Why don't scientists trust atoms?", "punchline": "Because they make up everything!"}

Parsed completion:
 setup="Why don't scientists trust atoms?" punchline='Because they make up everything!'
```

Ofc, works only with LMs of sufficient capacity. DaVinci is reliable but
not always.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-03-16 21:43:11 -07:00
Alex Strick van Linschoten
d6bbf395af Loosen PyYAML dependency (#1698)
Hitting some dependency issues relating to this strict pinning. Unsure
of the knock-on effects, but wanted to propose this loosening down a
couple of versions.
2023-03-16 17:05:36 -07:00
Jonathan Pedoeem
606605925d Adding ability to return_pl_id to all PromptLayer Models in LangChain (#1699)
PromptLayer now has support for [several different tracking
features.](https://magniv.notion.site/Track-4deee1b1f7a34c1680d085f82567dab9)
In order to use any of these features you need to have a request id
associated with the request.

In this PR we add a boolean argument called `return_pl_id` which will
add `pl_request_id` to the `generation_info` dictionary associated with
a generation.

We also updated the relevant documentation.
2023-03-16 17:05:23 -07:00
Jeff Huber
f93c011456 fallback to {} for None metadata from Chroma (#1714)
The basic vector store example started breaking because `Document`
required `not None` for metadata, but Chroma stores metadata as `None`
if none is provided. This creates a fallback which fixes the basic
tutorial
https://langchain.readthedocs.io/en/latest/modules/indexes/examples/vectorstores.html

Here is the error that was generated

```
Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.
Traceback (most recent call last):
  File "/Users/jeff/src/temp/langchainchroma/test.py", line 17, in <module>
    docs = docsearch.similarity_search(query)
  File "/Users/jeff/src/langchain/langchain/vectorstores/chroma.py", line 133, in similarity_search
    docs_and_scores = self.similarity_search_with_score(query, k)
  File "/Users/jeff/src/langchain/langchain/vectorstores/chroma.py", line 182, in similarity_search_with_score
    return _results_to_docs_and_scores(results)
  File "/Users/jeff/src/langchain/langchain/vectorstores/chroma.py", line 24, in _results_to_docs_and_scores
    return [
  File "/Users/jeff/src/langchain/langchain/vectorstores/chroma.py", line 27, in <listcomp>
    (Document(page_content=result[0], metadata=result[1]), result[2])
  File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Document
metadata
  none is not an allowed value (type=type_error.none.not_allowed)
Exiting: Cleaning up .chroma directory
```
2023-03-16 12:06:47 -07:00
Harrison Chase
3c24684522 harrison/bump-version-00113 (#1701) 2023-03-15 14:49:47 -07:00
Harrison Chase
b84d190fd0 Harrison/gr int (#1700)
Co-authored-by: Shreya Rajpal <ShreyaR@users.noreply.github.com>
2023-03-15 13:22:20 -07:00
Harrison Chase
aad4bff098 Harrison/headers (#1696)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
2023-03-15 13:13:21 -07:00
Harrison Chase
3ea6d9c4d2 add docs for save/load messages (#1697) 2023-03-15 13:13:08 -07:00
Pandazki
ced412e1c1 fix: correct a small mistake in SimpleChatModel. (#1685) 2023-03-15 08:00:26 -07:00
Piyush Jain
1279c8de39 Fixed typo, clarified language (#1682) 2023-03-15 08:00:11 -07:00
at-b612
c7779c800a Added Mynd URL to gallery (#1684) 2023-03-15 07:59:59 -07:00
Jithin James
6f4f771897 docs: add path to state_of_the_union.txt in indexes/getting_started page (#1691)
add the state_of_the_union.txt file so that its easier to follow through
with the example.

---------

Co-authored-by: Jithin James <jjmachan@pop-os.localdomain>
2023-03-15 07:59:47 -07:00
Kacper Łukawski
4a327dd1d6 Implement basic metadata filtering in Qdrant (#1689)
This PR implements a basic metadata filtering mechanism similar to the
ones in Chroma and Pinecone. It still cannot express complex conditions,
as there are no operators, but some users requested to have that feature
available.
2023-03-15 07:31:39 -07:00
Ankush Gola
d4edd3c312 Zapier Integration (#1654)
* Zapier Wrapper and Tools (implemented by Zapier Team)
* Zapier Toolkit, examples with mrkl agent

---------

Co-authored-by: Mike Knoop <mikeknoop@gmail.com>
Co-authored-by: Robert Lewis <robert.lewis@zapier.com>
2023-03-14 23:06:17 -07:00
Harrison Chase
e72074f78a Harrison/ifixit (#1680)
Co-authored-by: David Rans <david@ifixit.com>
2023-03-14 21:17:50 -07:00
Harrison Chase
0b29e68c17 Harrison/pgvector (#1679)
Co-authored-by: Aman Kumar <krsingh.aman@gmail.com>
2023-03-14 21:13:58 -07:00
Harrison Chase
4d7fdb8957 Harrison/gml save (#1676)
Co-authored-by: Satoru Sakamoto <51464932+satoru814@users.noreply.github.com>
2023-03-14 20:00:22 -07:00
Harrison Chase
656efe6ef3 Harrison/fix nb (#1678) 2023-03-14 19:34:23 -07:00
Harrison Chase
362586fe8b save messages (#1653)
@yakigac this is my alternative to
https://github.com/hwchase17/langchain/pull/1648 - thoughts?
2023-03-14 18:15:55 -07:00
Matt Robinson
63aa28e2a6 feat: allow the unstructured kwargs to be passed in to Unstructured document loaders (#1667)
### Summary

Allows users to pass in `**unstructured_kwargs` to Unstructured document
loaders. Implemented with the `strategy` kwargs in mind, but will pass
in other kwargs like `include_page_breaks` as well. The two currently
supported strategies are `"hi_res"`, which is more accurate but takes
longer, and `"fast"`, which processes faster but with lower accuracy.
The `"hi_res"` strategy is the default. For PDFs, if `detectron2` is not
available and the user selects `"hi_res"`, the loader will fallback to
using the `"fast"` strategy.


### Testing

#### Make sure the `strategy` kwarg works

Run the following in iPython to verify that the `"fast"` strategy is
indeed faster.

```python
from langchain.document_loaders import UnstructuredFileLoader

loader = UnstructuredFileLoader("layout-parser-paper-fast.pdf", strategy="fast", mode="elements")
%timeit loader.load()

loader = UnstructuredFileLoader("layout-parser-paper-fast.pdf", mode="elements")
%timeit loader.load()
```

On my system I get:

```python
In [3]: from langchain.document_loaders import UnstructuredFileLoader

In [4]: loader = UnstructuredFileLoader("layout-parser-paper-fast.pdf", strategy="fast", mode="elements")

In [5]: %timeit loader.load()
247 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: loader = UnstructuredFileLoader("layout-parser-paper-fast.pdf", mode="elements")

In [7]: %timeit loader.load()
2.45 s ± 31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

#### Make sure older versions of `unstructured` still work

Run `pip install unstructured==0.5.3` and then verify the following runs
without error:

```python
from langchain.document_loaders import UnstructuredFileLoader

loader = UnstructuredFileLoader("layout-parser-paper-fast.pdf",  mode="elements")
loader.load()
```
2023-03-14 18:15:28 -07:00
Matthias Kern
c3dfbdf0da Remove outdated code from Chat VectorDB QA example (#1670) 2023-03-14 18:13:51 -07:00
Bilel MEDIMEGH
a2280f321f Docs: Fix typo in memory/key_concepts.md (#1671)
dialouge -> dialogue
2023-03-14 18:12:01 -07:00
Xin Qiu
4e13cef05a feat: add redisearch vectorstore (#1307)
# Description

Add `RediSearch` vectorstore for LangChain

RediSearch: [RediSearch quick
start](https://redis.io/docs/stack/search/quick_start/)

# How to use

```
from langchain.vectorstores.redisearch import RediSearch

rds = RediSearch.from_documents(docs, embeddings,redisearch_url="redis://localhost:6379")
```
2023-03-14 18:06:03 -07:00
Harrison Chase
e5c1659864 bump ver (#1668) 2023-03-14 13:05:17 -07:00
Harrison Chase
2d098e8869 Harrison/agent eval (#1620)
Co-authored-by: jerwelborn <jeremy.welborn@gmail.com>
2023-03-14 12:37:48 -07:00
Harrison Chase
8965a2f0af bump and hotfix (#1665) 2023-03-14 11:12:53 -07:00
Harrison Chase
e222ea4ee8 update rtd config (#1664) 2023-03-14 10:40:06 -07:00
Harrison Chase
e326939759 bump version 110 (#1662) 2023-03-14 10:21:35 -07:00
Harrison Chase
7cf46b3fee Harrison/convo agent (#1642) 2023-03-14 09:42:24 -07:00
Abhinav Upadhyay
84cd825a0e Add a batch_size param to the add_texts API of pinecone wrapper (#1658)
A safe default value of batch_size is required by the pinecone python
client otherwise if the user of add_texts passes too many documents in a
single call, they would get a 400 error from pinecone.
2023-03-14 09:40:22 -07:00
Jon Luo
0a1b1806e9 sql: do not hard code the LIMIT clause in the table_info section (#1563)
Seeing a lot of issues in Discord in which the LLM is not using the
correct LIMIT clause for different SQL dialects. ie, it's using `LIMIT`
for mssql instead of `TOP`, or instead of `ROWNUM` for Oracle, etc.
I think this could be due to us specifying the LIMIT statement in the
example rows portion of `table_info`. So the LLM is seeing the `LIMIT`
statement used in the prompt.
Since we can't specify each dialect's method here, I think it's fine to
just replace the `SELECT... LIMIT 3;` statement with `3 rows from
table_name table:`, and wrap everything in a block comment directly
following the `CREATE` statement. The Rajkumar et al paper wrapped the
example rows and `SELECT` statement in a block comment as well anyway.
Thoughts @fpingham?
2023-03-13 23:08:27 -07:00
Brian Thorne
9ee2713272 Bugfix - allow custom input variables in chat zero shot agent's prompt (#1624)
I was trying out the `chat-zero-shot-react-description` agent for
[qabot](dbbd31bb27/qabot/agents/data_query_chain.py (L35-L52))
but langchain 0.0.108 doesn't correctly use custom 'input_variables` in
the prompt template.
2023-03-13 23:07:35 -07:00
Tim Asp
b3234bf3b0 cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615)
`OnlinePDFLoader` and `PagedPDFSplitter` lived separate from the rest of
the pdf loaders.

Because they're all similar, I propose moving all to `pdy.py` and the
same docs/examples page.

Additionally, `PagedPDFSplitter` naming doesn't match the pattern the
rest of the loaders follow, so I renamed to `PyPDFLoader` and had it
inherit from `BasePDFLoader` so it can now load from remote file
sources.
2023-03-13 23:06:50 -07:00
Luis
562d9891ea Add regex dict: (#1616)
This class enables us to send a dictionary containing an output key and
the expected format, which in turn allows us to retrieve the result of
the matching formats and extract specific information from it.

To exclude irrelevant information from our return dictionary, we can
prompt the LLM to use a specific command that notifies us when it
doesn't know the answer. We refer to this variable as the
"no_update_value".

Regarding the updated regular expression pattern
(r"{}:\s?([^.'\n']*).?"), it enables us to retrieve a format as 'Output
Key':'value'.

We have improved the regex by adding an optional space between ':' and
'value' with "s?", and by excluding points and line jumps from the
matches using "[^.'\n']*".
2023-03-13 23:05:39 -07:00
Harrison Chase
56aff797c0 docs req (#1647) 2023-03-13 16:03:32 -07:00
Harrison Chase
d53ff270e0 bump version to 109 (#1646) 2023-03-13 15:52:35 -07:00
Harrison Chase
df6c33d4b3 Harrison/new output parser (#1617) 2023-03-13 15:08:39 -07:00
Dennis Aumiller
039d05c808 Update types in cohere.py (#1635)
Adjust argument type and clarification on parameter limits for
attributes `frequency_penalty` and `presence_penalty`.
2023-03-13 09:08:32 -07:00
Harrison Chase
aed9f9febe Harrison/return intermediate (#1633)
Co-authored-by: Mario Kostelac <mario@intercom.io>
2023-03-13 07:54:29 -07:00
Harrison Chase
72b461e257 improve chat error (#1632) 2023-03-13 07:43:44 -07:00
Peng Qu
cb646082ba remove an extra whitespace (#1625) 2023-03-13 07:27:21 -07:00
Eugene Yurtsev
bd4a2a670b Add copy button to sphinx notebooks (#1622)
This adds a copy button at the top right corner of all notebook cells in
sphinx
notebooks.
2023-03-12 21:15:07 -07:00
Ikko Eltociear Ashimine
6e98ab01e1 Fix typo in vectorstore.ipynb (#1614)
Initalize -> Initialize
2023-03-12 14:12:47 -07:00
Harrison Chase
c0ad5d13b8 bump to version 108 (#1613) 2023-03-12 09:50:45 -07:00
yakigac
acd86d33bc Add read only shared memory (#1491)
Provide shared memory capability for the Agent.
Inspired by #1293 .

## Problem

If both Agent and Tools (i.e., LLMChain) use the same memory, both of
them will save the context. It can be annoying in some cases.


## Solution

Create a memory wrapper that ignores the save and clear, thereby
preventing updates from Agent or Tools.
2023-03-12 09:34:36 -07:00
Abhinav Upadhyay
9707eda83c Fix docstring of FAISS constructor (#1611) 2023-03-12 09:31:40 -07:00
Kayvane Shakerifar
7e550df6d4 feat: add lookup index to csv loader to make retrieving the original … (#1612)
feat: add lookup index to csv loader to make retrieving the original csv
information easier using theDocument properties
2023-03-12 09:29:27 -07:00
Harrison Chase
c9b5a30b37 move output parsing (#1605) 2023-03-11 16:41:03 -08:00
Harrison Chase
cb04ba0136 Add support for intermediate steps to SQLDatabaseSequentialChain (#1583) (#1601)
for https://github.com/hwchase17/langchain/issues/1582

I simply added the `return_intermediate_steps` and changed the
`output_keys` function.

I added 2 simple tests, 1 for SQLDatabaseSequentialChain without the
intermediate steps and 1 with

Co-authored-by: brad-nemetski <115185478+brad-nemetski@users.noreply.github.com>
2023-03-11 15:44:41 -08:00
Harrison Chase
5903a93f3d add convinence method to call chat model as an llm (#1604) 2023-03-11 15:04:57 -08:00
Harrison Chase
15de3e8137 Harrison/docs footer (#1600)
Co-authored-by: Albert Avetisian <albert.avetisian@gmail.com>
2023-03-11 09:18:35 -08:00
Harrison Chase
f95d551f7a Harrison/shallow metadata (#1599)
Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-03-11 09:18:25 -08:00
Harrison Chase
c6bfa00178 bump version to 107 (#1590) 2023-03-10 15:39:30 -08:00
Tim Asp
01a57198b8 [bugfix] Fix persisted chromadb vectorstore (#1444)
If a `persist_directory` param was set, chromadb would throw a warning
that ""No embedding_function provided, using default embedding function:
SentenceTransformerEmbeddingFunction". and would error with a `Illegal
instruction: 4` error.

This is on a MBP M1 13.2.1, python 3.9.

I'm not entirely sure why that error happened, but when using
`get_or_create_collection` instead of `list_collection` on our end, the
error and warning goes away and chroma works as expected.

Added bonus this is cleaner and likely more efficient.
`list_collections` builds a new `Collection` instance for each collect,
then `Chroma` would just use the `name` field to tell if the collection
existed.
2023-03-10 15:14:35 -08:00
Harrison Chase
8dba30f31e Harrison/kwargs loaders (#1588)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
2023-03-10 15:05:06 -08:00
Harrison Chase
9f78717b3c Harrison/callbacks (#1587) 2023-03-10 12:53:09 -08:00
Harrison Chase
90846dcc28 fix chat agent (#1586) 2023-03-10 12:40:37 -08:00
Claus Thomasen
6ed16e13b1 Readded similarity_search_by_vector (#1568)
I am redoing this PR, as I made a mistake by merging the latest changes
into my fork's branch, sorry. This added a bunch of commits to my
previous PR.

This fixes #1451.
2023-03-10 12:40:14 -08:00
Harrison Chase
c1dc784a3d buffer memory old version (#1581)
bring back an older version of memory since people seem to be using it
more widely
2023-03-10 11:27:15 -08:00
fabi.s
5b0e747f9a Fix description of UnstructuredURLLoader & UnstructuredHTMLLoader (#1570) 2023-03-10 07:08:58 -08:00
Zach Schillaci
624c72c266 Add wikipedia tool doc (#1579) 2023-03-10 07:07:27 -08:00
Ryan Dao
a950287206 Strip trailing whitespaces in agent's stop sequences (#1566)
Fixes #1489
2023-03-09 16:36:15 -08:00
Tim Asp
30383abb12 Add CSVLoader document loader (#1573)
Simple CSV document loader which wraps `csv` reader, and preps the file
with a single `Document` per row.

The column header is prepended to each value for context which is useful
for context with embedding and semantic search
2023-03-09 16:35:18 -08:00
Zach Schillaci
cdb97f3dfb Add Wikipedia search utility and tool (#1561)
The Python `wikipedia` package gives easy access for searching and
fetching pages from Wikipedia, see https://pypi.org/project/wikipedia/.
It can serve as an additional search and retrieval tool, like the
existing Google and SerpAPI helpers, for both chains and agents.
2023-03-09 16:34:39 -08:00
Felix Altenberger
b44c8bd969 Add optional base_url arg to GitbookLoader (#1552)
First of all, big kudos on what you guys are doing, langchain is
enabling some really amazing usecases and I'm having lot's of fun
playing around with it. It's really cool how many data sources it
supports out of the box.

However, I noticed some limitations of the current `GitbookLoader` which
this PR adresses:

The main change is that I added an optional `base_url` arg to
`GitbookLoader`. This enables use cases where one wants to crawl docs
from a start page other than the index page, e.g., the following call
would scrape all pages that are reachable via nav bar links from
"https://docs.zenml.io/v/0.35.0":

```python
GitbookLoader(
    web_page="https://docs.zenml.io/v/0.35.0", 
    load_all_paths=True,
    base_url="https://docs.zenml.io",
)
```

Previously, this would fail because relative links would be of the form
`/v/0.35.0/...` and the full link URLs would become
`docs.zenml.io/v/0.35.0/v/0.35.0/...`.

I also fixed another issue of the `GitbookLoader` where the link URLs
were constructed incorrectly as `website//relative_url` if the provided
`web_page` had a trailing slash.
2023-03-09 16:32:40 -08:00
Andriy Mulyar
c9189d354a AtlasDB vector store documentation updates. (#1572)
- Updated errors in the AtlasDB vector store documentation
- Removed extraneous output logs in example notebook.
2023-03-09 16:31:14 -08:00
blob42
622578a022 docs: fix typo in searx tool (#1569)
Co-authored-by: blob42 <spike@w530>
2023-03-09 15:58:33 -08:00
Matt Robinson
7018806a92 feat: document loader for markdown files (#1558)
### Summary

Adds a document loader for handling markdown files. This document loader
requires `unstructured>=0.4.16`.

### Testing

```python
from langchain.document_loaders import UnstructuredMarkdownLoader

loader = UnstructuredMarkdownLoader("README.md")
loader.load()
```
2023-03-09 10:55:07 -08:00
Harrison Chase
bd335ffd64 bump version to 106 (#1562) 2023-03-09 10:20:54 -08:00
Harrison Chase
a094c49153 add chat agent (#1509) 2023-03-09 09:12:08 -08:00
Brenton Wheeler
99fe023496 docs: fix typo in modules/indexes/chain_examples/question_answering (#1551)
docs: fix typo in modules/indexes/chain_examples/question_answering


![image](https://user-images.githubusercontent.com/11394076/224007874-3a52adf6-ff7a-4f22-9dbf-18c83d08167f.png)
2023-03-09 09:11:43 -08:00
Harrison Chase
3ee32a01ea Harrison/prompt layer (#1547)
Co-authored-by: Jonathan Pedoeem <jonathanped@gmail.com>
Co-authored-by: AbuBakar <abubakarsohail123@gmail.com>
2023-03-08 21:24:27 -08:00
Harrison Chase
c844d1fd46 Harrison/chunk size (#1549)
Co-authored-by: Florian Leuerer <31259070+floleuerer@users.noreply.github.com>
2023-03-08 21:24:18 -08:00
Harrison Chase
9405af6919 Harrison/hf inf error (#1543)
Co-authored-by: Konstantin Hebenstreit <57603012+KonstantinHebenstreit@users.noreply.github.com>
2023-03-08 20:53:46 -08:00
Harrison Chase
357d808484 Harrison/remote paths pdf (#1544)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
2023-03-08 20:53:37 -08:00
Harrison Chase
cc423f40f1 Harrison/youtube loader (#1545)
Co-authored-by: Julian Wustl <57504258+Julianwustl@users.noreply.github.com>
2023-03-08 20:53:27 -08:00
Harrison Chase
b053f831cd Harrison/contributing (#1542)
Co-authored-by: Saurav Maheshkar <sauravvmaheshkar@gmail.com>
2023-03-08 20:53:16 -08:00
Harrison Chase
523ad8d2e2 Harrison/chat history formatter1 (#1538)
Co-authored-by: Youssef A. Abukwaik <yousseb@users.noreply.github.com>
2023-03-08 20:46:37 -08:00
Graham Neubig
31303d0b11 Added other evaluation metrics for data-augmented QA (#1521)
This PR adds additional evaluation metrics for data-augmented QA,
resulting in a report like this at the end of the notebook:

![Screen Shot 2023-03-08 at 8 53 23
AM](https://user-images.githubusercontent.com/398875/223731199-8eb8e77f-5ff3-40a2-a23e-f3bede623344.png)

The score calculation is based on the
[Critique](https://docs.inspiredco.ai/critique/) toolkit, an API-based
toolkit (like OpenAI) that has minimal dependencies, so it should be
easy for people to run if they choose.

The code could further be simplified by actually adding a chain that
calls Critique directly, but that probably should be saved for another
PR if necessary. Any comments or change requests are welcome!
2023-03-08 20:41:03 -08:00
gidler
494c9d341a [DOCS] Assorted wording, punctuation, and consistency revisions (#1443)
Contributing some small fixes I noticed while reading through the
documentation.

Thank you for a creating and maintaining this project!
2023-03-08 20:16:09 -08:00
Harrison Chase
519f0187b6 Harrison/gdrive pdf (#1433)
Co-authored-by: LM <93918064+LuisMalhadas@users.noreply.github.com>
Co-authored-by: Luis Malhadas <luis@sia.so>
2023-03-08 20:15:36 -08:00
Florian Leuerer
64c6435545 Added client_settings support for chromadb vecstore (#1528)
# Problem

The ChromaDB vecstore only supported local connection. There was no way
to use a chromadb server.

# Fix
Added `client_settings` as Chroma attribute. 

# Usage

```
from chromadb.config import Settings
from langchain.vectorstores import Chroma

chroma_settings = Settings(chroma_api_impl="rest",
                            chroma_server_host="localhost",
                            chroma_server_http_port="80")

docsearch = Chroma.from_documents(chunks, embeddings, metadatas=metadatas, client_settings=chroma_settings, collection_name=COLLECTION_NAME)
```
2023-03-08 17:42:09 -08:00
Harrison Chase
7eba828e1b Harrison/update regex (#1534)
Co-authored-by: Luis <57528712+LuisLechugaRuiz@users.noreply.github.com>
2023-03-08 17:41:17 -08:00
Harrison Chase
2a7215bc3b Harrison/prompt issues (#1537) 2023-03-08 16:56:10 -08:00
Alpri Else
784d24a1d5 Support S3 Object keys with / in S3FileLoader (#1517)
Resolves https://github.com/hwchase17/langchain/issues/1510

### Problem
When loading S3 Objects with `/` in the object key (eg.
`folder/some-document.txt`) using `S3FileLoader`, the objects are
downloaded into a temporary directory and saved as a file.

This errors out when the parent directory does not exist within the
temporary directory.

See
https://github.com/hwchase17/langchain/issues/1510#issuecomment-1459583696
on how to reproduce this bug

### What this pr does
Creates parent directories based on object key. 

This also works with deeply nested keys:
`folder/subfolder/some-document.txt`
2023-03-08 16:17:26 -08:00
Harrison Chase
aba58e9e2e Harrison/bumpver104 (#1525) 2023-03-08 09:46:02 -08:00
Harrison Chase
c4a557bdd4 add concept of prompt collection (#1507) 2023-03-08 08:31:29 -08:00
Ivan
97e3666e0d changed requests.run to requests.get (#1485)
This pull request proposes an update to the Lightweight wrapper
library's documentation. The current documentation provides an example
of how to use the library's requests.run method, as follows:
requests.run("https://www.google.com"). However, this example does not
work for the 0.0.102 version of the library.

Testing:

The changes have been tested locally to ensure they are working as
intended.

Thank you for considering this pull request.
2023-03-07 21:10:23 -08:00
Harrison Chase
7ade419a0e allow passing of messages into prompt template (#1505) 2023-03-07 21:10:12 -08:00
Harrison Chase
a4a2d79087 Harrison/rtd loader (#1513)
Co-authored-by: Youssef A. Abukwaik <yousseb@users.noreply.github.com>
2023-03-07 21:09:54 -08:00
Harrison Chase
8f21605d71 add return source docs (#1515) 2023-03-07 21:09:36 -08:00
Harrison Chase
064741db58 Harrison/fix text splitter (#1511)
Co-authored-by: ajaysolanky <ajsolanky@gmail.com>
Co-authored-by: Ajay Solanky <ajaysolanky@saw-l14668307kd.myfiosgateway.com>
2023-03-07 15:42:28 -08:00
Tom Dyson
e3354404ad Fix link to Pinecone notebook (#1492) 2023-03-07 15:24:03 -08:00
Harrison Chase
3610ef2830 add fake embeddings class (#1503) 2023-03-07 15:23:46 -08:00
Ankush Gola
27104d4921 fix ChatOpenAI.agenerate (#1504) 2023-03-07 15:22:05 -08:00
Harrison Chase
4f41e20f09 memory docs (#1501) 2023-03-07 11:02:46 -08:00
284 changed files with 17735 additions and 2488 deletions

View File

@@ -73,6 +73,8 @@ poetry install -E all
This will install all requirements for running the package, examples, linting, formatting, tests, and coverage. Note the `-E all` flag will install all optional dependencies necessary for integration testing.
❗Note: If you're running Poetry 1.4.1 and receive a `WheelFileValidationError` for `debugpy` during installation, you can try either downgrading to Poetry 1.4.0 or disabling "modern installation" (`poetry config installer.modern-installation false`) and re-install requirements. See [this `debugpy` issue](https://github.com/microsoft/debugpy/issues/1246) for more details.
Now, you should be able to run the common tasks in the following section.
## ✅Common Tasks

6
.gitignore vendored
View File

@@ -135,3 +135,9 @@ dmypy.json
# macOS display setting files
.DS_Store
# Wandb directory
wandb/
# asdf tool versions
.tool-versions

View File

@@ -79,4 +79,4 @@ For more information on these concepts, please see our [full documentation](http
As an open source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infra, or better documentation.
For detailed information on how to contribute, see [here](CONTRIBUTING.md).
For detailed information on how to contribute, see [here](.github/CONTRIBUTING.md).

View File

@@ -23,13 +23,14 @@ with open("../pyproject.toml") as f:
# -- Project information -----------------------------------------------------
project = "🦜🔗 LangChain"
copyright = "2022, Harrison Chase"
copyright = "2023, Harrison Chase"
author = "Harrison Chase"
version = data["tool"]["poetry"]["version"]
release = version
html_title = project + " " + version
html_last_updated_fmt = "%b %d, %Y"
# -- General configuration ---------------------------------------------------
@@ -45,6 +46,7 @@ extensions = [
"sphinx.ext.viewcode",
"sphinxcontrib.autodoc_pydantic",
"myst_nb",
"sphinx_copybutton",
"sphinx_panels",
"IPython.sphinxext.ipython_console_highlighting",
]

View File

@@ -1,19 +1,21 @@
# AtlasDB
This page covers how to Nomic's Atlas ecosystem within LangChain.
This page covers how to use Nomic's Atlas ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Atlas wrappers.
## Installation and Setup
- Install the Python package with `pip install nomic`
- Nomic is also included in langchains poetry extras `poetry install -E all`
-
## Wrappers
### VectorStore
There exists a wrapper around the Atlas neural database, allowing you to use it as a vectorstore.
This vectorstore also gives you full access to the underlying AtlasProject object, which will allow you to use the full range of Atlas map interactions, such as bulk tagging and automatic topic modeling.
Please see [the Nomic docs](https://docs.nomic.ai/atlas_api.html) for more detailed information.
Please see [the Atlas docs](https://docs.nomic.ai/atlas_api.html) for more detailed information.
@@ -22,4 +24,4 @@ To import this vectorstore:
from langchain.vectorstores import AtlasDB
```
For a more detailed walkthrough of the Chroma wrapper, see [this notebook](../modules/indexes/examples/vectorstores.ipynb)
For a more detailed walkthrough of the AtlasDB wrapper, see [this notebook](../modules/indexes/vectorstore_examples/atlas.ipynb)

View File

@@ -5,7 +5,7 @@ It is broken into two parts: installation and setup, and then references to spec
## Installation and Setup
- Install with `pip3 install banana-dev`
- Install with `pip install banana-dev`
- Get an Banana api key and set it as an environment variable (`BANANA_API_KEY`)
## Define your Banana Template

View File

@@ -34,7 +34,8 @@ search = GoogleSerperAPIWrapper()
tools = [
Tool(
name="Intermediate Answer",
func=search.run
func=search.run,
description="useful for when you need to ask with search"
)
]

View File

@@ -1,6 +1,6 @@
# Graphsignal
This page covers how to use the Graphsignal to trace and monitor LangChain.
This page covers how to use the Graphsignal ecosystem to trace and monitor LangChain.
## Installation and Setup

View File

@@ -1,6 +1,6 @@
# Helicone
This page covers how to use the [Helicone](https://helicone.ai) within LangChain.
This page covers how to use the [Helicone](https://helicone.ai) ecosystem within LangChain.
## What is Helicone?

20
docs/ecosystem/milvus.md Normal file
View File

@@ -0,0 +1,20 @@
# Milvus
This page covers how to use the Milvus ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Milvus wrappers.
## Installation and Setup
- Install the Python SDK with `pip install pymilvus`
## Wrappers
### VectorStore
There exists a wrapper around Milvus indexes, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
To import this vectorstore:
```python
from langchain.vectorstores import Milvus
```
For a more detailed walkthrough of the Miluvs wrapper, see [this notebook](../modules/indexes/vectorstore_examples/milvus.ipynb)

View File

@@ -0,0 +1,29 @@
# PGVector
This page covers how to use the Postgres [PGVector](https://github.com/pgvector/pgvector) ecosystem within LangChain
It is broken into two parts: installation and setup, and then references to specific PGVector wrappers.
## Installation
- Install the Python package with `pip install pgvector`
## Setup
1. The first step is to create a database with the `pgvector` extension installed.
Follow the steps at [PGVector Installation Steps](https://github.com/pgvector/pgvector#installation) to install the database and the extension. The docker image is the easiest way to get started.
## Wrappers
### VectorStore
There exists a wrapper around Postgres vector databases, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
To import this vectorstore:
```python
from langchain.vectorstores.pgvector import PGVector
```
### Usage
For a more detailed walkthrough of the PGVector Wrapper, see [this notebook](../modules/indexes/vectorstore_examples/pgvector.ipynb)

View File

@@ -17,4 +17,4 @@ To import this vectorstore:
from langchain.vectorstores import Pinecone
```
For a more detailed walkthrough of the Pinecone wrapper, see [this notebook](../modules/indexes/examples/vectorstores.ipynb)
For a more detailed walkthrough of the Pinecone wrapper, see [this notebook](../modules/indexes/vectorstore_examples/pinecone.ipynb)

View File

@@ -25,7 +25,25 @@ from langchain.llms import PromptLayerOpenAI
llm = PromptLayerOpenAI(pl_tags=["langchain-requests", "chatbot"])
```
To get the PromptLayer request id, use the argument `return_pl_id` when instanializing the LLM
```python
from langchain.llms import PromptLayerOpenAI
llm = PromptLayerOpenAI(return_pl_id=True)
```
This will add the PromptLayer request ID in the `generation_info` field of the `Generation` returned when using `.generate` or `.agenerate`
For example:
```python
llm_results = llm.generate(["hello world"])
for res in llm_results.generations:
print("pl request id: ", res[0].generation_info["pl_request_id"])
```
You can use the PromptLayer request ID to add a prompt, score, or other metadata to your request. [Read more about it here](https://magniv.notion.site/Track-4deee1b1f7a34c1680d085f82567dab9).
This LLM is identical to the [OpenAI LLM](./openai), except that
- all your requests will be logged to your PromptLayer account
- you can add `pl_tags` when instantializing to tag your requests on PromptLayer
- you can add `return_pl_id` when instantializing to return a PromptLayer request id to use [while tracking requests](https://magniv.notion.site/Track-4deee1b1f7a34c1680d085f82567dab9).
PromptLayer also provides native wrappers for [`PromptLayerChatOpenAI`](../modules/chat/examples/promptlayer_chat_openai.ipynb) and `PromptLayerOpenAIChat`

20
docs/ecosystem/qdrant.md Normal file
View File

@@ -0,0 +1,20 @@
# Qdrant
This page covers how to use the Qdrant ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Qdrant wrappers.
## Installation and Setup
- Install the Python SDK with `pip install qdrant-client`
## Wrappers
### VectorStore
There exists a wrapper around Qdrant indexes, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
To import this vectorstore:
```python
from langchain.vectorstores import Qdrant
```
For a more detailed walkthrough of the Qdrant wrapper, see [this notebook](../modules/indexes/vectorstore_examples/qdrant.ipynb)

View File

@@ -17,9 +17,12 @@ This page is broken into two parts: installation and setup, and then references
- `poppler-utils`
- `tesseract-ocr`
- `libreoffice`
- If you are parsing PDFs, run the following to install the `detectron2` model, which
- If you are parsing PDFs using the `"hi_res"` strategy, run the following to install the `detectron2` model, which
`unstructured` uses for layout detection:
- `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"`
- If `detectron2` is not installed, `unstructured` will fallback to processing PDFs
using the `"fast"` strategy, which uses `pdfminer` directly and doesn't require
`detectron2`.
## Wrappers

View File

@@ -0,0 +1,625 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Weights & Biases\n",
"\n",
"This notebook goes over how to track your LangChain experiments into one centralized Weights and Biases dashboard. To learn more about prompt engineering and the callback please refer to this Report which explains both alongside the resultant dashboards you can expect to see.\n",
"\n",
"Run in Colab: https://colab.research.google.com/drive/1DXH4beT4HFaRKy_Vm4PoxhXVDRf7Ym8L?usp=sharing\n",
"\n",
"View Report: https://wandb.ai/a-sh0ts/langchain_callback_demo/reports/Prompt-Engineering-LLMs-with-LangChain-and-W-B--VmlldzozNjk1NTUw#👋-how-to-build-a-callback-in-langchain-for-better-prompt-engineering"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install wandb\n",
"!pip install pandas\n",
"!pip install textstat\n",
"!pip install spacy\n",
"!python -m spacy download en_core_web_sm"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "T1bSmKd6V2If"
},
"outputs": [],
"source": [
"import os\n",
"os.environ[\"WANDB_API_KEY\"] = \"\"\n",
"# os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
"# os.environ[\"SERPAPI_API_KEY\"] = \"\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "8WAGnTWpUUnD"
},
"outputs": [],
"source": [
"from datetime import datetime\n",
"from langchain.callbacks import WandbCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"Callback Handler that logs to Weights and Biases.\n",
"\n",
"Parameters:\n",
" job_type (str): The type of job.\n",
" project (str): The project to log to.\n",
" entity (str): The entity to log to.\n",
" tags (list): The tags to log.\n",
" group (str): The group to log to.\n",
" name (str): The name of the run.\n",
" notes (str): The notes to log.\n",
" visualize (bool): Whether to visualize the run.\n",
" complexity_metrics (bool): Whether to log complexity metrics.\n",
" stream_logs (bool): Whether to stream callback actions to W&B\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cxBFfZR8d9FC"
},
"source": [
"```\n",
"Default values for WandbCallbackHandler(...)\n",
"\n",
"visualize: bool = False,\n",
"complexity_metrics: bool = False,\n",
"stream_logs: bool = False,\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"NOTE: For beta workflows we have made the default analysis based on textstat and the visualizations based on spacy"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "KAz8weWuUeXF"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34m\u001b[1mwandb\u001b[0m: Currently logged in as: \u001b[33mharrison-chase\u001b[0m. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n"
]
},
{
"data": {
"text/html": [
"Tracking run with wandb version 0.14.0"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Run data is saved locally in <code>/Users/harrisonchase/workplace/langchain/docs/ecosystem/wandb/run-20230318_150408-e47j1914</code>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Syncing run <strong><a href='https://wandb.ai/harrison-chase/langchain_callback_demo/runs/e47j1914' target=\"_blank\">llm</a></strong> to <a href='https://wandb.ai/harrison-chase/langchain_callback_demo' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" View project at <a href='https://wandb.ai/harrison-chase/langchain_callback_demo' target=\"_blank\">https://wandb.ai/harrison-chase/langchain_callback_demo</a>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" View run at <a href='https://wandb.ai/harrison-chase/langchain_callback_demo/runs/e47j1914' target=\"_blank\">https://wandb.ai/harrison-chase/langchain_callback_demo/runs/e47j1914</a>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[33mWARNING\u001b[0m The wandb callback is currently in beta and is subject to change based on updates to `langchain`. Please report any issues to https://github.com/wandb/wandb/issues with the tag `langchain`.\n"
]
}
],
"source": [
"\"\"\"Main function.\n",
"\n",
"This function is used to try the callback handler.\n",
"Scenarios:\n",
"1. OpenAI LLM\n",
"2. Chain with multiple SubChains on multiple generations\n",
"3. Agent with Tools\n",
"\"\"\"\n",
"session_group = datetime.now().strftime(\"%m.%d.%Y_%H.%M.%S\")\n",
"wandb_callback = WandbCallbackHandler(\n",
" job_type=\"inference\",\n",
" project=\"langchain_callback_demo\",\n",
" group=f\"minimal_{session_group}\",\n",
" name=\"llm\",\n",
" tags=[\"test\"],\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), wandb_callback])\n",
"llm = OpenAI(temperature=0, callback_manager=manager, verbose=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Q-65jwrDeK6w"
},
"source": [
"\n",
"\n",
"```\n",
"# Defaults for WandbCallbackHandler.flush_tracker(...)\n",
"\n",
"reset: bool = True,\n",
"finish: bool = False,\n",
"```\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `flush_tracker` function is used to log LangChain sessions to Weights & Biases. It takes in the LangChain module or agent, and logs at minimum the prompts and generations alongside the serialized form of the LangChain module to the specified Weights & Biases project. By default we reset the session as opposed to concluding the session outright."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "o_VmneyIUyx8"
},
"outputs": [
{
"data": {
"text/html": [
"Waiting for W&B process to finish... <strong style=\"color:green\">(success).</strong>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" View run <strong style=\"color:#cdcd00\">llm</strong> at: <a href='https://wandb.ai/harrison-chase/langchain_callback_demo/runs/e47j1914' target=\"_blank\">https://wandb.ai/harrison-chase/langchain_callback_demo/runs/e47j1914</a><br/>Synced 5 W&B file(s), 2 media file(s), 5 artifact file(s) and 0 other file(s)"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Find logs at: <code>./wandb/run-20230318_150408-e47j1914/logs</code>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "0d7b4307ccdb450ea631497174fca2d1",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Label(value='Waiting for wandb.init()...\\r'), FloatProgress(value=0.016745895149999985, max=1.0…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Tracking run with wandb version 0.14.0"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Run data is saved locally in <code>/Users/harrisonchase/workplace/langchain/docs/ecosystem/wandb/run-20230318_150534-jyxma7hu</code>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Syncing run <strong><a href='https://wandb.ai/harrison-chase/langchain_callback_demo/runs/jyxma7hu' target=\"_blank\">simple_sequential</a></strong> to <a href='https://wandb.ai/harrison-chase/langchain_callback_demo' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" View project at <a href='https://wandb.ai/harrison-chase/langchain_callback_demo' target=\"_blank\">https://wandb.ai/harrison-chase/langchain_callback_demo</a>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" View run at <a href='https://wandb.ai/harrison-chase/langchain_callback_demo/runs/jyxma7hu' target=\"_blank\">https://wandb.ai/harrison-chase/langchain_callback_demo/runs/jyxma7hu</a>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# SCENARIO 1 - LLM\n",
"llm_result = llm.generate([\"Tell me a joke\", \"Tell me a poem\"] * 3)\n",
"wandb_callback.flush_tracker(llm, name=\"simple_sequential\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "trxslyb1U28Y"
},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"from langchain.chains import LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "uauQk10SUzF6"
},
"outputs": [
{
"data": {
"text/html": [
"Waiting for W&B process to finish... <strong style=\"color:green\">(success).</strong>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" View run <strong style=\"color:#cdcd00\">simple_sequential</strong> at: <a href='https://wandb.ai/harrison-chase/langchain_callback_demo/runs/jyxma7hu' target=\"_blank\">https://wandb.ai/harrison-chase/langchain_callback_demo/runs/jyxma7hu</a><br/>Synced 4 W&B file(s), 2 media file(s), 6 artifact file(s) and 0 other file(s)"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Find logs at: <code>./wandb/run-20230318_150534-jyxma7hu/logs</code>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "dbdbf28fb8ed40a3a60218d2e6d1a987",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Label(value='Waiting for wandb.init()...\\r'), FloatProgress(value=0.016736786816666675, max=1.0…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Tracking run with wandb version 0.14.0"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Run data is saved locally in <code>/Users/harrisonchase/workplace/langchain/docs/ecosystem/wandb/run-20230318_150550-wzy59zjq</code>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Syncing run <strong><a href='https://wandb.ai/harrison-chase/langchain_callback_demo/runs/wzy59zjq' target=\"_blank\">agent</a></strong> to <a href='https://wandb.ai/harrison-chase/langchain_callback_demo' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" View project at <a href='https://wandb.ai/harrison-chase/langchain_callback_demo' target=\"_blank\">https://wandb.ai/harrison-chase/langchain_callback_demo</a>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" View run at <a href='https://wandb.ai/harrison-chase/langchain_callback_demo/runs/wzy59zjq' target=\"_blank\">https://wandb.ai/harrison-chase/langchain_callback_demo/runs/wzy59zjq</a>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# SCENARIO 2 - Chain\n",
"template = \"\"\"You are a playwright. Given the title of play, it is your job to write a synopsis for that title.\n",
"Title: {title}\n",
"Playwright: This is a synopsis for the above play:\"\"\"\n",
"prompt_template = PromptTemplate(input_variables=[\"title\"], template=template)\n",
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callback_manager=manager)\n",
"\n",
"test_prompts = [\n",
" {\n",
" \"title\": \"documentary about good video games that push the boundary of game design\"\n",
" },\n",
" {\"title\": \"cocaine bear vs heroin wolf\"},\n",
" {\"title\": \"the best in class mlops tooling\"},\n",
"]\n",
"synopsis_chain.apply(test_prompts)\n",
"wandb_callback.flush_tracker(synopsis_chain, name=\"agent\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "_jN73xcPVEpI"
},
"outputs": [],
"source": [
"from langchain.agents import initialize_agent, load_tools"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "Gpq4rk6VT9cu"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out who Leo DiCaprio's girlfriend is and then calculate her age raised to the 0.43 power.\n",
"Action: Search\n",
"Action Input: \"Leo DiCaprio girlfriend\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mDiCaprio had a steady girlfriend in Camila Morrone. He had been with the model turned actress for nearly five years, as they were first said to be dating at the end of 2017. And the now 26-year-old Morrone is no stranger to Hollywood.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate her age raised to the 0.43 power.\n",
"Action: Calculator\n",
"Action Input: 26^0.43\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mAnswer: 4.059182145592686\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: Leo DiCaprio's girlfriend is Camila Morrone and her current age raised to the 0.43 power is 4.059182145592686.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/html": [
"Waiting for W&B process to finish... <strong style=\"color:green\">(success).</strong>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
" View run <strong style=\"color:#cdcd00\">agent</strong> at: <a href='https://wandb.ai/harrison-chase/langchain_callback_demo/runs/wzy59zjq' target=\"_blank\">https://wandb.ai/harrison-chase/langchain_callback_demo/runs/wzy59zjq</a><br/>Synced 5 W&B file(s), 2 media file(s), 7 artifact file(s) and 0 other file(s)"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Find logs at: <code>./wandb/run-20230318_150550-wzy59zjq/logs</code>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# SCENARIO 3 - Agent with Tools\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callback_manager=manager)\n",
"agent = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=\"zero-shot-react-description\",\n",
" callback_manager=manager,\n",
" verbose=True,\n",
")\n",
"agent.run(\n",
" \"Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?\"\n",
")\n",
"wandb_callback.flush_tracker(agent, reset=False, finish=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -158,14 +158,14 @@ Open Source
---
.. link-button:: https://github.com/jerryjliu/gpt_index
.. link-button:: https://github.com/jerryjliu/llama_index
:type: url
:text: GPT Index
:text: LlamaIndex
:classes: stretched-link btn-lg
+++
GPT Index is a project consisting of a set of data structures that are created using GPT-3 and can be traversed using GPT-3 in order to answer queries.
LlamaIndex (formerly GPT Index) is a project consisting of a set of data structures that are created using GPT-3 and can be traversed using GPT-3 in order to answer queries.
---
@@ -322,5 +322,14 @@ Proprietary
By Zahid Khawaja, this demo utilizes question answering to answer questions about a given website. A followup added this for `YouTube videos <https://twitter.com/chillzaza_/status/1593739682013220865?s=20&t=EhU8jl0KyCPJ7vE9Rnz-cQ>`_, and then another followup added it for `Wikipedia <https://twitter.com/chillzaza_/status/1594847151238037505?s=20&t=EhU8jl0KyCPJ7vE9Rnz-cQ>`_.
---
.. link-button:: https://mynd.so
:type: url
:text: Mynd
:classes: stretched-link btn-lg
+++
A journaling app for self-care that uses AI to uncover insights and patterns over time.

View File

@@ -97,6 +97,8 @@ The above modules can be used in a variety of ways. LangChain also provides guid
- `Summarization <./use_cases/summarization.html>`_: Summarizing longer documents into shorter, more condensed chunks of information. A type of Data Augmented Generation.
- `Querying Tabular Data <./use_cases/tabular.html>`_: If you want to understand how to use LLMs to query data that is stored in a tabular format (csvs, SQL, dataframes, etc) you should read this page.
- `Evaluation <./use_cases/evaluation.html>`_: Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.
- `Generate similar examples <./use_cases/generate_examples.html>`_: Generating similar examples to a given input. This is a common use case for many applications, and LangChain provides some prompts/chains for assisting in this.
@@ -117,6 +119,8 @@ The above modules can be used in a variety of ways. LangChain also provides guid
./use_cases/combine_docs.md
./use_cases/question_answering.md
./use_cases/summarization.md
./use_cases/tabular.rst
./use_cases/extraction.md
./use_cases/evaluation.rst
./use_cases/model_laboratory.ipynb

View File

@@ -92,7 +92,7 @@
"id": "f4814175-964d-42f1-aa9d-22801ce1e912",
"metadata": {},
"source": [
"## Initalize Toolkit and Agent\n",
"## Initialize Toolkit and Agent\n",
"\n",
"First, we'll create an agent with a single vectorstore."
]

View File

@@ -22,7 +22,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 1,
"id": "2e87c10a",
"metadata": {},
"outputs": [],
@@ -30,13 +30,14 @@
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain import OpenAI, VectorDBQA\n",
"from langchain.llms import OpenAI\n",
"from langchain.chains import RetrievalQA\n",
"llm = OpenAI(temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"execution_count": 2,
"id": "f2675861",
"metadata": {},
"outputs": [
@@ -62,17 +63,17 @@
},
{
"cell_type": "code",
"execution_count": 38,
"execution_count": 4,
"id": "bc5403d4",
"metadata": {},
"outputs": [],
"source": [
"state_of_union = VectorDBQA.from_chain_type(llm=llm, chain_type=\"stuff\", vectorstore=docsearch)"
"state_of_union = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\", retriever=docsearch.as_retriever())"
]
},
{
"cell_type": "code",
"execution_count": 39,
"execution_count": 5,
"id": "1431cded",
"metadata": {},
"outputs": [],
@@ -82,7 +83,7 @@
},
{
"cell_type": "code",
"execution_count": 40,
"execution_count": 6,
"id": "915d3ff3",
"metadata": {},
"outputs": [],
@@ -92,7 +93,7 @@
},
{
"cell_type": "code",
"execution_count": 41,
"execution_count": 7,
"id": "96a2edf8",
"metadata": {},
"outputs": [
@@ -109,7 +110,7 @@
"docs = loader.load()\n",
"ruff_texts = text_splitter.split_documents(docs)\n",
"ruff_db = Chroma.from_documents(ruff_texts, embeddings, collection_name=\"ruff\")\n",
"ruff = VectorDBQA.from_chain_type(llm=llm, chain_type=\"stuff\", vectorstore=ruff_db)"
"ruff = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\", retriever=ruff_db.as_retriever())"
]
},
{
@@ -264,9 +265,9 @@
"id": "9161ba91",
"metadata": {},
"source": [
"You can also set `return_direct=True` if you intend to use the agent as a router and just want to directly return the result of the VectorDBQaChain.\n",
"You can also set `return_direct=True` if you intend to use the agent as a router and just want to directly return the result of the RetrievalQAChain.\n",
"\n",
"Notice that in the above examples the agent did some extra work after querying the VectorDBQAChain. You can avoid that and just return the result directly."
"Notice that in the above examples the agent did some extra work after querying the RetrievalQAChain. You can avoid that and just return the result directly."
]
},
{

View File

@@ -0,0 +1,309 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4658d71a",
"metadata": {},
"source": [
"# Conversation Agent (for Chat Models)\n",
"\n",
"This notebook walks through using an agent optimized for conversation, using ChatModels. Other agents are often optimized for using tools to figure out the best response, which is not ideal in a conversational setting where you may want the agent to be able to chat with the user as well.\n",
"\n",
"This is accomplished with a specific type of agent (`chat-conversational-react-description`) which expects to be used with a memory component."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "f4f5d1a8",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f65308ab",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import Tool\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.utilities import SerpAPIWrapper\n",
"from langchain.agents import initialize_agent"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5fb14d6d",
"metadata": {},
"outputs": [],
"source": [
"search = SerpAPIWrapper()\n",
"tools = [\n",
" Tool(\n",
" name = \"Current Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events or the current state of the world. the input to this should be a single search term.\"\n",
" ),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "dddc34c4",
"metadata": {},
"outputs": [],
"source": [
"memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "cafe9bc1",
"metadata": {},
"outputs": [],
"source": [
"llm=ChatOpenAI(temperature=0)\n",
"agent_chain = initialize_agent(tools, llm, agent=\"chat-conversational-react-description\", verbose=True, memory=memory)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "dc70b454",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"Hello Bob! How can I assist you today?\"\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Hello Bob! How can I assist you today?'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(input=\"hi, i am bob\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3dcf7953",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"Your name is Bob.\"\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Your name is Bob.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(input=\"what's my name?\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "aa05f566",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Current Search\",\n",
" \"action_input\": \"Thai food dinner recipes\"\n",
"}\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m59 easy Thai recipes for any night of the week · Marion Grasby's Thai spicy chilli and basil fried rice · Thai curry noodle soup · Marion Grasby's ...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"Here are some Thai food dinner recipes you can make this week: Thai spicy chilli and basil fried rice, Thai curry noodle soup, and many more. You can find 59 easy Thai recipes for any night of the week on Marion Grasby's website.\"\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"Here are some Thai food dinner recipes you can make this week: Thai spicy chilli and basil fried rice, Thai curry noodle soup, and many more. You can find 59 easy Thai recipes for any night of the week on Marion Grasby's website.\""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(\"what are some good dinners to make this week, if i like thai food?\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "c5d8b7ea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m```json\n",
"{\n",
" \"action\": \"Current Search\",\n",
" \"action_input\": \"who won the world cup in 1978\"\n",
"}\n",
"```\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mThe Argentina national football team represents Argentina in men's international football and is administered by the Argentine Football Association, the governing body for football in Argentina. Nicknamed La Albiceleste, they are the reigning world champions, having won the most recent World Cup in 2022.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m```json\n",
"{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"The last letter in your name is 'b'. The Argentina national football team won the World Cup in 1978.\"\n",
"}\n",
"```\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"The last letter in your name is 'b'. The Argentina national football team won the World Cup in 1978.\""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(input=\"tell me the last letter in my name, and also tell me who won the world cup in 1978?\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "f608889b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Current Search\",\n",
" \"action_input\": \"weather in pomfret\"\n",
"}\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mMostly cloudy with gusty winds developing during the afternoon. A few flurries or snow showers possible. High near 40F. Winds NNW at 20 to 30 mph.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"The weather in Pomfret is mostly cloudy with gusty winds developing during the afternoon. A few flurries or snow showers are possible. High near 40F. Winds NNW at 20 to 30 mph.\"\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The weather in Pomfret is mostly cloudy with gusty winds developing during the afternoon. A few flurries or snow showers are possible. High near 40F. Winds NNW at 20 to 30 mph.'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(input=\"whats the weather like in pomfret?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0084efd6",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,132 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Human as a tool\n",
"\n",
"Human are AGI so they can certainly be used as a tool to help out AI agent \n",
"when it is confused."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.llms import OpenAI\n",
"from langchain.agents import load_tools, initialize_agent\n",
"\n",
"llm = ChatOpenAI(temperature=0.0)\n",
"math_llm = OpenAI(temperature=0.0)\n",
"tools = load_tools(\n",
" [\"human\", \"llm-math\"], \n",
" llm=math_llm,\n",
")\n",
"\n",
"agent_chain = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=\"zero-shot-react-description\",\n",
" verbose=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the above code you can see the tool takes input directly from command line.\n",
"You can customize `prompt_func` and `input_func` according to your need."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mI don't know Eric Zhu, so I should ask a human for guidance.\n",
"Action: Human\n",
"Action Input: \"Do you know when Eric Zhu's birthday is?\"\u001b[0m\n",
"\n",
"Do you know when Eric Zhu's birthday is?\n",
"last week\n",
"\n",
"Observation: \u001b[36;1m\u001b[1;3mlast week\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mThat's not very helpful. I should ask for more information.\n",
"Action: Human\n",
"Action Input: \"Do you know the specific date of Eric Zhu's birthday?\"\u001b[0m\n",
"\n",
"Do you know the specific date of Eric Zhu's birthday?\n",
"august 1st\n",
"\n",
"Observation: \u001b[36;1m\u001b[1;3maugust 1st\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mNow that I have the date, I can check if it's a leap year or not.\n",
"Action: Calculator\n",
"Action Input: \"Is 2021 a leap year?\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mAnswer: False\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mI have all the information I need to answer the original question.\n",
"Final Answer: Eric Zhu's birthday is on August 1st and it is not a leap year in 2021.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"Eric Zhu's birthday is on August 1st and it is not a leap year in 2021.\""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"agent_chain.run(\"What is Eric Zhu's birthday?\")\n",
"# Answer with \"last week\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -61,7 +61,8 @@
"tools = [\n",
" Tool(\n",
" name=\"Intermediate Answer\",\n",
" func=search.run\n",
" func=search.run,\n",
" description=\"useful for when you need to ask with search\"\n",
" )\n",
"]\n",
"\n",

View File

@@ -0,0 +1,552 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "fa6802ac",
"metadata": {},
"source": [
"# Adding SharedMemory to an Agent and its Tools\n",
"\n",
"This notebook goes over adding memory to **both** of an Agent and its tools. Before going through this notebook, please walk through the following notebooks, as this will build on top of both of them:\n",
"\n",
"- [Adding memory to an LLM Chain](../../memory/examples/adding_memory.ipynb)\n",
"- [Custom Agents](custom_agent.ipynb)\n",
"\n",
"We are going to create a custom Agent. The agent has access to a conversation memory, search tool, and a summarization tool. And, the summarization tool also needs access to the conversation memory."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8db95912",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import ZeroShotAgent, Tool, AgentExecutor\n",
"from langchain.memory import ConversationBufferMemory, ReadOnlySharedMemory\n",
"from langchain import OpenAI, LLMChain, PromptTemplate\n",
"from langchain.utilities import GoogleSearchAPIWrapper"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "06b7187b",
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"This is a conversation between a human and a bot:\n",
"\n",
"{chat_history}\n",
"\n",
"Write a summary of the conversation for {input}:\n",
"\"\"\"\n",
"\n",
"prompt = PromptTemplate(\n",
" input_variables=[\"input\", \"chat_history\"], \n",
" template=template\n",
")\n",
"memory = ConversationBufferMemory(memory_key=\"chat_history\")\n",
"readonlymemory = ReadOnlySharedMemory(memory=memory)\n",
"summry_chain = LLMChain(\n",
" llm=OpenAI(), \n",
" prompt=prompt, \n",
" verbose=True, \n",
" memory=readonlymemory, # use the read-only memory to prevent the tool from modifying the memory\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "97ad8467",
"metadata": {},
"outputs": [],
"source": [
"search = GoogleSearchAPIWrapper()\n",
"tools = [\n",
" Tool(\n",
" name = \"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events\"\n",
" ),\n",
" Tool(\n",
" name = \"Summary\",\n",
" func=summry_chain.run,\n",
" description=\"useful for when you summarize a conversation. The input to this tool should be a string, representing who will read this summary.\"\n",
" )\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "e3439cd6",
"metadata": {},
"outputs": [],
"source": [
"prefix = \"\"\"Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:\"\"\"\n",
"suffix = \"\"\"Begin!\"\n",
"\n",
"{chat_history}\n",
"Question: {input}\n",
"{agent_scratchpad}\"\"\"\n",
"\n",
"prompt = ZeroShotAgent.create_prompt(\n",
" tools, \n",
" prefix=prefix, \n",
" suffix=suffix, \n",
" input_variables=[\"input\", \"chat_history\", \"agent_scratchpad\"]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0021675b",
"metadata": {},
"source": [
"We can now construct the LLMChain, with the Memory object, and then create the agent."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "c56a0e73",
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)\n",
"agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)\n",
"agent_chain = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ca4bc1fb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I should research ChatGPT to answer this question.\n",
"Action: Search\n",
"Action Input: \"ChatGPT\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mNov 30, 2022 ... We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer ... ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large ... ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer ... Feb 2, 2023 ... ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after ... 2 days ago ... ChatGPT recently launched a new version of its own plagiarism detection tool, with hopes that it will squelch some of the criticism around how ... An API for accessing new AI models developed by OpenAI. Feb 19, 2023 ... ChatGPT is an AI chatbot system that OpenAI released in November to show off and test what a very large, powerful AI system can accomplish. You ... ChatGPT is fine-tuned from GPT-3.5, a language model trained to produce text. ChatGPT was optimized for dialogue by using Reinforcement Learning with Human ... 3 days ago ... Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. Dec 1, 2022 ... ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a ...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\""
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(input=\"What is ChatGPT?\")"
]
},
{
"cell_type": "markdown",
"id": "45627664",
"metadata": {},
"source": [
"To test the memory of this agent, we can ask a followup question that relies on information in the previous exchange to be answered correctly."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "eecc0462",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out who developed ChatGPT\n",
"Action: Search\n",
"Action Input: Who developed ChatGPT\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large ... Feb 15, 2023 ... Who owns Chat GPT? Chat GPT is owned and developed by AI research and deployment company, OpenAI. The organization is headquartered in San ... Feb 8, 2023 ... ChatGPT is an AI chatbot developed by San Francisco-based startup OpenAI. OpenAI was co-founded in 2015 by Elon Musk and Sam Altman and is ... Dec 7, 2022 ... ChatGPT is an AI chatbot designed and developed by OpenAI. The bot works by generating text responses based on human-user input, like questions ... Jan 12, 2023 ... In 2019, Microsoft invested $1 billion in OpenAI, the tiny San Francisco company that designed ChatGPT. And in the years since, it has quietly ... Jan 25, 2023 ... The inside story of ChatGPT: How OpenAI founder Sam Altman built the world's hottest technology with billions from Microsoft. Dec 3, 2022 ... ChatGPT went viral on social media for its ability to do anything from code to write essays. · The company that created the AI chatbot has a ... Jan 17, 2023 ... While many Americans were nursing hangovers on New Year's Day, 22-year-old Edward Tian was working feverishly on a new app to combat misuse ... ChatGPT is a language model created by OpenAI, an artificial intelligence research laboratory consisting of a team of researchers and engineers focused on ... 1 day ago ... Everyone is talking about ChatGPT, developed by OpenAI. This is such a great tool that has helped to make AI more accessible to a wider ...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: ChatGPT was developed by OpenAI.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'ChatGPT was developed by OpenAI.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(input=\"Who developed it?\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c34424cf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to simplify the conversation for a 5 year old.\n",
"Action: Summary\n",
"Action Input: My daughter 5 years old\u001b[0m\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThis is a conversation between a human and a bot:\n",
"\n",
"Human: What is ChatGPT?\n",
"AI: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\n",
"Human: Who developed it?\n",
"AI: ChatGPT was developed by OpenAI.\n",
"\n",
"Write a summary of the conversation for My daughter 5 years old:\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3m\n",
"The conversation was about ChatGPT, an artificial intelligence chatbot. It was created by OpenAI and can send and receive images while chatting.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot created by OpenAI that can send and receive images while chatting.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'ChatGPT is an artificial intelligence chatbot created by OpenAI that can send and receive images while chatting.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(input=\"Thanks. Summarize the conversation, for my daughter 5 years old.\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4ebd8326",
"metadata": {},
"source": [
"Confirm that the memory was correctly updated."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b91f8c85",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Human: What is ChatGPT?\n",
"AI: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\n",
"Human: Who developed it?\n",
"AI: ChatGPT was developed by OpenAI.\n",
"Human: Thanks. Summarize the conversation, for my daughter 5 years old.\n",
"AI: ChatGPT is an artificial intelligence chatbot created by OpenAI that can send and receive images while chatting.\n"
]
}
],
"source": [
"print(agent_chain.memory.buffer)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "cc3d0aa4",
"metadata": {},
"source": [
"For comparison, below is a bad example that uses the same memory for both the Agent and the tool."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "3359d043",
"metadata": {},
"outputs": [],
"source": [
"## This is a bad practice for using the memory.\n",
"## Use the ReadOnlySharedMemory class, as shown above.\n",
"\n",
"template = \"\"\"This is a conversation between a human and a bot:\n",
"\n",
"{chat_history}\n",
"\n",
"Write a summary of the conversation for {input}:\n",
"\"\"\"\n",
"\n",
"prompt = PromptTemplate(\n",
" input_variables=[\"input\", \"chat_history\"], \n",
" template=template\n",
")\n",
"memory = ConversationBufferMemory(memory_key=\"chat_history\")\n",
"summry_chain = LLMChain(\n",
" llm=OpenAI(), \n",
" prompt=prompt, \n",
" verbose=True, \n",
" memory=memory, # <--- this is the only change\n",
")\n",
"\n",
"search = GoogleSearchAPIWrapper()\n",
"tools = [\n",
" Tool(\n",
" name = \"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events\"\n",
" ),\n",
" Tool(\n",
" name = \"Summary\",\n",
" func=summry_chain.run,\n",
" description=\"useful for when you summarize a conversation. The input to this tool should be a string, representing who will read this summary.\"\n",
" )\n",
"]\n",
"\n",
"prefix = \"\"\"Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:\"\"\"\n",
"suffix = \"\"\"Begin!\"\n",
"\n",
"{chat_history}\n",
"Question: {input}\n",
"{agent_scratchpad}\"\"\"\n",
"\n",
"prompt = ZeroShotAgent.create_prompt(\n",
" tools, \n",
" prefix=prefix, \n",
" suffix=suffix, \n",
" input_variables=[\"input\", \"chat_history\", \"agent_scratchpad\"]\n",
")\n",
"\n",
"llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)\n",
"agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)\n",
"agent_chain = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "970d23df",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I should research ChatGPT to answer this question.\n",
"Action: Search\n",
"Action Input: \"ChatGPT\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mNov 30, 2022 ... We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer ... ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large ... ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer ... Feb 2, 2023 ... ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after ... 2 days ago ... ChatGPT recently launched a new version of its own plagiarism detection tool, with hopes that it will squelch some of the criticism around how ... An API for accessing new AI models developed by OpenAI. Feb 19, 2023 ... ChatGPT is an AI chatbot system that OpenAI released in November to show off and test what a very large, powerful AI system can accomplish. You ... ChatGPT is fine-tuned from GPT-3.5, a language model trained to produce text. ChatGPT was optimized for dialogue by using Reinforcement Learning with Human ... 3 days ago ... Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. Dec 1, 2022 ... ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a ...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\""
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(input=\"What is ChatGPT?\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d9ea82f0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out who developed ChatGPT\n",
"Action: Search\n",
"Action Input: Who developed ChatGPT\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large ... Feb 15, 2023 ... Who owns Chat GPT? Chat GPT is owned and developed by AI research and deployment company, OpenAI. The organization is headquartered in San ... Feb 8, 2023 ... ChatGPT is an AI chatbot developed by San Francisco-based startup OpenAI. OpenAI was co-founded in 2015 by Elon Musk and Sam Altman and is ... Dec 7, 2022 ... ChatGPT is an AI chatbot designed and developed by OpenAI. The bot works by generating text responses based on human-user input, like questions ... Jan 12, 2023 ... In 2019, Microsoft invested $1 billion in OpenAI, the tiny San Francisco company that designed ChatGPT. And in the years since, it has quietly ... Jan 25, 2023 ... The inside story of ChatGPT: How OpenAI founder Sam Altman built the world's hottest technology with billions from Microsoft. Dec 3, 2022 ... ChatGPT went viral on social media for its ability to do anything from code to write essays. · The company that created the AI chatbot has a ... Jan 17, 2023 ... While many Americans were nursing hangovers on New Year's Day, 22-year-old Edward Tian was working feverishly on a new app to combat misuse ... ChatGPT is a language model created by OpenAI, an artificial intelligence research laboratory consisting of a team of researchers and engineers focused on ... 1 day ago ... Everyone is talking about ChatGPT, developed by OpenAI. This is such a great tool that has helped to make AI more accessible to a wider ...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: ChatGPT was developed by OpenAI.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'ChatGPT was developed by OpenAI.'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(input=\"Who developed it?\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "5b1f9223",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to simplify the conversation for a 5 year old.\n",
"Action: Summary\n",
"Action Input: My daughter 5 years old\u001b[0m\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThis is a conversation between a human and a bot:\n",
"\n",
"Human: What is ChatGPT?\n",
"AI: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\n",
"Human: Who developed it?\n",
"AI: ChatGPT was developed by OpenAI.\n",
"\n",
"Write a summary of the conversation for My daughter 5 years old:\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3m\n",
"The conversation was about ChatGPT, an artificial intelligence chatbot developed by OpenAI. It is designed to have conversations with humans and can also send and receive images.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI that can have conversations with humans and send and receive images.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'ChatGPT is an artificial intelligence chatbot developed by OpenAI that can have conversations with humans and send and receive images.'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_chain.run(input=\"Thanks. Summarize the conversation, for my daughter 5 years old.\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d07415da",
"metadata": {},
"source": [
"The final answer is not wrong, but we see the 3rd Human input is actually from the agent in the memory because the memory was modified by the summary tool."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "32f97b21",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Human: What is ChatGPT?\n",
"AI: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\n",
"Human: Who developed it?\n",
"AI: ChatGPT was developed by OpenAI.\n",
"Human: My daughter 5 years old\n",
"AI: \n",
"The conversation was about ChatGPT, an artificial intelligence chatbot developed by OpenAI. It is designed to have conversations with humans and can also send and receive images.\n",
"Human: Thanks. Summarize the conversation, for my daughter 5 years old.\n",
"AI: ChatGPT is an artificial intelligence chatbot developed by OpenAI that can have conversations with humans and send and receive images.\n"
]
}
],
"source": [
"print(agent_chain.memory.buffer)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,253 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f1390152",
"metadata": {},
"source": [
"# MRKL Chat\n",
"\n",
"This notebook showcases using an agent to replicate the MRKL chain using an agent optimized for chat models."
]
},
{
"cell_type": "markdown",
"id": "39ea3638",
"metadata": {},
"source": [
"This uses the example Chinook database.\n",
"To set it up follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the `.db` file in a notebooks folder at the root of this repository."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ac561cc4",
"metadata": {},
"outputs": [],
"source": [
"from langchain import OpenAI, LLMMathChain, SerpAPIWrapper, SQLDatabase, SQLDatabaseChain\n",
"from langchain.agents import initialize_agent, Tool\n",
"from langchain.chat_models import ChatOpenAI"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "07e96d99",
"metadata": {},
"outputs": [],
"source": [
"llm = ChatOpenAI(temperature=0)\n",
"llm1 = OpenAI(temperature=0)\n",
"search = SerpAPIWrapper()\n",
"llm_math_chain = LLMMathChain(llm=llm1, verbose=True)\n",
"db = SQLDatabase.from_uri(\"sqlite:///../../../../notebooks/Chinook.db\")\n",
"db_chain = SQLDatabaseChain(llm=llm1, database=db, verbose=True)\n",
"tools = [\n",
" Tool(\n",
" name = \"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events. You should ask targeted questions\"\n",
" ),\n",
" Tool(\n",
" name=\"Calculator\",\n",
" func=llm_math_chain.run,\n",
" description=\"useful for when you need to answer questions about math\"\n",
" ),\n",
" Tool(\n",
" name=\"FooBar DB\",\n",
" func=db_chain.run,\n",
" description=\"useful for when you need to answer questions about FooBar. Input should be in the form of a question containing full context\"\n",
" )\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a069c4b6",
"metadata": {},
"outputs": [],
"source": [
"mrkl = initialize_agent(tools, llm, agent=\"chat-zero-shot-react-description\", verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "e603cd7d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: The first question requires a search, while the second question requires a calculator.\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Search\",\n",
" \"action_input\": \"Who is Leo DiCaprio's girlfriend?\"\n",
"}\n",
"```\n",
"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mCamila Morrone\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mFor the second question, I need to use the calculator tool to raise her current age to the 0.43 power.\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Calculator\",\n",
" \"action_input\": \"22.0^(0.43)\"\n",
"}\n",
"```\n",
"\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Entering new LLMMathChain chain...\u001b[0m\n",
"22.0^(0.43)\u001b[32;1m\u001b[1;3m\n",
"```python\n",
"import math\n",
"print(math.pow(22.0, 0.43))\n",
"```\n",
"\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3m3.777824273683966\n",
"\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3mAnswer: 3.777824273683966\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mI now know the final answer.\n",
"Final Answer: Camila Morrone, 3.777824273683966.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Camila Morrone, 3.777824273683966.'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mrkl.run(\"Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a5c07010",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mQuestion: What is the full name of the artist who recently released an album called 'The Storm Before the Calm' and are they in the FooBar database? If so, what albums of theirs are in the FooBar database?\n",
"Thought: I should use the Search tool to find the answer to the first part of the question and then use the FooBar DB tool to find the answer to the second part of the question.\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Search\",\n",
" \"action_input\": \"Who recently released an album called 'The Storm Before the Calm'\"\n",
"}\n",
"```\n",
"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mAlanis Morissette\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mNow that I have the name of the artist, I can use the FooBar DB tool to find their albums in the database.\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"FooBar DB\",\n",
" \"action_input\": \"What albums does Alanis Morissette have in the database?\"\n",
"}\n",
"```\n",
"\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
"What albums does Alanis Morissette have in the database? \n",
"SQLQuery:"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/workplace/langchain/langchain/sql_database.py:141: SAWarning: Dialect sqlite+pysqlite does *not* support Decimal objects natively, and SQLAlchemy must convert from floating point - rounding errors and other issues may occur. Please consider storing Decimal numbers as strings or integers on this platform for lossless storage.\n",
" sample_rows = connection.execute(command)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32;1m\u001b[1;3m SELECT Title FROM Album WHERE ArtistId IN (SELECT ArtistId FROM Artist WHERE Name = 'Alanis Morissette') LIMIT 5;\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[('Jagged Little Pill',)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m Alanis Morissette has the album 'Jagged Little Pill' in the database.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"Observation: \u001b[38;5;200m\u001b[1;3m Alanis Morissette has the album 'Jagged Little Pill' in the database.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mI have found the answer to both parts of the question.\n",
"Final Answer: The artist who recently released an album called 'The Storm Before the Calm' is Alanis Morissette. The album 'Jagged Little Pill' is in the FooBar database.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"The artist who recently released an album called 'The Storm Before the Calm' is Alanis Morissette. The album 'Jagged Little Pill' is in the FooBar database.\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mrkl.run(\"What is the full name of the artist who recently released an album called 'The Storm Before the Calm' and are they in the FooBar database? If so, what albums of theirs are in the FooBar database?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af016a70",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -24,11 +24,13 @@
"tools = [\n",
" Tool(\n",
" name=\"Search\",\n",
" func=docstore.search\n",
" func=docstore.search,\n",
" description=\"useful for when you need to ask with search\"\n",
" ),\n",
" Tool(\n",
" name=\"Lookup\",\n",
" func=docstore.lookup\n",
" func=docstore.lookup,\n",
" description=\"useful for when you need to ask with lookup\"\n",
" )\n",
"]\n",
"\n",

View File

@@ -52,7 +52,8 @@
"tools = [\n",
" Tool(\n",
" name=\"Intermediate Answer\",\n",
" func=search.run\n",
" func=search.run,\n",
" description=\"useful for when you need to ask with search\"\n",
" )\n",
"]\n",
"\n",

View File

@@ -13,3 +13,4 @@ For more detailed information on tools, and different types of tools in LangChai
Toolkits are groups of tools that are best used together.
They allow you to logically group and initialize a set of tools that share a particular resource (such as a database connection or json object).
They can be used to construct an agent for a specific use-case.
For more detailed information on toolkits and their use cases, see [this documentation](how_to_guides.rst#agent-toolkits) (the "Agent Toolkits" section).

View File

@@ -136,3 +136,19 @@ Below is a list of all supported tools and relevant information:
- Requires LLM: No
- Extra Parameters: `serper_api_key`
- For more information on this, see [this page](../../ecosystem/google_serper.md)
**wikipedia**
- Tool Name: Wikipedia
- Tool Description: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, historical events, or other subjects. Input should be a search query.
- Notes: Uses the [wikipedia](https://pypi.org/project/wikipedia/) Python package to call the MediaWiki API and then parses results.
- Requires LLM: No
- Extra Parameters: `top_k_results`
**podcast-api**
- Tool Name: Podcast API
- Tool Description: Use the Listen Notes Podcast API to search all podcasts or episodes. The input should be a question in natural language that this API can answer.
- Notes: A natural language connection to the Listen Notes Podcast API (`https://www.PodcastAPI.com`), specifically the `/search/` endpoint.
- Requires LLM: Yes
- Extra Parameters: `listen_api_key` (your api key to access this endpoint)

View File

@@ -149,6 +149,33 @@
"chain.run(\"Search for 'Avatar'\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Listen API Example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.llms import OpenAI\n",
"from langchain.chains.api import podcast_docs\n",
"from langchain.chains import APIChain\n",
"\n",
"# Get api key here: https://www.listennotes.com/api/pricing/\n",
"listen_api_key = 'xxx'\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"headers = {\"X-ListenAPI-Key\": listen_api_key}\n",
"chain = APIChain.from_llm_and_api_docs(llm, podcast_docs.PODCAST_DOCS, headers=headers, verbose=True)\n",
"chain.run(\"Search for 'silicon valley bank' podcast episodes, audio length is more than 30 minutes, return only 1 results\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -173,7 +200,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -377,18 +377,19 @@
"\tFOREIGN KEY(\"GenreId\") REFERENCES \"Genre\" (\"GenreId\"), \n",
"\tFOREIGN KEY(\"AlbumId\") REFERENCES \"Album\" (\"AlbumId\")\n",
")\n",
"\n",
"SELECT * FROM 'Track' LIMIT 2;\n",
"/*\n",
"2 rows from Track table:\n",
"TrackId\tName\tAlbumId\tMediaTypeId\tGenreId\tComposer\tMilliseconds\tBytes\tUnitPrice\n",
"1\tFor Those About To Rock (We Salute You)\t1\t1\t1\tAngus Young, Malcolm Young, Brian Johnson\t343719\t11170334\t0.99\n",
"2\tBalls to the Wall\t2\t2\t1\tNone\t342562\t5510424\t0.99\n"
"2\tBalls to the Wall\t2\t2\t1\tNone\t342562\t5510424\t0.99\n",
"*/\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jon/projects/langchain/langchain/sql_database.py:121: SAWarning: Dialect sqlite+pysqlite does *not* support Decimal objects natively, and SQLAlchemy must convert from floating point - rounding errors and other issues may occur. Please consider storing Decimal numbers as strings or integers on this platform for lossless storage.\n",
"/home/jon/projects/langchain/langchain/sql_database.py:135: SAWarning: Dialect sqlite+pysqlite does *not* support Decimal objects natively, and SQLAlchemy must convert from floating point - rounding errors and other issues may occur. Please consider storing Decimal numbers as strings or integers on this platform for lossless storage.\n",
" sample_rows = connection.execute(command)\n"
]
}
@@ -467,12 +468,13 @@
"\t\"Composer\" NVARCHAR(220),\n",
"\tPRIMARY KEY (\"TrackId\")\n",
")\n",
"\n",
"SELECT * FROM 'Track' LIMIT 3;\n",
"/*\n",
"3 rows from Track table:\n",
"TrackId\tName\tComposer\n",
"1\tFor Those About To Rock (We Salute You)\tAngus Young, Malcolm Young, Brian Johnson\n",
"2\tBalls to the Wall\tNone\n",
"3\tMy favorite song ever\tThe coolest composer of all time\"\"\"\n",
"3\tMy favorite song ever\tThe coolest composer of all time\n",
"*/\"\"\"\n",
"}"
]
},
@@ -492,11 +494,12 @@
"\t\"Name\" NVARCHAR(120), \n",
"\tPRIMARY KEY (\"PlaylistId\")\n",
")\n",
"\n",
"SELECT * FROM 'Playlist' LIMIT 2;\n",
"/*\n",
"2 rows from Playlist table:\n",
"PlaylistId\tName\n",
"1\tMusic\n",
"2\tMovies\n",
"*/\n",
"\n",
"CREATE TABLE Track (\n",
"\t\"TrackId\" INTEGER NOT NULL, \n",
@@ -504,12 +507,13 @@
"\t\"Composer\" NVARCHAR(220),\n",
"\tPRIMARY KEY (\"TrackId\")\n",
")\n",
"\n",
"SELECT * FROM 'Track' LIMIT 3;\n",
"/*\n",
"3 rows from Track table:\n",
"TrackId\tName\tComposer\n",
"1\tFor Those About To Rock (We Salute You)\tAngus Young, Malcolm Young, Brian Johnson\n",
"2\tBalls to the Wall\tNone\n",
"3\tMy favorite song ever\tThe coolest composer of all time\n"
"3\tMy favorite song ever\tThe coolest composer of all time\n",
"*/\n"
]
}
],
@@ -528,7 +532,7 @@
"id": "5fc6f507",
"metadata": {},
"source": [
"Note how our custom table definition and sample rows for `Track` overrides the `sample_rows_in_table_info` parameter. Tables that are not overriden by `custom_table_info`, in this example `Playlist`, will have their table info gathered automatically as usual."
"Note how our custom table definition and sample rows for `Track` overrides the `sample_rows_in_table_info` parameter. Tables that are not overridden by `custom_table_info`, in this example `Playlist`, will have their table info gathered automatically as usual."
]
},
{
@@ -675,7 +679,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -9,3 +9,12 @@ This is a specific type of chain where multiple other chains are run in sequence
to the next. A subtype of this type of chain is the [`SimpleSequentialChain`](./generic/sequential_chains.html#simplesequentialchain), where all subchains have only one input and one output,
and the output of one is therefore used as sole input to the next chain.
## Prompt Selectors
One thing that we've noticed is that the best prompt to use is really dependent on the model you use.
Some prompts work really good with some models, but not great with others.
One of our goals is provide good chains that "just work" out of the box.
A big part of chains like that is having prompts that "just work".
So rather than having a default prompt for chains, we are moving towards a paradigm where if a prompt is not explicitly
provided we select one with a PromptSelector. This class takes in the model passed in, and returns a default prompt.
The inner workings of the PromptSelector can look at any aspect of the model - LLM vs ChatModel, OpenAI vs Cohere, GPT3 vs GPT4, etc.
Due to this being a newer feature, this may not be implemented for all chains, but this is the direction we are moving.

View File

@@ -9,7 +9,7 @@
"\n",
"This notebook goes over how to set up a chat model to chat with a vector database.\n",
"\n",
"This notebook is very similar to the example of using an LLM in the ChatVectorDBChain. The only differences here are (1) using a ChatModel, and (2) passing in a ChatPromptTemplate (optimized for chat models)."
"This notebook is very similar to the example of using an LLM in the ConversationalRetrievalChain. The only differences here are (1) using a ChatModel, and (2) passing in a ChatPromptTemplate (optimized for chat models)."
]
},
{
@@ -24,7 +24,7 @@
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.chains import ChatVectorDBChain"
"from langchain.chains import ConversationalRetrievalChain"
]
},
{
@@ -157,7 +157,7 @@
"id": "3c96b118",
"metadata": {},
"source": [
"We now initialize the ChatVectorDBChain"
"We now initialize the ConversationalRetrievalChain"
]
},
{
@@ -169,7 +169,7 @@
},
"outputs": [],
"source": [
"qa = ChatVectorDBChain.from_llm(ChatOpenAI(temperature=0), vectorstore,qa_prompt=prompt)"
"qa = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0), vectorstore,qa_prompt=prompt)"
]
},
{
@@ -205,7 +205,7 @@
{
"data": {
"text/plain": [
"\"The President nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. He described her as one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and a consensus builder. He also mentioned that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
"\"The President nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. He described her as one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and a consensus builder. She has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 9,
@@ -227,7 +227,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 10,
"id": "00b4cf00",
"metadata": {
"tags": []
@@ -241,7 +241,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 11,
"id": "f01828d1",
"metadata": {
"tags": []
@@ -250,10 +250,10 @@
{
"data": {
"text/plain": [
"'The context does not provide information about the predecessor of Ketanji Brown Jackson.'"
"\"The President mentioned Circuit Court of Appeals Judge Ketanji Brown Jackson as the nominee for the United States Supreme Court. He described her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence. The President did not mention any specific sources of support for Judge Jackson, but he did note that advancing immigration reform is supported by everyone from labor unions to religious leaders to the U.S. Chamber of Commerce.\""
]
},
"execution_count": 13,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
@@ -267,14 +267,14 @@
"id": "2324cdc6-98bf-4708-b8cd-02a98b1e5b67",
"metadata": {},
"source": [
"## Chat Vector DB with streaming to `stdout`\n",
"## ConversationalRetrievalChain with streaming to `stdout`\n",
"\n",
"Output from the chain will be streamed to `stdout` token by token in this example."
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 12,
"id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
"metadata": {
"tags": []
@@ -285,7 +285,7 @@
"from langchain.llms import OpenAI\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"from langchain.chains.chat_vector_db.prompts import CONDENSE_QUESTION_PROMPT\n",
"from langchain.chains.chat_index.prompts import CONDENSE_QUESTION_PROMPT\n",
"from langchain.chains.question_answering import load_qa_chain\n",
"\n",
"# Construct a ChatVectorDBChain with a streaming llm for combine docs\n",
@@ -296,12 +296,12 @@
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
"doc_chain = load_qa_chain(streaming_llm, chain_type=\"stuff\", prompt=prompt)\n",
"\n",
"qa = ChatVectorDBChain(vectorstore=vectorstore, combine_docs_chain=doc_chain, question_generator=question_generator)"
"qa = ConversationalRetrievalChain(retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator)\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 13,
"id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
"metadata": {
"tags": []
@@ -323,7 +323,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 14,
"id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
"metadata": {
"tags": []

View File

@@ -0,0 +1,188 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "959300d4",
"metadata": {},
"source": [
"# PromptLayer ChatOpenAI\n",
"\n",
"This example showcases how to connect to [PromptLayer](https://www.promptlayer.com) to start recording your ChatOpenAI requests."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6a45943e",
"metadata": {},
"source": [
"## Install PromptLayer\n",
"The `promptlayer` package is required to use PromptLayer with OpenAI. Install `promptlayer` using pip."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dbe09bd8",
"metadata": {
"vscode": {
"languageId": "powershell"
}
},
"outputs": [],
"source": [
"pip install promptlayer"
]
},
{
"cell_type": "markdown",
"id": "536c1dfa",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c16da3b5",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.chat_models import PromptLayerChatOpenAI\n",
"from langchain.schema import HumanMessage"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8564ce7d",
"metadata": {},
"source": [
"## Set the Environment API Key\n",
"You can create a PromptLayer API Key at [wwww.promptlayer.com](https://ww.promptlayer.com) by clicking the settings cog in the navbar.\n",
"\n",
"Set it as an environment variable called `PROMPTLAYER_API_KEY`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "46ba25dc",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"PROMPTLAYER_API_KEY\"] = \"**********\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "bf0294de",
"metadata": {},
"source": [
"## Use the PromptLayerOpenAI LLM like normal\n",
"*You can optionally pass in `pl_tags` to track your requests with PromptLayer's tagging feature.*"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3acf0069",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='to take a nap in a cozy spot. I search around for a suitable place and finally settle on a soft cushion on the window sill. I curl up into a ball and close my eyes, relishing the warmth of the sun on my fur. As I drift off to sleep, I can hear the birds chirping outside and feel the gentle breeze blowing through the window. This is the life of a contented cat.', additional_kwargs={})"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chat = PromptLayerChatOpenAI(pl_tags=[\"langchain\"])\n",
"chat([HumanMessage(content=\"I am a cat and I want\")])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "a2d76826",
"metadata": {},
"source": [
"**The above request should now appear on your [PromptLayer dashboard](https://ww.promptlayer.com).**"
]
},
{
"cell_type": "markdown",
"id": "05e9e2fe",
"metadata": {},
"source": []
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c43803d1",
"metadata": {},
"source": [
"## Using PromptLayer Track\n",
"If you would like to use any of the [PromptLayer tracking features](https://magniv.notion.site/Track-4deee1b1f7a34c1680d085f82567dab9), you need to pass the argument `return_pl_id` when instantializing the PromptLayer LLM to get the request id. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b7d4db01",
"metadata": {},
"outputs": [],
"source": [
"chat = PromptLayerChatOpenAI(return_pl_id=True)\n",
"chat_results = chat.generate([[HumanMessage(content=\"I am a cat and I want\")]])\n",
"\n",
"for res in chat_results.generations:\n",
" pl_request_id = res[0].generation_info[\"pl_request_id\"]\n",
" promptlayer.track.score(request_id=pl_request_id, score=100)"
]
},
{
"cell_type": "markdown",
"id": "13e56507",
"metadata": {},
"source": [
"Using this allows you to track the performance of your model in the PromptLayer dashboard. If you are using a prompt template, you can attach a template to a request as well.\n",
"Overall, this gives you the opportunity to track the performance of different templates and models in the PromptLayer dashboard."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8 (default, Apr 13 2021, 12:59:45) \n[Clang 10.0.0 ]"
},
"vscode": {
"interpreter": {
"hash": "8a5edab282632443219e051e4ade2d1d5bbc671c781051bf1437897cbdfea0f1"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -5,16 +5,16 @@
"id": "07c1e3b9",
"metadata": {},
"source": [
"# Vector DB Question/Answering\n",
"# Retrieval Question/Answering\n",
"\n",
"This example showcases using a chat model to do question answering over a vector database.\n",
"\n",
"This notebook is very similar to the example of using an LLM in the ChatVectorDBChain. The only differences here are (1) using a ChatModel, and (2) passing in a ChatPromptTemplate (optimized for chat models)."
"This notebook is very similar to the example of using an LLM in the RetrievalQA. The only differences here are (1) using a ChatModel, and (2) passing in a ChatPromptTemplate (optimized for chat models)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 7,
"id": "82525493",
"metadata": {},
"outputs": [],
@@ -22,7 +22,7 @@
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.chains import VectorDBQA"
"from langchain.chains import RetrievalQA"
]
},
{
@@ -100,28 +100,28 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"id": "3018f865",
"metadata": {},
"outputs": [],
"source": [
"chain_type_kwargs = {\"prompt\": prompt}\n",
"qa = VectorDBQA.from_chain_type(llm=ChatOpenAI(), chain_type=\"stuff\", vectorstore=docsearch, chain_type_kwargs=chain_type_kwargs)"
"qa = RetrievalQA.from_chain_type(llm=ChatOpenAI(), chain_type=\"stuff\", retriever=docsearch.as_retriever(), chain_type_kwargs=chain_type_kwargs)\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 9,
"id": "032a47f8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"The President nominated Ketanji Brown Jackson as a Judge for the United States Supreme Court. He described her as one of the nation's top legal minds and a former top litigator in private practice, a former federal public defender, and a consensus builder.\""
"\"The president nominated Ketanji Brown Jackson to serve on the United States Supreme Court. He referred to her as one of our nation's top legal minds, a former federal public defender, a consensus builder, and from a family of public school educators and police officers. Since she's been nominated, she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 7,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}

View File

@@ -5,11 +5,11 @@
"id": "efc5be67",
"metadata": {},
"source": [
"# VectorDB Question Answering with Sources\n",
"# Retrieval Question Answering with Sources\n",
"\n",
"This notebook goes over how to do question-answering with sources with a chat model over a vector database. It does this by using the `VectorDBQAWithSourcesChain`, which does the lookup of the documents from a vector database. \n",
"This notebook goes over how to do question-answering with sources with a chat model over a vector database. It does this by using the `RetrievalQAWithSourcesChain`, which does the lookup of the documents from a vector database. \n",
"\n",
"This notebook is very similar to the example of using an LLM in the ChatVectorDBChain. The only differences here are (1) using a ChatModel, and (2) passing in a ChatPromptTemplate (optimized for chat models)."
"This notebook is very similar to the example of using an LLM in the RetrievalQAWithSources. The only differences here are (1) using a ChatModel, and (2) passing in a ChatPromptTemplate (optimized for chat models)."
]
},
{
@@ -51,6 +51,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n",
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
@@ -62,12 +64,12 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 9,
"id": "8aa571ae",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import VectorDBQAWithSourcesChain"
"from langchain.chains import RetrievalQAWithSourcesChain"
]
},
{
@@ -101,7 +103,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 6,
"id": "ed00e906",
"metadata": {},
"outputs": [],
@@ -130,23 +132,23 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 10,
"id": "aa859d4c",
"metadata": {},
"outputs": [],
"source": [
"chain_type_kwargs = {\"prompt\": prompt}\n",
"chain = VectorDBQAWithSourcesChain.from_chain_type(\n",
"chain = RetrievalQAWithSourcesChain.from_chain_type(\n",
" ChatOpenAI(temperature=0), \n",
" chain_type=\"stuff\", \n",
" vectorstore=docsearch,\n",
" retriever=docsearch.as_retriever(),\n",
" chain_type_kwargs=chain_type_kwargs\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 11,
"id": "8ba36fa7",
"metadata": {},
"outputs": [
@@ -154,10 +156,10 @@
"data": {
"text/plain": [
"{'answer': 'The President honored Justice Stephen Breyer, an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, for his dedicated service to the country. \\n',\n",
" 'sources': '30-pl'}"
" 'sources': '31-pl'}"
]
},
"execution_count": 19,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
@@ -168,25 +170,11 @@
},
{
"cell_type": "code",
"execution_count": 11,
"id": "c91fdc8a",
"execution_count": null,
"id": "8308fbf7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'answer': ' The president honored Justice Stephen Breyer for his service.\\n',\n",
" 'sources': '30-pl'}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"qa({\"question\": \"What did the president say about Justice Breyer\"}, return_only_outputs=True)"
]
"outputs": [],
"source": []
}
],
"metadata": {

View File

@@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "522686de",
"metadata": {
"tags": []
@@ -36,7 +36,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"id": "62e0dbc3",
"metadata": {
"tags": []
@@ -56,7 +56,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 3,
"id": "76a6e7b0-e927-4bfb-a414-1332a4149106",
"metadata": {
"tags": []
@@ -68,7 +68,7 @@
"AIMessage(content=\"J'aime programmer.\", additional_kwargs={})"
]
},
"execution_count": 4,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@@ -87,7 +87,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"id": "ce16ad78-8e6f-48cd-954e-98be75eb5836",
"metadata": {
"tags": []
@@ -99,7 +99,7 @@
"AIMessage(content=\"J'aime programmer.\", additional_kwargs={})"
]
},
"execution_count": 5,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -122,7 +122,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 5,
"id": "2b21fc52-74b6-4950-ab78-45d12c68fb4d",
"metadata": {
"tags": []
@@ -131,10 +131,10 @@
{
"data": {
"text/plain": [
"LLMResult(generations=[[ChatGeneration(text=\"J'aime programmer.\", generation_info=None, message=AIMessage(content=\"J'aime programmer.\", additional_kwargs={}))], [ChatGeneration(text=\"J'aime l'intelligence artificielle.\", generation_info=None, message=AIMessage(content=\"J'aime l'intelligence artificielle.\", additional_kwargs={}))]], llm_output=None)"
"LLMResult(generations=[[ChatGeneration(text=\"J'aime programmer.\", generation_info=None, message=AIMessage(content=\"J'aime programmer.\", additional_kwargs={}))], [ChatGeneration(text=\"J'aime l'intelligence artificielle.\", generation_info=None, message=AIMessage(content=\"J'aime l'intelligence artificielle.\", additional_kwargs={}))]], llm_output={'token_usage': {'prompt_tokens': 71, 'completion_tokens': 18, 'total_tokens': 89}})"
]
},
"execution_count": 6,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -150,7 +150,39 @@
" HumanMessage(content=\"Translate this sentence from English to French. I love artificial intelligence.\")\n",
" ],\n",
"]\n",
"chat.generate(batch_messages)"
"result = chat.generate(batch_messages)\n",
"result"
]
},
{
"cell_type": "markdown",
"id": "2960f50f",
"metadata": {},
"source": [
"You can recover things like token usage from this LLMResult"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a6186bee",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'token_usage': {'prompt_tokens': 71,\n",
" 'completion_tokens': 18,\n",
" 'total_tokens': 89}}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result.llm_output"
]
},
{

View File

@@ -0,0 +1,38 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Blackboard\n",
"\n",
"This covers how to load data from a Blackboard Learn instance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import BlackboardLoader\n",
"\n",
"loader = BlackboardLoader(\n",
" blackboard_course_url=\"https://blackboard.example.com/webapps/blackboard/execute/announcement?method=search&context=course_entry&course_id=_123456_1\",\n",
" bbrouter=\"expires:12345...\",\n",
" load_all_recursively=True,\n",
")\n",
"documents = loader.load()"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

View File

@@ -1,5 +1,8 @@
<!DOCTYPE html>
<html>
<head>
<title>Test Title</title>
</head>
<body>
<h1>My First Heading</h1>

View File

@@ -0,0 +1,32 @@
"Team", "Payroll (millions)", "Wins"
"Nationals", 81.34, 98
"Reds", 82.20, 97
"Yankees", 197.96, 95
"Giants", 117.62, 94
"Braves", 83.31, 94
"Athletics", 55.37, 94
"Rangers", 120.51, 93
"Orioles", 81.43, 93
"Rays", 64.17, 90
"Angels", 154.49, 89
"Tigers", 132.30, 88
"Cardinals", 110.30, 88
"Dodgers", 95.14, 86
"White Sox", 96.92, 85
"Brewers", 97.65, 83
"Phillies", 174.54, 81
"Diamondbacks", 74.28, 81
"Pirates", 63.43, 79
"Padres", 55.24, 76
"Mariners", 81.97, 75
"Mets", 93.35, 74
"Blue Jays", 75.48, 73
"Royals", 60.91, 72
"Marlins", 118.07, 69
"Red Sox", 173.18, 69
"Indians", 78.43, 68
"Twins", 94.08, 66
"Rockies", 78.06, 64
"Cubs", 88.19, 61
"Astros", 60.65, 55
1 Team Payroll (millions) Wins
2 Nationals 81.34 98
3 Reds 82.20 97
4 Yankees 197.96 95
5 Giants 117.62 94
6 Braves 83.31 94
7 Athletics 55.37 94
8 Rangers 120.51 93
9 Orioles 81.43 93
10 Rays 64.17 90
11 Angels 154.49 89
12 Tigers 132.30 88
13 Cardinals 110.30 88
14 Dodgers 95.14 86
15 White Sox 96.92 85
16 Brewers 97.65 83
17 Phillies 174.54 81
18 Diamondbacks 74.28 81
19 Pirates 63.43 79
20 Padres 55.24 76
21 Mariners 81.97 75
22 Mets 93.35 74
23 Blue Jays 75.48 73
24 Royals 60.91 72
25 Marlins 118.07 69
26 Red Sox 173.18 69
27 Indians 78.43 68
28 Twins 94.08 66
29 Rockies 78.06 64
30 Cubs 88.19 61
31 Astros 60.65 55

View File

@@ -0,0 +1,79 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "33205b12",
"metadata": {},
"source": [
"# Figma\n",
"\n",
"This notebook covers how to load data from the Figma REST API into a format that can be ingested into LangChain."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90b69c94",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.document_loaders import FigmaFileLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "13deb0f5",
"metadata": {},
"outputs": [],
"source": [
"loader = FigmaFileLoader(\n",
" os.environ.get('ACCESS_TOKEN'),\n",
" os.environ.get('NODE_IDS'),\n",
" os.environ.get('FILE_KEY')\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ccc1e2f",
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3e64cac2",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -48,9 +48,7 @@
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='My First Heading\\n\\nMy first paragraph.', lookup_str='', metadata={'source': 'example_data/fake-content.html'}, lookup_index=0)]"
]
"text/plain": "[Document(page_content='My First Heading\\n\\nMy first paragraph.', lookup_str='', metadata={'source': 'example_data/fake-content.html'}, lookup_index=0)]"
},
"execution_count": 4,
"metadata": {},
@@ -61,13 +59,57 @@
"data"
]
},
{
"cell_type": "markdown",
"source": [
"## Loading HTML with BeautifulSoup4\n",
"\n",
"We can also use BeautifulSoup4 to load HTML documents using the `BSHTMLLoader`. This will extract the text from the html into `page_content`, and the page title as `title` into `metadata`."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 16,
"id": "79b1bce4",
"metadata": {},
"outputs": [],
"source": []
"source": [
"from langchain.document_loaders import BSHTMLLoader"
]
},
{
"cell_type": "code",
"execution_count": 17,
"outputs": [
{
"data": {
"text/plain": "[Document(page_content='\\n\\nTest Title\\n\\n\\nMy First Heading\\nMy first paragraph.\\n\\n\\n', lookup_str='', metadata={'source': 'example_data/fake-content.html', 'title': 'Test Title'}, lookup_index=0)]"
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader = BSHTMLLoader(\"example_data/fake-content.html\")\n",
"data = loader.load()\n",
"data"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [],
"metadata": {
"collapsed": false
}
}
],
"metadata": {

View File

@@ -0,0 +1,145 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "39af9ecd",
"metadata": {},
"source": [
"# Markdown\n",
"\n",
"This covers how to load markdown documents into a document format that we can use downstream."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "721c48aa",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredMarkdownLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9d3d0e35",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredMarkdownLoader(\"../../../../README.md\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "06073f91",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c9adc5cb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content=\"ð\\x9f¦\\x9cï¸\\x8fð\\x9f”\\x97 LangChain\\n\\nâ\\x9a¡ Building applications with LLMs through composability â\\x9a¡\\n\\nProduction Support: As you move your LangChains into production, we'd love to offer more comprehensive support.\\nPlease fill out this form and we'll set up a dedicated support Slack channel.\\n\\nQuick Install\\n\\npip install langchain\\n\\nð\\x9f¤” What is this?\\n\\nLarge language models (LLMs) are emerging as a transformative technology, enabling\\ndevelopers to build applications that they previously could not.\\nBut using these LLMs in isolation is often not enough to\\ncreate a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.\\n\\nThis library is aimed at assisting in the development of those types of applications. Common examples of these types of applications include:\\n\\nâ\\x9d“ Question Answering over specific documents\\n\\nDocumentation\\n\\nEnd-to-end Example: Question Answering over Notion Database\\n\\nð\\x9f¬ Chatbots\\n\\nDocumentation\\n\\nEnd-to-end Example: Chat-LangChain\\n\\nð\\x9f¤\\x96 Agents\\n\\nDocumentation\\n\\nEnd-to-end Example: GPT+WolframAlpha\\n\\nð\\x9f“\\x96 Documentation\\n\\nPlease see here for full documentation on:\\n\\nGetting started (installation, setting up the environment, simple examples)\\n\\nHow-To examples (demos, integrations, helper functions)\\n\\nReference (full API docs)\\n Resources (high-level explanation of core concepts)\\n\\nð\\x9f\\x9a\\x80 What can this help with?\\n\\nThere are six main areas that LangChain is designed to help with.\\nThese are, in increasing order of complexity:\\n\\nð\\x9f“\\x83 LLMs and Prompts:\\n\\nThis includes prompt management, prompt optimization, generic interface for all LLMs, and common utilities for working with LLMs.\\n\\nð\\x9f”\\x97 Chains:\\n\\nChains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.\\n\\nð\\x9f“\\x9a Data Augmented Generation:\\n\\nData Augmented Generation involves specific types of chains that first interact with an external datasource to fetch data to use in the generation step. Examples of this include summarization of long pieces of text and question/answering over specific data sources.\\n\\nð\\x9f¤\\x96 Agents:\\n\\nAgents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents.\\n\\nð\\x9f§\\xa0 Memory:\\n\\nMemory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.\\n\\nð\\x9f§\\x90 Evaluation:\\n\\n[BETA] Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.\\n\\nFor more information on these concepts, please see our full documentation.\\n\\nð\\x9f\\x81 Contributing\\n\\nAs an open source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infra, or better documentation.\\n\\nFor detailed information on how to contribute, see here.\", lookup_str='', metadata={'source': '../../../../README.md'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "markdown",
"id": "525d6b67",
"metadata": {},
"source": [
"## Retain Elements\n",
"\n",
"Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "064f9162",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredMarkdownLoader(\"../../../../README.md\", mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "abefbbdb",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a547c534",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='ð\\x9f¦\\x9cï¸\\x8fð\\x9f”\\x97 LangChain', lookup_str='', metadata={'source': '../../../../README.md', 'page_number': 1, 'category': 'UncategorizedText'}, lookup_index=0)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "381d4139",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,145 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "34c90eed",
"metadata": {},
"source": [
"# Microsoft Word\n",
"\n",
"This notebook shows how to load text from Microsoft word documents."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "28ded768",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredDocxLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f1f26035",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredDocxLoader('example_data/fake.docx')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2c87dde9",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0e4a884c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': 'example_data/fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "markdown",
"id": "5d1472e9",
"metadata": {},
"source": [
"## Retain Elements\n",
"\n",
"Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "93abf60b",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredDocxLoader('example_data/fake.docx', mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c35cdbcc",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "fae2d730",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': 'example_data/fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "961a7b1d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -158,7 +158,72 @@
},
{
"cell_type": "markdown",
"id": "7874d01d",
"id": "672733fd",
"metadata": {},
"source": [
"## Define a Partitioning Strategy\n",
"\n",
"Unstructured document loader allow users to pass in a `strategy` parameter that lets `unstructured` know how to partitioning the document. Currently supported strategies are `\"hi_res\"` (the default) and `\"fast\"`. Hi res partitioning strategies are more accurate, but take longer to process. Fast strategies partition the document more quickly, but trade-off accuracy. Not all document types have separate hi res and fast partitioning strategies. For those document types, the `strategy` kwarg is ignored. In some cases, the high res strategy will fallback to fast if there is a dependency missing (i.e. a model for document partitioning). You can see how to apply a strategy to an `UnstructuredFileLoader` below."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "767238a4",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredFileLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9518b425",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredFileLoader(\"layout-parser-paper-fast.pdf\", strategy=\"fast\", mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "645f29e9",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "60685353",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='1', lookup_str='', metadata={'source': 'layout-parser-paper-fast.pdf', 'filename': 'layout-parser-paper-fast.pdf', 'page_number': 1, 'category': 'UncategorizedText'}, lookup_index=0),\n",
" Document(page_content='2', lookup_str='', metadata={'source': 'layout-parser-paper-fast.pdf', 'filename': 'layout-parser-paper-fast.pdf', 'page_number': 1, 'category': 'UncategorizedText'}, lookup_index=0),\n",
" Document(page_content='0', lookup_str='', metadata={'source': 'layout-parser-paper-fast.pdf', 'filename': 'layout-parser-paper-fast.pdf', 'page_number': 1, 'category': 'UncategorizedText'}, lookup_index=0),\n",
" Document(page_content='2', lookup_str='', metadata={'source': 'layout-parser-paper-fast.pdf', 'filename': 'layout-parser-paper-fast.pdf', 'page_number': 1, 'category': 'UncategorizedText'}, lookup_index=0),\n",
" Document(page_content='n', lookup_str='', metadata={'source': 'layout-parser-paper-fast.pdf', 'filename': 'layout-parser-paper-fast.pdf', 'page_number': 1, 'category': 'Title'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[:5]"
]
},
{
"cell_type": "markdown",
"id": "8de9ef16",
"metadata": {},
"source": [
"## PDF Example\n",
@@ -166,7 +231,6 @@
"Processing PDF documents works exactly the same way. Unstructured detects the file type and extracts the same types of `elements`. "
]
},
{
"cell_type": "code",
"execution_count": 1,
@@ -225,7 +289,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8ca8a648",
"id": "f52b04cb",
"metadata": {},
"outputs": [],
"source": []
@@ -247,7 +311,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.8.13"
}
},
"nbformat": 4,

View File

@@ -27,7 +27,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredWordDocumentLoader(\"fake.docx\")"
"loader = UnstructuredWordDocumentLoader(\"example_data/fake.docx\")"
]
},
{
@@ -78,7 +78,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredWordDocumentLoader(\"fake.docx\", mode=\"elements\")"
"loader = UnstructuredWordDocumentLoader(\"example_data/fake.docx\", mode=\"elements\")"
]
},
{

View File

@@ -7,22 +7,23 @@
"source": [
"# YouTube\n",
"\n",
"How to load documents from YouTube transcripts."
"How to load documents from YouTube transcripts.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "da4a867f",
"execution_count": null,
"id": "427d5745",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import YoutubeLoader"
"from langchain.document_loaders import YoutubeLoader\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "34a25b57",
"metadata": {
"scrolled": true
@@ -34,7 +35,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"id": "bc8b308a",
"metadata": {},
"outputs": [],
@@ -44,21 +45,10 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"id": "d073dd36",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='LADIES AND GENTLEMEN, PEDRO PASCAL! [ CHEERS AND APPLAUSE ] >> THANK YOU, THANK YOU. THANK YOU VERY MUCH. I\\'M SO EXCITED TO BE HERE. THANK YOU. I SPENT THE LAST YEAR SHOOTING A SHOW CALLED \"THE LAST OF US\" ON HBO. FOR SOME HBO SHOES, YOU GET TO SHOOT IN A FIVE STAR ITALIAN RESORT SURROUNDED BY BEAUTIFUL PEOPLE, BUT I SAID, NO, THAT\\'S TOO EASY. I WANT TO SHOOT IN A FREEZING CANADIAN FOREST WHILE BEING CHASED AROUND BY A GUY WHOSE HEAD LOOKS LIKE A GENITAL WART. IT IS AN HONOR BEING A PART OF THESE HUGE FRANCHISEs LIKE \"GAME OF THRONES\" AND \"STAR WARS,\" BUT I\\'M STILL GETTING USED TO PEOPLE RECOGNIZING ME. THE OTHER DAY, A GUY STOPPED ME ON THE STREET AND SAYS, MY SON LOVES \"THE MANDALORIAN\" AND THE NEXT THING I KNOW, I\\'M FACE TIMING WITH A 6-YEAR-OLD WHO HAS NO IDEA WHO I AM BECAUSE MY CHARACTER WEARS A MASK THE ENTIRE SHOW. THE GUY IS LIKE, DO THE MANDO VOICE, BUT IT\\'S LIKE A BEDROOM VOICE. WITHOUT THE MASK, IT JUST SOUNDS PORNY. PEOPLE WALKING BY ON THE STREET SEE ME WHISPERING TO A 6-YEAR-OLD KID. I CAN BRING YOU IN WARM, OR I CAN BRING YOU IN COLD. EVEN THOUGH I CAME TO THE U.S. WHEN I WAS LITTLE, I WAS BORN IN CHILE, AND I HAVE 34 FIRST COUSINS WHO ARE STILL THERE. THEY\\'RE VERY PROUD OF ME. I KNOW THEY\\'RE PROUD BECAUSE THEY GIVE MY PHONE NUMBER TO EVERY PERSON THEY MEET, WHICH MEANS EVERY DAY, SOMEONE IN SANTIAGO WILL TEXT ME STUFF LIKE, CAN YOU COME TO MY WEDDING, OR CAN YOU SING MY PRIEST HAPPY BIRTHDAY, OR IS BABY YODA MEAN IN REAL LIFE. SO I HAVE TO BE LIKE NO, NO, AND HIS NAME IS GROGU. BUT MY COUSINS WEREN\\'T ALWAYS SO PROUD. EARLY IN MY CAREER, I PLAYED SMALL PARTS IN EVERY CRIME SHOW. I EVEN PLAYED TWO DIFFERENT CHARACTERS ON \"LAW AND ORDER.\" TITO CABASSA WHO LOOKED LIKE THIS. AND ONE YEAR LATER, I PLAYED REGGIE LUCKMAN WHO LOOKS LIKE THIS. AND THAT, MY FRIENDS, IS CALLED RANGE. BUT IT IS AMAZING TO BE HERE, LIKE I SAID. I WAS BORN IN CHILE, AND NINE MONTHS LATER, MY PARENTS FLED AND BROUGHT ME AND MY SISTER TO THE U.S. THEY WERE SO BRAVE, AND WITHOUT THEM, I WOULDN\\'T BE HERE IN THIS WONDERFUL COUNTRY, AND I CERTAINLY WOULDN\\'T BE STANDING HERE WITH YOU ALL TONIGHT. SO TO ALL MY FAMILY WATCHING IN CHILE, I WANT TO SAY [ SPEAKING NON-ENGLISH ] WHICH MEANS, I LOVE YOU, I MISS YOU, AND STOP GIVING OUT MY PHONE NUMBER. WE\\'VE GOT AN AMAZING SHOW FOR YOU TONIGHT. COLDPLAY IS HERE, SO STICK', lookup_str='', metadata={'source': 'QsYGlZkevEg', 'title': 'Pedro Pascal Monologue - SNL', 'description': 'First-time host Pedro Pascal talks about filming The Last of Us and being recognized by fans.\\n\\nSaturday Night Live. Stream now on Peacock: https://pck.tv/3uQxh4q\\n\\nSubscribe to SNL: https://goo.gl/tUsXwM\\nStream Current Full Episodes: http://www.nbc.com/saturday-night-live\\n\\nWATCH PAST SNL SEASONS\\nGoogle Play - http://bit.ly/SNLGooglePlay\\niTunes - http://bit.ly/SNLiTunes\\n\\nSNL ON SOCIAL\\nSNL Instagram: http://instagram.com/nbcsnl\\nSNL Facebook: https://www.facebook.com/snl\\nSNL Twitter: https://twitter.com/nbcsnl\\nSNL TikTok: https://www.tiktok.com/@nbcsnl\\n\\nGET MORE NBC\\nLike NBC: http://Facebook.com/NBC\\nFollow NBC: http://Twitter.com/NBC\\nNBC Tumblr: http://NBCtv.tumblr.com/\\nYouTube: http://www.youtube.com/nbc\\nNBC Instagram: http://instagram.com/nbc\\n\\n#SNL #PedroPascal #SNL48 #Coldplay', 'view_count': 1175057, 'thumbnail_url': 'https://i.ytimg.com/vi/QsYGlZkevEg/sddefault.jpg', 'publish_date': datetime.datetime(2023, 2, 4, 0, 0), 'length': 224, 'author': 'Saturday Night Live'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"loader.load()"
]
@@ -73,7 +63,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"id": "ba28af69",
"metadata": {},
"outputs": [],
@@ -83,7 +73,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"id": "9b8ea390",
"metadata": {},
"outputs": [],
@@ -93,24 +83,61 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"id": "97b98e92",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='LADIES AND GENTLEMEN, PEDRO PASCAL! [ CHEERS AND APPLAUSE ] >> THANK YOU, THANK YOU. THANK YOU VERY MUCH. I\\'M SO EXCITED TO BE HERE. THANK YOU. I SPENT THE LAST YEAR SHOOTING A SHOW CALLED \"THE LAST OF US\" ON HBO. FOR SOME HBO SHOES, YOU GET TO SHOOT IN A FIVE STAR ITALIAN RESORT SURROUNDED BY BEAUTIFUL PEOPLE, BUT I SAID, NO, THAT\\'S TOO EASY. I WANT TO SHOOT IN A FREEZING CANADIAN FOREST WHILE BEING CHASED AROUND BY A GUY WHOSE HEAD LOOKS LIKE A GENITAL WART. IT IS AN HONOR BEING A PART OF THESE HUGE FRANCHISEs LIKE \"GAME OF THRONES\" AND \"STAR WARS,\" BUT I\\'M STILL GETTING USED TO PEOPLE RECOGNIZING ME. THE OTHER DAY, A GUY STOPPED ME ON THE STREET AND SAYS, MY SON LOVES \"THE MANDALORIAN\" AND THE NEXT THING I KNOW, I\\'M FACE TIMING WITH A 6-YEAR-OLD WHO HAS NO IDEA WHO I AM BECAUSE MY CHARACTER WEARS A MASK THE ENTIRE SHOW. THE GUY IS LIKE, DO THE MANDO VOICE, BUT IT\\'S LIKE A BEDROOM VOICE. WITHOUT THE MASK, IT JUST SOUNDS PORNY. PEOPLE WALKING BY ON THE STREET SEE ME WHISPERING TO A 6-YEAR-OLD KID. I CAN BRING YOU IN WARM, OR I CAN BRING YOU IN COLD. EVEN THOUGH I CAME TO THE U.S. WHEN I WAS LITTLE, I WAS BORN IN CHILE, AND I HAVE 34 FIRST COUSINS WHO ARE STILL THERE. THEY\\'RE VERY PROUD OF ME. I KNOW THEY\\'RE PROUD BECAUSE THEY GIVE MY PHONE NUMBER TO EVERY PERSON THEY MEET, WHICH MEANS EVERY DAY, SOMEONE IN SANTIAGO WILL TEXT ME STUFF LIKE, CAN YOU COME TO MY WEDDING, OR CAN YOU SING MY PRIEST HAPPY BIRTHDAY, OR IS BABY YODA MEAN IN REAL LIFE. SO I HAVE TO BE LIKE NO, NO, AND HIS NAME IS GROGU. BUT MY COUSINS WEREN\\'T ALWAYS SO PROUD. EARLY IN MY CAREER, I PLAYED SMALL PARTS IN EVERY CRIME SHOW. I EVEN PLAYED TWO DIFFERENT CHARACTERS ON \"LAW AND ORDER.\" TITO CABASSA WHO LOOKED LIKE THIS. AND ONE YEAR LATER, I PLAYED REGGIE LUCKMAN WHO LOOKS LIKE THIS. AND THAT, MY FRIENDS, IS CALLED RANGE. BUT IT IS AMAZING TO BE HERE, LIKE I SAID. I WAS BORN IN CHILE, AND NINE MONTHS LATER, MY PARENTS FLED AND BROUGHT ME AND MY SISTER TO THE U.S. THEY WERE SO BRAVE, AND WITHOUT THEM, I WOULDN\\'T BE HERE IN THIS WONDERFUL COUNTRY, AND I CERTAINLY WOULDN\\'T BE STANDING HERE WITH YOU ALL TONIGHT. SO TO ALL MY FAMILY WATCHING IN CHILE, I WANT TO SAY [ SPEAKING NON-ENGLISH ] WHICH MEANS, I LOVE YOU, I MISS YOU, AND STOP GIVING OUT MY PHONE NUMBER. WE\\'VE GOT AN AMAZING SHOW FOR YOU TONIGHT. COLDPLAY IS HERE, SO STICK', lookup_str='', metadata={'source': 'QsYGlZkevEg', 'title': 'Pedro Pascal Monologue - SNL', 'description': 'First-time host Pedro Pascal talks about filming The Last of Us and being recognized by fans.\\n\\nSaturday Night Live. Stream now on Peacock: https://pck.tv/3uQxh4q\\n\\nSubscribe to SNL: https://goo.gl/tUsXwM\\nStream Current Full Episodes: http://www.nbc.com/saturday-night-live\\n\\nWATCH PAST SNL SEASONS\\nGoogle Play - http://bit.ly/SNLGooglePlay\\niTunes - http://bit.ly/SNLiTunes\\n\\nSNL ON SOCIAL\\nSNL Instagram: http://instagram.com/nbcsnl\\nSNL Facebook: https://www.facebook.com/snl\\nSNL Twitter: https://twitter.com/nbcsnl\\nSNL TikTok: https://www.tiktok.com/@nbcsnl\\n\\nGET MORE NBC\\nLike NBC: http://Facebook.com/NBC\\nFollow NBC: http://Twitter.com/NBC\\nNBC Tumblr: http://NBCtv.tumblr.com/\\nYouTube: http://www.youtube.com/nbc\\nNBC Instagram: http://instagram.com/nbc\\n\\n#SNL #PedroPascal #SNL48 #Coldplay', 'view_count': 1175057, 'thumbnail_url': 'https://i.ytimg.com/vi/QsYGlZkevEg/sddefault.jpg', 'publish_date': datetime.datetime(2023, 2, 4, 0, 0), 'length': 224, 'author': 'Saturday Night Live'}, lookup_index=0)]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "65796cc5",
"metadata": {},
"source": [
"## YouTube loader from Google Cloud\n",
"\n",
"### Prerequisites\n",
"\n",
"1. Create a Google Cloud project or use an existing project\n",
"1. Enable the [Youtube Api](https://console.cloud.google.com/apis/enableflow?apiid=youtube.googleapis.com&project=sixth-grammar-344520)\n",
"1. [Authorize credentials for desktop app](https://developers.google.com/drive/api/quickstart/python#authorize_credentials_for_a_desktop_application)\n",
"1. `pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib youtube-transcript-api`\n",
"\n",
"### 🧑 Instructions for ingesting your Google Docs data\n",
"By default, the `GoogleDriveLoader` expects the `credentials.json` file to be `~/.credentials/credentials.json`, but this is configurable using the `credentials_file` keyword argument. Same thing with `token.json`. Note that `token.json` will be created automatically the first time you use the loader.\n",
"\n",
"`GoogleApiYoutubeLoader` can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:\n",
"Note depending on your set up, the `service_account_path` needs to be set up. See [here](https://developers.google.com/drive/api/v3/quickstart/python) for more details."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c345bc43",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GoogleApiClient, GoogleApiYoutubeLoader\n",
"\n",
"# Init the GoogleApiClient \n",
"from pathlib import Path\n",
"\n",
"\n",
"google_api_client = GoogleApiClient(credentials_path=Path(\"your_path_creds.json\"))\n",
"\n",
"\n",
"# Use a Channel\n",
"youtube_loader_channel = GoogleApiYoutubeLoader(google_api_client=google_api_client, channel_name=\"Reducible\",captions_language=\"en\")\n",
"\n",
"# Use Youtube Ids\n",
"\n",
"youtube_loader_ids = GoogleApiYoutubeLoader(google_api_client=google_api_client, video_ids=[\"TrdevFK_am4\"], add_video_info=True)\n",
"\n",
"# returns a list of Documents\n",
"youtube_loader_channel.load()"
]
}
],
"metadata": {
@@ -130,6 +157,11 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "604c1013f65d31a2eb1fca07aae054bedd5a5a0d272dbb31e502c81f0b254b99"
}
}
},
"nbformat": 4,

View File

@@ -21,8 +21,6 @@ There are a lot of different document loaders that LangChain supports. Below are
`GoogleDrive <./examples/googledrive.html>`_: A walkthrough of how to load data from Google drive.
`Microsoft Word <./examples/microsoft_word.html>`_: A walkthrough of how to load data from Microsoft Word files.
`Obsidian <./examples/obsidian.html>`_: A walkthrough of how to load data from an Obsidian file dump.
`Roam <./examples/roam.html>`_: A walkthrough of how to load data from a Roam file export.
@@ -55,12 +53,32 @@ There are a lot of different document loaders that LangChain supports. Below are
`Airbyte Json <./examples/airbyte_json.html>`_: A walkthrough of how to load data from a local Airbyte JSON file.
`Online PDF <./examples/online_pdf.html>`_: A walkthrough of how to load data from an online PDF.
`CoNLL-U <./examples/CoNLL-U.html>`_: A walkthrough of how to load data from a ConLL-U file.
`iFixit <./examples/ifixit.html>`_: A walkthrough of how to search and load data like guides, technical Q&A's, and device wikis from iFixit.com
`Notebook <./examples/notebook.html>`_: A walkthrough of how to load data from .ipynb notebook.
`Copypaste <./examples/copypaste.html>`_: A walkthrough of how to load a document object from something you just want to copy and paste.
`CSV <./examples/csv.html>`_: A walkthrough of how to load data from a .csv file.
`Facebook Chat <./examples/facebook_chat.html>`_: A walkthrough of how to load data from a Facebook Chat json file.
`Image <./examples/image.html>`_: A walkthrough of how to load images such as JPGs PNGs into a document format that can be used downstream.
`Markdown <./examples/markdown.html>`_: A walkthrough of how to load data from a markdown file.
`SRT <./examples/srt.html>`_: A walkthrough of how to load data from a subtitle (`.srt`) file.
`Telegram <./examples/telegram.html>`_: A walkthrough of how to load data from a Telegram Chat json file.
`URL <./examples/url.html>`_: A walkthrough of how to load HTML documents from a list of URLs into a document format that we can use downstream.
`Word Document <./examples/word_document.html>`_: A walkthrough of how to load data from Microsoft Word files.
`Blackboard <./examples/blackboard.html>`_: A walkthrough of how to load data from a Blackboard course.
.. toctree::
:maxdepth: 1
:glob:

View File

@@ -3,16 +3,24 @@ Indexes
Indexes refer to ways to structure documents so that LLMs can best interact with them.
This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains.
LangChain provides common indices for working with data (most prominently support for vector databases).
For more complicated index structures, it is worth checking out `GPTIndex <https://gpt-index.readthedocs.io/en/latest/index.html>`_.
The most common way that indexes are used in chains is in a "retrieval" step.
This step refers to taking a user's query and returning the most relevant documents.
We draw this distinction because (1) an index can be used for other things besides retrieval, and (2) retrieval can use other logic besides an index to find relevant documents.
We therefor have a concept of a "Retriever" interface - this is the interface that most chains work with.
Most of the time when we talk about indexes and retrieval we are talking about indexing and retrieving unstructured data (like text documents).
For interacting with structured data (SQL tables, etc) or APIs, please see the corresponding use case sections for links to relevant functionality.
The primary index and retrieval types supported by LangChain are currently centered around vector databases, and therefore
a lot of the functionality we dive deep on those topics.
The following sections of documentation are provided:
- `Getting Started <./indexes/getting_started.html>`_: An overview of all the functionality LangChain provides for working with indexes.
- `Getting Started <./indexes/getting_started.html>`_: An overview of the base "Retriever" interface, and then all the functionality LangChain provides for working with indexes.
- `Key Concepts <./indexes/key_concepts.html>`_: A conceptual guide going over the various concepts related to indexes and the tools needed to create them.
- `How-To Guides <./indexes/how_to_guides.html>`_: A collection of how-to guides. These highlight how to use all the relevant tools, the different types of vector databases, and how to use indexes in chains.
- `How-To Guides <./indexes/how_to_guides.html>`_: A collection of how-to guides. These highlight how to use all the relevant tools, the different types of vector databases, different types of retrievers, and how to use retrievers and indexes in chains.
.. toctree::

View File

@@ -5,9 +5,9 @@
"id": "134a0785",
"metadata": {},
"source": [
"# Chat Vector DB\n",
"# Chat Index\n",
"\n",
"This notebook goes over how to set up a chain to chat with a vector database. The only difference between this chain and the [VectorDBQAChain](./vector_db_qa.ipynb) is that this allows for passing in of a chat history which can be used to allow for follow up questions."
"This notebook goes over how to set up a chain to chat with an index. The only difference between this chain and the [RetrievalQAChain](./vector_db_qa.ipynb) is that this allows for passing in of a chat history which can be used to allow for follow up questions."
]
},
{
@@ -23,7 +23,7 @@
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.llms import OpenAI\n",
"from langchain.chains import ChatVectorDBChain"
"from langchain.chains import ConversationalRetrievalChain"
]
},
{
@@ -109,7 +109,7 @@
"id": "3c96b118",
"metadata": {},
"source": [
"We now initialize the ChatVectorDBChain"
"We now initialize the ConversationalRetrievalChain"
]
},
{
@@ -121,7 +121,7 @@
},
"outputs": [],
"source": [
"qa = ChatVectorDBChain.from_llm(OpenAI(temperature=0), vectorstore)"
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore)"
]
},
{
@@ -220,22 +220,22 @@
"metadata": {},
"source": [
"## Return Source Documents\n",
"You can also easily return source documents from the ChatVectorDBChain. This is useful for when you want to inspect what documents were returned."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "562769c6",
"metadata": {},
"outputs": [],
"source": [
"qa = ChatVectorDBChain.from_llm(OpenAI(temperature=0), vectorstore, return_source_documents=True)"
"You can also easily return source documents from the ConversationalRetrievalChain. This is useful for when you want to inspect what documents were returned."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "562769c6",
"metadata": {},
"outputs": [],
"source": [
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore, return_source_documents=True)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "ea478300",
"metadata": {},
"outputs": [],
@@ -247,17 +247,17 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 13,
"id": "4cb75b4e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0)"
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0)"
]
},
"execution_count": 15,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
@@ -268,64 +268,60 @@
},
{
"cell_type": "markdown",
"id": "4f49beab",
"metadata": {},
"source": [
"## Chat Vector DB with `search_distance`\n",
"## ConversationalRetrievalChain with `search_distance`\n",
"If you are using a vector store that supports filtering by search distance, you can add a threshold value parameter."
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 14,
"id": "5ed8d612",
"metadata": {},
"outputs": [],
"source": [
"vectordbkwargs = {\"search_distance\": 0.9}"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 15,
"id": "6a7b3459",
"metadata": {},
"outputs": [],
"source": [
"qa = ChatVectorDBChain.from_llm(OpenAI(temperature=0), vectorstore, return_source_documents=True)\n",
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore, return_source_documents=True)\n",
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history, \"vectordbkwargs\": vectordbkwargs})"
],
"metadata": {
"collapsed": false
}
]
},
{
"cell_type": "markdown",
"id": "99b96dae",
"metadata": {},
"source": [
"## Chat Vector DB with `map_reduce`\n",
"We can also use different types of combine document chains with the Chat Vector DB chain."
],
"metadata": {
"collapsed": false
}
"## ConversationalRetrievalChain with `map_reduce`\n",
"We can also use different types of combine document chains with the ConversationalRetrievalChain chain."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 16,
"id": "e53a9d66",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import LLMChain\n",
"from langchain.chains.question_answering import load_qa_chain\n",
"from langchain.chains.chat_vector_db.prompts import CONDENSE_QUESTION_PROMPT"
"from langchain.chains.chat_index.prompts import CONDENSE_QUESTION_PROMPT"
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 19,
"id": "bf205e35",
"metadata": {},
"outputs": [],
@@ -334,8 +330,8 @@
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
"doc_chain = load_qa_chain(llm, chain_type=\"map_reduce\")\n",
"\n",
"chain = ChatVectorDBChain(\n",
" vectorstore=vectorstore,\n",
"chain = ConversationalRetrievalChain(\n",
" retriever=vectorstore.as_retriever(),\n",
" question_generator=question_generator,\n",
" combine_docs_chain=doc_chain,\n",
")"
@@ -343,7 +339,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 20,
"id": "78155887",
"metadata": {},
"outputs": [],
@@ -355,7 +351,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 21,
"id": "e54b5fa2",
"metadata": {},
"outputs": [
@@ -365,7 +361,7 @@
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, from a family of public school educators and police officers, a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 11,
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
@@ -379,14 +375,14 @@
"id": "a2fe6b14",
"metadata": {},
"source": [
"## Chat Vector DB with Question Answering with sources\n",
"## ConversationalRetrievalChain with Question Answering with sources\n",
"\n",
"You can also use this chain with the question answering with sources chain."
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 22,
"id": "d1058fd2",
"metadata": {},
"outputs": [],
@@ -396,7 +392,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 23,
"id": "a6594482",
"metadata": {},
"outputs": [],
@@ -405,8 +401,8 @@
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
"doc_chain = load_qa_with_sources_chain(llm, chain_type=\"map_reduce\")\n",
"\n",
"chain = ChatVectorDBChain(\n",
" vectorstore=vectorstore,\n",
"chain = ConversationalRetrievalChain(\n",
" retriever=vectorstore.as_retriever(),\n",
" question_generator=question_generator,\n",
" combine_docs_chain=doc_chain,\n",
")"
@@ -414,7 +410,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 24,
"id": "e2badd21",
"metadata": {},
"outputs": [],
@@ -426,7 +422,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 25,
"id": "edb31fe5",
"metadata": {},
"outputs": [
@@ -436,7 +432,7 @@
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, from a family of public school educators and police officers, a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\nSOURCES: ../../state_of_the_union.txt\""
]
},
"execution_count": 16,
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
@@ -450,14 +446,14 @@
"id": "2324cdc6-98bf-4708-b8cd-02a98b1e5b67",
"metadata": {},
"source": [
"## Chat Vector DB with streaming to `stdout`\n",
"## ConversationalRetrievalChain with streaming to `stdout`\n",
"\n",
"Output from the chain will be streamed to `stdout` token by token in this example."
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 26,
"id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
"metadata": {
"tags": []
@@ -467,7 +463,7 @@
"from langchain.chains.llm import LLMChain\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"from langchain.chains.chat_vector_db.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT\n",
"from langchain.chains.chat_index.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT\n",
"from langchain.chains.question_answering import load_qa_chain\n",
"\n",
"# Construct a ChatVectorDBChain with a streaming llm for combine docs\n",
@@ -478,12 +474,13 @@
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
"doc_chain = load_qa_chain(streaming_llm, chain_type=\"stuff\", prompt=QA_PROMPT)\n",
"\n",
"qa = ChatVectorDBChain(vectorstore=vectorstore, combine_docs_chain=doc_chain, question_generator=question_generator)"
"qa = ConversationalRetrievalChain(\n",
" retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 27,
"id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
"metadata": {
"tags": []
@@ -505,7 +502,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 28,
"id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
"metadata": {
"tags": []
@@ -524,6 +521,71 @@
"query = \"Did he mention who she suceeded\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})\n"
]
},
{
"cell_type": "markdown",
"id": "f793d56b",
"metadata": {},
"source": [
"## get_chat_history Function\n",
"You can also specify a `get_chat_history` function, which can be used to format the chat_history string."
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "a7ba9d8c",
"metadata": {},
"outputs": [],
"source": [
"def get_chat_history(inputs) -> str:\n",
" res = []\n",
" for human, ai in inputs:\n",
" res.append(f\"Human:{human}\\nAI:{ai}\")\n",
" return \"\\n\".join(res)\n",
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore, get_chat_history=get_chat_history)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "a3e33c0d",
"metadata": {},
"outputs": [],
"source": [
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "936dc62f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result['answer']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8c26901",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

View File

@@ -178,16 +178,16 @@
"text": [
"\n",
"\n",
"\u001B[1m> Entering new GraphQAChain chain...\u001B[0m\n",
"\u001b[1m> Entering new GraphQAChain chain...\u001b[0m\n",
"Entities Extracted:\n",
"\u001B[32;1m\u001B[1;3m Intel\u001B[0m\n",
"\u001b[32;1m\u001b[1;3m Intel\u001b[0m\n",
"Full Context:\n",
"\u001B[32;1m\u001B[1;3mIntel is going to build $20 billion semiconductor \"mega site\"\n",
"\u001b[32;1m\u001b[1;3mIntel is going to build $20 billion semiconductor \"mega site\"\n",
"Intel is building state-of-the-art factories\n",
"Intel is creating 10,000 new good-paying jobs\n",
"Intel is helping build Silicon Valley\u001B[0m\n",
"Intel is helping build Silicon Valley\u001b[0m\n",
"\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
@@ -205,10 +205,76 @@
"chain.run(\"what is Intel going to build?\")"
]
},
{
"cell_type": "markdown",
"id": "410aafa0",
"metadata": {},
"source": [
"## Save the graph\n",
"We can also save and load the graph."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "bc72cca0",
"metadata": {},
"outputs": [],
"source": [
"graph.write_to_gml(\"graph.gml\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "652760ad",
"metadata": {},
"outputs": [],
"source": [
"from langchain.indexes.graph import NetworkxEntityGraph"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "eae591fe",
"metadata": {},
"outputs": [],
"source": [
"loaded_graph = NetworkxEntityGraph.from_gml(\"graph.gml\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "9439d419",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('Intel', '$20 billion semiconductor \"mega site\"', 'is going to build'),\n",
" ('Intel', 'state-of-the-art factories', 'is building'),\n",
" ('Intel', '10,000 new good-paying jobs', 'is creating'),\n",
" ('Intel', 'Silicon Valley', 'is helping build'),\n",
" ('Field of dreams',\n",
" \"America's future will be built\",\n",
" 'is the ground on which')]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loaded_graph.get_triples()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f70b9ada",
"id": "045796cf",
"metadata": {},
"outputs": [],
"source": []

View File

@@ -635,7 +635,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts.base import RegexParser\n",
"from langchain.output_parsers import RegexParser\n",
"\n",
"output_parser = RegexParser(\n",
" regex=r\"(.*?)\\nScore: (.*)\",\n",
@@ -732,4 +732,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -7,7 +7,7 @@
"source": [
"# Question Answering\n",
"\n",
"This notebook walks through how to use LangChain for question answering over a list of documents. It covers four different types of chains: `stuff`, `map_reduce`, `refine`, `map-rerank`. For a more in depth explanation of what these chain types are, see [here](../combine_docs.md)."
"This notebook walks through how to use LangChain for question answering over a list of documents. It covers four different types of chains: `stuff`, `map_reduce`, `refine`, `map_rerank`. For a more in depth explanation of what these chain types are, see [here](../combine_docs.md)."
]
},
{
@@ -635,7 +635,7 @@
}
],
"source": [
"from langchain.prompts.base import RegexParser\n",
"from langchain.output_parsers import RegexParser\n",
"\n",
"output_parser = RegexParser(\n",
" regex=r\"(.*?)\\nScore: (.*)\",\n",

View File

@@ -5,9 +5,9 @@
"id": "07c1e3b9",
"metadata": {},
"source": [
"# Vector DB Question/Answering\n",
"# Retrieval Question/Answering\n",
"\n",
"This example showcases question answering over a vector database."
"This example showcases question answering over an index."
]
},
{
@@ -20,7 +20,8 @@
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain import OpenAI, VectorDBQA"
"from langchain.llms import OpenAI\n",
"from langchain.chains import RetrievalQA"
]
},
{
@@ -56,7 +57,7 @@
"metadata": {},
"outputs": [],
"source": [
"qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", vectorstore=docsearch)"
"qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", retriever=docsearch.as_retriever())"
]
},
{
@@ -68,7 +69,7 @@
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice and federal public defender, from a family of public school educators and police officers, a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
"\" The president said that she is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support, from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 4,
@@ -87,7 +88,7 @@
"metadata": {},
"source": [
"## Chain Type\n",
"You can easily specify different chain types to load and use in the VectorDBQA chain. For a more detailed walkthrough of these types, please see [this notebook](question_answering.ipynb).\n",
"You can easily specify different chain types to load and use in the RetrievalQA chain. For a more detailed walkthrough of these types, please see [this notebook](question_answering.ipynb).\n",
"\n",
"There are two ways to load different chain types. First, you can specify the chain type argument in the `from_chain_type` method. This allows you to pass in the name of the chain type you want to use. For example, in the below we change the chain type to `map_reduce`."
]
@@ -99,7 +100,7 @@
"metadata": {},
"outputs": [],
"source": [
"qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type=\"map_reduce\", vectorstore=docsearch)"
"qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type=\"map_reduce\", retriever=docsearch.as_retriever())"
]
},
{
@@ -111,7 +112,7 @@
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, from a family of public school educators and police officers, a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
"\" The president said that Judge Ketanji Brown Jackson is one of our nation's top legal minds, a former top litigator in private practice and a former federal public defender, from a family of public school educators and police officers, a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 6,
@@ -129,24 +130,24 @@
"id": "60368f38",
"metadata": {},
"source": [
"The above way allows you to really simply change the chain_type, but it does provide a ton of flexibility over parameters to that chain type. If you want to control those parameters, you can load the chain directly (as you did in [this notebook](question_answering.ipynb)) and then pass that directly to the the VectorDBQA chain with the `combine_documents_chain` parameter. For example:"
"The above way allows you to really simply change the chain_type, but it does provide a ton of flexibility over parameters to that chain type. If you want to control those parameters, you can load the chain directly (as you did in [this notebook](question_answering.ipynb)) and then pass that directly to the the RetrievalQA chain with the `combine_documents_chain` parameter. For example:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 9,
"id": "7b403f0d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.question_answering import load_qa_chain\n",
"qa_chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"stuff\")\n",
"qa = VectorDBQA(combine_documents_chain=qa_chain, vectorstore=docsearch)"
"qa = RetrievalQA(combine_documents_chain=qa_chain, retriever=docsearch.as_retriever())"
]
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 10,
"id": "9e04a9ac",
"metadata": {},
"outputs": [
@@ -156,7 +157,7 @@
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 19,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
@@ -177,7 +178,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 11,
"id": "a45232a2",
"metadata": {},
"outputs": [],
@@ -196,28 +197,28 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 13,
"id": "9b5c8d1d",
"metadata": {},
"outputs": [],
"source": [
"chain_type_kwargs = {\"prompt\": PROMPT}\n",
"qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", vectorstore=docsearch, chain_type_kwargs=chain_type_kwargs)"
"qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", retriever=docsearch.as_retriever(), chain_type_kwargs=chain_type_kwargs)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 14,
"id": "26ee7671",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" Il Presidente ha detto che Ketanji Brown Jackson è uno dei pensatori legali più importanti del nostro Paese, che continuerà l'eccellente eredità di giustizia Breyer. È un ex principale litigante in pratica privata, un ex difensore federale pubblico e appartiene a una famiglia di insegnanti e poliziotti delle scuole pubbliche. È un costruttore di consenso che ha ricevuto un ampio supporto da parte di Fraternal Order of Police e giudici designati da democratici e repubblicani.\""
"\" Il presidente ha detto che Ketanji Brown Jackson è una delle menti legali più importanti del paese, che continuerà l'eccellenza di Justice Breyer e che ha ricevuto un ampio sostegno, da Fraternal Order of Police a ex giudici nominati da democratici e repubblicani.\""
]
},
"execution_count": 8,
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
@@ -238,17 +239,17 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 15,
"id": "af093aba",
"metadata": {},
"outputs": [],
"source": [
"qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", vectorstore=docsearch, return_source_documents=True)"
"qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", retriever=docsearch.as_retriever(), return_source_documents=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 16,
"id": "eac11321",
"metadata": {},
"outputs": [],
@@ -259,17 +260,17 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 17,
"id": "7d75945a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of our nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice and a former federal public defender from a family of public school educators and police officers, and that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 10,
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
@@ -280,20 +281,20 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 18,
"id": "35b4f31f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={}, lookup_index=0),\n",
" Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', lookup_str='', metadata={}, lookup_index=0),\n",
" Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', lookup_str='', metadata={}, lookup_index=0),\n",
" Document(page_content='As Ive told Xi Jinping, it is never a good bet to bet against the American people. \\n\\nWell create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. \\n\\nAnd well do it all to withstand the devastating effects of the climate crisis and promote environmental justice. \\n\\nWell build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities. \\n\\n4,000 projects have already been announced. \\n\\nAnd tonight, Im announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair. \\n\\nWhen we use taxpayer dollars to rebuild America we are going to Buy American: buy American products to support American jobs.', lookup_str='', metadata={}, lookup_index=0)]"
"[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n",
" Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n",
" Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n",
" Document(page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0)]"
]
},
"execution_count": 11,
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}

View File

@@ -5,9 +5,9 @@
"id": "efc5be67",
"metadata": {},
"source": [
"# VectorDB Question Answering with Sources\n",
"# Retrieval Question Answering with Sources\n",
"\n",
"This notebook goes over how to do question-answering with sources over a vector database. It does this by using the `VectorDBQAWithSourcesChain`, which does the lookup of the documents from a vector database. "
"This notebook goes over how to do question-answering with sources over an Index. It does this by using the `RetrievalQAWithSourcesChain`, which does the lookup of the documents from an Index. "
]
},
{
@@ -41,7 +41,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 3,
"id": "0e745d99",
"metadata": {},
"outputs": [
@@ -50,8 +50,7 @@
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n",
"Exiting: Cleaning up .chroma directory\n"
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
@@ -61,40 +60,40 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 4,
"id": "8aa571ae",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import VectorDBQAWithSourcesChain"
"from langchain.chains import RetrievalQAWithSourcesChain"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 5,
"id": "aa859d4c",
"metadata": {},
"outputs": [],
"source": [
"from langchain import OpenAI\n",
"\n",
"chain = VectorDBQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type=\"stuff\", vectorstore=docsearch)"
"chain = RetrievalQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type=\"stuff\", retriever=docsearch.as_retriever())"
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 6,
"id": "8ba36fa7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'answer': ' The president thanked Justice Breyer for his service and mentioned his legacy of excellence.\\n',\n",
" 'sources': '30-pl'}"
"{'answer': ' The president honored Justice Breyer for his service and mentioned his legacy of excellence.\\n',\n",
" 'sources': '31-pl'}"
]
},
"execution_count": 8,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -109,35 +108,35 @@
"metadata": {},
"source": [
"## Chain Type\n",
"You can easily specify different chain types to load and use in the VectorDBQAWithSourcesChain chain. For a more detailed walkthrough of these types, please see [this notebook](qa_with_sources.ipynb).\n",
"You can easily specify different chain types to load and use in the RetrievalQAWithSourcesChain chain. For a more detailed walkthrough of these types, please see [this notebook](qa_with_sources.ipynb).\n",
"\n",
"There are two ways to load different chain types. First, you can specify the chain type argument in the `from_chain_type` method. This allows you to pass in the name of the chain type you want to use. For example, in the below we change the chain type to `map_reduce`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 7,
"id": "8b35b30a",
"metadata": {},
"outputs": [],
"source": [
"chain = VectorDBQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type=\"map_reduce\", vectorstore=docsearch)"
"chain = RetrievalQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type=\"map_reduce\", retriever=docsearch.as_retriever())"
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 8,
"id": "58bd424f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'answer': ' The president honored Justice Stephen Breyer for his service.\\n',\n",
" 'sources': '30-pl'}"
"{'answer': ' The president said \"Justice Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\"\\n',\n",
" 'sources': '31-pl'}"
]
},
"execution_count": 9,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -151,19 +150,19 @@
"id": "21e14eed",
"metadata": {},
"source": [
"The above way allows you to really simply change the chain_type, but it does provide a ton of flexibility over parameters to that chain type. If you want to control those parameters, you can load the chain directly (as you did in [this notebook](qa_with_sources.ipynb)) and then pass that directly to the the VectorDBQA chain with the `combine_documents_chain` parameter. For example:"
"The above way allows you to really simply change the chain_type, but it does provide a ton of flexibility over parameters to that chain type. If you want to control those parameters, you can load the chain directly (as you did in [this notebook](qa_with_sources.ipynb)) and then pass that directly to the the RetrievalQAWithSourcesChain chain with the `combine_documents_chain` parameter. For example:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 10,
"id": "af35f0c6",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.qa_with_sources import load_qa_with_sources_chain\n",
"qa_chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"stuff\")\n",
"qa = VectorDBQAWithSourcesChain(combine_documents_chain=qa_chain, vectorstore=docsearch)"
"qa = RetrievalQAWithSourcesChain(combine_documents_chain=qa_chain, retriever=docsearch.as_retriever())"
]
},
{
@@ -175,8 +174,8 @@
{
"data": {
"text/plain": [
"{'answer': ' The president honored Justice Stephen Breyer for his service.\\n',\n",
" 'sources': '30-pl'}"
"{'answer': ' The president honored Justice Breyer for his service and mentioned his legacy of excellence.\\n',\n",
" 'sources': '31-pl'}"
]
},
"execution_count": 11,
@@ -187,6 +186,14 @@
"source": [
"qa({\"question\": \"What did the president say about Justice Breyer\"}, return_only_outputs=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c594296",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

View File

@@ -28,7 +28,7 @@
"from langchain.docstore.document import Document\n",
"import requests\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chromama\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.prompts import PromptTemplate\n",
"import pathlib\n",

View File

@@ -19,20 +19,20 @@ to pass to the language model. This is implemented in LangChain as the `StuffDoc
**Cons:** Most LLMs have a context length, and for large documents (or many documents) this will not work as it will result in a prompt larger than the context length.
The main downside of this method is that it only works one smaller pieces of data. Once you are working
The main downside of this method is that it only works on smaller pieces of data. Once you are working
with many pieces of data, this approach is no longer feasible. The next two approaches are designed to help deal with that.
## Map Reduce
This method involves an initial prompt on each chunk of data (for summarization tasks, this
This method involves running an initial prompt on each chunk of data (for summarization tasks, this
could be a summary of that chunk; for question-answering tasks, it could be an answer based solely on that chunk).
Then a different prompt is run to combine all the initial outputs. This is implemented in the LangChain as the `MapReduceDocumentsChain`.
**Pros:** Can scale to larger documents (and more documents) than `StuffDocumentsChain`. The calls to the LLM on individual documents are independent and can therefore be parallelized.
**Cons:** Requires many more calls to the LLM than `StuffDocumentsChain`. Loses some information during the final combining call.
**Cons:** Requires many more calls to the LLM than `StuffDocumentsChain`. Loses some information during the final combined call.
## Refine
This method involves an initial prompt on the first chunk of data, generating some output.
This method involves running an initial prompt on the first chunk of data, generating some output.
For the remaining documents, that output is passed in, along with the next document,
asking the LLM to refine the output based on the new document.
@@ -46,6 +46,6 @@ This method involves running an initial prompt on each chunk of data, that not o
task but also gives a score for how certain it is in its answer. The responses are then
ranked according to this score, and the highest score is returned.
**Pros:** Similar pros as `MapReduceDocumentsChain`. Compared to `MapReduceDocumentsChain`, it requires fewer calls.
**Pros:** Similar pros as `MapReduceDocumentsChain`. Requires fewer calls, compared to `MapReduceDocumentsChain`.
**Cons:** Cannot combine information between documents. This means it is most useful when you expect there to be a single simple answer in a single document.

View File

@@ -76,6 +76,129 @@
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "markdown",
"id": "bb61bbeb",
"metadata": {},
"source": [
"Let's load the OpenAI Embedding class with first generation models (e.g. text-search-ada-doc-001/text-search-ada-query-001). Note: These are not recommended models - see [here](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c0b072cc",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a56b70f5",
"metadata": {},
"outputs": [],
"source": [
"embeddings = OpenAIEmbeddings(model_name=\"ada\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14aefb64",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c39ed33",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3221db6",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "markdown",
"id": "c3852491",
"metadata": {},
"source": [
"## AzureOpenAI\n",
"\n",
"Let's load the OpenAI Embedding class with environment variables set to indicate to use Azure endpoints."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b40f827",
"metadata": {},
"outputs": [],
"source": [
"# set the environment variables needed for openai package to know to reach out to azure\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_TYPE\"] = \"azure\"\n",
"os.environ[\"OPENAI_API_BASE\"] = \"https://<your-endpoint.openai.azure.com/\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"your AzureOpenAI key\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb36d16c",
"metadata": {},
"outputs": [],
"source": [
"embeddings = OpenAIEmbeddings(model=\"your-embeddings-deployment-name\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "228abcbb",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60dd7fad",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83bc1a72",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "markdown",
"id": "42f76e43",
@@ -86,6 +209,12 @@
"Let's load the Cohere Embedding class."
]
},
{
"cell_type": "markdown",
"id": "ca9e2b3a",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 1,
@@ -103,7 +232,7 @@
"metadata": {},
"outputs": [],
"source": [
"embeddings = CohereEmbeddings(cohere_api_key= cohere_api_key)"
"embeddings = CohereEmbeddings(cohere_api_key=cohere_api_key)"
]
},
{
@@ -290,7 +419,9 @@
}
],
"source": [
"embeddings = HuggingFaceInstructEmbeddings(query_instruction=\"Represent the query for retrieval: \")"
"embeddings = HuggingFaceInstructEmbeddings(\n",
" query_instruction=\"Represent the query for retrieval: \"\n",
")"
]
},
{
@@ -332,9 +463,9 @@
"outputs": [],
"source": [
"from langchain.embeddings import (\n",
" SelfHostedEmbeddings, \n",
" SelfHostedHuggingFaceEmbeddings, \n",
" SelfHostedHuggingFaceInstructEmbeddings\n",
" SelfHostedEmbeddings,\n",
" SelfHostedHuggingFaceEmbeddings,\n",
" SelfHostedHuggingFaceInstructEmbeddings,\n",
")\n",
"import runhouse as rh"
]
@@ -353,7 +484,7 @@
"# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')\n",
"\n",
"# For an existing cluster\n",
"# gpu = rh.cluster(ips=['<ip of the cluster>'], \n",
"# gpu = rh.cluster(ips=['<ip of the cluster>'],\n",
"# ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},\n",
"# name='my-cluster')"
]
@@ -424,16 +555,22 @@
"outputs": [],
"source": [
"def get_pipeline():\n",
" from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline # Must be inside the function in notebooks\n",
" from transformers import (\n",
" AutoModelForCausalLM,\n",
" AutoTokenizer,\n",
" pipeline,\n",
" ) # Must be inside the function in notebooks\n",
"\n",
" model_id = \"facebook/bart-base\"\n",
" tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
" model = AutoModelForCausalLM.from_pretrained(model_id)\n",
" return pipeline(\"feature-extraction\", model=model, tokenizer=tokenizer)\n",
"\n",
"\n",
"def inference_fn(pipeline, prompt):\n",
" # Return last hidden state of the model\n",
" if isinstance(prompt, list):\n",
" return [emb[0][-1] for emb in pipeline(prompt)] \n",
" return [emb[0][-1] for emb in pipeline(prompt)]\n",
" return pipeline(prompt)[0][-1]"
]
},
@@ -445,10 +582,10 @@
"outputs": [],
"source": [
"embeddings = SelfHostedEmbeddings(\n",
" model_load_fn=get_pipeline, \n",
" model_load_fn=get_pipeline,\n",
" hardware=gpu,\n",
" model_reqs=[\"./\", \"torch\", \"transformers\"],\n",
" inference_fn=inference_fn\n",
" inference_fn=inference_fn,\n",
")"
]
},
@@ -463,6 +600,153 @@
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "markdown",
"id": "f9c02c78",
"metadata": {},
"source": [
"## Fake Embeddings\n",
"\n",
"LangChain also provides a fake embedding class. You can use this to test your pipelines."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2ffc2e4b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import FakeEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "80777571",
"metadata": {},
"outputs": [],
"source": [
"embeddings = FakeEmbeddings(size=1352)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3ec9d8f0",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3b9ae9e1",
"metadata": {},
"outputs": [],
"source": [
"doc_results = embeddings.embed_documents([\"foo\"])"
]
},
{
"cell_type": "markdown",
"id": "1f83f273",
"metadata": {},
"source": [
"## SageMaker Endpoint Embeddings\n",
"\n",
"Let's load the SageMaker Endpoints Embeddings class. The class can be used if you host, e.g. your own Hugging Face model on SageMaker.\n",
"\n",
"For instrucstions on how to do this, please see [here](https://www.philschmid.de/custom-inference-huggingface-sagemaker)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "88d366bd",
"metadata": {},
"outputs": [],
"source": [
"!pip3 install langchain boto3"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1e9b926a",
"metadata": {},
"outputs": [],
"source": [
"from typing import Dict\n",
"from langchain.embeddings import SagemakerEndpointEmbeddings\n",
"from langchain.llms.sagemaker_endpoint import ContentHandlerBase\n",
"import json\n",
"\n",
"\n",
"class ContentHandler(ContentHandlerBase):\n",
" content_type = \"application/json\"\n",
" accepts = \"application/json\"\n",
"\n",
" def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:\n",
" input_str = json.dumps({\"inputs\": prompt, **model_kwargs})\n",
" return input_str.encode('utf-8')\n",
" \n",
" def transform_output(self, output: bytes) -> str:\n",
" response_json = json.loads(output.read().decode(\"utf-8\"))\n",
" return response_json[\"embeddings\"]\n",
"\n",
"content_handler = ContentHandler()\n",
"\n",
"\n",
"embeddings = SagemakerEndpointEmbeddings(\n",
" # endpoint_name=\"endpoint-name\", \n",
" # credentials_profile_name=\"credentials-profile-name\", \n",
" endpoint_name=\"huggingface-pytorch-inference-2023-03-21-16-14-03-834\", \n",
" region_name=\"us-east-1\", \n",
" content_handler=content_handler\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe9797b8",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "76f1b752",
"metadata": {},
"outputs": [],
"source": [
"doc_results = embeddings.embed_documents([\"foo\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fff99b21",
"metadata": {},
"outputs": [],
"source": [
"doc_results"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aaad49f8",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -481,11 +765,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "ce6f9b0d7cdac41515b0e0c38d0e6e153a2edce81d579281cb1ab99da6e8ea6d"
"hash": "7377c2ccc78bc62c2683122d48c8cd1fb85a53850a1b1fc29736ed39852c9885"
}
}
},

View File

@@ -176,6 +176,77 @@
"docs"
]
},
{
"cell_type": "markdown",
"id": "3a2f572e",
"metadata": {},
"source": [
"## Latex Text Splitter\n",
"\n",
"LatexTextSplitter splits text along Latex headings, headlines, enumerations and more. It's implemented as a simple subclass of RecursiveCharacterSplitter with Latex-specific separators. See the source code to see the Latex syntax expected by default.\n",
"\n",
"1. How the text is split: by list of latex specific tags\n",
"2. How the chunk size is measured: by length function passed in (defaults to number of characters)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2503917",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import LatexTextSplitter"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e46b753b",
"metadata": {},
"outputs": [],
"source": [
"latex_text = \"\"\"\n",
"\\documentclass{article}\n",
"\n",
"\\begin{document}\n",
"\n",
"\\maketitle\n",
"\n",
"\\section{Introduction}\n",
"Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like language. In recent years, LLMs have made significant advances in a variety of natural language processing tasks, including language translation, text generation, and sentiment analysis.\n",
"\n",
"\\subsection{History of LLMs}\n",
"The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the computational power available at the time. In the past decade, however, advances in hardware and software have made it possible to train LLMs on massive datasets, leading to significant improvements in performance.\n",
"\n",
"\\subsection{Applications of LLMs}\n",
"LLMs have many applications in industry, including chatbots, content creation, and virtual assistants. They can also be used in academia for research in linguistics, psychology, and computational linguistics.\n",
"\n",
"\\end{document}\n",
"\"\"\"\n",
"latex_splitter = LatexTextSplitter(chunk_size=400, chunk_overlap=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73b5bd33",
"metadata": {},
"outputs": [],
"source": [
"docs = latex_splitter.create_documents([latex_text])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1c7fbd5",
"metadata": {},
"outputs": [],
"source": [
"docs"
]
},
{
"cell_type": "markdown",
"id": "c350765d",

View File

@@ -1,20 +1,71 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fcc8bb1c",
"metadata": {},
"source": [
"# Getting Started\n",
"\n",
"LangChain primary focuses on constructing indexes with the goal of using them as a Retriever. In order to best understand what this means, it's worth highlighting what the base Retriever interface is. The `BaseRetriever` class in LangChain is as follows:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "b09ac324",
"metadata": {},
"outputs": [],
"source": [
"from abc import ABC, abstractmethod\n",
"from typing import List\n",
"from langchain.schema import Document\n",
"\n",
"class BaseRetriever(ABC):\n",
" @abstractmethod\n",
" def get_relevant_documents(self, query: str) -> List[Document]:\n",
" \"\"\"Get texts relevant for a query.\n",
"\n",
" Args:\n",
" query: string to find relevant tests for\n",
"\n",
" Returns:\n",
" List of relevant documents\n",
" \"\"\""
]
},
{
"cell_type": "markdown",
"id": "e19d4adb",
"metadata": {},
"source": [
"It's that simple! The `get_relevant_documents` method can be implemented however you see fit.\n",
"\n",
"Of course, we also help construct what we think useful Retrievers are. The main type of Retriever that we focus on is a Vectorstore retriever. We will focus on that for the rest of this guide.\n",
"\n",
"In order to understand what a vectorstore retriever is, it's important to understand what a Vectorstore is. So let's look at that."
]
},
{
"cell_type": "markdown",
"id": "2244801b",
"metadata": {},
"source": [
"# Getting Started\n",
"By default, LangChain uses [Chroma](../../ecosystem/chroma.md) as the vectorstore to index and search embeddings. To walk through this tutorial, we'll first need to install `chromadb`.\n",
"\n",
"```\n",
"pip install chromadb\n",
"```\n",
"\n",
"This example showcases question answering over documents.\n",
"We have chosen this as the example for getting started because it nicely combines a lot of different elements (Text splitters, embeddings, vectorstores) and then also shows how to use them in a chain.\n",
"\n",
"Question answering over documents consists of three steps:\n",
"Question answering over documents consists of four steps:\n",
"\n",
"1. Create an index\n",
"2. Create a question answering chain\n",
"3. Ask questions!\n",
"2. Create a Retriever from that index\n",
"3. Create a question answering chain\n",
"4. Ask questions!\n",
"\n",
"Each of the steps has multiple sub steps and potential configurations. In this notebook we will primarily focus on (1). We will start by showing the one-liner for doing so, but then break down what is actually going on.\n",
"\n",
@@ -23,12 +74,12 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 3,
"id": "8d369452",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import VectorDBQA\n",
"from langchain.chains import RetrievalQA\n",
"from langchain.llms import OpenAI"
]
},
@@ -37,12 +88,12 @@
"id": "07c1e3b9",
"metadata": {},
"source": [
"Next in the generic setup, let's specify the document loader we want to use."
"Next in the generic setup, let's specify the document loader we want to use. You can download the `state_of_the_union.txt` file [here](https://github.com/hwchase17/langchain/blob/master/docs/modules/state_of_the_union.txt)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 4,
"id": "33958a86",
"metadata": {},
"outputs": [],
@@ -63,7 +114,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"id": "403fc231",
"metadata": {},
"outputs": [],
@@ -73,7 +124,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"id": "57a8a199",
"metadata": {},
"outputs": [
@@ -100,7 +151,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"id": "23d0d234",
"metadata": {},
"outputs": [
@@ -110,7 +161,7 @@
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@@ -122,7 +173,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"id": "ae46b239",
"metadata": {},
"outputs": [
@@ -134,7 +185,7 @@
" 'sources': '../state_of_the_union.txt'}"
]
},
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -154,17 +205,17 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 9,
"id": "b04f3c10",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<langchain.vectorstores.chroma.Chroma at 0x113a3a700>"
"<langchain.vectorstores.chroma.Chroma at 0x119aa5940>"
]
},
"execution_count": 7,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@@ -173,6 +224,35 @@
"index.vectorstore"
]
},
{
"cell_type": "markdown",
"id": "297ccfa4",
"metadata": {},
"source": [
"If we then want to access the VectorstoreRetriever, we can do that with:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "b8fef77d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"VectorStoreRetriever(vectorstore=<langchain.vectorstores.chroma.Chroma object at 0x119aa5940>, search_kwargs={})"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"index.vectorstore.as_retriever()"
]
},
{
"cell_type": "markdown",
"id": "2cb6d2eb",
@@ -195,7 +275,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 11,
"id": "54270abc",
"metadata": {},
"outputs": [],
@@ -213,7 +293,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 12,
"id": "afecb8cf",
"metadata": {},
"outputs": [],
@@ -233,7 +313,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 13,
"id": "9eaaa735",
"metadata": {},
"outputs": [],
@@ -252,7 +332,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 14,
"id": "5c7049db",
"metadata": {},
"outputs": [
@@ -270,38 +350,55 @@
"db = Chroma.from_documents(texts, embeddings)"
]
},
{
"cell_type": "markdown",
"id": "f0ef85a6",
"metadata": {},
"source": [
"So that's creating the index. Then, we expose this index in a retriever interface."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "13495c77",
"metadata": {},
"outputs": [],
"source": [
"retriever = db.as_retriever()"
]
},
{
"cell_type": "markdown",
"id": "30c4e5c6",
"metadata": {},
"source": [
"So that's creating the index.\n",
"Then, as before, we create a chain and use it to answer questions!"
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 16,
"id": "3018f865",
"metadata": {},
"outputs": [],
"source": [
"qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", vectorstore=db)"
"qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", retriever=retriever)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 17,
"id": "032a47f8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" The President said that Ketanji Brown Jackson is one of the nation's top legal minds and a consensus builder, with a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. She is a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers.\""
"\" The President said that Judge Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He said she is a consensus builder and has received a broad range of support from organizations such as the Fraternal Order of Police and former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 13,
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}

View File

@@ -36,6 +36,8 @@ In the below guides, we cover different types of vectorstores and how to use the
`Chroma <./vectorstore_examples/chroma.html>`_: A walkthrough of how to use the Chroma vectorstore wrapper.
`AtlasDB <./vectorstore_examples/atlas.html>`_: A walkthrough of how to use the AtlasDB vectorstore and visualizer wrapper.
`DeepLake <./vectorstore_examples/deeplake.html>`_: A walkthrough of how to use the Deep Lake, data lake, wrapper.
`FAISS <./vectorstore_examples/faiss.html>`_: A walkthrough of how to use the FAISS vectorstore wrapper.
@@ -44,12 +46,16 @@ In the below guides, we cover different types of vectorstores and how to use the
`Milvus <./vectorstore_examples/milvus.html>`_: A walkthrough of how to use the Milvus vectorstore wrapper.
`Open Search <./vectorstore_examples/opensearch.html>`_: A walkthrough of how to use the OpenSearch wrapper.
`Pinecone <./vectorstore_examples/pinecone.html>`_: A walkthrough of how to use the Pinecone vectorstore wrapper.
`Qdrant <./vectorstore_examples/qdrant.html>`_: A walkthrough of how to use the Qdrant vectorstore wrapper.
`Weaviate <./vectorstore_examples/weaviate.html>`_: A walkthrough of how to use the Weaviate vectorstore wrapper.
`PGVector <./vectorstore_examples/pgvector.html>`_: A walkthrough of how to use the PGVector (Postgres Vector DB) vectorstore wrapper.
.. toctree::
:maxdepth: 1
@@ -61,6 +67,29 @@ In the below guides, we cover different types of vectorstores and how to use the
vectorstore_examples/*
Retrievers
------------
The retriever interface is a generic interface that makes it easy to combine documents with
language models. This interface exposes a `get_relevant_documents` method which takes in a query
(a string) and returns a list of documents.
`Vectorstore Retriever <./retriever_examples/vectorstore-retriever.html>`_: A walkthrough of how to use a VectorStore as a Retriever.
`ChatGPT Plugin Retriever <./retriever_examples/chatgpt-plugin-retriever.html>`_: A walkthrough of how to use the ChatGPT Plugin Retriever within the LangChain framework.
.. toctree::
:maxdepth: 1
:glob:
:caption: Retrievers
:name: retrievers
:hidden:
retriever_examples/*
Chains
------
@@ -92,4 +121,4 @@ The examples here are all end-to-end chains that use indexes or utils covered ab
:name: chains
:hidden:
./chain_examples/*
./chain_examples/*

View File

@@ -0,0 +1,148 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1edb9e6b",
"metadata": {},
"source": [
"# ChatGPT Plugin Retriever\n",
"\n",
"This notebook shows how to use the ChatGPT Retriever Plugin within LangChain."
]
},
{
"cell_type": "markdown",
"id": "074b0004",
"metadata": {},
"source": [
"## Create\n",
"\n",
"First, let's go over how to create the ChatGPT Retriever Plugin.\n",
"\n",
"To set up the ChatGPT Retriever Plugin, please follow instructions [here](https://github.com/openai/chatgpt-retrieval-plugin).\n",
"\n",
"You can also create the ChatGPT Retriever Plugin from LangChain document loaders. The below code walks through how to do that."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "bbe89ca0",
"metadata": {},
"outputs": [],
"source": [
"# STEP 1: Load\n",
"\n",
"# Load documents using LangChain's DocumentLoaders\n",
"# This is from https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/csv.html\n",
"\n",
"from langchain.document_loaders.csv_loader import CSVLoader\n",
"loader = CSVLoader(file_path='../../document_loaders/examples/example_data/mlb_teams_2012.csv')\n",
"data = loader.load()\n",
"\n",
"\n",
"# STEP 2: Convert\n",
"\n",
"# Convert Document to format expected by https://github.com/openai/chatgpt-retrieval-plugin\n",
"from typing import List\n",
"from langchain.docstore.document import Document\n",
"import json\n",
"\n",
"def write_json(path: str, documents: List[Document])-> None:\n",
" results = [{\"text\": doc.page_content} for doc in documents]\n",
" with open(path, \"w\") as f:\n",
" json.dump(results, f, indent=2)\n",
"\n",
"write_json(\"foo.json\", data)\n",
"\n",
"# STEP 3: Use\n",
"\n",
"# Ingest this as you would any other json file in https://github.com/openai/chatgpt-retrieval-plugin/tree/main/scripts/process_json\n"
]
},
{
"cell_type": "markdown",
"id": "0474661d",
"metadata": {},
"source": [
"## Using the ChatGPT Retriever Plugin\n",
"\n",
"Okay, so we've created the ChatGPT Retriever Plugin, but how do we actually use it?\n",
"\n",
"The below code walks through how to do that."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "39d6074e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.retrievers import ChatGPTPluginRetriever"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "33fd23d1",
"metadata": {},
"outputs": [],
"source": [
"retriever = ChatGPTPluginRetriever(url=\"http://0.0.0.0:8000\", bearer_token=\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "16250bdf",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content=\"This is Alice's phone number: 123-456-7890\", lookup_str='', metadata={'id': '456_0', 'metadata': {'source': 'email', 'source_id': '567', 'url': None, 'created_at': '1609592400.0', 'author': 'Alice', 'document_id': '456'}, 'embedding': None, 'score': 0.925571561}, lookup_index=0),\n",
" Document(page_content='This is a document about something', lookup_str='', metadata={'id': '123_0', 'metadata': {'source': 'file', 'source_id': 'https://example.com/doc1', 'url': 'https://example.com/doc1', 'created_at': '1609502400.0', 'author': 'Alice', 'document_id': '123'}, 'embedding': None, 'score': 0.6987589}, lookup_index=0),\n",
" Document(page_content='Team: Angels \"Payroll (millions)\": 154.49 \"Wins\": 89', lookup_str='', metadata={'id': '59c2c0c1-ae3f-4272-a1da-f44a723ea631_0', 'metadata': {'source': None, 'source_id': None, 'url': None, 'created_at': None, 'author': None, 'document_id': '59c2c0c1-ae3f-4272-a1da-f44a723ea631'}, 'embedding': None, 'score': 0.697888613}, lookup_index=0)]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.get_relevant_documents(\"alice's phone number\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c8b5794b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,93 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fc0db1bc",
"metadata": {},
"source": [
"# VectorStore Retriever\n",
"\n",
"The index - and therefor the retriever - that LangChain has the most support for is a VectorStoreRetriever. As the name suggests, this retriever is backed heavily by a VectorStore.\n",
"\n",
"Once you construct a VectorStore, its very easy to construct a retriever. Let's walk through an example."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5831703b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9fbcc58f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"embeddings = OpenAIEmbeddings()\n",
"db = Chroma.from_documents(texts, embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "0cbfb1af",
"metadata": {},
"outputs": [],
"source": [
"retriever = db.as_retriever()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc12700b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -2,11 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"metadata": {},
"source": [
"# AtlasDB\n",
"\n",
@@ -15,10 +11,10 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
"is_executing": true
}
},
"outputs": [],
@@ -32,56 +28,14 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting en-core-web-sm==3.5.0\n",
" Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)\n",
"\u001B[2K \u001B[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001B[0m \u001B[32m12.8/12.8 MB\u001B[0m \u001B[31m90.8 MB/s\u001B[0m eta \u001B[36m0:00:00\u001B[0m00:01\u001B[0m00:01\u001B[0m\n",
"\u001B[?25hRequirement already satisfied: spacy<3.6.0,>=3.5.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from en-core-web-sm==3.5.0) (3.5.0)\n",
"Requirement already satisfied: packaging>=20.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (23.0)\n",
"Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.1.1)\n",
"Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.3.0)\n",
"Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2.4.5)\n",
"Requirement already satisfied: pathy>=0.10.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (0.10.1)\n",
"Requirement already satisfied: setuptools in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (67.4.0)\n",
"Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (4.64.1)\n",
"Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.0.4)\n",
"Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (6.3.0)\n",
"Requirement already satisfied: thinc<8.2.0,>=8.1.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (8.1.7)\n",
"Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2.0.7)\n",
"Requirement already satisfied: typer<0.8.0,>=0.3.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (0.7.0)\n",
"Requirement already satisfied: requests<3.0.0,>=2.13.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2.28.2)\n",
"Requirement already satisfied: jinja2 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.1.2)\n",
"Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.11.0,>=1.7.4 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.10.5)\n",
"Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2.0.8)\n",
"Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.0.12)\n",
"Requirement already satisfied: numpy>=1.15.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.24.2)\n",
"Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.0.9)\n",
"Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.0.8)\n",
"Requirement already satisfied: typing-extensions>=4.2.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from pydantic!=1.8,!=1.8.1,<1.11.0,>=1.7.4->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (4.5.0)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.0.1)\n",
"Requirement already satisfied: idna<4,>=2.5 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (3.4)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2022.12.7)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from requests<3.0.0,>=2.13.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (1.26.14)\n",
"Requirement already satisfied: blis<0.8.0,>=0.7.8 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from thinc<8.2.0,>=8.1.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (0.7.9)\n",
"Requirement already satisfied: confection<1.0.0,>=0.0.1 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from thinc<8.2.0,>=8.1.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (0.0.4)\n",
"Requirement already satisfied: click<9.0.0,>=7.1.1 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from typer<0.8.0,>=0.3.0->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (8.1.3)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in /home/ubuntu/langchain/.venv/lib/python3.9/site-packages (from jinja2->spacy<3.6.0,>=3.5.0->en-core-web-sm==3.5.0) (2.1.2)\n",
"\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m23.0\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m23.0.1\u001B[0m\n",
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n",
"\u001B[38;5;2m✔ Download and installation successful\u001B[0m\n",
"You can now load the package via spacy.load('en_core_web_sm')\n"
]
"scrolled": true,
"pycharm": {
"is_executing": true
}
],
},
"outputs": [],
"source": [
"!python -m spacy download en_core_web_sm"
]
@@ -113,51 +67,31 @@
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-02-24 16:13:49.696 | INFO | nomic.project:_create_project:884 - Creating project `test_index_1677255228.136989` in organization `Atlas Demo`\n",
"2023-02-24 16:13:51.087 | INFO | nomic.project:wait_for_project_lock:993 - test_index_1677255228.136989: Project lock is released.\n",
"2023-02-24 16:13:51.225 | INFO | nomic.project:wait_for_project_lock:993 - test_index_1677255228.136989: Project lock is released.\n",
"2023-02-24 16:13:51.481 | INFO | nomic.project:add_text:1351 - Uploading text to Atlas.\n",
"1it [00:00, 1.20it/s]\n",
"2023-02-24 16:13:52.318 | INFO | nomic.project:add_text:1422 - Text upload succeeded.\n",
"2023-02-24 16:13:52.628 | INFO | nomic.project:wait_for_project_lock:993 - test_index_1677255228.136989: Project lock is released.\n",
"2023-02-24 16:13:53.380 | INFO | nomic.project:create_index:1192 - Created map `test_index_1677255228.136989_index` in project `test_index_1677255228.136989`: https://atlas.nomic.ai/map/ee2354a3-7f9a-4c6b-af43-b0cda09d7198/db996d77-8981-48a0-897a-ff2c22bbf541\n"
]
"execution_count": null,
"metadata": {
"pycharm": {
"is_executing": true
}
],
},
"outputs": [],
"source": [
"db = AtlasDB.from_texts(texts=texts,\n",
" name='test_index_'+str(time.time()),\n",
" description='test_index',\n",
" name='test_index_'+str(time.time()), # unique name for your vector store\n",
" description='test_index', #a description for your vector store\n",
" api_key=ATLAS_TEST_API_KEY,\n",
" index_kwargs={'build_topic_model': True})"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-02-24 16:14:09.106 | INFO | nomic.project:wait_for_project_lock:993 - test_index_1677255228.136989: Project lock is released.\n"
]
}
],
"execution_count": null,
"outputs": [],
"source": [
"with db.project.wait_for_project_lock():\n",
" time.sleep(1)"
]
"db.project.wait_for_project_lock()"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
@@ -263,4 +197,4 @@
},
"nbformat": 4,
"nbformat_minor": 1
}
}

View File

@@ -62,10 +62,6 @@
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
@@ -200,10 +196,104 @@
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "57da60d4",
"metadata": {},
"source": [
"## Merging\n",
"You can also merge two FAISS vectorstores"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6dfd2b78",
"metadata": {},
"outputs": [],
"source": [
"db1 = FAISS.from_texts([\"foo\"], embeddings)\n",
"db2 = FAISS.from_texts([\"bar\"], embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "29960da7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'e0b74348-6c93-4893-8764-943139ec1d17': Document(page_content='foo', lookup_str='', metadata={}, lookup_index=0)}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db1.docstore._dict"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "83392605",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'bdc50ae3-a1bb-4678-9260-1b0979578f40': Document(page_content='bar', lookup_str='', metadata={}, lookup_index=0)}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db2.docstore._dict"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "a3fcc1c7",
"metadata": {},
"outputs": [],
"source": [
"db1.merge_from(db2)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "41c51f89",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'e0b74348-6c93-4893-8764-943139ec1d17': Document(page_content='foo', lookup_str='', metadata={}, lookup_index=0),\n",
" 'd5211050-c777-493d-8825-4800e74cfdb6': Document(page_content='bar', lookup_str='', metadata={}, lookup_index=0)}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db1.docstore._dict"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bc8b71f7",
"id": "f80b60de",
"metadata": {},
"outputs": [],
"source": []

View File

@@ -0,0 +1,194 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# PGVector\n",
"\n",
"This notebook shows how to use functionality related to the Postgres vector database (PGVector)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## Loading Environment Variables\n",
"from typing import List, Tuple\n",
"from dotenv import load_dotenv\n",
"load_dotenv()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores.pgvector import PGVector\n",
"from langchain.document_loaders import TextLoader\n",
"from langchain.docstore.document import Document"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"## PGVector needs the connection string to the database.\n",
"## We will load it from the environment variables.\n",
"import os\n",
"CONNECTION_STRING = PGVector.connection_string_from_db_params(\n",
" driver=os.environ.get(\"PGVECTOR_DRIVER\", \"psycopg2\"),\n",
" host=os.environ.get(\"PGVECTOR_HOST\", \"localhost\"),\n",
" port=int(os.environ.get(\"PGVECTOR_PORT\", \"5432\")),\n",
" database=os.environ.get(\"PGVECTOR_DATABASE\", \"postgres\"),\n",
" user=os.environ.get(\"PGVECTOR_USER\", \"postgres\"),\n",
" password=os.environ.get(\"PGVECTOR_PASSWORD\", \"postgres\"),\n",
")\n",
"\n",
"\n",
"## Example\n",
"# postgresql+psycopg2://username:password@localhost:5432/database_name"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Similarity search with score"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Similarity Search with Euclidean Distance (Default)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# The PGVector Module will try to create a table with the name of the collection. So, make sure that the collection name is unique and the user has the \n",
"# permission to create a table.\n",
"\n",
"db = PGVector.from_documents(\n",
" embedding=embeddings,\n",
" documents=docs,\n",
" collection_name=\"state_of_the_union\",\n",
" connection_string=CONNECTION_STRING,\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------------------------------------------\n",
"Score: 0.6076628081132506\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6076628081132506\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6076804780049968\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6076804780049968\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n"
]
}
],
"source": [
"for doc, score in docs_with_score:\n",
" print(\"-\" * 80)\n",
" print(\"Score: \", score)\n",
" print(doc.page_content)\n",
" print(\"-\" * 80)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,191 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Redis\n",
"\n",
"This notebook shows how to use functionality related to the Redis database."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 1,
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores.redis import Redis"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 3,
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 4,
"outputs": [],
"source": [
"rds = Redis.from_documents(docs, embeddings, redis_url=\"redis://localhost:6379\", index_name='link')"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 5,
"outputs": [
{
"data": {
"text/plain": "'b564189668a343648996bd5a1d353d4e'"
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rds.index_name"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 6,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"results = rds.similarity_search(query)\n",
"print(results[0].page_content)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 7,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['doc:333eadf75bd74be393acafa8bca48669']\n"
]
}
],
"source": [
"print(rds.add_texts([\"Ankush went to Princeton\"]))"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 8,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ankush went to Princeton\n"
]
}
],
"source": [
"query = \"Princeton\"\n",
"results = rds.similarity_search(query)\n",
"print(results[0].page_content)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"#Query\n",
"rds = Redis.from_existing_index(embeddings, redis_url=\"redis://localhost:6379\", index_name='link')\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"results = rds.similarity_search(query)\n",
"print(results[0].page_content)"
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -119,10 +119,39 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "05e9e2fe",
"metadata": {},
"source": []
"source": [
"## Using PromptLayer Track\n",
"If you would like to use any of the [PromptLayer tracking features](https://magniv.notion.site/Track-4deee1b1f7a34c1680d085f82567dab9), you need to pass the argument `return_pl_id` when instantializing the PromptLayer LLM to get the request id. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a7315b9",
"metadata": {},
"outputs": [],
"source": [
"llm = PromptLayerOpenAI(return_pl_id=True)\n",
"llm_results = llm.generate([\"Tell me a joke\"])\n",
"\n",
"for res in llm_results.generations:\n",
" pl_request_id = res[0].generation_info[\"pl_request_id\"]\n",
" promptlayer.track.score(request_id=pl_request_id, score=100)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7eb19139",
"metadata": {},
"source": [
"Using this allows you to track the performance of your model in the PromptLayer dashboard. If you are using a prompt template, you can attach a template to a request as well.\n",
"Overall, this gives you the opportunity to track the performance of different templates and models in the PromptLayer dashboard."
]
}
],
"metadata": {
@@ -145,7 +174,7 @@
},
"vscode": {
"interpreter": {
"hash": "c4fe2cd85a8d9e8baaec5340ce66faff1c77581a9f43e6c45e85e09b6fced008"
"hash": "8a5edab282632443219e051e4ade2d1d5bbc671c781051bf1437897cbdfea0f1"
}
}
},

View File

@@ -0,0 +1,131 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# SageMakerEndpoint\n",
"\n",
"This notebooks goes over how to use an LLM hosted on a SageMaker endpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip3 install langchain boto3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.docstore.document import Document"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"example_doc_1 = \"\"\"\n",
"Peter and Elizabeth took a taxi to attend the night party in the city. While in the party, Elizabeth collapsed and was rushed to the hospital.\n",
"Since she was diagnosed with a brain injury, the doctor told Peter to stay besides her until she gets well.\n",
"Therefore, Peter stayed with her at the hospital for 3 days without leaving.\n",
"\"\"\"\n",
"\n",
"docs = [\n",
" Document(\n",
" page_content=example_doc_1,\n",
" )\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import Dict\n",
"\n",
"from langchain import PromptTemplate, SagemakerEndpoint\n",
"from langchain.llms.sagemaker_endpoint import ContentHandlerBase\n",
"from langchain.chains.question_answering import load_qa_chain\n",
"import json\n",
"\n",
"query = \"\"\"How long was Elizabeth hospitalized?\n",
"\"\"\"\n",
"\n",
"prompt_template = \"\"\"Use the following pieces of context to answer the question at the end.\n",
"\n",
"{context}\n",
"\n",
"Question: {question}\n",
"Answer:\"\"\"\n",
"PROMPT = PromptTemplate(\n",
" template=prompt_template, input_variables=[\"context\", \"question\"]\n",
")\n",
"\n",
"class ContentHandler(ContentHandlerBase):\n",
" content_type = \"application/json\"\n",
" accepts = \"application/json\"\n",
"\n",
" def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:\n",
" input_str = json.dumps({prompt: prompt, **model_kwargs})\n",
" return input_str.encode('utf-8')\n",
" \n",
" def transform_output(self, output: bytes) -> str:\n",
" response_json = json.loads(output.read().decode(\"utf-8\"))\n",
" return response_json[0][\"generated_text\"]\n",
"\n",
"content_handler = ContentHandler()\n",
"\n",
"chain = load_qa_chain(\n",
" llm=SagemakerEndpoint(\n",
" endpoint_name=\"endpoint-name\", \n",
" credentials_profile_name=\"credentials-profile-name\", \n",
" region_name=\"us-west-2\", \n",
" model_kwargs={\"temperature\":1e-10},\n",
" content_handler=content_handler\n",
" ),\n",
" prompt=PROMPT\n",
")\n",
"\n",
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -30,36 +30,12 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.memory import ChatMessageHistory"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4404d509",
"metadata": {},
"outputs": [],
"source": [
"history = ChatMessageHistory()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "78c1a67b",
"metadata": {},
"outputs": [],
"source": [
"history.add_user_message(\"hi!\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "525ce606",
"metadata": {},
"outputs": [],
"source": [
"from langchain.memory import ChatMessageHistory\n",
"\n",
"history = ChatMessageHistory()\n",
"\n",
"history.add_user_message(\"hi!\")\n",
"\n",
"history.add_ai_message(\"whats up?\")"
]
},
@@ -331,6 +307,99 @@
"conversation.predict(input=\"Tell me about yourself.\")"
]
},
{
"cell_type": "markdown",
"id": "fb68bb9e",
"metadata": {},
"source": [
"## Saving Message History\n",
"\n",
"You may often to save messages, and then load them to use again. This can be done easily by first converting the messages to normal python dictionaries, saving those (as json or something) and then loading those. Here is an example of doing that."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b5acbc4b",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"from langchain.memory import ChatMessageHistory\n",
"from langchain.schema import messages_from_dict, messages_to_dict\n",
"\n",
"history = ChatMessageHistory()\n",
"\n",
"history.add_user_message(\"hi!\")\n",
"\n",
"history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7812ee21",
"metadata": {},
"outputs": [],
"source": [
"dicts = messages_to_dict(history.messages)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "3ed6e6a0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'type': 'human', 'data': {'content': 'hi!', 'additional_kwargs': {}}},\n",
" {'type': 'ai', 'data': {'content': 'whats up?', 'additional_kwargs': {}}}]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dicts"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "cdf4ebd2",
"metadata": {},
"outputs": [],
"source": [
"new_messages = messages_from_dict(dicts)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9724e24b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[HumanMessage(content='hi!', additional_kwargs={}),\n",
" AIMessage(content='whats up?', additional_kwargs={})]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_messages"
]
},
{
"cell_type": "markdown",
"id": "7826c210",

View File

@@ -1,10 +1,23 @@
How-To Guides
=============
Types
-----
The first set of examples all highlight different types of memory.
`Buffer <./types/buffer.html>`_: How to use a type of memory that just keeps previous messages in a buffer.
`Buffer Window <./types/buffer_window.html>`_: How to use a type of memory that keeps previous messages in a buffer but only uses the previous `k` of them.
`Summary <./types/summary.html>`_: How to use a type of memory that summarizes previous messages.
`Summary Buffer <./types/summary_buffer.html>`_: How to use a type of memory that keeps a buffer of messages up to a point, and then summarizes them.
`Entity Memory <./types/entity_summary_memory.html>`_: How to use a type of memory that organizes information by entity.
`Knowledge Graph Memory <./types/kg.html>`_: How to use a type of memory that extracts and organizes information in a knowledge graph
.. toctree::
:maxdepth: 1
@@ -13,6 +26,10 @@ The first set of examples all highlight different types of memory.
./types/*
Usage
-----
The examples here all highlight how to use memory in different ways.
`Adding Memory <./examples/adding_memory.html>`_: How to add a memory component to any single input chain.

View File

@@ -9,8 +9,8 @@ both at a short term but also at a long term level. The concept of "Memory" exis
One of the simpler forms of memory occurs in chatbots, where they remember previous conversations.
There are a few different ways to accomplish this:
- Buffer: This is just passing in the past `N` interactions in as context. `N` can be chosen based on a fixed number, the length of the interactions, or other!
- Summary: This involves summarizing previous conversations and passing that summary in, instead of the raw dialouge itself. Compared to `Buffer`, this compresses information: meaning it is more lossy, but also less likely to run into context length limits.
- Combination: A combination of the above two approaches, where you compute a summary but also pass in some previous interfactions directly!
- Summary: This involves summarizing previous conversations and passing that summary in, instead of the raw dialogue itself. Compared to `Buffer`, this compresses information: meaning it is more lossy, but also less likely to run into context length limits.
- Combination: A combination of the above two approaches, where you compute a summary but also pass in some previous interactions directly!
## Entity Memory
A more complex form of memory is remembering information about specific entities in the conversation.

View File

@@ -0,0 +1,288 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ff4be5f3",
"metadata": {},
"source": [
"## ConversationTokenBufferMemory\n",
"\n",
"`ConversationTokenBufferMemory` keeps a buffer of recent interactions in memory, and uses token length rather than number of interactions to determine when to flush interactions.\n",
"\n",
"Let's first walk through how to use the utilities"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "da3384db",
"metadata": {},
"outputs": [],
"source": [
"from langchain.memory import ConversationTokenBufferMemory\n",
"from langchain.llms import OpenAI\n",
"llm = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e00d4938",
"metadata": {},
"outputs": [],
"source": [
"memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=10)\n",
"memory.save_context({\"input\": \"hi\"}, {\"ouput\": \"whats up\"})\n",
"memory.save_context({\"input\": \"not much you\"}, {\"ouput\": \"not much\"})"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2fe28a28",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'history': 'Human: not much you\\nAI: not much'}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"memory.load_memory_variables({})"
]
},
{
"cell_type": "markdown",
"id": "cf57b97a",
"metadata": {},
"source": [
"We can also get the history as a list of messages (this is useful if you are using this with a chat model)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3422a3a8",
"metadata": {},
"outputs": [],
"source": [
"memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=10, return_messages=True)\n",
"memory.save_context({\"input\": \"hi\"}, {\"ouput\": \"whats up\"})\n",
"memory.save_context({\"input\": \"not much you\"}, {\"ouput\": \"not much\"})"
]
},
{
"cell_type": "markdown",
"id": "a6d2569f",
"metadata": {},
"source": [
"## Using in a chain\n",
"Let's walk through an example, again setting `verbose=True` so we can see the prompt."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ebd68c10",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ConversationChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"Human: Hi, what's up?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\" Hi there! I'm doing great, just enjoying the day. How about you?\""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import ConversationChain\n",
"conversation_with_summary = ConversationChain(\n",
" llm=llm, \n",
" # We set a very low max_token_limit for the purposes of testing.\n",
" memory=ConversationTokenBufferMemory(llm=OpenAI(), max_token_limit=60),\n",
" verbose=True\n",
")\n",
"conversation_with_summary.predict(input=\"Hi, what's up?\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "86207a61",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ConversationChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"Human: Hi, what's up?\n",
"AI: Hi there! I'm doing great, just enjoying the day. How about you?\n",
"Human: Just working on writing some documentation!\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' Sounds like a productive day! What kind of documentation are you writing?'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation_with_summary.predict(input=\"Just working on writing some documentation!\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "76a0ab39",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ConversationChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"Human: Hi, what's up?\n",
"AI: Hi there! I'm doing great, just enjoying the day. How about you?\n",
"Human: Just working on writing some documentation!\n",
"AI: Sounds like a productive day! What kind of documentation are you writing?\n",
"Human: For LangChain! Have you heard of it?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\" Yes, I have heard of LangChain! It is a decentralized language-learning platform that connects native speakers and learners in real time. Is that the documentation you're writing about?\""
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation_with_summary.predict(input=\"For LangChain! Have you heard of it?\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "8c669db1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ConversationChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"Human: For LangChain! Have you heard of it?\n",
"AI: Yes, I have heard of LangChain! It is a decentralized language-learning platform that connects native speakers and learners in real time. Is that the documentation you're writing about?\n",
"Human: Haha nope, although a lot of people confuse it for that\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\" Oh, I see. Is there another language learning platform you're referring to?\""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We can see here that the buffer is updated\n",
"conversation_with_summary.predict(input=\"Haha nope, although a lot of people confuse it for that\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8c09a239",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -21,16 +21,17 @@
"id": "5d56ce86",
"metadata": {},
"source": [
"## Create a custom prompt template\n",
"## Creating a Custom Prompt Template\n",
"\n",
"The only two requirements for all prompt templates are:\n",
"There are essentially two distinct prompt templates available - string prompt templates and chat prompt templates. String prompt templates provides a simple prompt in string format, while chat prompt templates produces a more structured prompt to be used with a chat API.\n",
"\n",
"1. They have a input_variables attribute that exposes what input variables this prompt template expects.\n",
"2. They expose a format method which takes in keyword arguments corresponding to the expected input_variables and returns the formatted prompt.\n",
"In this guide, we will create a custom prompt using a string prompt template. \n",
"\n",
"Let's create a custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function.\n",
"To create a custom string prompt template, there are two requirements:\n",
"1. It has an input_variables attribute that exposes what input variables the prompt template expects.\n",
"2. It exposes a format method that takes in keyword arguments corresponding to the expected input_variables and returns the formatted prompt.\n",
"\n",
"First, let's create a function that will return the source code of a function given its name."
"We will create a custom prompt template that takes in the function name as input and formats the prompt to provide the source code of the function. To achieve this, let's first create a function that will return the source code of a function given its name."
]
},
{
@@ -62,11 +63,11 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import BasePromptTemplate\n",
"from langchain.prompts import StringPromptTemplate\n",
"from pydantic import BaseModel, validator\n",
"\n",
"\n",
"class FunctionExplainerPromptTemplate(BasePromptTemplate, BaseModel):\n",
"class FunctionExplainerPromptTemplate(StringPromptTemplate, BaseModel):\n",
" \"\"\" A custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function. \"\"\"\n",
"\n",
" @validator(\"input_variables\")\n",

View File

@@ -0,0 +1,764 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "084ee2f0",
"metadata": {},
"source": [
"# Output Parsers\n",
"\n",
"Language models output text. But many times you may want to get more structured information than just text back. This is where output parsers come in.\n",
"\n",
"Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:\n",
"\n",
"- `get_format_instructions() -> str`: A method which returns a string containing instructions for how the output of a language model should be formatted.\n",
"- `parse(str) -> Any`: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.\n",
"\n",
"And then one optional one:\n",
"\n",
"- `parse_with_prompt(str) -> Any`: A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to the prompt that generated such a response) and parses it into some structure. The prompt is largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from the prompt to do so.\n",
"\n",
"\n",
"Below we go over some examples of output parsers."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5f0c8a33",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate\n",
"from langchain.llms import OpenAI\n",
"from langchain.chat_models import ChatOpenAI"
]
},
{
"cell_type": "markdown",
"id": "a1ae632a",
"metadata": {},
"source": [
"## PydanticOutputParser\n",
"This output parser allows users to specify an arbitrary JSON schema and query LLMs for JSON outputs that conform to that schema.\n",
"\n",
"Keep in mind that large language models are leaky abstractions! You'll have to use an LLM with sufficient capacity to generate well-formed JSON. In the OpenAI family, DaVinci can do reliably but Curie's ability already drops off dramatically. \n",
"\n",
"Use Pydantic to declare your data model. Pydantic's BaseModel like a Python dataclass, but with actual type checking + coercion."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "cba6d8e3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.output_parsers import PydanticOutputParser\n",
"from pydantic import BaseModel, Field, validator\n",
"from typing import List"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0a203100",
"metadata": {},
"outputs": [],
"source": [
"model_name = 'text-davinci-003'\n",
"temperature = 0.0\n",
"model = OpenAI(model_name=model_name, temperature=temperature)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b3f16168",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Define your desired data structure.\n",
"class Joke(BaseModel):\n",
" setup: str = Field(description=\"question to set up a joke\")\n",
" punchline: str = Field(description=\"answer to resolve the joke\")\n",
" \n",
" # You can add custom validation logic easily with Pydantic.\n",
" @validator('setup')\n",
" def question_ends_with_question_mark(cls, field):\n",
" if field[-1] != '?':\n",
" raise ValueError(\"Badly formed question!\")\n",
" return field\n",
"\n",
"# And a query intented to prompt a language model to populate the data structure.\n",
"joke_query = \"Tell me a joke.\"\n",
"\n",
"# Set up a parser + inject instructions into the prompt template.\n",
"parser = PydanticOutputParser(pydantic_object=Joke)\n",
"\n",
"prompt = PromptTemplate(\n",
" template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
" input_variables=[\"query\"],\n",
" partial_variables={\"format_instructions\": parser.get_format_instructions()}\n",
")\n",
"\n",
"_input = prompt.format_prompt(query=joke_query)\n",
"\n",
"output = model(_input.to_string())\n",
"\n",
"parser.parse(output)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "03049f88",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Actor(name='Tom Hanks', film_names=['Forrest Gump', 'Saving Private Ryan', 'The Green Mile', 'Cast Away', 'Toy Story'])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Here's another example, but with a compound typed field.\n",
"class Actor(BaseModel):\n",
" name: str = Field(description=\"name of an actor\")\n",
" film_names: List[str] = Field(description=\"list of names of films they starred in\")\n",
" \n",
"actor_query = \"Generate the filmography for a random actor.\"\n",
"\n",
"parser = PydanticOutputParser(pydantic_object=Actor)\n",
"\n",
"prompt = PromptTemplate(\n",
" template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
" input_variables=[\"query\"],\n",
" partial_variables={\"format_instructions\": parser.get_format_instructions()}\n",
")\n",
"\n",
"_input = prompt.format_prompt(query=actor_query)\n",
"\n",
"output = model(_input.to_string())\n",
"\n",
"parser.parse(output)"
]
},
{
"cell_type": "markdown",
"id": "4d6c0c86",
"metadata": {},
"source": [
"## Fixing Output Parsing Mistakes\n",
"\n",
"The above guardrail simply tries to parse the LLM response. If it does not parse correctly, then it errors.\n",
"\n",
"But we can do other things besides throw errors. Specifically, we can pass the misformatted output, along with the formatted instructions, to the model and ask it to fix it.\n",
"\n",
"For this example, we'll use the above OutputParser. Here's what happens if we pass it a result that does not comply with the schema:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "73beb20d",
"metadata": {},
"outputs": [],
"source": [
"misformatted = \"{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f0e5ba80",
"metadata": {},
"outputs": [
{
"ename": "OutputParserException",
"evalue": "Failed to parse Actor from completion {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}. Got: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mJSONDecodeError\u001b[0m Traceback (most recent call last)",
"File \u001b[0;32m~/workplace/langchain/langchain/output_parsers/pydantic.py:23\u001b[0m, in \u001b[0;36mPydanticOutputParser.parse\u001b[0;34m(self, text)\u001b[0m\n\u001b[1;32m 22\u001b[0m json_str \u001b[38;5;241m=\u001b[39m match\u001b[38;5;241m.\u001b[39mgroup()\n\u001b[0;32m---> 23\u001b[0m json_object \u001b[38;5;241m=\u001b[39m \u001b[43mjson\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mloads\u001b[49m\u001b[43m(\u001b[49m\u001b[43mjson_str\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 24\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mpydantic_object\u001b[38;5;241m.\u001b[39mparse_obj(json_object)\n",
"File \u001b[0;32m~/.pyenv/versions/3.9.1/lib/python3.9/json/__init__.py:346\u001b[0m, in \u001b[0;36mloads\u001b[0;34m(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)\u001b[0m\n\u001b[1;32m 343\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m (\u001b[38;5;28mcls\u001b[39m \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m object_hook \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m\n\u001b[1;32m 344\u001b[0m parse_int \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m parse_float \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m\n\u001b[1;32m 345\u001b[0m parse_constant \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m object_pairs_hook \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m kw):\n\u001b[0;32m--> 346\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_default_decoder\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdecode\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 347\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mcls\u001b[39m \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n",
"File \u001b[0;32m~/.pyenv/versions/3.9.1/lib/python3.9/json/decoder.py:337\u001b[0m, in \u001b[0;36mJSONDecoder.decode\u001b[0;34m(self, s, _w)\u001b[0m\n\u001b[1;32m 333\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Return the Python representation of ``s`` (a ``str`` instance\u001b[39;00m\n\u001b[1;32m 334\u001b[0m \u001b[38;5;124;03mcontaining a JSON document).\u001b[39;00m\n\u001b[1;32m 335\u001b[0m \n\u001b[1;32m 336\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m--> 337\u001b[0m obj, end \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mraw_decode\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43midx\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m_w\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mend\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 338\u001b[0m end \u001b[38;5;241m=\u001b[39m _w(s, end)\u001b[38;5;241m.\u001b[39mend()\n",
"File \u001b[0;32m~/.pyenv/versions/3.9.1/lib/python3.9/json/decoder.py:353\u001b[0m, in \u001b[0;36mJSONDecoder.raw_decode\u001b[0;34m(self, s, idx)\u001b[0m\n\u001b[1;32m 352\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 353\u001b[0m obj, end \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mscan_once\u001b[49m\u001b[43m(\u001b[49m\u001b[43ms\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43midx\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 354\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mStopIteration\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m err:\n",
"\u001b[0;31mJSONDecodeError\u001b[0m: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)",
"\nDuring handling of the above exception, another exception occurred:\n",
"\u001b[0;31mOutputParserException\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[7], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mparser\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparse\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmisformatted\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/workplace/langchain/langchain/output_parsers/pydantic.py:29\u001b[0m, in \u001b[0;36mPydanticOutputParser.parse\u001b[0;34m(self, text)\u001b[0m\n\u001b[1;32m 27\u001b[0m name \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mpydantic_object\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m\n\u001b[1;32m 28\u001b[0m msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mFailed to parse \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mname\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m from completion \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mtext\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m. Got: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00me\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m---> 29\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m OutputParserException(msg)\n",
"\u001b[0;31mOutputParserException\u001b[0m: Failed to parse Actor from completion {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}. Got: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)"
]
}
],
"source": [
"parser.parse(misformatted)"
]
},
{
"cell_type": "markdown",
"id": "6c7c82b6",
"metadata": {},
"source": [
"Now we can construct and use a `OutputFixingParser`. This output parser takes as an argument another output parser but also an LLM with which to try to correct any formatting mistakes."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "39b1a5ce",
"metadata": {},
"outputs": [],
"source": [
"from langchain.output_parsers import OutputFixingParser\n",
"\n",
"new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "0fd96d68",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Actor(name='Tom Hanks', film_names=['Forrest Gump'])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_parser.parse(misformatted)"
]
},
{
"cell_type": "markdown",
"id": "ea34eeaa",
"metadata": {},
"source": [
"## Fixing Output Parsing Mistakes with the original prompt\n",
"\n",
"While in some cases it is possible to fix any parsing mistakes by only looking at the output, in other cases it can't. An example of this is when the output is not just in the incorrect format, but is partially complete. Consider the below example."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "67c5e1ac",
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Based on the user question, provide an Action and Action Input for what step should be taken.\n",
"{format_instructions}\n",
"Question: {query}\n",
"Response:\"\"\"\n",
"class Action(BaseModel):\n",
" action: str = Field(description=\"action to take\")\n",
" action_input: str = Field(description=\"input to the action\")\n",
" \n",
"parser = PydanticOutputParser(pydantic_object=Action)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "007aa87f",
"metadata": {},
"outputs": [],
"source": [
"prompt = PromptTemplate(\n",
" template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
" input_variables=[\"query\"],\n",
" partial_variables={\"format_instructions\": parser.get_format_instructions()}\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "10d207ff",
"metadata": {},
"outputs": [],
"source": [
"prompt_value = prompt.format_prompt(query=\"who is leo di caprios gf?\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "68622837",
"metadata": {},
"outputs": [],
"source": [
"bad_response = '{\"action\": \"search\"}'"
]
},
{
"cell_type": "markdown",
"id": "25631465",
"metadata": {},
"source": [
"If we try to parse this response as is, we will get an error"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "894967c1",
"metadata": {},
"outputs": [
{
"ename": "OutputParserException",
"evalue": "Failed to parse Action from completion {\"action\": \"search\"}. Got: 1 validation error for Action\naction_input\n field required (type=value_error.missing)",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
"File \u001b[0;32m~/workplace/langchain/langchain/output_parsers/pydantic.py:24\u001b[0m, in \u001b[0;36mPydanticOutputParser.parse\u001b[0;34m(self, text)\u001b[0m\n\u001b[1;32m 23\u001b[0m json_object \u001b[38;5;241m=\u001b[39m json\u001b[38;5;241m.\u001b[39mloads(json_str)\n\u001b[0;32m---> 24\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpydantic_object\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparse_obj\u001b[49m\u001b[43m(\u001b[49m\u001b[43mjson_object\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 26\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m (json\u001b[38;5;241m.\u001b[39mJSONDecodeError, ValidationError) \u001b[38;5;28;01mas\u001b[39;00m e:\n",
"File \u001b[0;32m~/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/pydantic/main.py:527\u001b[0m, in \u001b[0;36mpydantic.main.BaseModel.parse_obj\u001b[0;34m()\u001b[0m\n",
"File \u001b[0;32m~/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/pydantic/main.py:342\u001b[0m, in \u001b[0;36mpydantic.main.BaseModel.__init__\u001b[0;34m()\u001b[0m\n",
"\u001b[0;31mValidationError\u001b[0m: 1 validation error for Action\naction_input\n field required (type=value_error.missing)",
"\nDuring handling of the above exception, another exception occurred:\n",
"\u001b[0;31mOutputParserException\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[15], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mparser\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparse\u001b[49m\u001b[43m(\u001b[49m\u001b[43mbad_response\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/workplace/langchain/langchain/output_parsers/pydantic.py:29\u001b[0m, in \u001b[0;36mPydanticOutputParser.parse\u001b[0;34m(self, text)\u001b[0m\n\u001b[1;32m 27\u001b[0m name \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mpydantic_object\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m\n\u001b[1;32m 28\u001b[0m msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mFailed to parse \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mname\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m from completion \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mtext\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m. Got: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00me\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m---> 29\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m OutputParserException(msg)\n",
"\u001b[0;31mOutputParserException\u001b[0m: Failed to parse Action from completion {\"action\": \"search\"}. Got: 1 validation error for Action\naction_input\n field required (type=value_error.missing)"
]
}
],
"source": [
"parser.parse(bad_response)"
]
},
{
"cell_type": "markdown",
"id": "f6b64696",
"metadata": {},
"source": [
"If we try to use the `OutputFixingParser` to fix this error, it will be confused - namely, it doesn't know what to actually put for action input."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "78b2b40d",
"metadata": {},
"outputs": [],
"source": [
"fix_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "4fe1301d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Action(action='search', action_input='keyword')"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fix_parser.parse(bad_response)"
]
},
{
"cell_type": "markdown",
"id": "9bd9ea7d",
"metadata": {},
"source": [
"Instead, we can use the RetryOutputParser, which passes in the prompt (as well as the original output) to try again to get a better response."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "7e8a8a28",
"metadata": {},
"outputs": [],
"source": [
"from langchain.output_parsers import RetryWithErrorOutputParser"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "5c86e141",
"metadata": {},
"outputs": [],
"source": [
"retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=ChatOpenAI())"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "9c04f731",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Action(action='search', action_input='leo di caprios girlfriend')"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retry_parser.parse_with_prompt(bad_response, prompt_value)"
]
},
{
"cell_type": "markdown",
"id": "61f67890",
"metadata": {},
"source": [
"<br>\n",
"<br>\n",
"<br>\n",
"<br>\n",
"<br>\n",
"<br>\n",
"<br>\n",
"<br>\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "64bf525a",
"metadata": {},
"source": [
"# Older, less powerful parsers"
]
},
{
"cell_type": "markdown",
"id": "91871002",
"metadata": {},
"source": [
"## Structured Output Parser\n",
"\n",
"While the Pydantic/JSON parser is more powerful, we initially experimented data structures having text fields only."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "b492997a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.output_parsers import StructuredOutputParser, ResponseSchema"
]
},
{
"cell_type": "markdown",
"id": "09473dce",
"metadata": {},
"source": [
"Here we define the response schema we want to receive."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "432ac44a",
"metadata": {},
"outputs": [],
"source": [
"response_schemas = [\n",
" ResponseSchema(name=\"answer\", description=\"answer to the user's question\"),\n",
" ResponseSchema(name=\"source\", description=\"source used to answer the user's question, should be a website.\")\n",
"]\n",
"output_parser = StructuredOutputParser.from_response_schemas(response_schemas)"
]
},
{
"cell_type": "markdown",
"id": "7b92ce96",
"metadata": {},
"source": [
"We now get a string that contains instructions for how the response should be formatted, and we then insert that into our prompt."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "593cfc25",
"metadata": {},
"outputs": [],
"source": [
"format_instructions = output_parser.get_format_instructions()\n",
"prompt = PromptTemplate(\n",
" template=\"answer the users question as best as possible.\\n{format_instructions}\\n{question}\",\n",
" input_variables=[\"question\"],\n",
" partial_variables={\"format_instructions\": format_instructions}\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0943e783",
"metadata": {},
"source": [
"We can now use this to format a prompt to send to the language model, and then parse the returned result."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "106f1ba6",
"metadata": {},
"outputs": [],
"source": [
"model = OpenAI(temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "86d9d24f",
"metadata": {},
"outputs": [],
"source": [
"_input = prompt.format_prompt(question=\"what's the capital of france\")\n",
"output = model(_input.to_string())"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "956bdc99",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output_parser.parse(output)"
]
},
{
"cell_type": "markdown",
"id": "da639285",
"metadata": {},
"source": [
"And here's an example of using this in a chat model"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "8f483d7d",
"metadata": {},
"outputs": [],
"source": [
"chat_model = ChatOpenAI(temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "f761cbf1",
"metadata": {},
"outputs": [],
"source": [
"prompt = ChatPromptTemplate(\n",
" messages=[\n",
" HumanMessagePromptTemplate.from_template(\"answer the users question as best as possible.\\n{format_instructions}\\n{question}\") \n",
" ],\n",
" input_variables=[\"question\"],\n",
" partial_variables={\"format_instructions\": format_instructions}\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "edd73ae3",
"metadata": {},
"outputs": [],
"source": [
"_input = prompt.format_prompt(question=\"what's the capital of france\")\n",
"output = chat_model(_input.to_messages())"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "a3c8b91e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output_parser.parse(output.content)"
]
},
{
"cell_type": "markdown",
"id": "9936fa27",
"metadata": {},
"source": [
"## CommaSeparatedListOutputParser\n",
"\n",
"Here's another parser strictly less powerful than Pydantic/JSON parsing."
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "872246d7",
"metadata": {},
"outputs": [],
"source": [
"from langchain.output_parsers import CommaSeparatedListOutputParser"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "c3f9aee6",
"metadata": {},
"outputs": [],
"source": [
"output_parser = CommaSeparatedListOutputParser()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "e77871b7",
"metadata": {},
"outputs": [],
"source": [
"format_instructions = output_parser.get_format_instructions()\n",
"prompt = PromptTemplate(\n",
" template=\"List five {subject}.\\n{format_instructions}\",\n",
" input_variables=[\"subject\"],\n",
" partial_variables={\"format_instructions\": format_instructions}\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "a71cb5d3",
"metadata": {},
"outputs": [],
"source": [
"model = OpenAI(temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "783d7d98",
"metadata": {},
"outputs": [],
"source": [
"_input = prompt.format(subject=\"ice cream flavors\")\n",
"output = model(_input)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "fcb81344",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Vanilla',\n",
" 'Chocolate',\n",
" 'Strawberry',\n",
" 'Mint Chocolate Chip',\n",
" 'Cookies and Cream']"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output_parser.parse(output)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -120,6 +120,25 @@
"!cat simple_prompt.json"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de75e959",
"metadata": {},
"outputs": [],
"source": [
"prompt = load_prompt(\"simple_prompt.json\")\n",
"print(prompt.format(adjective=\"funny\", content=\"chickens\"))"
]
},
{
"cell_type": "markdown",
"id": "d1d788f9",
"metadata": {},
"source": [
"Tell me a funny joke about chickens."
]
},
{
"cell_type": "markdown",
"id": "d788a83c",

View File

@@ -32,3 +32,4 @@ The user guide here shows more advanced workflows and how to use the library in
./examples/prompt_serialization.ipynb
./examples/few_shot_examples_data.ipynb
./examples/example_selectors.ipynb
./examples/output_parsers.ipynb

View File

@@ -121,7 +121,8 @@
"tools = [\n",
" Tool(\n",
" name=\"Intermediate Answer\",\n",
" func=search.run\n",
" func=search.run,\n",
" description=\"useful for when you need to ask with search\"\n",
" )\n",
"]\n",
"\n",

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,326 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "16763ed3",
"metadata": {},
"source": [
"## Zapier Natural Language Actions API\n",
"\\\n",
"Full docs here: https://nla.zapier.com/api/v1/docs\n",
"\n",
"**Zapier Natural Language Actions** gives you access to the 5k+ apps, 20k+ actions on Zapier's platform through a natural language API interface.\n",
"\n",
"NLA supports apps like Gmail, Salesforce, Trello, Slack, Asana, HubSpot, Google Sheets, Microsoft Teams, and thousands more apps: https://zapier.com/apps\n",
"\n",
"Zapier NLA handles ALL the underlying API auth and translation from natural language --> underlying API call --> return simplified output for LLMs. The key idea is you, or your users, expose a set of actions via an oauth-like setup window, which you can then query and execute via a REST API.\n",
"\n",
"NLA offers both API Key and OAuth for signing NLA API requests.\n",
"\n",
"1. Server-side (API Key): for quickly getting started, testing, and production scenarios where LangChain will only use actions exposed in the developer's Zapier account (and will use the developer's connected accounts on Zapier.com)\n",
"\n",
"2. User-facing (Oauth): for production scenarios where you are deploying an end-user facing application and LangChain needs access to end-user's exposed actions and connected accounts on Zapier.com\n",
"\n",
"This quick start will focus on the server-side use case for brevity. Review [full docs](https://nla.zapier.com/api/v1/docs) or reach out to nla@zapier.com for user-facing oauth developer support.\n",
"\n",
"This example goes over how to use the Zapier integration with a `SimpleSequentialChain`, then an `Agent`.\n",
"In code, below:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a363309c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5cf33377",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# get from https://platform.openai.com/\n",
"os.environ[\"OPENAI_API_KEY\"] = os.environ.get(\"OPENAI_API_KEY\", \"\")\n",
"\n",
"# get from https://nla.zapier.com/demo/provider/debug (under User Information, after logging in): \n",
"os.environ[\"ZAPIER_NLA_API_KEY\"] = os.environ.get(\"ZAPIER_NLA_API_KEY\", \"\")"
]
},
{
"cell_type": "markdown",
"id": "4881b484-1b97-478f-b206-aec407ceff66",
"metadata": {},
"source": [
"## Example with Agent\n",
"Zapier tools can be used with an agent. See the example below."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b2044b17-c941-4ffb-8a03-027a35e2df81",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.agents import initialize_agent\n",
"from langchain.agents.agent_toolkits import ZapierToolkit\n",
"from langchain.utilities.zapier import ZapierNLAWrapper"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7b505eeb",
"metadata": {},
"outputs": [],
"source": [
"## step 0. expose gmail 'find email' and slack 'send channel message' actions\n",
"\n",
"# first go here, log in, expose (enable) the two actions: https://nla.zapier.com/demo/start -- for this example, can leave all fields \"Have AI guess\"\n",
"# in an oauth scenario, you'd get your own <provider> id (instead of 'demo') which you route your users through first"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "cab18227-c232-4214-9256-bb8dd352266c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"zapier = ZapierNLAWrapper()\n",
"toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)\n",
"agent = initialize_agent(toolkit.get_tools(), llm, agent=\"zero-shot-react-description\", verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f94713de-b64d-465f-a087-00288b5f80ec",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find the email and summarize it.\n",
"Action: Gmail: Find Email\n",
"Action Input: Find the latest email from Silicon Valley Bank\u001b[0m\n",
"Observation: \u001b[31;1m\u001b[1;3m{\"from__name\": \"Silicon Valley Bridge Bank, N.A.\", \"from__email\": \"sreply@svb.com\", \"body_plain\": \"Dear Clients, After chaotic, tumultuous & stressful days, we have clarity on path for SVB, FDIC is fully insuring all deposits & have an ask for clients & partners as we rebuild. Tim Mayopoulos <https://eml.svb.com/NjEwLUtBSy0yNjYAAAGKgoxUeBCLAyF_NxON97X4rKEaNBLG\", \"reply_to__email\": \"sreply@svb.com\", \"subject\": \"Meet the new CEO Tim Mayopoulos\", \"date\": \"Tue, 14 Mar 2023 23:42:29 -0500 (CDT)\", \"message_url\": \"https://mail.google.com/mail/u/0/#inbox/186e393b13cfdf0a\", \"attachment_count\": \"0\", \"to__emails\": \"ankush@langchain.dev\", \"message_id\": \"186e393b13cfdf0a\", \"labels\": \"IMPORTANT, CATEGORY_UPDATES, INBOX\"}\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to summarize the email and send it to the #test-zapier channel in Slack.\n",
"Action: Slack: Send Channel Message\n",
"Action Input: Send a slack message to the #test-zapier channel with the text \"Silicon Valley Bank has announced that Tim Mayopoulos is the new CEO. FDIC is fully insuring all deposits and they have an ask for clients and partners as they rebuild.\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m{\"message__text\": \"Silicon Valley Bank has announced that Tim Mayopoulos is the new CEO. FDIC is fully insuring all deposits and they have an ask for clients and partners as they rebuild.\", \"message__permalink\": \"https://langchain.slack.com/archives/C04TSGU0RA7/p1678859932375259\", \"channel\": \"C04TSGU0RA7\", \"message__bot_profile__name\": \"Zapier\", \"message__team\": \"T04F8K3FZB5\", \"message__bot_id\": \"B04TRV4R74K\", \"message__bot_profile__deleted\": \"false\", \"message__bot_profile__app_id\": \"A024R9PQM\", \"ts_time\": \"2023-03-15T05:58:52Z\", \"message__bot_profile__icons__image_36\": \"https://avatars.slack-edge.com/2022-08-02/3888649620612_f864dc1bb794cf7d82b0_36.png\", \"message__blocks[]block_id\": \"kdZZ\", \"message__blocks[]elements[]type\": \"['rich_text_section']\"}\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: I have sent a summary of the last email from Silicon Valley Bank to the #test-zapier channel in Slack.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'I have sent a summary of the last email from Silicon Valley Bank to the #test-zapier channel in Slack.'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"Summarize the last email I received regarding Silicon Valley Bank. Send the summary to the #test-zapier channel in slack.\")"
]
},
{
"cell_type": "markdown",
"id": "bcdea831",
"metadata": {},
"source": [
"# Example with SimpleSequentialChain\n",
"If you need more explicit control, use a chain, like below."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "10a46e7e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.chains import LLMChain, TransformChain, SimpleSequentialChain\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.tools.zapier.tool import ZapierNLARunAction\n",
"from langchain.utilities.zapier import ZapierNLAWrapper"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b9358048",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"## step 0. expose gmail 'find email' and slack 'send direct message' actions\n",
"\n",
"# first go here, log in, expose (enable) the two actions: https://nla.zapier.com/demo/start -- for this example, can leave all fields \"Have AI guess\"\n",
"# in an oauth scenario, you'd get your own <provider> id (instead of 'demo') which you route your users through first\n",
"\n",
"actions = ZapierNLAWrapper().list()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "4e80f461",
"metadata": {},
"outputs": [],
"source": [
"## step 1. gmail find email\n",
"\n",
"GMAIL_SEARCH_INSTRUCTIONS = \"Grab the latest email from Silicon Valley Bank\"\n",
"\n",
"def nla_gmail(inputs):\n",
" action = next((a for a in actions if a[\"description\"].startswith(\"Gmail: Find Email\")), None)\n",
" return {\"email_data\": ZapierNLARunAction(action_id=action[\"id\"], zapier_description=action[\"description\"], params_schema=action[\"params\"]).run(inputs[\"instructions\"])}\n",
"gmail_chain = TransformChain(input_variables=[\"instructions\"], output_variables=[\"email_data\"], transform=nla_gmail)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "46893233",
"metadata": {},
"outputs": [],
"source": [
"## step 2. generate draft reply\n",
"\n",
"template = \"\"\"You are an assisstant who drafts replies to an incoming email. Output draft reply in plain text (not JSON).\n",
"\n",
"Incoming email:\n",
"{email_data}\n",
"\n",
"Draft email reply:\"\"\"\n",
"\n",
"prompt_template = PromptTemplate(input_variables=[\"email_data\"], template=template)\n",
"reply_chain = LLMChain(llm=OpenAI(temperature=.7), prompt=prompt_template)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "cd85c4f8",
"metadata": {},
"outputs": [],
"source": [
"## step 3. send draft reply via a slack direct message\n",
"\n",
"SLACK_HANDLE = \"@Ankush Gola\"\n",
"\n",
"def nla_slack(inputs):\n",
" action = next((a for a in actions if a[\"description\"].startswith(\"Slack: Send Direct Message\")), None)\n",
" instructions = f'Send this to {SLACK_HANDLE} in Slack: {inputs[\"draft_reply\"]}'\n",
" return {\"slack_data\": ZapierNLARunAction(action_id=action[\"id\"], zapier_description=action[\"description\"], params_schema=action[\"params\"]).run(instructions)}\n",
"slack_chain = TransformChain(input_variables=[\"draft_reply\"], output_variables=[\"slack_data\"], transform=nla_slack)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4829cab4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
"\u001b[36;1m\u001b[1;3m{\"from__name\": \"Silicon Valley Bridge Bank, N.A.\", \"from__email\": \"sreply@svb.com\", \"body_plain\": \"Dear Clients, After chaotic, tumultuous & stressful days, we have clarity on path for SVB, FDIC is fully insuring all deposits & have an ask for clients & partners as we rebuild. Tim Mayopoulos <https://eml.svb.com/NjEwLUtBSy0yNjYAAAGKgoxUeBCLAyF_NxON97X4rKEaNBLG\", \"reply_to__email\": \"sreply@svb.com\", \"subject\": \"Meet the new CEO Tim Mayopoulos\", \"date\": \"Tue, 14 Mar 2023 23:42:29 -0500 (CDT)\", \"message_url\": \"https://mail.google.com/mail/u/0/#inbox/186e393b13cfdf0a\", \"attachment_count\": \"0\", \"to__emails\": \"ankush@langchain.dev\", \"message_id\": \"186e393b13cfdf0a\", \"labels\": \"IMPORTANT, CATEGORY_UPDATES, INBOX\"}\u001b[0m\n",
"\u001b[33;1m\u001b[1;3m\n",
"Dear Silicon Valley Bridge Bank, \n",
"\n",
"Thank you for your email and the update regarding your new CEO Tim Mayopoulos. We appreciate your dedication to keeping your clients and partners informed and we look forward to continuing our relationship with you. \n",
"\n",
"Best regards, \n",
"[Your Name]\u001b[0m\n",
"\u001b[38;5;200m\u001b[1;3m{\"message__text\": \"Dear Silicon Valley Bridge Bank, \\n\\nThank you for your email and the update regarding your new CEO Tim Mayopoulos. We appreciate your dedication to keeping your clients and partners informed and we look forward to continuing our relationship with you. \\n\\nBest regards, \\n[Your Name]\", \"message__permalink\": \"https://langchain.slack.com/archives/D04TKF5BBHU/p1678859968241629\", \"channel\": \"D04TKF5BBHU\", \"message__bot_profile__name\": \"Zapier\", \"message__team\": \"T04F8K3FZB5\", \"message__bot_id\": \"B04TRV4R74K\", \"message__bot_profile__deleted\": \"false\", \"message__bot_profile__app_id\": \"A024R9PQM\", \"ts_time\": \"2023-03-15T05:59:28Z\", \"message__blocks[]block_id\": \"p7i\", \"message__blocks[]elements[]elements[]type\": \"[['text']]\", \"message__blocks[]elements[]type\": \"['rich_text_section']\"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'{\"message__text\": \"Dear Silicon Valley Bridge Bank, \\\\n\\\\nThank you for your email and the update regarding your new CEO Tim Mayopoulos. We appreciate your dedication to keeping your clients and partners informed and we look forward to continuing our relationship with you. \\\\n\\\\nBest regards, \\\\n[Your Name]\", \"message__permalink\": \"https://langchain.slack.com/archives/D04TKF5BBHU/p1678859968241629\", \"channel\": \"D04TKF5BBHU\", \"message__bot_profile__name\": \"Zapier\", \"message__team\": \"T04F8K3FZB5\", \"message__bot_id\": \"B04TRV4R74K\", \"message__bot_profile__deleted\": \"false\", \"message__bot_profile__app_id\": \"A024R9PQM\", \"ts_time\": \"2023-03-15T05:59:28Z\", \"message__blocks[]block_id\": \"p7i\", \"message__blocks[]elements[]elements[]type\": \"[[\\'text\\']]\", \"message__blocks[]elements[]type\": \"[\\'rich_text_section\\']\"}'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"## finally, execute\n",
"\n",
"overall_chain = SimpleSequentialChain(chains=[gmail_chain, reply_chain, slack_chain], verbose=True)\n",
"overall_chain.run(GMAIL_SEARCH_INSTRUCTIONS)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "09ff954e-45f2-4595-92ea-91627abde4a0",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -9,3 +9,4 @@ sphinx-typlog-theme==0.8.0
sphinx-panels
toml
myst_nb
sphinx_copybutton

View File

@@ -35,12 +35,28 @@
"\n",
"import langchain\n",
"from langchain.agents import Tool, initialize_agent, load_tools\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "1b62cd48",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Agent run with tracing. Ensure that OPENAI_API_KEY is set appropriately to run this example.\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"tools = load_tools([\"llm-math\"], llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bfa16b79-aa4b-4d41-a067-70d1f593f667",
"metadata": {
"tags": []
@@ -70,16 +86,12 @@
"'1.0891804557407723'"
]
},
"execution_count": 2,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Agent run with tracing. Ensure that OPENAI_API_KEY is set appropriately to run this example.\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"tools = load_tools([\"llm-math\"], llm=llm)\n",
"agent = initialize_agent(\n",
" tools, llm, agent=\"zero-shot-react-description\", verbose=True\n",
")\n",
@@ -87,10 +99,94 @@
"agent.run(\"What is 2 raised to .123243 power?\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4829eb1d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mQuestion: What is 2 raised to .123243 power?\n",
"Thought: I need a calculator to solve this problem.\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"calculator\",\n",
" \"action_input\": \"2^0.123243\"\n",
"}\n",
"```\n",
"\u001b[0m\n",
"Observation: calculator is not a valid tool, try another one.\n",
"\u001b[32;1m\u001b[1;3mI made a mistake, I need to use the correct tool for this question.\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"calculator\",\n",
" \"action_input\": \"2^0.123243\"\n",
"}\n",
"```\n",
"\n",
"\u001b[0m\n",
"Observation: calculator is not a valid tool, try another one.\n",
"\u001b[32;1m\u001b[1;3mI made a mistake, the tool name is actually \"calc\" instead of \"calculator\".\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"calc\",\n",
" \"action_input\": \"2^0.123243\"\n",
"}\n",
"```\n",
"\n",
"\u001b[0m\n",
"Observation: calc is not a valid tool, try another one.\n",
"\u001b[32;1m\u001b[1;3mI made another mistake, the tool name is actually \"Calculator\" instead of \"calc\".\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Calculator\",\n",
" \"action_input\": \"2^0.123243\"\n",
"}\n",
"```\n",
"\n",
"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 1.0891804557407723\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mThe final answer is 1.0891804557407723.\n",
"Final Answer: 1.0891804557407723\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'1.0891804557407723'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Agent run with tracing using a chat model\n",
"agent = initialize_agent(\n",
" tools, ChatOpenAI(temperature=0), agent=\"chat-zero-shot-react-description\", verbose=True\n",
")\n",
"\n",
"agent.run(\"What is 2 raised to .123243 power?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "25addd7f",
"id": "76abfd82",
"metadata": {},
"outputs": [],
"source": []
@@ -112,7 +208,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -7,8 +7,8 @@ This page contains instructions for installing and then setting up the environme
1. Ensure you have Docker installed (see [Get Docker](https://docs.docker.com/get-docker/)) and that its running.
2. Install the latest version of `langchain`: `pip install langchain` or `pip install langchain -U` to upgrade your
existing version.
3. Run `langchain-server`
1. This will spin up the server in the terminal.
3. Run `langchain-server`. This command was installed automatically when you ran the above command (`pip install langchain`).
1. This will spin up the server in the terminal, hosted on port `4137` by default.
2. Once you see the terminal
output `langchain-langchain-frontend-1 | ➜ Local: [http://localhost:4173/](http://localhost:4173/)`, navigate
to [http://localhost:4173/](http://localhost:4173/)

View File

@@ -76,7 +76,7 @@ Examples of vector database companies include [Pinecone](https://www.pinecone.io
Although this is perhaps the most common way of document retrieval, people are starting to think about alternative
data structures and indexing techniques specifically for working with language models. For a leading example of this,
check out [GPT Index](https://github.com/jerryjliu/gpt_index) - a collection of data structures created by and optimized
check out [LlamaIndex](https://github.com/jerryjliu/llama_index) - a collection of data structures created by and optimized
for language models.
## Augmenting

View File

@@ -1,13 +1,89 @@
Evaluation
==============
Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.
This section of documentation covers how we approach and think about evaluation in LangChain.
Both evaluation of internal chains/agents, but also how we would recommend people building on top of LangChain approach evaluation.
The examples here all highlight how to use language models to assist in evaluation of themselves.
The Problem
-----------
It can be really hard to evaluate LangChain chains and agents.
There are two main reasons for this:
**# 1: Lack of data**
You generally don't have a ton of data to evaluate your chains/agents over before starting a project.
This is usually because Large Language Models (the core of most chains/agents) are terrific few-shot and zero shot learners,
meaning you are almost always able to get started on a particular task (text-to-SQL, question answering, etc) without
a large dataset of examples.
This is in stark contrast to traditional machine learning where you had to first collect a bunch of datapoints
before even getting started using a model.
**# 2: Lack of metrics**
Most chains/agents are performing tasks for which there are not very good metrics to evaluate performance.
For example, one of the most common use cases is generating text of some form.
Evaluating generated text is much more complicated than evaluating a classification prediction, or a numeric prediction.
The Solution
------------
LangChain attempts to tackle both of those issues.
What we have so far are initial passes at solutions - we do not think we have a perfect solution.
So we very much welcome feedback, contributions, integrations, and thoughts on this.
Here is what we have for each problem so far:
**# 1: Lack of data**
We have started `LangChainDatasets <https://huggingface.co/LangChainDatasets>`_ a Community space on Hugging Face.
We intend this to be a collection of open source datasets for evaluating common chains and agents.
We have contributed five datasets of our own to start, but we highly intend this to be a community effort.
In order to contribute a dataset, you simply need to join the community and then you will be able to upload datasets.
We're also aiming to make it as easy as possible for people to create their own datasets.
As a first pass at this, we've added a QAGenerationChain, which given a document comes up
with question-answer pairs that can be used to evaluate question-answering tasks over that document down the line.
See `this notebook <./evaluation/qa_generation.html>`_ for an example of how to use this chain.
**# 2: Lack of metrics**
We have two solutions to the lack of metrics.
The first solution is to use no metrics, and rather just rely on looking at results by eye to get a sense for how the chain/agent is performing.
To assist in this, we have developed (and will continue to develop) `tracing <../tracing.html>`_, a UI-based visualizer of your chain and agent runs.
The second solution we recommend is to use Language Models themselves to evaluate outputs.
For this we have a few different chains and prompts aimed at tackling this issue.
The Examples
------------
We have created a bunch of examples combining the above two solutions to show how we internally evaluate chains and agents when we are developing.
In addition to the examples we've curated, we also highly welcome contributions here.
To facilitate that, we've included a `template notebook <./evaluation/benchmarking_template.html>`_ for community members to use to build their own examples.
The existing examples we have are:
`Question Answering (State of Union) <./evaluation/qa_benchmarking_sota.html>`_: An notebook showing evaluation of a question-answering task over a State-of-the-Union address.
`Question Answering (Paul Graham Essay) <./evaluation/qa_benchmarking_pg.html>`_: An notebook showing evaluation of a question-answering task over a Paul Graham essay.
`SQL Question Answering (Chinook) <./evaluation/sql_qa_benchmarking_chinook.html>`_: An notebook showing evaluation of a question-answering task over a SQL database (the Chinook database).
`Agent Vectorstore <./evaluation/agent_vectordb_sota_pg.html>`_: An notebook showing evaluation of an agent doing question answering while routing between two different vector databases.
`Agent Search + Calculator <./evaluation/agent_benchmarking.html>`_: An notebook showing evaluation of an agent doing question answering using a Search engine and a Calculator as tools.
Other Examples
------------
In addition, we also have some more generic resources for evaluation.
`Question Answering <./evaluation/question_answering.html>`_: An overview of LLMs aimed at evaluating question answering systems in general.
`Data Augmented Question Answering <./evaluation/data_augmented_question_answering.html>`_: An end-to-end example of evaluating a question answering system focused on a specific document (a VectorDBQAChain to be precise). This example highlights how to use LLMs to come up with question/answer examples to evaluate over, and then highlights how to use LLMs to evaluate performance on those generated examples.
`Data Augmented Question Answering <./evaluation/data_augmented_question_answering.html>`_: An end-to-end example of evaluating a question answering system focused on a specific document (a RetrievalQAChain to be precise). This example highlights how to use LLMs to come up with question/answer examples to evaluate over, and then highlights how to use LLMs to evaluate performance on those generated examples.
`Hugging Face Datasets <./evaluation/huggingface_datasets.html>`_: Covers an example of loading and using a dataset from Hugging Face for evaluation.

View File

@@ -0,0 +1,343 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "984169ca",
"metadata": {},
"source": [
"# Agent Benchmarking: Search + Calculator\n",
"\n",
"Here we go over how to benchmark performance of an agent on tasks where it has access to a calculator and a search tool.\n",
"\n",
"It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "46bf9205",
"metadata": {},
"outputs": [],
"source": [
"# Comment this out if you are NOT using tracing\n",
"import os\n",
"os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
]
},
{
"cell_type": "markdown",
"id": "8a16b75d",
"metadata": {},
"source": [
"## Loading the data\n",
"First, let's load the data."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5b2d5e98",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found cached dataset json (/Users/harrisonchase/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--agent-search-calculator-8a025c0ce5fb99d2/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3a275586643f4ccfba1a8d54be28c351",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langchain.evaluation.loading import load_dataset\n",
"dataset = load_dataset(\"agent-search-calculator\")"
]
},
{
"cell_type": "markdown",
"id": "4ab6a716",
"metadata": {},
"source": [
"## Setting up a chain\n",
"Now we need to load an agent capable of answering these questions."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c18680b5",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.chains import LLMMathChain\n",
"from langchain.agents import initialize_agent, Tool, load_tools\n",
"\n",
"tools = load_tools(['serpapi', 'llm-math'], llm=OpenAI(temperature=0))\n",
"agent = initialize_agent(tools, OpenAI(temperature=0), agent=\"zero-shot-react-description\")\n"
]
},
{
"cell_type": "markdown",
"id": "68504a8f",
"metadata": {},
"source": [
"## Make a prediction\n",
"\n",
"First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "cbcafc92",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'38,630,316 people live in Canada as of 2023.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(dataset[0]['question'])"
]
},
{
"cell_type": "markdown",
"id": "d0c16cd7",
"metadata": {},
"source": [
"## Make many predictions\n",
"Now we can make predictions"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "24b4c66e",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer')).\n"
]
}
],
"source": [
"predictions = []\n",
"predicted_dataset = []\n",
"error_dataset = []\n",
"for data in dataset:\n",
" new_data = {\"input\": data[\"question\"], \"answer\": data[\"answer\"]}\n",
" try:\n",
" predictions.append(agent(new_data))\n",
" predicted_dataset.append(new_data)\n",
" except Exception:\n",
" error_dataset.append(new_data)"
]
},
{
"cell_type": "markdown",
"id": "49d969fb",
"metadata": {},
"source": [
"## Evaluate performance\n",
"Now we can evaluate the predictions. The first thing we can do is look at them by eye."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "1d583f03",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'input': 'How many people live in canada as of 2023?',\n",
" 'answer': 'approximately 38,625,801',\n",
" 'output': '38,630,316 people live in Canada as of 2023.',\n",
" 'intermediate_steps': [(AgentAction(tool='Search', tool_input='Population of Canada 2023', log=' I need to find population data\\nAction: Search\\nAction Input: Population of Canada 2023'),\n",
" '38,630,316')]}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictions[0]"
]
},
{
"cell_type": "markdown",
"id": "4783344b",
"metadata": {},
"source": [
"Next, we can use a language model to score them programatically"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d0a9341d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.evaluation.qa import QAEvalChain"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "1612dec1",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"eval_chain = QAEvalChain.from_llm(llm)\n",
"graded_outputs = eval_chain.evaluate(dataset, predictions, question_key=\"question\", prediction_key=\"output\")"
]
},
{
"cell_type": "markdown",
"id": "79587806",
"metadata": {},
"source": [
"We can add in the graded output to the `predictions` dict and then get a count of the grades."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "2a689df5",
"metadata": {},
"outputs": [],
"source": [
"for i, prediction in enumerate(predictions):\n",
" prediction['grade'] = graded_outputs[i]['text']"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "27b61215",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Counter({' CORRECT': 4, ' INCORRECT': 6})"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from collections import Counter\n",
"Counter([pred['grade'] for pred in predictions])"
]
},
{
"cell_type": "markdown",
"id": "12fe30f4",
"metadata": {},
"source": [
"We can also filter the datapoints to the incorrect examples and look at them."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "47c692a1",
"metadata": {},
"outputs": [],
"source": [
"incorrect = [pred for pred in predictions if pred['grade'] == \" INCORRECT\"]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "0ef976c1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'input': \"who is dua lipa's boyfriend? what is his age raised to the .43 power?\",\n",
" 'answer': 'her boyfriend is Romain Gravas. his age raised to the .43 power is approximately 4.9373857399466665',\n",
" 'output': \"Isaac Carew, Dua Lipa's boyfriend, is 36 years old and his age raised to the .43 power is 4.6688516567750975.\",\n",
" 'intermediate_steps': [(AgentAction(tool='Search', tool_input=\"Dua Lipa's boyfriend\", log=' I need to find out who Dua Lipa\\'s boyfriend is and then calculate his age raised to the .43 power\\nAction: Search\\nAction Input: \"Dua Lipa\\'s boyfriend\"'),\n",
" 'Dua and Isaac, a model and a chef, dated on and off from 2013 to 2019. The two first split in early 2017, which is when Dua went on to date LANY ...'),\n",
" (AgentAction(tool='Search', tool_input='Isaac Carew age', log=' I need to find out Isaac\\'s age\\nAction: Search\\nAction Input: \"Isaac Carew age\"'),\n",
" '36 years'),\n",
" (AgentAction(tool='Calculator', tool_input='36^.43', log=' I need to calculate 36 raised to the .43 power\\nAction: Calculator\\nAction Input: 36^.43'),\n",
" 'Answer: 4.6688516567750975\\n')],\n",
" 'grade': ' INCORRECT'}"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"incorrect[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7710401a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,503 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "984169ca",
"metadata": {},
"source": [
"# Agent VectorDB Question Answering Benchmarking\n",
"\n",
"Here we go over how to benchmark performance on a question answering task using an agent to route between multiple vectordatabases.\n",
"\n",
"It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "7b57a50f",
"metadata": {},
"outputs": [],
"source": [
"# Comment this out if you are NOT using tracing\n",
"import os\n",
"os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
]
},
{
"cell_type": "markdown",
"id": "8a16b75d",
"metadata": {},
"source": [
"## Loading the data\n",
"First, let's load the data."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5b2d5e98",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found cached dataset json (/Users/harrisonchase/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--agent-vectordb-qa-sota-pg-d3ae24016b514f92/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4c389519842e4b65afc33006a531dcbc",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langchain.evaluation.loading import load_dataset\n",
"dataset = load_dataset(\"agent-vectordb-qa-sota-pg\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "61375342",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What is the purpose of the NATO Alliance?',\n",
" 'answer': 'The purpose of the NATO Alliance is to secure peace and stability in Europe after World War 2.',\n",
" 'steps': [{'tool': 'State of Union QA System', 'tool_input': None},\n",
" {'tool': None, 'tool_input': 'What is the purpose of the NATO Alliance?'}]}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset[0]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "02500304",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What is the purpose of YC?',\n",
" 'answer': 'The purpose of YC is to cause startups to be founded that would not otherwise have existed.',\n",
" 'steps': [{'tool': 'Paul Graham QA System', 'tool_input': None},\n",
" {'tool': None, 'tool_input': 'What is the purpose of YC?'}]}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset[-1]"
]
},
{
"cell_type": "markdown",
"id": "4ab6a716",
"metadata": {},
"source": [
"## Setting up a chain\n",
"Now we need to create some pipelines for doing question answering. Step one in that is creating indexes over the data in question."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "c18680b5",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "7f0de2b3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.indexes import VectorstoreIndexCreator"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ef84ff99",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"vectorstore_sota = VectorstoreIndexCreator(vectorstore_kwargs={\"collection_name\":\"sota\"}).from_loaders([loader]).vectorstore"
]
},
{
"cell_type": "markdown",
"id": "f0b5d8f6",
"metadata": {},
"source": [
"Now we can create a question answering chain."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "8843cb0c",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "573719a0",
"metadata": {},
"outputs": [],
"source": [
"chain_sota = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type=\"stuff\", retriever=vectorstore_sota, input_key=\"question\")\n"
]
},
{
"cell_type": "markdown",
"id": "e48b03d8",
"metadata": {},
"source": [
"Now we do the same for the Paul Graham data."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "c2dbb014",
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader(\"../../modules/paul_graham_essay.txt\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "98d16f08",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"vectorstore_pg = VectorstoreIndexCreator(vectorstore_kwargs={\"collection_name\":\"paul_graham\"}).from_loaders([loader]).vectorstore"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "ec0aab02",
"metadata": {},
"outputs": [],
"source": [
"chain_pg = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type=\"stuff\", retriever=vectorstore_pg, input_key=\"question\")\n"
]
},
{
"cell_type": "markdown",
"id": "76b5f8fb",
"metadata": {},
"source": [
"We can now set up an agent to route between them."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "ade1aafa",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import initialize_agent, Tool\n",
"tools = [\n",
" Tool(\n",
" name = \"State of Union QA System\",\n",
" func=chain_sota.run,\n",
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\"\n",
" ),\n",
" Tool(\n",
" name = \"Paul Graham System\",\n",
" func=chain_pg.run,\n",
" description=\"useful for when you need to answer questions about Paul Graham. Input should be a fully formed question.\"\n",
" ),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "104853f8",
"metadata": {},
"outputs": [],
"source": [
"agent = initialize_agent(tools, OpenAI(temperature=0), agent=\"zero-shot-react-description\", max_iterations=3)"
]
},
{
"cell_type": "markdown",
"id": "7f036641",
"metadata": {},
"source": [
"## Make a prediction\n",
"\n",
"First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "4664e79f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The purpose of the NATO Alliance is to promote peace and security in the North Atlantic region by providing a collective defense against potential threats.'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(dataset[0]['question'])"
]
},
{
"cell_type": "markdown",
"id": "d0c16cd7",
"metadata": {},
"source": [
"## Make many predictions\n",
"Now we can make predictions"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "799f6c17",
"metadata": {},
"outputs": [],
"source": [
"predictions = []\n",
"predicted_dataset = []\n",
"error_dataset = []\n",
"for data in dataset:\n",
" new_data = {\"input\": data[\"question\"], \"answer\": data[\"answer\"]}\n",
" try:\n",
" predictions.append(agent(new_data))\n",
" predicted_dataset.append(new_data)\n",
" except Exception:\n",
" error_dataset.append(new_data)"
]
},
{
"cell_type": "markdown",
"id": "49d969fb",
"metadata": {},
"source": [
"## Evaluate performance\n",
"Now we can evaluate the predictions. The first thing we can do is look at them by eye."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1d583f03",
"metadata": {},
"outputs": [],
"source": [
"predictions[0]"
]
},
{
"cell_type": "markdown",
"id": "4783344b",
"metadata": {},
"source": [
"Next, we can use a language model to score them programatically"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d0a9341d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.evaluation.qa import QAEvalChain"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "1612dec1",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"eval_chain = QAEvalChain.from_llm(llm)\n",
"graded_outputs = eval_chain.evaluate(predicted_dataset, predictions, question_key=\"input\", prediction_key=\"output\")"
]
},
{
"cell_type": "markdown",
"id": "79587806",
"metadata": {},
"source": [
"We can add in the graded output to the `predictions` dict and then get a count of the grades."
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "2a689df5",
"metadata": {},
"outputs": [],
"source": [
"for i, prediction in enumerate(predictions):\n",
" prediction['grade'] = graded_outputs[i]['text']"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "27b61215",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Counter({' CORRECT': 19, ' INCORRECT': 14})"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from collections import Counter\n",
"Counter([pred['grade'] for pred in predictions])"
]
},
{
"cell_type": "markdown",
"id": "12fe30f4",
"metadata": {},
"source": [
"We can also filter the datapoints to the incorrect examples and look at them."
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "47c692a1",
"metadata": {},
"outputs": [],
"source": [
"incorrect = [pred for pred in predictions if pred['grade'] == \" INCORRECT\"]"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "0ef976c1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'input': 'What is the purpose of the Bipartisan Innovation Act mentioned in the text?',\n",
" 'answer': 'The Bipartisan Innovation Act will make record investments in emerging technologies and American manufacturing to level the playing field with China and other competitors.',\n",
" 'output': 'The purpose of the Bipartisan Innovation Act is to promote innovation and entrepreneurship in the United States by providing tax incentives and other support for startups and small businesses.',\n",
" 'grade': ' INCORRECT'}"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"incorrect[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7710401a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,160 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a175c650",
"metadata": {},
"source": [
"# Benchmarking Template\n",
"\n",
"This is an example notebook that can be used to create a benchmarking notebook for a task of your choice. Evaluation is really hard, and so we greatly welcome any contributions that can make it easier for people to experiment"
]
},
{
"cell_type": "markdown",
"id": "984169ca",
"metadata": {},
"source": [
"It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "9fe4d1b4",
"metadata": {},
"outputs": [],
"source": [
"# Comment this out if you are NOT using tracing\n",
"import os\n",
"os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
]
},
{
"cell_type": "markdown",
"id": "0f66405e",
"metadata": {},
"source": [
"## Loading the data\n",
"\n",
"First, let's load the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "79402a8f",
"metadata": {},
"outputs": [],
"source": [
"# This notebook should so how to load the dataset from LangChainDatasets on Hugging Face\n",
"\n",
"# Please upload your dataset to https://huggingface.co/LangChainDatasets\n",
"\n",
"# The value passed into `load_dataset` should NOT have the `LangChainDatasets/` prefix\n",
"from langchain.evaluation.loading import load_dataset\n",
"dataset = load_dataset(\"TODO\")"
]
},
{
"cell_type": "markdown",
"id": "8a16b75d",
"metadata": {},
"source": [
"## Setting up a chain\n",
"\n",
"This next section should have an example of setting up a chain that can be run on this dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a2661ce0",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "6c0062e7",
"metadata": {},
"source": [
"## Make a prediction\n",
"\n",
"First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d28c5e7d",
"metadata": {},
"outputs": [],
"source": [
"# Example of running the chain on a single datapoint (`dataset[0]`) goes here"
]
},
{
"cell_type": "markdown",
"id": "d0c16cd7",
"metadata": {},
"source": [
"## Make many predictions\n",
"Now we can make predictions."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "24b4c66e",
"metadata": {},
"outputs": [],
"source": [
"# Example of running the chain on many predictions goes here\n",
"\n",
"# Sometimes its as simple as `chain.apply(dataset)`\n",
"\n",
"# Othertimes you may want to write a for loop to catch errors"
]
},
{
"cell_type": "markdown",
"id": "4783344b",
"metadata": {},
"source": [
"## Evaluate performance\n",
"\n",
"Any guide to evaluating performance in a more systematic manner goes here."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7710401a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -23,12 +23,13 @@
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain import OpenAI, VectorDBQA"
"from langchain.llms import OpenAI\n",
"from langchain.chains import RetrievalQA"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"id": "4fdc211d",
"metadata": {},
"outputs": [
@@ -50,7 +51,7 @@
"\n",
"embeddings = OpenAIEmbeddings()\n",
"docsearch = Chroma.from_documents(texts, embeddings)\n",
"qa = VectorDBQA.from_llm(llm=OpenAI(), vectorstore=docsearch)"
"qa = RetrievalQA.from_llm(llm=OpenAI(), retriever=docsearch.as_retriever())"
]
},
{
@@ -67,7 +68,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 3,
"id": "3459b001",
"metadata": {},
"outputs": [],
@@ -87,7 +88,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"id": "b9c3fa75",
"metadata": {},
"outputs": [],
@@ -99,7 +100,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 5,
"id": "c24543a9",
"metadata": {},
"outputs": [],
@@ -116,16 +117,16 @@
{
"data": {
"text/plain": [
"[{'query': 'What did Vladimir Putin miscalculate when he sought to shake the foundations of the free world? ',\n",
" 'answer': 'He miscalculated that the world would roll over and that he could roll into Ukraine without facing resistance.'},\n",
" {'query': 'What is the purpose of NATO?',\n",
" 'answer': 'The purpose of NATO is to secure peace and stability in Europe after World War 2.'},\n",
" {'query': \"What did the author do to prepare for Putin's attack on Ukraine?\",\n",
" 'answer': \"The author spent months building a coalition of freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin, shared with the world in advance what they knew Putin was planning, and countered Russia's lies with truth.\"},\n",
" {'query': 'What are the US and its allies doing to isolate Russia from the world?',\n",
" 'answer': \"Enforcing powerful economic sanctions, cutting off Russia's largest banks from the international financial system, preventing Russia's central bank from defending the Russian Ruble, choking off Russia's access to technology, and joining with European allies to find and seize assets of Russian oligarchs.\"},\n",
" {'query': 'How much direct assistance is the U.S. providing to Ukraine?',\n",
" 'answer': 'The U.S. is providing more than $1 Billion in direct assistance to Ukraine.'}]"
"[{'query': 'According to the document, what did Vladimir Putin miscalculate?',\n",
" 'answer': 'He miscalculated that he could roll into Ukraine and the world would roll over.'},\n",
" {'query': 'Who is the Ukrainian Ambassador to the United States?',\n",
" 'answer': 'The Ukrainian Ambassador to the United States is here tonight.'},\n",
" {'query': 'How many countries were part of the coalition formed to confront Putin?',\n",
" 'answer': '27 members of the European Union, France, Germany, Italy, the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.'},\n",
" {'query': 'What action is the U.S. Department of Justice taking to target Russian oligarchs?',\n",
" 'answer': 'The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs and joining with European allies to find and seize their yachts, luxury apartments, and private jets.'},\n",
" {'query': 'How much direct assistance is the United States providing to Ukraine?',\n",
" 'answer': 'The United States is providing more than $1 Billion in direct assistance to Ukraine.'}]"
]
},
"execution_count": 6,
@@ -211,44 +212,43 @@
"Example 0:\n",
"Question: What did the president say about Ketanji Brown Jackson\n",
"Real Answer: He praised her legal ability and said he nominated her for the supreme court.\n",
"Predicted Answer: The president said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\n",
"Predicted Answer: The president said that she is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by both Democrats and Republicans.\n",
"Predicted Grade: CORRECT\n",
"\n",
"Example 1:\n",
"Question: What did the president say about Michael Jackson\n",
"Real Answer: Nothing\n",
"Predicted Answer: \n",
"The president did not mention Michael Jackson in this context.\n",
"Predicted Answer: The president did not mention Michael Jackson in this speech.\n",
"Predicted Grade: CORRECT\n",
"\n",
"Example 2:\n",
"Question: What did Vladimir Putin miscalculate when he sought to shake the foundations of the free world? \n",
"Real Answer: He miscalculated that the world would roll over and that he could roll into Ukraine without facing resistance.\n",
"Predicted Answer: Putin miscalculated that the West and NATO wouldn't respond to his attack on Ukraine and that he could divide the US and its allies.\n",
"Question: According to the document, what did Vladimir Putin miscalculate?\n",
"Real Answer: He miscalculated that he could roll into Ukraine and the world would roll over.\n",
"Predicted Answer: Putin miscalculated that the world would roll over when he rolled into Ukraine.\n",
"Predicted Grade: CORRECT\n",
"\n",
"Example 3:\n",
"Question: What is the purpose of NATO?\n",
"Real Answer: The purpose of NATO is to secure peace and stability in Europe after World War 2.\n",
"Predicted Answer: The purpose of NATO is to secure peace and stability in Europe after World War 2.\n",
"Predicted Grade: CORRECT\n",
"Question: Who is the Ukrainian Ambassador to the United States?\n",
"Real Answer: The Ukrainian Ambassador to the United States is here tonight.\n",
"Predicted Answer: I don't know.\n",
"Predicted Grade: INCORRECT\n",
"\n",
"Example 4:\n",
"Question: What did the author do to prepare for Putin's attack on Ukraine?\n",
"Real Answer: The author spent months building a coalition of freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin, shared with the world in advance what they knew Putin was planning, and countered Russia's lies with truth.\n",
"Predicted Answer: The author prepared extensively and carefully. They spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin, and they spent countless hours unifying their European allies. They also shared with the world in advance what they knew Putin was planning and precisely how he would try to falsely justify his aggression. They countered Russias lies with truth.\n",
"Predicted Grade: CORRECT\n",
"Question: How many countries were part of the coalition formed to confront Putin?\n",
"Real Answer: 27 members of the European Union, France, Germany, Italy, the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.\n",
"Predicted Answer: The coalition included freedom-loving nations from Europe and the Americas to Asia and Africa, 27 members of the European Union including France, Germany, Italy, the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.\n",
"Predicted Grade: INCORRECT\n",
"\n",
"Example 5:\n",
"Question: What are the US and its allies doing to isolate Russia from the world?\n",
"Real Answer: Enforcing powerful economic sanctions, cutting off Russia's largest banks from the international financial system, preventing Russia's central bank from defending the Russian Ruble, choking off Russia's access to technology, and joining with European allies to find and seize assets of Russian oligarchs.\n",
"Predicted Answer: The US and its allies are enforcing economic sanctions on Russia, cutting off its largest banks from the international financial system, preventing its central bank from defending the Russian Ruble, choking off Russia's access to technology, closing American airspace to all Russian flights, and providing support to Ukraine.\n",
"Predicted Grade: CORRECT\n",
"Question: What action is the U.S. Department of Justice taking to target Russian oligarchs?\n",
"Real Answer: The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs and joining with European allies to find and seize their yachts, luxury apartments, and private jets.\n",
"Predicted Answer: The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs and to find and seize their yachts, luxury apartments, and private jets.\n",
"Predicted Grade: INCORRECT\n",
"\n",
"Example 6:\n",
"Question: How much direct assistance is the U.S. providing to Ukraine?\n",
"Real Answer: The U.S. is providing more than $1 Billion in direct assistance to Ukraine.\n",
"Predicted Answer: The U.S. is providing more than $1 Billion in direct assistance to Ukraine.\n",
"Question: How much direct assistance is the United States providing to Ukraine?\n",
"Real Answer: The United States is providing more than $1 Billion in direct assistance to Ukraine.\n",
"Predicted Answer: The United States is providing more than $1 billion in direct assistance to Ukraine.\n",
"Predicted Grade: CORRECT\n",
"\n"
]
@@ -264,13 +264,159 @@
" print()"
]
},
{
"cell_type": "markdown",
"id": "50a9e845",
"metadata": {},
"source": [
"## Evaluate with Other Metrics\n",
"\n",
"In addition to predicting whether the answer is correct or incorrect using a language model, we can also use other metrics to get a more nuanced view on the quality of the answers. To do so, we can use the [Critique](https://docs.inspiredco.ai/critique/) library, which allows for simple calculation of various metrics over generated text.\n",
"\n",
"First you can get an API key from the [Inspired Cognition Dashboard](https://dashboard.inspiredco.ai) and do some setup:\n",
"\n",
"```bash\n",
"export INSPIREDCO_API_KEY=\"...\"\n",
"pip install inspiredco\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 13,
"id": "bd0b01dc",
"metadata": {},
"outputs": [],
"source": []
"source": [
"import inspiredco.critique\n",
"import os\n",
"critique = inspiredco.critique.Critique(api_key=os.environ['INSPIREDCO_API_KEY'])"
]
},
{
"cell_type": "markdown",
"id": "4f52629e",
"metadata": {},
"source": [
"Then run the following code to set up the configuration and calculate the [ROUGE](https://docs.inspiredco.ai/critique/metric_rouge.html), [chrf](https://docs.inspiredco.ai/critique/metric_chrf.html), [BERTScore](https://docs.inspiredco.ai/critique/metric_bert_score.html), and [UniEval](https://docs.inspiredco.ai/critique/metric_uni_eval.html) (you can choose [other metrics](https://docs.inspiredco.ai/critique/metrics.html) too):"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "84a0ba21",
"metadata": {},
"outputs": [],
"source": [
"metrics = {\n",
" \"rouge\": {\n",
" \"metric\": \"rouge\",\n",
" \"config\": {\"variety\": \"rouge_l\"},\n",
" },\n",
" \"chrf\": {\n",
" \"metric\": \"chrf\",\n",
" \"config\": {},\n",
" },\n",
" \"bert_score\": {\n",
" \"metric\": \"bert_score\",\n",
" \"config\": {\"model\": \"bert-base-uncased\"},\n",
" },\n",
" \"uni_eval\": {\n",
" \"metric\": \"uni_eval\",\n",
" \"config\": {\"task\": \"summarization\", \"evaluation_aspect\": \"relevance\"},\n",
" },\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "3b9a4056",
"metadata": {},
"outputs": [],
"source": [
"critique_data = [\n",
" {\"target\": pred['result'], \"references\": [pred['answer']]} for pred in predictions\n",
"]\n",
"eval_results = {\n",
" k: critique.evaluate(dataset=critique_data, metric=v[\"metric\"], config=v[\"config\"])\n",
" for k, v in metrics.items()\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "6f0ae799",
"metadata": {},
"source": [
"Finally, we can print out the results. We can see that overall the scores are higher when the output is semantically correct, and also when the output closely matches with the gold-standard answer."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "b51edcf4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Example 0:\n",
"Question: What did the president say about Ketanji Brown Jackson\n",
"Real Answer: He praised her legal ability and said he nominated her for the supreme court.\n",
"Predicted Answer: The president said that she is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by both Democrats and Republicans.\n",
"Predicted Scores: rouge=0.0941, chrf=0.2001, bert_score=0.5219, uni_eval=0.9043\n",
"\n",
"Example 1:\n",
"Question: What did the president say about Michael Jackson\n",
"Real Answer: Nothing\n",
"Predicted Answer: The president did not mention Michael Jackson in this speech.\n",
"Predicted Scores: rouge=0.0000, chrf=0.1087, bert_score=0.3486, uni_eval=0.7802\n",
"\n",
"Example 2:\n",
"Question: According to the document, what did Vladimir Putin miscalculate?\n",
"Real Answer: He miscalculated that he could roll into Ukraine and the world would roll over.\n",
"Predicted Answer: Putin miscalculated that the world would roll over when he rolled into Ukraine.\n",
"Predicted Scores: rouge=0.5185, chrf=0.6955, bert_score=0.8421, uni_eval=0.9578\n",
"\n",
"Example 3:\n",
"Question: Who is the Ukrainian Ambassador to the United States?\n",
"Real Answer: The Ukrainian Ambassador to the United States is here tonight.\n",
"Predicted Answer: I don't know.\n",
"Predicted Scores: rouge=0.0000, chrf=0.0375, bert_score=0.3159, uni_eval=0.7493\n",
"\n",
"Example 4:\n",
"Question: How many countries were part of the coalition formed to confront Putin?\n",
"Real Answer: 27 members of the European Union, France, Germany, Italy, the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.\n",
"Predicted Answer: The coalition included freedom-loving nations from Europe and the Americas to Asia and Africa, 27 members of the European Union including France, Germany, Italy, the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland.\n",
"Predicted Scores: rouge=0.7419, chrf=0.8602, bert_score=0.8388, uni_eval=0.0669\n",
"\n",
"Example 5:\n",
"Question: What action is the U.S. Department of Justice taking to target Russian oligarchs?\n",
"Real Answer: The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs and joining with European allies to find and seize their yachts, luxury apartments, and private jets.\n",
"Predicted Answer: The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs and to find and seize their yachts, luxury apartments, and private jets.\n",
"Predicted Scores: rouge=0.9412, chrf=0.8687, bert_score=0.9607, uni_eval=0.9718\n",
"\n",
"Example 6:\n",
"Question: How much direct assistance is the United States providing to Ukraine?\n",
"Real Answer: The United States is providing more than $1 Billion in direct assistance to Ukraine.\n",
"Predicted Answer: The United States is providing more than $1 billion in direct assistance to Ukraine.\n",
"Predicted Scores: rouge=1.0000, chrf=0.9483, bert_score=1.0000, uni_eval=0.9734\n",
"\n"
]
}
],
"source": [
"for i, eg in enumerate(examples):\n",
" score_string = \", \".join([f\"{k}={v['examples'][i]['value']:.4f}\" for k, v in eval_results.items()])\n",
" print(f\"Example {i}:\")\n",
" print(\"Question: \" + predictions[i]['query'])\n",
" print(\"Real Answer: \" + predictions[i]['answer'])\n",
" print(\"Predicted Answer: \" + predictions[i]['result'])\n",
" print(\"Predicted Scores: \" + score_string)\n",
" print()"
]
}
],
"metadata": {

View File

@@ -0,0 +1,306 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a4734146",
"metadata": {},
"source": [
"# LLM Math\n",
"\n",
"Evaluating chains that know how to do math."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "fdd7afae",
"metadata": {},
"outputs": [],
"source": [
"# Comment this out if you are NOT using tracing\n",
"import os\n",
"os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ce05ffea",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d028a511cede4de2b845b9a9954d6bea",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading readme: 0%| | 0.00/21.0 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading and preparing dataset json/LangChainDatasets--llm-math to /Users/harrisonchase/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--llm-math-509b11d101165afa/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a71c8e5a21dd4da5a20a354b544f7a58",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading data files: 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "ae530ca624154a1a934075c47d1093a6",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading data: 0%| | 0.00/631 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7a4968df05d84bc483aa2c5039aecafe",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Extracting data files: 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Generating train split: 0 examples [00:00, ? examples/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset json downloaded and prepared to /Users/harrisonchase/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--llm-math-509b11d101165afa/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9a2caed96225410fb1cc0f8f155eb766",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langchain.evaluation.loading import load_dataset\n",
"dataset = load_dataset(\"llm-math\")"
]
},
{
"cell_type": "markdown",
"id": "8a998d6f",
"metadata": {},
"source": [
"## Setting up a chain\n",
"Now we need to create some pipelines for doing math."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "7078f7f8",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.chains import LLMMathChain"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "2bd70c46",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "954c3270",
"metadata": {},
"outputs": [],
"source": [
"chain = LLMMathChain(llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "f252027e",
"metadata": {},
"outputs": [],
"source": [
"predictions = chain.apply(dataset)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "c8af7041",
"metadata": {},
"outputs": [],
"source": [
"numeric_output = [float(p['answer'].strip().strip(\"Answer: \")) for p in predictions]"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "cc09ffe4",
"metadata": {},
"outputs": [],
"source": [
"correct = [example['answer'] == numeric_output[i] for i, example in enumerate(dataset)]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "585244e4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(correct) / len(correct)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "0d14ac78",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"input: 5\n",
"expected output : 5.0\n",
"prediction: 5.0\n",
"input: 5 + 3\n",
"expected output : 8.0\n",
"prediction: 8.0\n",
"input: 2^3.171\n",
"expected output : 9.006708689094099\n",
"prediction: 9.006708689094099\n",
"input: 2 ^3.171 \n",
"expected output : 9.006708689094099\n",
"prediction: 9.006708689094099\n",
"input: two to the power of three point one hundred seventy one\n",
"expected output : 9.006708689094099\n",
"prediction: 9.006708689094099\n",
"input: five + three squared minus 1\n",
"expected output : 13.0\n",
"prediction: 13.0\n",
"input: 2097 times 27.31\n",
"expected output : 57269.07\n",
"prediction: 57269.07\n",
"input: two thousand ninety seven times twenty seven point thirty one\n",
"expected output : 57269.07\n",
"prediction: 57269.07\n",
"input: 209758 / 2714\n",
"expected output : 77.28739867354459\n",
"prediction: 77.28739867354459\n",
"input: 209758.857 divided by 2714.31\n",
"expected output : 77.27888745205964\n",
"prediction: 77.27888745205964\n"
]
}
],
"source": [
"for i, example in enumerate(dataset):\n",
" print(\"input: \", example[\"question\"])\n",
" print(\"expected output :\", example[\"answer\"])\n",
" print(\"prediction: \", numeric_output[i])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b9021ffd",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,374 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "984169ca",
"metadata": {},
"source": [
"# Question Answering Benchmarking: Paul Graham Essay\n",
"\n",
"Here we go over how to benchmark performance on a question answering task over a Paul Graham essay.\n",
"\n",
"It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3bd13ab7",
"metadata": {},
"outputs": [],
"source": [
"# Comment this out if you are NOT using tracing\n",
"import os\n",
"os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
]
},
{
"cell_type": "markdown",
"id": "8a16b75d",
"metadata": {},
"source": [
"## Loading the data\n",
"First, let's load the data."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5b2d5e98",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found cached dataset json (/Users/harrisonchase/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--question-answering-paul-graham-76e8f711e038d742/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9264acfe710b4faabf060f0fcf4f7308",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langchain.evaluation.loading import load_dataset\n",
"dataset = load_dataset(\"question-answering-paul-graham\")"
]
},
{
"cell_type": "markdown",
"id": "4ab6a716",
"metadata": {},
"source": [
"## Setting up a chain\n",
"Now we need to create some pipelines for doing question answering. Step one in that is creating an index over the data in question."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c18680b5",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader(\"../../modules/paul_graham_essay.txt\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7f0de2b3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.indexes import VectorstoreIndexCreator"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ef84ff99",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"vectorstore = VectorstoreIndexCreator().from_loaders([loader]).vectorstore"
]
},
{
"cell_type": "markdown",
"id": "f0b5d8f6",
"metadata": {},
"source": [
"Now we can create a question answering chain."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "8843cb0c",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "573719a0",
"metadata": {},
"outputs": [],
"source": [
"chain = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", retriever=vectorstore.as_retriever(), input_key=\"question\")"
]
},
{
"cell_type": "markdown",
"id": "53b5aa23",
"metadata": {},
"source": [
"## Make a prediction\n",
"\n",
"First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "3f81d951",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What were the two main things the author worked on before college?',\n",
" 'answer': 'The two main things the author worked on before college were writing and programming.',\n",
" 'result': ' Writing and programming.'}"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain(dataset[0])"
]
},
{
"cell_type": "markdown",
"id": "d0c16cd7",
"metadata": {},
"source": [
"## Make many predictions\n",
"Now we can make predictions"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "24b4c66e",
"metadata": {},
"outputs": [],
"source": [
"predictions = chain.apply(dataset)"
]
},
{
"cell_type": "markdown",
"id": "49d969fb",
"metadata": {},
"source": [
"## Evaluate performance\n",
"Now we can evaluate the predictions. The first thing we can do is look at them by eye."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "1d583f03",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What were the two main things the author worked on before college?',\n",
" 'answer': 'The two main things the author worked on before college were writing and programming.',\n",
" 'result': ' Writing and programming.'}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictions[0]"
]
},
{
"cell_type": "markdown",
"id": "4783344b",
"metadata": {},
"source": [
"Next, we can use a language model to score them programatically"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "d0a9341d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.evaluation.qa import QAEvalChain"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "1612dec1",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"eval_chain = QAEvalChain.from_llm(llm)\n",
"graded_outputs = eval_chain.evaluate(dataset, predictions, question_key=\"question\", prediction_key=\"result\")"
]
},
{
"cell_type": "markdown",
"id": "79587806",
"metadata": {},
"source": [
"We can add in the graded output to the `predictions` dict and then get a count of the grades."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "2a689df5",
"metadata": {},
"outputs": [],
"source": [
"for i, prediction in enumerate(predictions):\n",
" prediction['grade'] = graded_outputs[i]['text']"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "27b61215",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Counter({' CORRECT': 12, ' INCORRECT': 10})"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from collections import Counter\n",
"Counter([pred['grade'] for pred in predictions])"
]
},
{
"cell_type": "markdown",
"id": "12fe30f4",
"metadata": {},
"source": [
"We can also filter the datapoints to the incorrect examples and look at them."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "47c692a1",
"metadata": {},
"outputs": [],
"source": [
"incorrect = [pred for pred in predictions if pred['grade'] == \" INCORRECT\"]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "0ef976c1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What did the author write their dissertation on?',\n",
" 'answer': 'The author wrote their dissertation on applications of continuations.',\n",
" 'result': ' The author does not mention what their dissertation was on, so it is not known.',\n",
" 'grade': ' INCORRECT'}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"incorrect[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7710401a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,374 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "984169ca",
"metadata": {},
"source": [
"# Question Answering Benchmarking: State of the Union Address\n",
"\n",
"Here we go over how to benchmark performance on a question answering task over a state of the union address.\n",
"\n",
"It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "f127fb04",
"metadata": {},
"outputs": [],
"source": [
"# Comment this out if you are NOT using tracing\n",
"import os\n",
"os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
]
},
{
"cell_type": "markdown",
"id": "8a16b75d",
"metadata": {},
"source": [
"## Loading the data\n",
"First, let's load the data."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5b2d5e98",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found cached dataset json (/Users/harrisonchase/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--question-answering-state-of-the-union-a7e5a3b2db4f440d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langchain.evaluation.loading import load_dataset\n",
"dataset = load_dataset(\"question-answering-state-of-the-union\")"
]
},
{
"cell_type": "markdown",
"id": "4ab6a716",
"metadata": {},
"source": [
"## Setting up a chain\n",
"Now we need to create some pipelines for doing question answering. Step one in that is creating an index over the data in question."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c18680b5",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "7f0de2b3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.indexes import VectorstoreIndexCreator"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ef84ff99",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"vectorstore = VectorstoreIndexCreator().from_loaders([loader]).vectorstore"
]
},
{
"cell_type": "markdown",
"id": "f0b5d8f6",
"metadata": {},
"source": [
"Now we can create a question answering chain."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "8843cb0c",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "573719a0",
"metadata": {},
"outputs": [],
"source": [
"chain = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", retriever=vectorstore.as_retriever(), input_key=\"question\")"
]
},
{
"cell_type": "markdown",
"id": "37d669e9",
"metadata": {},
"source": [
"## Make a prediction\n",
"\n",
"First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "3089e409",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What is the purpose of the NATO Alliance?',\n",
" 'answer': 'The purpose of the NATO Alliance is to secure peace and stability in Europe after World War 2.',\n",
" 'result': ' The NATO Alliance was created to secure peace and stability in Europe after World War 2.'}"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain(dataset[0])"
]
},
{
"cell_type": "markdown",
"id": "d0c16cd7",
"metadata": {},
"source": [
"## Make many predictions\n",
"Now we can make predictions"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "24b4c66e",
"metadata": {},
"outputs": [],
"source": [
"predictions = chain.apply(dataset)"
]
},
{
"cell_type": "markdown",
"id": "49d969fb",
"metadata": {},
"source": [
"## Evaluate performance\n",
"Now we can evaluate the predictions. The first thing we can do is look at them by eye."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "1d583f03",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What is the purpose of the NATO Alliance?',\n",
" 'answer': 'The purpose of the NATO Alliance is to secure peace and stability in Europe after World War 2.',\n",
" 'result': ' The purpose of the NATO Alliance is to secure peace and stability in Europe after World War 2.'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictions[0]"
]
},
{
"cell_type": "markdown",
"id": "4783344b",
"metadata": {},
"source": [
"Next, we can use a language model to score them programatically"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "d0a9341d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.evaluation.qa import QAEvalChain"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "1612dec1",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"eval_chain = QAEvalChain.from_llm(llm)\n",
"graded_outputs = eval_chain.evaluate(dataset, predictions, question_key=\"question\", prediction_key=\"result\")"
]
},
{
"cell_type": "markdown",
"id": "79587806",
"metadata": {},
"source": [
"We can add in the graded output to the `predictions` dict and then get a count of the grades."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "2a689df5",
"metadata": {},
"outputs": [],
"source": [
"for i, prediction in enumerate(predictions):\n",
" prediction['grade'] = graded_outputs[i]['text']"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "27b61215",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Counter({' CORRECT': 7, ' INCORRECT': 4})"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from collections import Counter\n",
"Counter([pred['grade'] for pred in predictions])"
]
},
{
"cell_type": "markdown",
"id": "12fe30f4",
"metadata": {},
"source": [
"We can also filter the datapoints to the incorrect examples and look at them."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "47c692a1",
"metadata": {},
"outputs": [],
"source": [
"incorrect = [pred for pred in predictions if pred['grade'] == \" INCORRECT\"]"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "0ef976c1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What is the U.S. Department of Justice doing to combat the crimes of Russian oligarchs?',\n",
" 'answer': 'The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs.',\n",
" 'result': ' The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs and is naming a chief prosecutor for pandemic fraud.',\n",
" 'grade': ' INCORRECT'}"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"incorrect[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7710401a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,117 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ee2a3a21",
"metadata": {},
"source": [
"# QA Generation\n",
"This notebook shows how to use the `QAGenerationChain` to come up with question-answer pairs over a specific document.\n",
"This is important because often times you may not have data to evaluate your question-answer system over, so this is a cheap and lightweight way to generate it!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "33d3f0b4",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2029a29c",
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "87edb84c",
"metadata": {},
"outputs": [],
"source": [
"doc = loader.load()[0]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "04125b6d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import QAGenerationChain\n",
"chain = QAGenerationChain.from_llm(ChatOpenAI(temperature = 0))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4f1593e4",
"metadata": {},
"outputs": [],
"source": [
"qa = chain.run(doc.page_content)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "ee831f92",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What is the U.S. Department of Justice doing to combat the crimes of Russian oligarchs?',\n",
" 'answer': 'The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs.'}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"qa[1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7028754e",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -191,7 +191,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "782ae8c8",
"metadata": {},
@@ -316,7 +315,7 @@
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -330,7 +329,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7 (default, Sep 16 2021, 08:50:36) \n[Clang 10.0.0 ]"
"version": "3.9.1"
},
"vscode": {
"interpreter": {

View File

@@ -0,0 +1,423 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "984169ca",
"metadata": {},
"source": [
"# SQL Question Answering Benchmarking: Chinook\n",
"\n",
"Here we go over how to benchmark performance on a question answering task over a SQL database.\n",
"\n",
"It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "44874486",
"metadata": {},
"outputs": [],
"source": [
"# Comment this out if you are NOT using tracing\n",
"import os\n",
"os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
]
},
{
"cell_type": "markdown",
"id": "0f66405e",
"metadata": {},
"source": [
"## Loading the data\n",
"\n",
"First, let's load the data."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "0df1393f",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b220d07ee5d14909bc842b4545cdc0de",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading readme: 0%| | 0.00/21.0 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloading and preparing dataset json/LangChainDatasets--sql-qa-chinook to /Users/harrisonchase/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--sql-qa-chinook-7528565d2d992b47/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e89e3c8ef76f49889c4b39c624828c71",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading data files: 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a8421df6c26045e8978c7086cb418222",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading data: 0%| | 0.00/1.44k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d1fb6becc3324a85bf039a53caf30924",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Extracting data files: 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Generating train split: 0 examples [00:00, ? examples/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset json downloaded and prepared to /Users/harrisonchase/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--sql-qa-chinook-7528565d2d992b47/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9d68ad1b3e4a4bd79f92597aac4d3cc9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langchain.evaluation.loading import load_dataset\n",
"dataset = load_dataset(\"sql-qa-chinook\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "ab44d504",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'How many employees are there?', 'answer': '8'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset[0]"
]
},
{
"cell_type": "markdown",
"id": "8a16b75d",
"metadata": {},
"source": [
"## Setting up a chain\n",
"This uses the example Chinook database.\n",
"To set it up follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the `.db` file in a notebooks folder at the root of this repository.\n",
"\n",
"Note that here we load a simple chain. If you want to experiment with more complex chains, or an agent, just create the `chain` object in a different way."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5b2d5e98",
"metadata": {},
"outputs": [],
"source": [
"from langchain import OpenAI, SQLDatabase, SQLDatabaseChain"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "33cdcbfc",
"metadata": {},
"outputs": [],
"source": [
"db = SQLDatabase.from_uri(\"sqlite:///../../../notebooks/Chinook.db\")\n",
"llm = OpenAI(temperature=0)"
]
},
{
"cell_type": "markdown",
"id": "f0b5d8f6",
"metadata": {},
"source": [
"Now we can create a SQL database chain."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "8843cb0c",
"metadata": {},
"outputs": [],
"source": [
"chain = SQLDatabaseChain(llm=llm, database=db, input_key=\"question\")"
]
},
{
"cell_type": "markdown",
"id": "6c0062e7",
"metadata": {},
"source": [
"## Make a prediction\n",
"\n",
"First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "d28c5e7d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'How many employees are there?',\n",
" 'answer': '8',\n",
" 'result': ' There are 8 employees.'}"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain(dataset[0])"
]
},
{
"cell_type": "markdown",
"id": "d0c16cd7",
"metadata": {},
"source": [
"## Make many predictions\n",
"Now we can make predictions. Note that we add a try-except because this chain can sometimes error (if SQL is written incorrectly, etc)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "24b4c66e",
"metadata": {},
"outputs": [],
"source": [
"predictions = []\n",
"predicted_dataset = []\n",
"error_dataset = []\n",
"for data in dataset:\n",
" try:\n",
" predictions.append(chain(data))\n",
" predicted_dataset.append(data)\n",
" except:\n",
" error_dataset.append(data)"
]
},
{
"cell_type": "markdown",
"id": "4783344b",
"metadata": {},
"source": [
"## Evaluate performance\n",
"Now we can evaluate the predictions. We can use a language model to score them programatically"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "d0a9341d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.evaluation.qa import QAEvalChain"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "1612dec1",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"eval_chain = QAEvalChain.from_llm(llm)\n",
"graded_outputs = eval_chain.evaluate(predicted_dataset, predictions, question_key=\"question\", prediction_key=\"result\")"
]
},
{
"cell_type": "markdown",
"id": "79587806",
"metadata": {},
"source": [
"We can add in the graded output to the `predictions` dict and then get a count of the grades."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "2a689df5",
"metadata": {},
"outputs": [],
"source": [
"for i, prediction in enumerate(predictions):\n",
" prediction['grade'] = graded_outputs[i]['text']"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "27b61215",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Counter({' CORRECT': 3, ' INCORRECT': 4})"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from collections import Counter\n",
"Counter([pred['grade'] for pred in predictions])"
]
},
{
"cell_type": "markdown",
"id": "12fe30f4",
"metadata": {},
"source": [
"We can also filter the datapoints to the incorrect examples and look at them."
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "47c692a1",
"metadata": {},
"outputs": [],
"source": [
"incorrect = [pred for pred in predictions if pred['grade'] == \" INCORRECT\"]"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "0ef976c1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'How many employees are also customers?',\n",
" 'answer': 'None',\n",
" 'result': ' 59 employees are also customers.',\n",
" 'grade': ' INCORRECT'}"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"incorrect[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7710401a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,20 @@
# Extraction
Most APIs and databases still deal with structured information.
Therefore, in order to better work with those, it can be useful to extract structured information from text.
Examples of this include:
- Extracting a structured row to insert into a database from a sentence
- Extracting multiple rows to insert into a database from a long document
- Extracting the correct API parameters from a user query
This work is extremely related to [output parsing](../modules/prompts/examples/output_parsers.ipynb).
Output parsers are responsible for instructing the LLM to respond in a specific format.
In this case, the output parsers specify the format of the data you would like to extract from the document.
Then, in addition to the output format instructions, the prompt should also contain the data you would like to extract information from.
While normal output parsers are good enough for basic structuring of response data,
when doing extraction you often want to extract more complicated or nested structures.
For a deep dive on extraction, we recommend checking out [`kor`](https://eyurtsev.github.io/kor/),
a library that uses the existing LangChain chain and OutputParser abstractions
but deep dives on allowing extraction of more complicated schemas.

31
docs/use_cases/tabular.md Normal file
View File

@@ -0,0 +1,31 @@
# Querying Tabular Data
Lots of data and information is stored in tabular data, whether it be csvs, excel sheets, or SQL tables.
This page covers all resources available in LangChain for working with data in this format.
## Document Loading
If you have text data stored in a tabular format, you may want to load the data into a Document and then index it as you would
other text/unstructured data. For this, you should use a document loader like the [CSVLoader](../modules/document_loaders/examples/csv.ipynb)
and then you should [create an index](../modules/indexes.rst) over that data, and [query it that way](../modules/indexes/chain_examples/vector_db_qa.ipynb).
## Querying
If you have more numeric tabular data, or have a large amount of data and don't want to index it, you should get started
by looking at various chains and agents we have for dealing with this data.
### Chains
If you are just getting started, and you have relatively small/simple tabular data, you should get started with chains.
Chains are a sequence of predetermined steps, so they are good to get started with as they give you more control and let you
understand what is happening better.
- [SQL Database Chain](../modules/chains/examples/sqlite.ipynb)
### Agents
Agents are more complex, and involve multiple queries to the LLM to understand what to do.
The downside of agents are that you have less control. The upside is that they are more powerful,
which allows you to use them on larger databases and more complex schemas.
- [SQL Agent](../modules/agents/agent_toolkits/sql_database.ipynb)
- [Pandas Agent](../modules/agents/agent_toolkits/pandas.ipynb)
- [CSV Agent](../modules/agents/agent_toolkits/csv.ipynb)

Some files were not shown because too many files have changed in this diff Show More