mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-11 19:49:54 +00:00
Compare commits
25 Commits
vwp/simila
...
vwp/make_n
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
cddfe05073 | ||
|
|
e5611565b7 | ||
|
|
9d1bd18596 | ||
|
|
a435a436c1 | ||
|
|
d6cd0deaef | ||
|
|
1db266b20d | ||
|
|
3f9900a864 | ||
|
|
3ca1a387c2 | ||
|
|
f92ccf70fd | ||
|
|
f3d178f600 | ||
|
|
dd2a151543 | ||
|
|
d6664af0ee | ||
|
|
efe0d39c6a | ||
|
|
b4c196f785 | ||
|
|
f1070de038 | ||
|
|
ef72a7cf26 | ||
|
|
a980095efc | ||
|
|
74848aafea | ||
|
|
b24472eae3 | ||
|
|
e53995836a | ||
|
|
e494b0a09f | ||
|
|
da462d9dd4 | ||
|
|
24e4ae95ba | ||
|
|
8392ca602c | ||
|
|
fcb3a64799 |
@@ -23,11 +23,15 @@ its dependencies running locally.
|
||||
If you want to get up and running with less set up, you can
|
||||
simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or
|
||||
`UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API.
|
||||
Note that currently (as of 1 May 2023) the Unstructured API is open, but it will soon require
|
||||
an API. The [Unstructured documentation page](https://unstructured-io.github.io/) will have
|
||||
instructions on how to generate an API key once they're available. Check out the instructions
|
||||
[here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image)
|
||||
if you'd like to self-host the Unstructured API or run it locally.
|
||||
|
||||
|
||||
The Unstructured API requires API keys to make requests.
|
||||
You can generate a free API key [here](https://www.unstructured.io/api-key) and start using it today!
|
||||
Checkout the README [here](https://github.com/Unstructured-IO/unstructured-api) here to get started making API calls.
|
||||
We'd love to hear your feedback, let us know how it goes in our [community slack](https://join.slack.com/t/unstructuredw-kbe4326/shared_invite/zt-1x7cgo0pg-PTptXWylzPQF9xZolzCnwQ).
|
||||
And stay tuned for improvements to both quality and performance!
|
||||
Check out the instructions
|
||||
[here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if you'd like to self-host the Unstructured API or run it locally.
|
||||
|
||||
## Wrappers
|
||||
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
"source": [
|
||||
"# Zapier Natural Language Actions API\n",
|
||||
"\\\n",
|
||||
"Full docs here: https://nla.zapier.com/api/v1/docs\n",
|
||||
"Full docs here: https://nla.zapier.com/start/\n",
|
||||
"\n",
|
||||
"**Zapier Natural Language Actions** gives you access to the 5k+ apps, 20k+ actions on Zapier's platform through a natural language API interface.\n",
|
||||
"\n",
|
||||
@@ -21,7 +21,7 @@
|
||||
"\n",
|
||||
"2. User-facing (Oauth): for production scenarios where you are deploying an end-user facing application and LangChain needs access to end-user's exposed actions and connected accounts on Zapier.com\n",
|
||||
"\n",
|
||||
"This quick start will focus on the server-side use case for brevity. Review [full docs](https://nla.zapier.com/api/v1/docs) or reach out to nla@zapier.com for user-facing oauth developer support.\n",
|
||||
"This quick start will focus on the server-side use case for brevity. Review [full docs](https://nla.zapier.com/start/) for user-facing oauth developer support.\n",
|
||||
"\n",
|
||||
"This example goes over how to use the Zapier integration with a `SimpleSequentialChain`, then an `Agent`.\n",
|
||||
"In code, below:"
|
||||
@@ -39,7 +39,7 @@
|
||||
"# get from https://platform.openai.com/\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = os.environ.get(\"OPENAI_API_KEY\", \"\")\n",
|
||||
"\n",
|
||||
"# get from https://nla.zapier.com/demo/provider/debug (under User Information, after logging in):\n",
|
||||
"# get from https://nla.zapier.com/docs/authentication/ after logging in):\n",
|
||||
"os.environ[\"ZAPIER_NLA_API_KEY\"] = os.environ.get(\"ZAPIER_NLA_API_KEY\", \"\")"
|
||||
]
|
||||
},
|
||||
|
||||
73
docs/extras/modules/callbacks/integrations/streamlit.md
Normal file
73
docs/extras/modules/callbacks/integrations/streamlit.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Streamlit
|
||||
|
||||
> **[Streamlit](https://streamlit.io/) is a faster way to build and share data apps.**
|
||||
> Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No front‑end experience required.
|
||||
> See more examples at [streamlit.io/generative-ai](https://streamlit.io/generative-ai).
|
||||
|
||||
[](https://codespaces.new/langchain-ai/streamlit-agent?quickstart=1)
|
||||
|
||||
In this guide we will demonstrate how to use `StreamlitCallbackHandler` to display the thoughts and actions of an agent in an
|
||||
interactive Streamlit app. Try it out with the running app below using the [MRKL agent](/docs/modules/agents/how_to/mrkl/):
|
||||
|
||||
<iframe loading="lazy" src="https://mrkl-minimal.streamlit.app/?embed=true&embed_options=light_theme"
|
||||
style={{ width: 100 + '%', border: 'none', marginBottom: 1 + 'rem', height: 600 }}
|
||||
allow="camera;clipboard-read;clipboard-write;"
|
||||
></iframe>
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install langchain streamlit
|
||||
```
|
||||
|
||||
You can run `streamlit hello` to load a sample app and validate your install succeeded. See full instructions in Streamlit's
|
||||
[Getting started documentation](https://docs.streamlit.io/library/get-started).
|
||||
|
||||
## Display thoughts and actions
|
||||
|
||||
To create a `StreamlitCallbackHandler`, you just need to provide a parent container to render the output.
|
||||
|
||||
```python
|
||||
from langchain.callbacks import StreamlitCallbackHandler
|
||||
import streamlit as st
|
||||
|
||||
st_callback = StreamlitCallbackHandler(st.container())
|
||||
```
|
||||
|
||||
Additional keyword arguments to customize the display behavior are described in the
|
||||
[API reference](https://api.python.langchain.com/en/latest/modules/callbacks.html#langchain.callbacks.StreamlitCallbackHandler).
|
||||
|
||||
### Scenario 1: Using an Agent with Tools
|
||||
|
||||
The primary supported use case today is visualizing the actions of an Agent with Tools (or Agent Executor). You can create an
|
||||
agent in your Streamlit app and simply pass the `StreamlitCallbackHandler` to `agent.run()` in order to visualize the
|
||||
thoughts and actions live in your app.
|
||||
|
||||
```python
|
||||
from langchain.llms import OpenAI
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
from langchain.callbacks import StreamlitCallbackHandler
|
||||
import streamlit as st
|
||||
|
||||
llm = OpenAI(temperature=0, streaming=True)
|
||||
tools = load_tools(["ddg-search"])
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
|
||||
if prompt := st.chat_input():
|
||||
st.chat_message("user").write(prompt)
|
||||
with st.chat_message("assistant"):
|
||||
st_callback = StreamlitCallbackHandler(st.container())
|
||||
response = agent.run(prompt, callbacks=[st_callback])
|
||||
st.write(response)
|
||||
```
|
||||
|
||||
**Note:** You will need to set `OPENAI_API_KEY` for the above app code to run successfully.
|
||||
The easiest way to do this is via [Streamlit secrets.toml](https://docs.streamlit.io/library/advanced-features/secrets-management),
|
||||
or any other local ENV management tool.
|
||||
|
||||
### Additional scenarios
|
||||
|
||||
Currently `StreamlitCallbackHandler` is geared towards use with a LangChain Agent Executor. Support for additional agent types,
|
||||
use directly with Chains, etc will be added in the future.
|
||||
@@ -0,0 +1,27 @@
|
||||
* Example Docs
|
||||
|
||||
The sample docs directory contains the following files:
|
||||
|
||||
- ~example-10k.html~ - A 10-K SEC filing in HTML format
|
||||
- ~layout-parser-paper.pdf~ - A PDF copy of the layout parser paper
|
||||
- ~factbook.xml~ / ~factbook.xsl~ - Example XML/XLS files that you
|
||||
can use to test stylesheets
|
||||
|
||||
These documents can be used to test out the parsers in the library. In
|
||||
addition, here are instructions for pulling in some sample docs that are
|
||||
too big to store in the repo.
|
||||
|
||||
** XBRL 10-K
|
||||
|
||||
You can get an example 10-K in inline XBRL format using the following
|
||||
~curl~. Note, you need to have the user agent set in the header or the
|
||||
SEC site will reject your request.
|
||||
|
||||
#+BEGIN_SRC bash
|
||||
|
||||
curl -O \
|
||||
-A '${organization} ${email}'
|
||||
https://www.sec.gov/Archives/edgar/data/311094/000117184321001344/0001171843-21-001344.txt
|
||||
#+END_SRC
|
||||
|
||||
You can parse this document using the HTML parser.
|
||||
@@ -0,0 +1,17 @@
|
||||
class MyClass {
|
||||
constructor(name) {
|
||||
this.name = name;
|
||||
}
|
||||
|
||||
greet() {
|
||||
console.log(`Hello, ${this.name}!`);
|
||||
}
|
||||
}
|
||||
|
||||
function main() {
|
||||
const name = prompt("Enter your name:");
|
||||
const obj = new MyClass(name);
|
||||
obj.greet();
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -0,0 +1,16 @@
|
||||
class MyClass:
|
||||
def __init__(self, name):
|
||||
self.name = name
|
||||
|
||||
def greet(self):
|
||||
print(f"Hello, {self.name}!")
|
||||
|
||||
|
||||
def main():
|
||||
name = input("Enter your name: ")
|
||||
obj = MyClass(name)
|
||||
obj.greet()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,103 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "33205b12",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# LarkSuite (FeiShu)\n",
|
||||
"\n",
|
||||
">[LarkSuite](https://www.larksuite.com/) is an enterprise collaboration platform developed by ByteDance.\n",
|
||||
"\n",
|
||||
"This notebook covers how to load data from the `LarkSuite` REST API into a format that can be ingested into LangChain, along with example usage for text summarization.\n",
|
||||
"\n",
|
||||
"The LarkSuite API requires an access token (tenant_access_token or user_access_token), checkout [LarkSuite open platform document](https://open.larksuite.com/document) for API details."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "90b69c94",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-19T10:05:03.645161Z",
|
||||
"start_time": "2023-06-19T10:04:49.541968Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from getpass import getpass\n",
|
||||
"from langchain.document_loaders.larksuite import LarkSuiteDocLoader\n",
|
||||
"\n",
|
||||
"DOMAIN = input(\"larksuite domain\")\n",
|
||||
"ACCESS_TOKEN = getpass(\"larksuite tenant_access_token or user_access_token\")\n",
|
||||
"DOCUMENT_ID = input(\"larksuite document id\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "13deb0f5",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-19T10:05:36.016495Z",
|
||||
"start_time": "2023-06-19T10:05:35.360884Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Document(page_content='Test Doc\\nThis is a Test Doc\\n\\n1\\n2\\n3\\n\\n', metadata={'document_id': 'V76kdbd2HoBbYJxdiNNccajunPf', 'revision_id': 11, 'title': 'Test Doc'})]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from pprint import pprint\n",
|
||||
"\n",
|
||||
"larksuite_loader = LarkSuiteDocLoader(DOMAIN, ACCESS_TOKEN, DOCUMENT_ID)\n",
|
||||
"docs = larksuite_loader.load()\n",
|
||||
"\n",
|
||||
"pprint(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9ccc1e2f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# see https://python.langchain.com/docs/use_cases/summarization for more details\n",
|
||||
"from langchain.chains.summarize import load_summarize_chain\n",
|
||||
"\n",
|
||||
"chain = load_summarize_chain(llm, chain_type=\"map_reduce\")\n",
|
||||
"chain.run(docs)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,88 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Org-mode\n",
|
||||
"\n",
|
||||
">A [Org Mode document](https://en.wikipedia.org/wiki/Org-mode) is a document editing, formatting, and organizing mode, designed for notes, planning, and authoring within the free software text editor Emacs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## `UnstructuredOrgModeLoader`\n",
|
||||
"\n",
|
||||
"You can load data from Org-mode files with `UnstructuredOrgModeLoader` using the following workflow."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import UnstructuredOrgModeLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredOrgModeLoader(\n",
|
||||
" file_path=\"example_data/README.org\", mode=\"elements\"\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='Example Docs' metadata={'source': 'example_data/README.org', 'filename': 'README.org', 'file_directory': 'example_data', 'filetype': 'text/org', 'page_number': 1, 'category': 'Title'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -0,0 +1,419 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "213a38a2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Source Code\n",
|
||||
"\n",
|
||||
"This notebook covers how to load source code files using a special approach with language parsing: each top-level function and class in the code is loaded into separate documents. Any remaining code top-level code outside the already loaded functions and classes will be loaded into a seperate document.\n",
|
||||
"\n",
|
||||
"This approach can potentially improve the accuracy of QA models over source code. Currently, the supported languages for code parsing are Python and JavaScript. The language used for parsing can be configured, along with the minimum number of lines required to activate the splitting based on syntax."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7fa47b2e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install esprima"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "beb55c2f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import warnings\n",
|
||||
"warnings.filterwarnings('ignore')\n",
|
||||
"from pprint import pprint\n",
|
||||
"from langchain.text_splitter import Language\n",
|
||||
"from langchain.document_loaders.generic import GenericLoader\n",
|
||||
"from langchain.document_loaders.parsers import LanguageParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "64056e07",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = GenericLoader.from_filesystem(\n",
|
||||
" \"./example_data/source_code\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".py\", \".js\"],\n",
|
||||
" parser=LanguageParser()\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8af79bd7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"6"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "85edf3fc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'content_type': 'functions_classes',\n",
|
||||
" 'language': <Language.PYTHON: 'python'>,\n",
|
||||
" 'source': 'example_data/source_code/example.py'}\n",
|
||||
"{'content_type': 'functions_classes',\n",
|
||||
" 'language': <Language.PYTHON: 'python'>,\n",
|
||||
" 'source': 'example_data/source_code/example.py'}\n",
|
||||
"{'content_type': 'simplified_code',\n",
|
||||
" 'language': <Language.PYTHON: 'python'>,\n",
|
||||
" 'source': 'example_data/source_code/example.py'}\n",
|
||||
"{'content_type': 'functions_classes',\n",
|
||||
" 'language': <Language.JS: 'js'>,\n",
|
||||
" 'source': 'example_data/source_code/example.js'}\n",
|
||||
"{'content_type': 'functions_classes',\n",
|
||||
" 'language': <Language.JS: 'js'>,\n",
|
||||
" 'source': 'example_data/source_code/example.js'}\n",
|
||||
"{'content_type': 'simplified_code',\n",
|
||||
" 'language': <Language.JS: 'js'>,\n",
|
||||
" 'source': 'example_data/source_code/example.js'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for document in docs:\n",
|
||||
" pprint(document.metadata)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "f44e3e37",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"class MyClass:\n",
|
||||
" def __init__(self, name):\n",
|
||||
" self.name = name\n",
|
||||
"\n",
|
||||
" def greet(self):\n",
|
||||
" print(f\"Hello, {self.name}!\")\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"def main():\n",
|
||||
" name = input(\"Enter your name: \")\n",
|
||||
" obj = MyClass(name)\n",
|
||||
" obj.greet()\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"# Code for: class MyClass:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Code for: def main():\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"if __name__ == \"__main__\":\n",
|
||||
" main()\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"class MyClass {\n",
|
||||
" constructor(name) {\n",
|
||||
" this.name = name;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" greet() {\n",
|
||||
" console.log(`Hello, ${this.name}!`);\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"function main() {\n",
|
||||
" const name = prompt(\"Enter your name:\");\n",
|
||||
" const obj = new MyClass(name);\n",
|
||||
" obj.greet();\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"// Code for: class MyClass {\n",
|
||||
"\n",
|
||||
"// Code for: function main() {\n",
|
||||
"\n",
|
||||
"main();\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"\\n\\n--8<--\\n\\n\".join([document.page_content for document in docs]))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "69aad0ed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The parser can be disabled for small files. \n",
|
||||
"\n",
|
||||
"The parameter `parser_threshold` indicates the minimum number of lines that the source code file must have to be segmented using the parser."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "ae024794",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = GenericLoader.from_filesystem(\n",
|
||||
" \"./example_data/source_code\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".py\"],\n",
|
||||
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000)\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "5d3b372a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"1"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "89e546ad",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"class MyClass:\n",
|
||||
" def __init__(self, name):\n",
|
||||
" self.name = name\n",
|
||||
"\n",
|
||||
" def greet(self):\n",
|
||||
" print(f\"Hello, {self.name}!\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def main():\n",
|
||||
" name = input(\"Enter your name: \")\n",
|
||||
" obj = MyClass(name)\n",
|
||||
" obj.greet()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"if __name__ == \"__main__\":\n",
|
||||
" main()\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c9c71e61",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Splitting\n",
|
||||
"\n",
|
||||
"Additional splitting could be needed for those functions, classes, or scripts that are too big."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "adbaa79f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = GenericLoader.from_filesystem(\n",
|
||||
" \"./example_data/source_code\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".js\"],\n",
|
||||
" parser=LanguageParser(language=Language.JS)\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "c44c0d3f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import (\n",
|
||||
" RecursiveCharacterTextSplitter,\n",
|
||||
" Language,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "b1e0053d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"js_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.JS, chunk_size=60, chunk_overlap=0\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "7dbe6188",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"result = js_splitter.split_documents(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "8a80d089",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"7"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "000a6011",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"class MyClass {\n",
|
||||
" constructor(name) {\n",
|
||||
" this.name = name;\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"greet() {\n",
|
||||
" console.log(`Hello, ${this.name}!`);\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"function main() {\n",
|
||||
" const name = prompt(\"Enter your name:\");\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"const obj = new MyClass(name);\n",
|
||||
" obj.greet();\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"// Code for: class MyClass {\n",
|
||||
"\n",
|
||||
"// Code for: function main() {\n",
|
||||
"\n",
|
||||
"--8<--\n",
|
||||
"\n",
|
||||
"main();\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"\\n\\n--8<--\\n\\n\".join([document.page_content for document in result]))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,116 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a634365e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tencent COS Directory\n",
|
||||
"\n",
|
||||
"This covers how to load document objects from a `Tencent COS Directory`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "85e97267",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#! pip install cos-python-sdk-v5"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "2f0cd6a5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TencentCOSDirectoryLoader\n",
|
||||
"from qcloud_cos import CosConfig"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "321cc7f1",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"conf = CosConfig(\n",
|
||||
" Region=\"your cos region\",\n",
|
||||
" SecretId=\"your cos secret_id\",\n",
|
||||
" SecretKey=\"your cos secret_key\",\n",
|
||||
" )\n",
|
||||
"loader = TencentCOSDirectoryLoader(conf=conf, bucket=\"you_cos_bucket\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4c50d2c7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0690c40a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Specifying a prefix\n",
|
||||
"You can also specify a prefix for more finegrained control over what files to load."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "72d44781",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TencentCOSDirectoryLoader(conf=conf, bucket=\"you_cos_bucket\", prefix=\"fake\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2d3c32db",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,91 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a634365e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tencent COS File\n",
|
||||
"\n",
|
||||
"This covers how to load document object from a `Tencent COS File`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "85e97267",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#! pip install cos-python-sdk-v5"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "2f0cd6a5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TencentCOSFileLoader\n",
|
||||
"from qcloud_cos import CosConfig"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "321cc7f1",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"conf = CosConfig(\n",
|
||||
" Region=\"your cos region\",\n",
|
||||
" SecretId=\"your cos secret_id\",\n",
|
||||
" SecretKey=\"your cos secret_key\",\n",
|
||||
" )\n",
|
||||
"loader = TencentCOSFileLoader(conf=conf, bucket=\"you_cos_bucket\", key=\"fake.docx\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4c50d2c7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0690c40a",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -226,7 +226,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "8de9ef16",
|
||||
"metadata": {},
|
||||
@@ -303,7 +302,7 @@
|
||||
"source": [
|
||||
"## Unstructured API\n",
|
||||
"\n",
|
||||
"If you want to get up and running with less set up, you can simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or `UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API. Note that currently (as of 11 May 2023) the Unstructured API is open, but it will soon require an API. The [Unstructured documentation](https://unstructured-io.github.io/) page will have instructions on how to generate an API key once they’re available. Check out the instructions [here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if you’d like to self-host the Unstructured API or run it locally."
|
||||
"If you want to get up and running with less set up, you can simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or `UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API. You can generate a free Unstructured API key [here](https://www.unstructured.io/api-key/). The [Unstructured documentation](https://unstructured-io.github.io/) page will have instructions on how to generate an API key once they’re available. Check out the instructions [here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if you’d like to self-host the Unstructured API or run it locally."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -224,13 +224,33 @@
|
||||
"docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Using proxies\n",
|
||||
"\n",
|
||||
"Sometimes you might need to use proxies to get around IP blocks. You can pass in a dictionary of proxies to the loader (and `requests` underneath) to use them."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1dd8ab23",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
"source": [
|
||||
"loader = WebBaseLoader(\n",
|
||||
" \"https://www.walmart.com/search?q=parrots\", proxies={\n",
|
||||
" \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n",
|
||||
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n",
|
||||
" }\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -0,0 +1,214 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8cc82b48",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# MultiQueryRetriever\n",
|
||||
"\n",
|
||||
"Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on \"distance\". But, retrieval may produce difference results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.\n",
|
||||
"\n",
|
||||
"The `MultiQueryRetriever` automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the `MultiQueryRetriever` might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "c2f3f5f2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Build a sample vectorDB\n",
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.document_loaders import PyPDFLoader\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"\n",
|
||||
"# Load PDF\n",
|
||||
"path=\"path-to-files\"\n",
|
||||
"loaders = [\n",
|
||||
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture01.pdf\"),\n",
|
||||
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture02.pdf\"),\n",
|
||||
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture03.pdf\")\n",
|
||||
"]\n",
|
||||
"docs = []\n",
|
||||
"for loader in loaders:\n",
|
||||
" docs.extend(loader.load())\n",
|
||||
" \n",
|
||||
"# Split\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500,chunk_overlap = 150)\n",
|
||||
"splits = text_splitter.split_documents(docs)\n",
|
||||
"\n",
|
||||
"# VectorDB\n",
|
||||
"embedding = OpenAIEmbeddings()\n",
|
||||
"vectordb = Chroma.from_documents(documents=splits,embedding=embedding)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cca8f56c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`Simple usage`\n",
|
||||
"\n",
|
||||
"Specify the LLM to use for query generation, and the retriver will do the rest."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "edbca101",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.retrievers.multi_query import MultiQueryRetriever\n",
|
||||
"question=\"What does the course say about regression?\"\n",
|
||||
"num_queries=3\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectordb.as_retriever(),llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "e5203612",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"INFO:root:Generated queries: [\"1. What is the course's perspective on regression?\", '2. How does the course discuss regression?', '3. What information does the course provide about regression?']\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"6"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"unique_docs = retriever_from_llm.get_relevant_documents(question=\"What does the course say about regression?\")\n",
|
||||
"len(unique_docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c54a282f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`Supplying your own prompt`\n",
|
||||
"\n",
|
||||
"You can also supply a prompt along with an output parser to split the results into a list of queries."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "d9afb0ca",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"from langchain import LLMChain\n",
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.output_parsers import PydanticOutputParser\n",
|
||||
"\n",
|
||||
"# Output parser will split the LLM result into a list of queries\n",
|
||||
"class LineList(BaseModel):\n",
|
||||
" # \"lines\" is the key (attribute name) of the parsed output\n",
|
||||
" lines: List[str] = Field(description=\"Lines of text\")\n",
|
||||
"\n",
|
||||
"class LineListOutputParser(PydanticOutputParser):\n",
|
||||
" def __init__(self) -> None:\n",
|
||||
" super().__init__(pydantic_object=LineList)\n",
|
||||
" def parse(self, text: str) -> LineList:\n",
|
||||
" lines = text.strip().split(\"\\n\")\n",
|
||||
" return LineList(lines=lines)\n",
|
||||
"\n",
|
||||
"output_parser = LineListOutputParser()\n",
|
||||
" \n",
|
||||
"QUERY_PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"question\"],\n",
|
||||
" template=\"\"\"You are an AI language model assistant. Your task is to generate five \n",
|
||||
" different versions of the given user question to retrieve relevant documents from a vector \n",
|
||||
" database. By generating multiple perspectives on the user question, your goal is to help\n",
|
||||
" the user overcome some of the limitations of the distance-based similarity search. \n",
|
||||
" Provide these alternative questions seperated by newlines.\n",
|
||||
" Original question: {question}\"\"\",\n",
|
||||
")\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"\n",
|
||||
"# Chain\n",
|
||||
"llm_chain = LLMChain(llm=llm,prompt=QUERY_PROMPT,output_parser=output_parser)\n",
|
||||
" \n",
|
||||
"# Other inputs\n",
|
||||
"question=\"What does the course say about regression?\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "6660d7ee",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"INFO:root:Generated queries: [\"1. What is the course's perspective on regression?\", '2. Can you provide information on regression as discussed in the course?', '3. How does the course cover the topic of regression?', \"4. What are the course's teachings on regression?\", '5. In relation to the course, what is mentioned about regression?']\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"8"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Run\n",
|
||||
"retriever = MultiQueryRetriever(retriever=vectordb.as_retriever(), \n",
|
||||
" llm_chain=llm_chain,\n",
|
||||
" parser_key=\"lines\") # \"lines\" is the key (attribute name) of the parsed output\n",
|
||||
"\n",
|
||||
"# Results\n",
|
||||
"unique_docs = retriever.get_relevant_documents(question=\"What does the course say about regression?\")\n",
|
||||
"len(unique_docs)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -9,7 +9,7 @@ If you are just getting started, and you have relatively simple apis, you should
|
||||
Chains are a sequence of predetermined steps, so they are good to get started with as they give you more control and let you
|
||||
understand what is happening better.
|
||||
|
||||
- [API Chain](/docs/modules/chains/how_to/api.html)
|
||||
- [API Chain](/docs/modules/chains/popular/api.html)
|
||||
|
||||
## Agents
|
||||
|
||||
|
||||
@@ -29,6 +29,23 @@ class ZapierToolkit(BaseToolkit):
|
||||
]
|
||||
return cls(tools=tools)
|
||||
|
||||
@classmethod
|
||||
async def async_from_zapier_nla_wrapper(
|
||||
cls, zapier_nla_wrapper: ZapierNLAWrapper
|
||||
) -> "ZapierToolkit":
|
||||
"""Create a toolkit from a ZapierNLAWrapper."""
|
||||
actions = await zapier_nla_wrapper.alist()
|
||||
tools = [
|
||||
ZapierNLARunAction(
|
||||
action_id=action["id"],
|
||||
zapier_description=action["description"],
|
||||
params_schema=action["params"],
|
||||
api_wrapper=zapier_nla_wrapper,
|
||||
)
|
||||
for action in actions
|
||||
]
|
||||
return cls(tools=tools)
|
||||
|
||||
def get_tools(self) -> List[BaseTool]:
|
||||
"""Get the tools in the toolkit."""
|
||||
return self.tools
|
||||
|
||||
@@ -96,7 +96,7 @@ def get_openai_token_cost_for_model(
|
||||
f"Unknown model: {model_name}. Please provide a valid OpenAI model name."
|
||||
"Known models are: " + ", ".join(MODEL_COST_PER_1K_TOKENS.keys())
|
||||
)
|
||||
return MODEL_COST_PER_1K_TOKENS[model_name] * num_tokens / 1000
|
||||
return MODEL_COST_PER_1K_TOKENS[model_name] * (num_tokens / 1000)
|
||||
|
||||
|
||||
class OpenAICallbackHandler(BaseCallbackHandler):
|
||||
|
||||
88
langchain/callbacks/streaming_aiter_final_only.py
Normal file
88
langchain/callbacks/streaming_aiter_final_only.py
Normal file
@@ -0,0 +1,88 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from langchain.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
|
||||
from langchain.schema import LLMResult
|
||||
|
||||
DEFAULT_ANSWER_PREFIX_TOKENS = ["Final", "Answer", ":"]
|
||||
|
||||
|
||||
class AsyncFinalIteratorCallbackHandler(AsyncIteratorCallbackHandler):
|
||||
"""Callback handler that returns an async iterator.
|
||||
Only the final output of the agent will be iterated.
|
||||
"""
|
||||
|
||||
def append_to_last_tokens(self, token: str) -> None:
|
||||
self.last_tokens.append(token)
|
||||
self.last_tokens_stripped.append(token.strip())
|
||||
if len(self.last_tokens) > len(self.answer_prefix_tokens):
|
||||
self.last_tokens.pop(0)
|
||||
self.last_tokens_stripped.pop(0)
|
||||
|
||||
def check_if_answer_reached(self) -> bool:
|
||||
if self.strip_tokens:
|
||||
return self.last_tokens_stripped == self.answer_prefix_tokens_stripped
|
||||
else:
|
||||
return self.last_tokens == self.answer_prefix_tokens
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
answer_prefix_tokens: Optional[List[str]] = None,
|
||||
strip_tokens: bool = True,
|
||||
stream_prefix: bool = False,
|
||||
) -> None:
|
||||
"""Instantiate AsyncFinalIteratorCallbackHandler.
|
||||
|
||||
Args:
|
||||
answer_prefix_tokens: Token sequence that prefixes the answer.
|
||||
Default is ["Final", "Answer", ":"]
|
||||
strip_tokens: Ignore white spaces and new lines when comparing
|
||||
answer_prefix_tokens to last tokens? (to determine if answer has been
|
||||
reached)
|
||||
stream_prefix: Should answer prefix itself also be streamed?
|
||||
"""
|
||||
super().__init__()
|
||||
if answer_prefix_tokens is None:
|
||||
self.answer_prefix_tokens = DEFAULT_ANSWER_PREFIX_TOKENS
|
||||
else:
|
||||
self.answer_prefix_tokens = answer_prefix_tokens
|
||||
if strip_tokens:
|
||||
self.answer_prefix_tokens_stripped = [
|
||||
token.strip() for token in self.answer_prefix_tokens
|
||||
]
|
||||
else:
|
||||
self.answer_prefix_tokens_stripped = self.answer_prefix_tokens
|
||||
self.last_tokens = [""] * len(self.answer_prefix_tokens)
|
||||
self.last_tokens_stripped = [""] * len(self.answer_prefix_tokens)
|
||||
self.strip_tokens = strip_tokens
|
||||
self.stream_prefix = stream_prefix
|
||||
self.answer_reached = False
|
||||
|
||||
async def on_llm_start(
|
||||
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
|
||||
) -> None:
|
||||
# If two calls are made in a row, this resets the state
|
||||
self.done.clear()
|
||||
self.answer_reached = False
|
||||
|
||||
async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
|
||||
if self.answer_reached:
|
||||
self.done.set()
|
||||
|
||||
async def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
|
||||
# Remember the last n tokens, where n = len(answer_prefix_tokens)
|
||||
self.append_to_last_tokens(token)
|
||||
|
||||
# Check if the last n tokens match the answer_prefix_tokens list ...
|
||||
if self.check_if_answer_reached():
|
||||
self.answer_reached = True
|
||||
if self.stream_prefix:
|
||||
for t in self.last_tokens:
|
||||
self.queue.put_nowait(t)
|
||||
return
|
||||
|
||||
# If yes, then put tokens from now on
|
||||
if self.answer_reached:
|
||||
self.queue.put_nowait(token)
|
||||
@@ -5,6 +5,7 @@ from uuid import UUID
|
||||
|
||||
from langchainplus_sdk import LangChainPlusClient, RunEvaluator
|
||||
|
||||
from langchain.callbacks.manager import tracing_v2_enabled
|
||||
from langchain.callbacks.tracers.base import BaseTracer
|
||||
from langchain.callbacks.tracers.schemas import Run
|
||||
|
||||
@@ -47,6 +48,7 @@ class EvaluatorCallbackHandler(BaseTracer):
|
||||
max_workers: Optional[int] = None,
|
||||
client: Optional[LangChainPlusClient] = None,
|
||||
example_id: Optional[Union[UUID, str]] = None,
|
||||
project_name: Optional[str] = None,
|
||||
**kwargs: Any
|
||||
) -> None:
|
||||
super().__init__(**kwargs)
|
||||
@@ -59,6 +61,23 @@ class EvaluatorCallbackHandler(BaseTracer):
|
||||
max_workers=max(max_workers or len(evaluators), 1)
|
||||
)
|
||||
self.futures: Set[Future] = set()
|
||||
self.project_name = project_name
|
||||
|
||||
def _evaluate_in_project(self, run: Run, evaluator: RunEvaluator) -> None:
|
||||
"""Evaluate the run in the project.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
run : Run
|
||||
The run to be evaluated.
|
||||
evaluator : RunEvaluator
|
||||
The evaluator to use for evaluating the run.
|
||||
|
||||
"""
|
||||
if self.project_name is None:
|
||||
return self.client.evaluate_run(run, evaluator)
|
||||
with tracing_v2_enabled(project_name=self.project_name):
|
||||
return self.client.evaluate_run(run, evaluator)
|
||||
|
||||
def _persist_run(self, run: Run) -> None:
|
||||
"""Run the evaluator on the run.
|
||||
@@ -73,7 +92,7 @@ class EvaluatorCallbackHandler(BaseTracer):
|
||||
run_.reference_example_id = self.example_id
|
||||
for evaluator in self.evaluators:
|
||||
self.futures.add(
|
||||
self.executor.submit(self.client.evaluate_run, run_, evaluator)
|
||||
self.executor.submit(self._evaluate_in_project, run_, evaluator)
|
||||
)
|
||||
|
||||
def wait_for_futures(self) -> None:
|
||||
|
||||
@@ -157,7 +157,13 @@ def openapi_spec_to_openai_fn(
|
||||
"url": api_op.base_url + api_op.path,
|
||||
}
|
||||
|
||||
def default_call_api(name: str, fn_args: dict, **kwargs: Any) -> Any:
|
||||
def default_call_api(
|
||||
name: str,
|
||||
fn_args: dict,
|
||||
headers: Optional[dict] = None,
|
||||
params: Optional[dict] = None,
|
||||
**kwargs: Any,
|
||||
) -> Any:
|
||||
method = _name_to_call_map[name]["method"]
|
||||
url = _name_to_call_map[name]["url"]
|
||||
path_params = fn_args.pop("path_params", {})
|
||||
@@ -165,6 +171,16 @@ def openapi_spec_to_openai_fn(
|
||||
if "data" in fn_args and isinstance(fn_args["data"], dict):
|
||||
fn_args["data"] = json.dumps(fn_args["data"])
|
||||
_kwargs = {**fn_args, **kwargs}
|
||||
if headers is not None:
|
||||
if "headers" in _kwargs:
|
||||
_kwargs["headers"].update(headers)
|
||||
else:
|
||||
_kwargs["headers"] = headers
|
||||
if params is not None:
|
||||
if "params" in _kwargs:
|
||||
_kwargs["params"].update(params)
|
||||
else:
|
||||
_kwargs["params"] = params
|
||||
return requests.request(method, url, **_kwargs)
|
||||
|
||||
return functions, default_call_api
|
||||
@@ -218,6 +234,8 @@ def get_openapi_chain(
|
||||
request_chain: Optional[Chain] = None,
|
||||
llm_kwargs: Optional[Dict] = None,
|
||||
verbose: bool = False,
|
||||
headers: Optional[Dict] = None,
|
||||
params: Optional[Dict] = None,
|
||||
**kwargs: Any,
|
||||
) -> SequentialChain:
|
||||
"""Create a chain for querying an API from a OpenAPI spec.
|
||||
@@ -259,7 +277,10 @@ def get_openapi_chain(
|
||||
**(llm_kwargs or {}),
|
||||
)
|
||||
request_chain = request_chain or SimpleRequestChain(
|
||||
request_method=call_api_fn, verbose=verbose
|
||||
request_method=lambda name, args: call_api_fn(
|
||||
name, args, headers=headers, params=params
|
||||
),
|
||||
verbose=verbose,
|
||||
)
|
||||
return SequentialChain(
|
||||
chains=[llm_chain, request_chain],
|
||||
|
||||
@@ -296,12 +296,14 @@ async def _callbacks_initializer(
|
||||
project_name: Optional[str],
|
||||
client: LangChainPlusClient,
|
||||
run_evaluators: Sequence[RunEvaluator],
|
||||
evaluation_handler_collector: List[EvaluatorCallbackHandler],
|
||||
) -> List[BaseTracer]:
|
||||
"""
|
||||
Initialize a tracer to share across tasks.
|
||||
|
||||
Args:
|
||||
project_name: The project name for the tracer.
|
||||
client: The client to use for the tracer.
|
||||
|
||||
Returns:
|
||||
A LangChainTracer instance with an active project.
|
||||
@@ -309,15 +311,17 @@ async def _callbacks_initializer(
|
||||
callbacks: List[BaseTracer] = []
|
||||
if project_name:
|
||||
callbacks.append(LangChainTracer(project_name=project_name))
|
||||
evaluator_project_name = f"{project_name}-evaluators" if project_name else None
|
||||
if run_evaluators:
|
||||
callbacks.append(
|
||||
EvaluatorCallbackHandler(
|
||||
client=client,
|
||||
evaluators=run_evaluators,
|
||||
# We already have concurrency, don't want to overload the machine
|
||||
max_workers=1,
|
||||
)
|
||||
callback = EvaluatorCallbackHandler(
|
||||
client=client,
|
||||
evaluators=run_evaluators,
|
||||
# We already have concurrency, don't want to overload the machine
|
||||
max_workers=1,
|
||||
project_name=evaluator_project_name,
|
||||
)
|
||||
callbacks.append(callback)
|
||||
evaluation_handler_collector.append(callback)
|
||||
return callbacks
|
||||
|
||||
|
||||
@@ -362,9 +366,6 @@ async def arun_on_examples(
|
||||
client_.create_project(project_name, mode="eval")
|
||||
|
||||
results: Dict[str, List[Any]] = {}
|
||||
evaluation_handler = EvaluatorCallbackHandler(
|
||||
evaluators=run_evaluators or [], client=client_
|
||||
)
|
||||
|
||||
async def process_example(
|
||||
example: Example, callbacks: List[BaseCallbackHandler], job_state: dict
|
||||
@@ -386,17 +387,20 @@ async def arun_on_examples(
|
||||
flush=True,
|
||||
)
|
||||
|
||||
evaluation_handlers: List[EvaluatorCallbackHandler] = []
|
||||
await _gather_with_concurrency(
|
||||
concurrency_level,
|
||||
functools.partial(
|
||||
_callbacks_initializer,
|
||||
project_name=project_name,
|
||||
client=client_,
|
||||
evaluation_handler_collector=evaluation_handlers,
|
||||
run_evaluators=run_evaluators or [],
|
||||
),
|
||||
*(functools.partial(process_example, e) for e in examples),
|
||||
)
|
||||
evaluation_handler.wait_for_futures()
|
||||
for handler in evaluation_handlers:
|
||||
handler.wait_for_futures()
|
||||
return results
|
||||
|
||||
|
||||
@@ -537,8 +541,11 @@ def run_on_examples(
|
||||
client_ = client or LangChainPlusClient()
|
||||
client_.create_project(project_name, mode="eval")
|
||||
tracer = LangChainTracer(project_name=project_name)
|
||||
evaluator_project_name = f"{project_name}-evaluators"
|
||||
evalution_handler = EvaluatorCallbackHandler(
|
||||
evaluators=run_evaluators or [], client=client_
|
||||
evaluators=run_evaluators or [],
|
||||
client=client_,
|
||||
project_name=evaluator_project_name,
|
||||
)
|
||||
callbacks: List[BaseCallbackHandler] = [tracer, evalution_handler]
|
||||
for i, example in enumerate(examples):
|
||||
|
||||
@@ -63,6 +63,7 @@ from langchain.document_loaders.imsdb import IMSDbLoader
|
||||
from langchain.document_loaders.iugu import IuguLoader
|
||||
from langchain.document_loaders.joplin import JoplinLoader
|
||||
from langchain.document_loaders.json_loader import JSONLoader
|
||||
from langchain.document_loaders.larksuite import LarkSuiteDocLoader
|
||||
from langchain.document_loaders.markdown import UnstructuredMarkdownLoader
|
||||
from langchain.document_loaders.mastodon import MastodonTootsLoader
|
||||
from langchain.document_loaders.max_compute import MaxComputeLoader
|
||||
@@ -78,6 +79,7 @@ from langchain.document_loaders.odt import UnstructuredODTLoader
|
||||
from langchain.document_loaders.onedrive import OneDriveLoader
|
||||
from langchain.document_loaders.onedrive_file import OneDriveFileLoader
|
||||
from langchain.document_loaders.open_city_data import OpenCityDataLoader
|
||||
from langchain.document_loaders.org_mode import UnstructuredOrgModeLoader
|
||||
from langchain.document_loaders.pdf import (
|
||||
MathpixPDFLoader,
|
||||
OnlinePDFLoader,
|
||||
@@ -112,6 +114,8 @@ from langchain.document_loaders.telegram import (
|
||||
TelegramChatApiLoader,
|
||||
TelegramChatFileLoader,
|
||||
)
|
||||
from langchain.document_loaders.tencent_cos_directory import TencentCOSDirectoryLoader
|
||||
from langchain.document_loaders.tencent_cos_file import TencentCOSFileLoader
|
||||
from langchain.document_loaders.text import TextLoader
|
||||
from langchain.document_loaders.tomarkdown import ToMarkdownLoader
|
||||
from langchain.document_loaders.toml import TomlLoader
|
||||
@@ -201,6 +205,7 @@ __all__ = [
|
||||
"IuguLoader",
|
||||
"JSONLoader",
|
||||
"JoplinLoader",
|
||||
"LarkSuiteDocLoader",
|
||||
"MWDumpLoader",
|
||||
"MastodonTootsLoader",
|
||||
"MathpixPDFLoader",
|
||||
@@ -242,6 +247,8 @@ __all__ = [
|
||||
"SnowflakeLoader",
|
||||
"SpreedlyLoader",
|
||||
"StripeLoader",
|
||||
"TencentCOSDirectoryLoader",
|
||||
"TencentCOSFileLoader",
|
||||
"TelegramChatApiLoader",
|
||||
"TelegramChatFileLoader",
|
||||
"TelegramChatLoader",
|
||||
@@ -262,6 +269,7 @@ __all__ = [
|
||||
"UnstructuredImageLoader",
|
||||
"UnstructuredMarkdownLoader",
|
||||
"UnstructuredODTLoader",
|
||||
"UnstructuredOrgModeLoader",
|
||||
"UnstructuredPDFLoader",
|
||||
"UnstructuredPowerPointLoader",
|
||||
"UnstructuredRSTLoader",
|
||||
|
||||
46
langchain/document_loaders/larksuite.py
Normal file
46
langchain/document_loaders/larksuite.py
Normal file
@@ -0,0 +1,46 @@
|
||||
"""Loader that loads LarkSuite (FeiShu) document json dump."""
|
||||
import json
|
||||
import urllib.request
|
||||
from typing import Any, Iterator, List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class LarkSuiteDocLoader(BaseLoader):
|
||||
"""Loader that loads LarkSuite (FeiShu) document."""
|
||||
|
||||
def __init__(self, domain: str, access_token: str, document_id: str):
|
||||
"""Initialize with domain, access_token (tenant / user), and document_id."""
|
||||
self.domain = domain
|
||||
self.access_token = access_token
|
||||
self.document_id = document_id
|
||||
|
||||
def _get_larksuite_api_json_data(self, api_url: str) -> Any:
|
||||
"""Get LarkSuite (FeiShu) API response json data."""
|
||||
headers = {"Authorization": f"Bearer {self.access_token}"}
|
||||
request = urllib.request.Request(api_url, headers=headers)
|
||||
with urllib.request.urlopen(request) as response:
|
||||
json_data = json.loads(response.read().decode())
|
||||
return json_data
|
||||
|
||||
def lazy_load(self) -> Iterator[Document]:
|
||||
"""Lazy load LarkSuite (FeiShu) document."""
|
||||
api_url_prefix = f"{self.domain}/open-apis/docx/v1/documents"
|
||||
metadata_json = self._get_larksuite_api_json_data(
|
||||
f"{api_url_prefix}/{self.document_id}"
|
||||
)
|
||||
raw_content_json = self._get_larksuite_api_json_data(
|
||||
f"{api_url_prefix}/{self.document_id}/raw_content"
|
||||
)
|
||||
text = raw_content_json["data"]["content"]
|
||||
metadata = {
|
||||
"document_id": self.document_id,
|
||||
"revision_id": metadata_json["data"]["document"]["revision_id"],
|
||||
"title": metadata_json["data"]["document"]["title"],
|
||||
}
|
||||
yield Document(page_content=text, metadata=metadata)
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load LarkSuite (FeiShu) document."""
|
||||
return list(self.lazy_load())
|
||||
22
langchain/document_loaders/org_mode.py
Normal file
22
langchain/document_loaders/org_mode.py
Normal file
@@ -0,0 +1,22 @@
|
||||
"""Loader that loads Org-Mode files."""
|
||||
from typing import Any, List
|
||||
|
||||
from langchain.document_loaders.unstructured import (
|
||||
UnstructuredFileLoader,
|
||||
validate_unstructured_version,
|
||||
)
|
||||
|
||||
|
||||
class UnstructuredOrgModeLoader(UnstructuredFileLoader):
|
||||
"""Loader that uses unstructured to load Org-Mode files."""
|
||||
|
||||
def __init__(
|
||||
self, file_path: str, mode: str = "single", **unstructured_kwargs: Any
|
||||
):
|
||||
validate_unstructured_version(min_unstructured_version="0.7.9")
|
||||
super().__init__(file_path=file_path, mode=mode, **unstructured_kwargs)
|
||||
|
||||
def _get_elements(self) -> List:
|
||||
from unstructured.partition.org import partition_org
|
||||
|
||||
return partition_org(filename=self.file_path, **self.unstructured_kwargs)
|
||||
@@ -1,5 +1,6 @@
|
||||
from langchain.document_loaders.parsers.audio import OpenAIWhisperParser
|
||||
from langchain.document_loaders.parsers.html import BS4HTMLParser
|
||||
from langchain.document_loaders.parsers.language import LanguageParser
|
||||
from langchain.document_loaders.parsers.pdf import (
|
||||
PDFMinerParser,
|
||||
PDFPlumberParser,
|
||||
@@ -10,6 +11,7 @@ from langchain.document_loaders.parsers.pdf import (
|
||||
|
||||
__all__ = [
|
||||
"BS4HTMLParser",
|
||||
"LanguageParser",
|
||||
"OpenAIWhisperParser",
|
||||
"PDFMinerParser",
|
||||
"PDFPlumberParser",
|
||||
|
||||
3
langchain/document_loaders/parsers/language/__init__.py
Normal file
3
langchain/document_loaders/parsers/language/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
||||
from langchain.document_loaders.parsers.language.language_parser import LanguageParser
|
||||
|
||||
__all__ = ["LanguageParser"]
|
||||
@@ -0,0 +1,18 @@
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import List
|
||||
|
||||
|
||||
class CodeSegmenter(ABC):
|
||||
def __init__(self, code: str):
|
||||
self.code = code
|
||||
|
||||
def is_valid(self) -> bool:
|
||||
return True
|
||||
|
||||
@abstractmethod
|
||||
def simplify_code(self) -> str:
|
||||
raise NotImplementedError # pragma: no cover
|
||||
|
||||
@abstractmethod
|
||||
def extract_functions_classes(self) -> List[str]:
|
||||
raise NotImplementedError # pragma: no cover
|
||||
65
langchain/document_loaders/parsers/language/javascript.py
Normal file
65
langchain/document_loaders/parsers/language/javascript.py
Normal file
@@ -0,0 +1,65 @@
|
||||
from typing import Any, List
|
||||
|
||||
from langchain.document_loaders.parsers.language.code_segmenter import CodeSegmenter
|
||||
|
||||
|
||||
class JavaScriptSegmenter(CodeSegmenter):
|
||||
def __init__(self, code: str):
|
||||
super().__init__(code)
|
||||
self.source_lines = self.code.splitlines()
|
||||
|
||||
try:
|
||||
import esprima # noqa: F401
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"Could not import esprima Python package. "
|
||||
"Please install it with `pip install esprima`."
|
||||
)
|
||||
|
||||
def is_valid(self) -> bool:
|
||||
import esprima
|
||||
|
||||
try:
|
||||
esprima.parseScript(self.code)
|
||||
return True
|
||||
except esprima.Error:
|
||||
return False
|
||||
|
||||
def _extract_code(self, node: Any) -> str:
|
||||
start = node.loc.start.line - 1
|
||||
end = node.loc.end.line
|
||||
return "\n".join(self.source_lines[start:end])
|
||||
|
||||
def extract_functions_classes(self) -> List[str]:
|
||||
import esprima
|
||||
|
||||
tree = esprima.parseScript(self.code, loc=True)
|
||||
functions_classes = []
|
||||
|
||||
for node in tree.body:
|
||||
if isinstance(
|
||||
node,
|
||||
(esprima.nodes.FunctionDeclaration, esprima.nodes.ClassDeclaration),
|
||||
):
|
||||
functions_classes.append(self._extract_code(node))
|
||||
|
||||
return functions_classes
|
||||
|
||||
def simplify_code(self) -> str:
|
||||
import esprima
|
||||
|
||||
tree = esprima.parseScript(self.code, loc=True)
|
||||
simplified_lines = self.source_lines[:]
|
||||
|
||||
for node in tree.body:
|
||||
if isinstance(
|
||||
node,
|
||||
(esprima.nodes.FunctionDeclaration, esprima.nodes.ClassDeclaration),
|
||||
):
|
||||
start = node.loc.start.line - 1
|
||||
simplified_lines[start] = f"// Code for: {simplified_lines[start]}"
|
||||
|
||||
for line_num in range(start + 1, node.loc.end.line):
|
||||
simplified_lines[line_num] = None # type: ignore
|
||||
|
||||
return "\n".join(line for line in simplified_lines if line is not None)
|
||||
143
langchain/document_loaders/parsers/language/language_parser.py
Normal file
143
langchain/document_loaders/parsers/language/language_parser.py
Normal file
@@ -0,0 +1,143 @@
|
||||
from typing import Any, Dict, Iterator, Optional
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseBlobParser
|
||||
from langchain.document_loaders.blob_loaders import Blob
|
||||
from langchain.document_loaders.parsers.language.javascript import JavaScriptSegmenter
|
||||
from langchain.document_loaders.parsers.language.python import PythonSegmenter
|
||||
from langchain.text_splitter import Language
|
||||
|
||||
LANGUAGE_EXTENSIONS: Dict[str, str] = {
|
||||
"py": Language.PYTHON,
|
||||
"js": Language.JS,
|
||||
}
|
||||
|
||||
LANGUAGE_SEGMENTERS: Dict[str, Any] = {
|
||||
Language.PYTHON: PythonSegmenter,
|
||||
Language.JS: JavaScriptSegmenter,
|
||||
}
|
||||
|
||||
|
||||
class LanguageParser(BaseBlobParser):
|
||||
"""
|
||||
Language parser that split code using the respective language syntax.
|
||||
|
||||
Each top-level function and class in the code is loaded into separate documents.
|
||||
Furthermore, an extra document is generated, containing the remaining top-level code
|
||||
that excludes the already segmented functions and classes.
|
||||
|
||||
This approach can potentially improve the accuracy of QA models over source code.
|
||||
|
||||
Currently, the supported languages for code parsing are Python and JavaScript.
|
||||
|
||||
The language used for parsing can be configured, along with the minimum number of
|
||||
lines required to activate the splitting based on syntax.
|
||||
|
||||
Examples:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.text_splitter.Language
|
||||
from langchain.document_loaders.generic import GenericLoader
|
||||
from langchain.document_loaders.parsers import LanguageParser
|
||||
|
||||
loader = GenericLoader.from_filesystem(
|
||||
"./code",
|
||||
glob="**/*",
|
||||
suffixes=[".py", ".js"],
|
||||
parser=LanguageParser()
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
Example instantiations to manually select the language:
|
||||
|
||||
... code-block:: python
|
||||
|
||||
from langchain.text_splitter import Language
|
||||
|
||||
loader = GenericLoader.from_filesystem(
|
||||
"./code",
|
||||
glob="**/*",
|
||||
suffixes=[".py"],
|
||||
parser=LanguageParser(language=Language.PYTHON)
|
||||
)
|
||||
|
||||
Example instantiations to set number of lines threshold:
|
||||
|
||||
... code-block:: python
|
||||
|
||||
loader = GenericLoader.from_filesystem(
|
||||
"./code",
|
||||
glob="**/*",
|
||||
suffixes=[".py"],
|
||||
parser=LanguageParser(parser_threshold=200)
|
||||
)
|
||||
"""
|
||||
|
||||
def __init__(self, language: Optional[Language] = None, parser_threshold: int = 0):
|
||||
"""
|
||||
Language parser that split code using the respective language syntax.
|
||||
|
||||
Args:
|
||||
language: If None (default), it will try to infer language from source.
|
||||
parser_threshold: Minimum lines needed to activate parsing (0 by default).
|
||||
"""
|
||||
self.language = language
|
||||
self.parser_threshold = parser_threshold
|
||||
|
||||
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
|
||||
code = blob.as_string()
|
||||
|
||||
language = self.language or (
|
||||
LANGUAGE_EXTENSIONS.get(blob.source.rsplit(".", 1)[-1])
|
||||
if isinstance(blob.source, str)
|
||||
else None
|
||||
)
|
||||
|
||||
if language is None:
|
||||
yield Document(
|
||||
page_content=code,
|
||||
metadata={
|
||||
"source": blob.source,
|
||||
},
|
||||
)
|
||||
return
|
||||
|
||||
if self.parser_threshold >= len(code.splitlines()):
|
||||
yield Document(
|
||||
page_content=code,
|
||||
metadata={
|
||||
"source": blob.source,
|
||||
"language": language,
|
||||
},
|
||||
)
|
||||
return
|
||||
|
||||
self.Segmenter = LANGUAGE_SEGMENTERS[language]
|
||||
segmenter = self.Segmenter(blob.as_string())
|
||||
if not segmenter.is_valid():
|
||||
yield Document(
|
||||
page_content=code,
|
||||
metadata={
|
||||
"source": blob.source,
|
||||
},
|
||||
)
|
||||
return
|
||||
|
||||
for functions_classes in segmenter.extract_functions_classes():
|
||||
yield Document(
|
||||
page_content=functions_classes,
|
||||
metadata={
|
||||
"source": blob.source,
|
||||
"content_type": "functions_classes",
|
||||
"language": language,
|
||||
},
|
||||
)
|
||||
yield Document(
|
||||
page_content=segmenter.simplify_code(),
|
||||
metadata={
|
||||
"source": blob.source,
|
||||
"content_type": "simplified_code",
|
||||
"language": language,
|
||||
},
|
||||
)
|
||||
47
langchain/document_loaders/parsers/language/python.py
Normal file
47
langchain/document_loaders/parsers/language/python.py
Normal file
@@ -0,0 +1,47 @@
|
||||
import ast
|
||||
from typing import Any, List
|
||||
|
||||
from langchain.document_loaders.parsers.language.code_segmenter import CodeSegmenter
|
||||
|
||||
|
||||
class PythonSegmenter(CodeSegmenter):
|
||||
def __init__(self, code: str):
|
||||
super().__init__(code)
|
||||
self.source_lines = self.code.splitlines()
|
||||
|
||||
def is_valid(self) -> bool:
|
||||
try:
|
||||
ast.parse(self.code)
|
||||
return True
|
||||
except SyntaxError:
|
||||
return False
|
||||
|
||||
def _extract_code(self, node: Any) -> str:
|
||||
start = node.lineno - 1
|
||||
end = node.end_lineno
|
||||
return "\n".join(self.source_lines[start:end])
|
||||
|
||||
def extract_functions_classes(self) -> List[str]:
|
||||
tree = ast.parse(self.code)
|
||||
functions_classes = []
|
||||
|
||||
for node in ast.iter_child_nodes(tree):
|
||||
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
|
||||
functions_classes.append(self._extract_code(node))
|
||||
|
||||
return functions_classes
|
||||
|
||||
def simplify_code(self) -> str:
|
||||
tree = ast.parse(self.code)
|
||||
simplified_lines = self.source_lines[:]
|
||||
|
||||
for node in ast.iter_child_nodes(tree):
|
||||
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
|
||||
start = node.lineno - 1
|
||||
simplified_lines[start] = f"# Code for: {simplified_lines[start]}"
|
||||
|
||||
assert isinstance(node.end_lineno, int)
|
||||
for line_num in range(start + 1, node.end_lineno):
|
||||
simplified_lines[line_num] = None # type: ignore
|
||||
|
||||
return "\n".join(line for line in simplified_lines if line is not None)
|
||||
@@ -1,5 +1,5 @@
|
||||
"""Loader that loads documents from Psychic.dev."""
|
||||
from typing import List
|
||||
from typing import List, Optional
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
@@ -8,8 +8,10 @@ from langchain.document_loaders.base import BaseLoader
|
||||
class PsychicLoader(BaseLoader):
|
||||
"""Loader that loads documents from Psychic.dev."""
|
||||
|
||||
def __init__(self, api_key: str, connector_id: str, connection_id: str):
|
||||
"""Initialize with API key, connector id, and connection id."""
|
||||
def __init__(
|
||||
self, api_key: str, account_id: str, connector_id: Optional[str] = None
|
||||
):
|
||||
"""Initialize with API key, connector id, and account id."""
|
||||
|
||||
try:
|
||||
from psychicapi import ConnectorId, Psychic # noqa: F401
|
||||
@@ -19,16 +21,18 @@ class PsychicLoader(BaseLoader):
|
||||
)
|
||||
self.psychic = Psychic(secret_key=api_key)
|
||||
self.connector_id = ConnectorId(connector_id)
|
||||
self.connection_id = connection_id
|
||||
self.account_id = account_id
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
|
||||
psychic_docs = self.psychic.get_documents(self.connector_id, self.connection_id)
|
||||
psychic_docs = self.psychic.get_documents(
|
||||
connector_id=self.connector_id, account_id=self.account_id
|
||||
)
|
||||
return [
|
||||
Document(
|
||||
page_content=doc["content"],
|
||||
metadata={"title": doc["title"], "source": doc["uri"]},
|
||||
)
|
||||
for doc in psychic_docs
|
||||
for doc in psychic_docs.documents
|
||||
]
|
||||
|
||||
50
langchain/document_loaders/tencent_cos_directory.py
Normal file
50
langchain/document_loaders/tencent_cos_directory.py
Normal file
@@ -0,0 +1,50 @@
|
||||
"""Loading logic for loading documents from Tencent Cloud COS directory."""
|
||||
from typing import Any, Iterator, List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.document_loaders.tencent_cos_file import TencentCOSFileLoader
|
||||
|
||||
|
||||
class TencentCOSDirectoryLoader(BaseLoader):
|
||||
"""Loading logic for loading documents from Tencent Cloud COS."""
|
||||
|
||||
def __init__(self, conf: Any, bucket: str, prefix: str = ""):
|
||||
"""Initialize with COS config, bucket and prefix.
|
||||
:param conf(CosConfig): COS config.
|
||||
:param bucket(str): COS bucket.
|
||||
:param prefix(str): prefix.
|
||||
"""
|
||||
self.conf = conf
|
||||
self.bucket = bucket
|
||||
self.prefix = prefix
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
return list(self.lazy_load())
|
||||
|
||||
def lazy_load(self) -> Iterator[Document]:
|
||||
"""Load documents."""
|
||||
try:
|
||||
from qcloud_cos import CosS3Client
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import cos-python-sdk-v5 python package. "
|
||||
"Please install it with `pip install cos-python-sdk-v5`."
|
||||
)
|
||||
client = CosS3Client(self.conf)
|
||||
contents = []
|
||||
marker = ""
|
||||
while True:
|
||||
response = client.list_objects(
|
||||
Bucket=self.bucket, Prefix=self.prefix, Marker=marker, MaxKeys=1000
|
||||
)
|
||||
if "Contents" in response:
|
||||
contents.extend(response["Contents"])
|
||||
if response["IsTruncated"] == "false":
|
||||
break
|
||||
marker = response["NextMarker"]
|
||||
for content in contents:
|
||||
if content["Key"].endswith("/"):
|
||||
continue
|
||||
loader = TencentCOSFileLoader(self.conf, self.bucket, content["Key"])
|
||||
yield loader.load()[0]
|
||||
48
langchain/document_loaders/tencent_cos_file.py
Normal file
48
langchain/document_loaders/tencent_cos_file.py
Normal file
@@ -0,0 +1,48 @@
|
||||
"""Loading logic for loading documents from Tencent Cloud COS file."""
|
||||
import os
|
||||
import tempfile
|
||||
from typing import Any, Iterator, List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.document_loaders.unstructured import UnstructuredFileLoader
|
||||
|
||||
|
||||
class TencentCOSFileLoader(BaseLoader):
|
||||
"""Loading logic for loading documents from Tencent Cloud COS."""
|
||||
|
||||
def __init__(self, conf: Any, bucket: str, key: str):
|
||||
"""Initialize with COS config, bucket and key name.
|
||||
:param conf(CosConfig): COS config.
|
||||
:param bucket(str): COS bucket.
|
||||
:param key(str): COS file key.
|
||||
"""
|
||||
self.conf = conf
|
||||
self.bucket = bucket
|
||||
self.key = key
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
return list(self.lazy_load())
|
||||
|
||||
def lazy_load(self) -> Iterator[Document]:
|
||||
"""Load documents."""
|
||||
try:
|
||||
from qcloud_cos import CosS3Client
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import cos-python-sdk-v5 python package. "
|
||||
"Please install it with `pip install cos-python-sdk-v5`."
|
||||
)
|
||||
|
||||
# Initialise a client
|
||||
client = CosS3Client(self.conf)
|
||||
with tempfile.TemporaryDirectory() as temp_dir:
|
||||
file_path = f"{temp_dir}/{self.bucket}/{self.key}"
|
||||
os.makedirs(os.path.dirname(file_path), exist_ok=True)
|
||||
# Download the file to a destination
|
||||
client.download_file(
|
||||
Bucket=self.bucket, Key=self.key, DestFilePath=file_path
|
||||
)
|
||||
loader = UnstructuredFileLoader(file_path)
|
||||
# UnstructuredFileLoader not implement lazy_load yet
|
||||
return iter(loader.load())
|
||||
@@ -50,6 +50,9 @@ class WebBaseLoader(BaseLoader):
|
||||
requests_kwargs: Dict[str, Any] = {}
|
||||
"""kwargs for requests"""
|
||||
|
||||
raise_for_status: bool = False
|
||||
"""Raise an exception if http status code denotes an error."""
|
||||
|
||||
bs_get_text_kwargs: Dict[str, Any] = {}
|
||||
"""kwargs for beatifulsoup4 get_text"""
|
||||
|
||||
@@ -58,6 +61,7 @@ class WebBaseLoader(BaseLoader):
|
||||
web_path: Union[str, List[str]],
|
||||
header_template: Optional[dict] = None,
|
||||
verify: Optional[bool] = True,
|
||||
proxies: Optional[dict] = None,
|
||||
):
|
||||
"""Initialize with webpage path."""
|
||||
|
||||
@@ -94,6 +98,9 @@ class WebBaseLoader(BaseLoader):
|
||||
)
|
||||
self.session.headers = dict(headers)
|
||||
|
||||
if proxies:
|
||||
self.session.proxies.update(proxies)
|
||||
|
||||
@property
|
||||
def web_path(self) -> str:
|
||||
if len(self.web_paths) > 1:
|
||||
@@ -189,6 +196,8 @@ class WebBaseLoader(BaseLoader):
|
||||
self._check_parser(parser)
|
||||
|
||||
html_doc = self.session.get(url, verify=self.verify, **self.requests_kwargs)
|
||||
if self.raise_for_status:
|
||||
html_doc.raise_for_status()
|
||||
html_doc.encoding = html_doc.apparent_encoding
|
||||
return BeautifulSoup(html_doc.text, parser)
|
||||
|
||||
|
||||
@@ -49,13 +49,15 @@ class WhatsAppChatLoader(BaseLoader):
|
||||
\s
|
||||
(.+)
|
||||
"""
|
||||
ignore_lines = ["This message was deleted", "<Media omitted>"]
|
||||
for line in lines:
|
||||
result = re.match(
|
||||
message_line_regex, line.strip(), flags=re.VERBOSE | re.IGNORECASE
|
||||
)
|
||||
if result:
|
||||
date, sender, text = result.groups()
|
||||
text_content += concatenate_rows(date, sender, text)
|
||||
if text not in ignore_lines:
|
||||
text_content += concatenate_rows(date, sender, text)
|
||||
|
||||
metadata = {"source": str(p)}
|
||||
|
||||
|
||||
@@ -61,6 +61,29 @@ class GuardrailsOutputParser(BaseOutputParser):
|
||||
kwargs=kwargs,
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def from_pydantic(
|
||||
cls,
|
||||
output_class: Any,
|
||||
num_reasks: int = 1,
|
||||
api: Optional[Callable] = None,
|
||||
*args: Any,
|
||||
**kwargs: Any,
|
||||
) -> GuardrailsOutputParser:
|
||||
try:
|
||||
from guardrails import Guard
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"guardrails-ai package not installed. "
|
||||
"Install it by running `pip install guardrails-ai`."
|
||||
)
|
||||
return cls(
|
||||
guard=Guard.from_pydantic(output_class, "", num_reasks=num_reasks),
|
||||
api=api,
|
||||
args=args,
|
||||
kwargs=kwargs,
|
||||
)
|
||||
|
||||
def get_format_instructions(self) -> str:
|
||||
return self.guard.raw_prompt.format_instructions
|
||||
|
||||
|
||||
@@ -14,6 +14,7 @@ from langchain.retrievers.llama_index import (
|
||||
from langchain.retrievers.merger_retriever import MergerRetriever
|
||||
from langchain.retrievers.metal import MetalRetriever
|
||||
from langchain.retrievers.milvus import MilvusRetriever
|
||||
from langchain.retrievers.multi_query import MultiQueryRetriever
|
||||
from langchain.retrievers.pinecone_hybrid_search import PineconeHybridSearchRetriever
|
||||
from langchain.retrievers.pupmed import PubMedRetriever
|
||||
from langchain.retrievers.remote_retriever import RemoteLangChainRetriever
|
||||
@@ -43,6 +44,7 @@ __all__ = [
|
||||
"MergerRetriever",
|
||||
"MetalRetriever",
|
||||
"MilvusRetriever",
|
||||
"MultiQueryRetriever",
|
||||
"PineconeHybridSearchRetriever",
|
||||
"PubMedRetriever",
|
||||
"RemoteLangChainRetriever",
|
||||
|
||||
@@ -32,16 +32,17 @@ class TextWithHighLights(BaseModel, extra=Extra.allow):
|
||||
Highlights: Optional[Any]
|
||||
|
||||
|
||||
class AdditionalResultAttributeValue(BaseModel, extra=Extra.allow):
|
||||
TextWithHighlightsValue: TextWithHighLights
|
||||
|
||||
|
||||
class AdditionalResultAttribute(BaseModel, extra=Extra.allow):
|
||||
Key: str
|
||||
ValueType: Literal["TEXT_WITH_HIGHLIGHTS_VALUE"]
|
||||
Value: Optional[TextWithHighLights]
|
||||
Value: AdditionalResultAttributeValue
|
||||
|
||||
def get_value_text(self) -> str:
|
||||
if not self.Value:
|
||||
return ""
|
||||
else:
|
||||
return self.Value.Text
|
||||
return self.Value.TextWithHighlightsValue.Text
|
||||
|
||||
|
||||
class QueryResultItem(BaseModel, extra=Extra.allow):
|
||||
|
||||
158
langchain/retrievers/multi_query.py
Normal file
158
langchain/retrievers/multi_query.py
Normal file
@@ -0,0 +1,158 @@
|
||||
import logging
|
||||
from typing import List
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.llms.base import BaseLLM
|
||||
from langchain.output_parsers.pydantic import PydanticOutputParser
|
||||
from langchain.prompts.prompt import PromptTemplate
|
||||
from langchain.schema import BaseRetriever, Document
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
|
||||
class LineList(BaseModel):
|
||||
lines: List[str] = Field(description="Lines of text")
|
||||
|
||||
|
||||
class LineListOutputParser(PydanticOutputParser):
|
||||
def __init__(self) -> None:
|
||||
super().__init__(pydantic_object=LineList)
|
||||
|
||||
def parse(self, text: str) -> LineList:
|
||||
lines = text.strip().split("\n")
|
||||
return LineList(lines=lines)
|
||||
|
||||
|
||||
# Default prompt
|
||||
DEFAULT_QUERY_PROMPT = PromptTemplate(
|
||||
input_variables=["question"],
|
||||
template="""You are an AI language model assistant. Your task is
|
||||
to generate 3 different versions of the given user
|
||||
question to retrieve relevant documents from a vector database.
|
||||
By generating multiple perspectives on the user question,
|
||||
your goal is to help the user overcome some of the limitations
|
||||
of distance-based similarity search. Provide these alternative
|
||||
questions seperated by newlines. Original question: {question}""",
|
||||
)
|
||||
|
||||
|
||||
class MultiQueryRetriever(BaseRetriever):
|
||||
|
||||
"""Given a user query, use an LLM to write a set of queries.
|
||||
Retrieve docs for each query. Rake the unique union of all retrieved docs."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
retriever: BaseRetriever,
|
||||
llm_chain: LLMChain,
|
||||
verbose: bool = True,
|
||||
parser_key: str = "lines",
|
||||
) -> None:
|
||||
"""Initialize MultiQueryRetriever.
|
||||
|
||||
Args:
|
||||
retriever: retriever to query documents from
|
||||
llm_chain: llm_chain for query generation
|
||||
verbose: show the queries that we generated to the user
|
||||
parser_key: attribute name for the parsed output
|
||||
|
||||
Returns:
|
||||
MultiQueryRetriever
|
||||
"""
|
||||
self.retriever = retriever
|
||||
self.llm_chain = llm_chain
|
||||
self.verbose = verbose
|
||||
self.parser_key = parser_key
|
||||
|
||||
@classmethod
|
||||
def from_llm(
|
||||
cls,
|
||||
retriever: BaseRetriever,
|
||||
llm: BaseLLM,
|
||||
prompt: PromptTemplate = DEFAULT_QUERY_PROMPT,
|
||||
parser_key: str = "lines",
|
||||
) -> "MultiQueryRetriever":
|
||||
"""Initialize from llm using default template.
|
||||
|
||||
Args:
|
||||
retriever: retriever to query documents from
|
||||
llm: llm for query generation using DEFAULT_QUERY_PROMPT
|
||||
|
||||
Returns:
|
||||
MultiQueryRetriever
|
||||
"""
|
||||
output_parser = LineListOutputParser()
|
||||
llm_chain = LLMChain(llm=llm, prompt=prompt, output_parser=output_parser)
|
||||
return cls(
|
||||
retriever=retriever,
|
||||
llm_chain=llm_chain,
|
||||
parser_key=parser_key,
|
||||
)
|
||||
|
||||
def get_relevant_documents(self, question: str) -> List[Document]:
|
||||
"""Get relevated documents given a user query.
|
||||
|
||||
Args:
|
||||
question: user query
|
||||
|
||||
Returns:
|
||||
Unique union of relevant documents from all generated queries
|
||||
"""
|
||||
queries = self.generate_queries(question)
|
||||
documents = self.retrieve_documents(queries)
|
||||
unique_documents = self.unique_union(documents)
|
||||
return unique_documents
|
||||
|
||||
async def aget_relevant_documents(self, query: str) -> List[Document]:
|
||||
raise NotImplementedError
|
||||
|
||||
def generate_queries(self, question: str) -> List[str]:
|
||||
"""Generate queries based upon user input.
|
||||
|
||||
Args:
|
||||
question: user query
|
||||
|
||||
Returns:
|
||||
List of LLM generated queries that are similar to the user input
|
||||
"""
|
||||
response = self.llm_chain({"question": question})
|
||||
lines = getattr(response["text"], self.parser_key, [])
|
||||
if self.verbose:
|
||||
logging.info(f"Generated queries: {lines}")
|
||||
return lines
|
||||
|
||||
def retrieve_documents(self, queries: List[str]) -> List[Document]:
|
||||
"""Run all LLM generated queries.
|
||||
|
||||
Args:
|
||||
queries: query list
|
||||
|
||||
Returns:
|
||||
List of retrived Documents
|
||||
"""
|
||||
documents = []
|
||||
for query in queries:
|
||||
docs = self.retriever.get_relevant_documents(query)
|
||||
documents.extend(docs)
|
||||
return documents
|
||||
|
||||
def unique_union(self, documents: List[Document]) -> List[Document]:
|
||||
"""Get uniqe Documents.
|
||||
|
||||
Args:
|
||||
documents: List of retrived Documents
|
||||
|
||||
Returns:
|
||||
List of unique retrived Documents
|
||||
"""
|
||||
# Create a dictionary with page_content as keys to remove duplicates
|
||||
# TODO: Add Document ID property (e.g., UUID)
|
||||
unique_documents_dict = {
|
||||
(doc.page_content, tuple(sorted(doc.metadata.items()))): doc
|
||||
for doc in documents
|
||||
}
|
||||
|
||||
unique_documents = list(unique_documents_dict.values())
|
||||
return unique_documents
|
||||
@@ -1,6 +1,6 @@
|
||||
"""## Zapier Natural Language Actions API
|
||||
\
|
||||
Full docs here: https://nla.zapier.com/api/v1/docs
|
||||
Full docs here: https://nla.zapier.com/start/
|
||||
|
||||
**Zapier Natural Language Actions** gives you access to the 5k+ apps, 20k+ actions
|
||||
on Zapier's platform through a natural language API interface.
|
||||
@@ -24,8 +24,8 @@ NLA offers both API Key and OAuth for signing NLA API requests.
|
||||
connected accounts on Zapier.com
|
||||
|
||||
This quick start will focus on the server-side use case for brevity.
|
||||
Review [full docs](https://nla.zapier.com/api/v1/docs) or reach out to
|
||||
nla@zapier.com for user-facing oauth developer support.
|
||||
Review [full docs](https://nla.zapier.com/start/) for user-facing oauth developer
|
||||
support.
|
||||
|
||||
Typically, you'd use SequentialChain, here's a basic example:
|
||||
|
||||
@@ -42,8 +42,7 @@ import os
|
||||
# get from https://platform.openai.com/
|
||||
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "")
|
||||
|
||||
# get from https://nla.zapier.com/demo/provider/debug
|
||||
# (under User Information, after logging in):
|
||||
# get from https://nla.zapier.com/docs/authentication/
|
||||
os.environ["ZAPIER_NLA_API_KEY"] = os.environ.get("ZAPIER_NLA_API_KEY", "")
|
||||
|
||||
from langchain.llms import OpenAI
|
||||
@@ -61,8 +60,9 @@ from langchain.utilities.zapier import ZapierNLAWrapper
|
||||
|
||||
llm = OpenAI(temperature=0)
|
||||
zapier = ZapierNLAWrapper()
|
||||
## To leverage a nla_oauth_access_token you may pass the value to the ZapierNLAWrapper
|
||||
## If you do this there is no need to initialize the ZAPIER_NLA_API_KEY env variable
|
||||
## To leverage OAuth you may pass the value `nla_oauth_access_token` to
|
||||
## the ZapierNLAWrapper. If you do this there is no need to initialize
|
||||
## the ZAPIER_NLA_API_KEY env variable
|
||||
# zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token="TOKEN_HERE")
|
||||
toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)
|
||||
agent = initialize_agent(
|
||||
@@ -99,7 +99,7 @@ class ZapierNLARunAction(BaseTool):
|
||||
(eg. "get the latest email from Mike Knoop" for "Gmail: find email" action)
|
||||
params: a dict, optional. Any params provided will *override* AI guesses
|
||||
from `instructions` (see "understanding the AI guessing flow" here:
|
||||
https://nla.zapier.com/api/v1/docs)
|
||||
https://nla.zapier.com/docs/using-the-api#ai-guessing)
|
||||
|
||||
"""
|
||||
|
||||
@@ -142,11 +142,15 @@ class ZapierNLARunAction(BaseTool):
|
||||
|
||||
async def _arun(
|
||||
self,
|
||||
_: str,
|
||||
instructions: str,
|
||||
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
|
||||
raise NotImplementedError("ZapierNLAListActions does not support async")
|
||||
return await self.api_wrapper.arun_as_str(
|
||||
self.action_id,
|
||||
instructions,
|
||||
self.params,
|
||||
)
|
||||
|
||||
|
||||
ZapierNLARunAction.__doc__ = (
|
||||
@@ -184,7 +188,7 @@ class ZapierNLAListActions(BaseTool):
|
||||
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
|
||||
raise NotImplementedError("ZapierNLAListActions does not support async")
|
||||
return await self.api_wrapper.alist_as_str()
|
||||
|
||||
|
||||
ZapierNLAListActions.__doc__ = (
|
||||
|
||||
@@ -322,7 +322,7 @@ class SearxSearchWrapper(BaseModel):
|
||||
str: The result of the query.
|
||||
|
||||
Raises:
|
||||
ValueError: If an error occured with the query.
|
||||
ValueError: If an error occurred with the query.
|
||||
|
||||
|
||||
Example:
|
||||
|
||||
@@ -36,7 +36,7 @@ class SerpAPIWrapper(BaseModel):
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain import SerpAPIWrapper
|
||||
from langchain.utilities import SerpAPIWrapper
|
||||
serpapi = SerpAPIWrapper()
|
||||
"""
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
"""Util that can interact with Zapier NLA.
|
||||
|
||||
Full docs here: https://nla.zapier.com/api/v1/docs
|
||||
Full docs here: https://nla.zapier.com/start/
|
||||
|
||||
Note: this wrapper currently only implemented the `api_key` auth method for testing
|
||||
and server-side production use cases (using the developer's connected accounts on
|
||||
@@ -12,8 +12,9 @@ to use oauth. Review the full docs above and reach out to nla@zapier.com for
|
||||
developer support.
|
||||
"""
|
||||
import json
|
||||
from typing import Dict, List, Optional
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import aiohttp
|
||||
import requests
|
||||
from pydantic import BaseModel, Extra, root_validator
|
||||
from requests import Request, Session
|
||||
@@ -24,16 +25,20 @@ from langchain.utils import get_from_dict_or_env
|
||||
class ZapierNLAWrapper(BaseModel):
|
||||
"""Wrapper for Zapier NLA.
|
||||
|
||||
Full docs here: https://nla.zapier.com/api/v1/docs
|
||||
Full docs here: https://nla.zapier.com/start/
|
||||
|
||||
Note: this wrapper currently only implemented the `api_key` auth method for
|
||||
testingand server-side production use cases (using the developer's connected
|
||||
accounts on Zapier.com)
|
||||
This wrapper supports both API Key and OAuth Credential auth methods. API Key
|
||||
is the fastest way to get started using this wrapper.
|
||||
|
||||
Call this wrapper with either `zapier_nla_api_key` or
|
||||
`zapier_nla_oauth_access_token` arguments, or set the `ZAPIER_NLA_API_KEY`
|
||||
environment variable. If both arguments are set, the Access Token will take
|
||||
precedence.
|
||||
|
||||
For use-cases where LangChain + Zapier NLA is powering a user-facing application,
|
||||
and LangChain needs access to the end-user's connected accounts on Zapier.com,
|
||||
you'll need to use oauth. Review the full docs above and reach out to
|
||||
nla@zapier.com for developer support.
|
||||
you'll need to use OAuth. Review the full docs above to learn how to create
|
||||
your own provider and generate credentials.
|
||||
"""
|
||||
|
||||
zapier_nla_api_key: str
|
||||
@@ -45,36 +50,63 @@ class ZapierNLAWrapper(BaseModel):
|
||||
|
||||
extra = Extra.forbid
|
||||
|
||||
def _get_session(self) -> Session:
|
||||
session = requests.Session()
|
||||
session.headers.update(
|
||||
{
|
||||
"Accept": "application/json",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
)
|
||||
def _format_headers(self) -> Dict[str, str]:
|
||||
"""Format headers for requests."""
|
||||
headers = {
|
||||
"Accept": "application/json",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
|
||||
if self.zapier_nla_oauth_access_token:
|
||||
session.headers.update(
|
||||
headers.update(
|
||||
{"Authorization": f"Bearer {self.zapier_nla_oauth_access_token}"}
|
||||
)
|
||||
else:
|
||||
session.params = {"api_key": self.zapier_nla_api_key}
|
||||
headers.update({"X-API-Key": self.zapier_nla_api_key})
|
||||
|
||||
return headers
|
||||
|
||||
def _get_session(self) -> Session:
|
||||
session = requests.Session()
|
||||
session.headers.update(self._format_headers())
|
||||
return session
|
||||
|
||||
def _get_action_request(
|
||||
self, action_id: str, instructions: str, params: Optional[Dict] = None
|
||||
) -> Request:
|
||||
async def _arequest(self, method: str, url: str, **kwargs: Any) -> Dict[str, Any]:
|
||||
"""Make an async request."""
|
||||
async with aiohttp.ClientSession(headers=self._format_headers()) as session:
|
||||
async with session.request(method, url, **kwargs) as response:
|
||||
response.raise_for_status()
|
||||
return await response.json()
|
||||
|
||||
def _create_action_payload( # type: ignore[no-untyped-def]
|
||||
self, instructions: str, params: Optional[Dict] = None, preview_only=False
|
||||
) -> Dict:
|
||||
"""Create a payload for an action."""
|
||||
data = params if params else {}
|
||||
data.update(
|
||||
{
|
||||
"instructions": instructions,
|
||||
}
|
||||
)
|
||||
if preview_only:
|
||||
data.update({"preview_only": True})
|
||||
return data
|
||||
|
||||
def _create_action_url(self, action_id: str) -> str:
|
||||
"""Create a url for an action."""
|
||||
return self.zapier_nla_api_base + f"exposed/{action_id}/execute/"
|
||||
|
||||
def _create_action_request( # type: ignore[no-untyped-def]
|
||||
self,
|
||||
action_id: str,
|
||||
instructions: str,
|
||||
params: Optional[Dict] = None,
|
||||
preview_only=False,
|
||||
) -> Request:
|
||||
data = self._create_action_payload(instructions, params, preview_only)
|
||||
return Request(
|
||||
"POST",
|
||||
self.zapier_nla_api_base + f"exposed/{action_id}/execute/",
|
||||
self._create_action_url(action_id),
|
||||
json=data,
|
||||
)
|
||||
|
||||
@@ -103,7 +135,7 @@ class ZapierNLAWrapper(BaseModel):
|
||||
|
||||
return values
|
||||
|
||||
def list(self) -> List[Dict]:
|
||||
async def alist(self) -> List[Dict]:
|
||||
"""Returns a list of all exposed (enabled) actions associated with
|
||||
current user (associated with the set api_key). Change your exposed
|
||||
actions here: https://nla.zapier.com/demo/start/
|
||||
@@ -122,9 +154,45 @@ class ZapierNLAWrapper(BaseModel):
|
||||
(see "understanding the AI guessing flow" here:
|
||||
https://nla.zapier.com/api/v1/docs)
|
||||
"""
|
||||
response = await self._arequest("GET", self.zapier_nla_api_base + "exposed/")
|
||||
return response["results"]
|
||||
|
||||
def list(self) -> List[Dict]:
|
||||
"""Returns a list of all exposed (enabled) actions associated with
|
||||
current user (associated with the set api_key). Change your exposed
|
||||
actions here: https://nla.zapier.com/demo/start/
|
||||
|
||||
The return list can be empty if no actions exposed. Else will contain
|
||||
a list of action objects:
|
||||
|
||||
[{
|
||||
"id": str,
|
||||
"description": str,
|
||||
"params": Dict[str, str]
|
||||
}]
|
||||
|
||||
`params` will always contain an `instructions` key, the only required
|
||||
param. All others optional and if provided will override any AI guesses
|
||||
(see "understanding the AI guessing flow" here:
|
||||
https://nla.zapier.com/docs/using-the-api#ai-guessing)
|
||||
"""
|
||||
session = self._get_session()
|
||||
response = session.get(self.zapier_nla_api_base + "exposed/")
|
||||
response.raise_for_status()
|
||||
try:
|
||||
response = session.get(self.zapier_nla_api_base + "exposed/")
|
||||
response.raise_for_status()
|
||||
except requests.HTTPError as http_err:
|
||||
if response.status_code == 401:
|
||||
if self.zapier_nla_oauth_access_token:
|
||||
raise requests.HTTPError(
|
||||
f"An unauthorized response occurred. Check that your "
|
||||
f"access token is correct and doesn't need to be "
|
||||
f"refreshed. Err: {http_err}"
|
||||
)
|
||||
raise requests.HTTPError(
|
||||
f"An unauthorized response occurred. Check that your api "
|
||||
f"key is correct. Err: {http_err}"
|
||||
)
|
||||
raise http_err
|
||||
return response.json()["results"]
|
||||
|
||||
def run(
|
||||
@@ -139,11 +207,29 @@ class ZapierNLAWrapper(BaseModel):
|
||||
call.
|
||||
"""
|
||||
session = self._get_session()
|
||||
request = self._get_action_request(action_id, instructions, params)
|
||||
request = self._create_action_request(action_id, instructions, params)
|
||||
response = session.send(session.prepare_request(request))
|
||||
response.raise_for_status()
|
||||
return response.json()["result"]
|
||||
|
||||
async def arun(
|
||||
self, action_id: str, instructions: str, params: Optional[Dict] = None
|
||||
) -> Dict:
|
||||
"""Executes an action that is identified by action_id, must be exposed
|
||||
(enabled) by the current user (associated with the set api_key). Change
|
||||
your exposed actions here: https://nla.zapier.com/demo/start/
|
||||
|
||||
The return JSON is guaranteed to be less than ~500 words (350
|
||||
tokens) making it safe to inject into the prompt of another LLM
|
||||
call.
|
||||
"""
|
||||
response = await self._arequest(
|
||||
"POST",
|
||||
self._create_action_url(action_id),
|
||||
json=self._create_action_payload(instructions, params),
|
||||
)
|
||||
return response["result"]
|
||||
|
||||
def preview(
|
||||
self, action_id: str, instructions: str, params: Optional[Dict] = None
|
||||
) -> Dict:
|
||||
@@ -153,25 +239,58 @@ class ZapierNLAWrapper(BaseModel):
|
||||
session = self._get_session()
|
||||
params = params if params else {}
|
||||
params.update({"preview_only": True})
|
||||
request = self._get_action_request(action_id, instructions, params)
|
||||
request = self._create_action_request(action_id, instructions, params, True)
|
||||
response = session.send(session.prepare_request(request))
|
||||
response.raise_for_status()
|
||||
return response.json()["input_params"]
|
||||
|
||||
async def apreview(
|
||||
self, action_id: str, instructions: str, params: Optional[Dict] = None
|
||||
) -> Dict:
|
||||
"""Same as run, but instead of actually executing the action, will
|
||||
instead return a preview of params that have been guessed by the AI in
|
||||
case you need to explicitly review before executing."""
|
||||
response = await self._arequest(
|
||||
"POST",
|
||||
self._create_action_url(action_id),
|
||||
json=self._create_action_payload(instructions, params, preview_only=True),
|
||||
)
|
||||
return response["result"]
|
||||
|
||||
def run_as_str(self, *args, **kwargs) -> str: # type: ignore[no-untyped-def]
|
||||
"""Same as run, but returns a stringified version of the JSON for
|
||||
insertting back into an LLM."""
|
||||
data = self.run(*args, **kwargs)
|
||||
return json.dumps(data)
|
||||
|
||||
async def arun_as_str(self, *args, **kwargs) -> str: # type: ignore[no-untyped-def]
|
||||
"""Same as run, but returns a stringified version of the JSON for
|
||||
insertting back into an LLM."""
|
||||
data = await self.arun(*args, **kwargs)
|
||||
return json.dumps(data)
|
||||
|
||||
def preview_as_str(self, *args, **kwargs) -> str: # type: ignore[no-untyped-def]
|
||||
"""Same as preview, but returns a stringified version of the JSON for
|
||||
insertting back into an LLM."""
|
||||
data = self.preview(*args, **kwargs)
|
||||
return json.dumps(data)
|
||||
|
||||
async def apreview_as_str( # type: ignore[no-untyped-def]
|
||||
self, *args, **kwargs
|
||||
) -> str:
|
||||
"""Same as preview, but returns a stringified version of the JSON for
|
||||
insertting back into an LLM."""
|
||||
data = await self.apreview(*args, **kwargs)
|
||||
return json.dumps(data)
|
||||
|
||||
def list_as_str(self) -> str: # type: ignore[no-untyped-def]
|
||||
"""Same as list, but returns a stringified version of the JSON for
|
||||
insertting back into an LLM."""
|
||||
actions = self.list()
|
||||
return json.dumps(actions)
|
||||
|
||||
async def alist_as_str(self) -> str: # type: ignore[no-untyped-def]
|
||||
"""Same as list, but returns a stringified version of the JSON for
|
||||
insertting back into an LLM."""
|
||||
actions = await self.alist()
|
||||
return json.dumps(actions)
|
||||
|
||||
@@ -354,15 +354,16 @@ class Pinecone(VectorStore):
|
||||
pinecone.Index(index_name), embedding.embed_query, text_key, namespace
|
||||
)
|
||||
|
||||
def delete(self, ids: List[str]) -> None:
|
||||
def delete(self, ids: List[str], namespace: Optional[str] = None) -> None:
|
||||
"""Delete by vector IDs.
|
||||
|
||||
Args:
|
||||
ids: List of ids to delete.
|
||||
"""
|
||||
|
||||
# This is the maximum number of IDs that can be deleted
|
||||
if namespace is None:
|
||||
namespace = self._namespace
|
||||
chunk_size = 1000
|
||||
for i in range(0, len(ids), chunk_size):
|
||||
chunk = ids[i : i + chunk_size]
|
||||
self._index.delete(ids=chunk)
|
||||
self._index.delete(ids=chunk, namespace=namespace)
|
||||
|
||||
566
poetry.lock
generated
566
poetry.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -1,6 +1,6 @@
|
||||
[tool.poetry]
|
||||
name = "langchain"
|
||||
version = "0.0.216"
|
||||
version = "0.0.218"
|
||||
description = "Building applications with LLMs through composability"
|
||||
authors = []
|
||||
license = "MIT"
|
||||
@@ -88,7 +88,6 @@ gql = {version = "^3.4.1", optional = true}
|
||||
pandas = {version = "^2.0.1", optional = true}
|
||||
telethon = {version = "^1.28.5", optional = true}
|
||||
neo4j = {version = "^5.8.1", optional = true}
|
||||
psychicapi = {version = "^0.5", optional = true}
|
||||
zep-python = {version=">=0.31", optional=true}
|
||||
langkit = {version = ">=0.0.1.dev3, <0.1.0", optional = true}
|
||||
chardet = {version="^5.1.0", optional=true}
|
||||
@@ -109,8 +108,10 @@ nebula3-python = {version = "^3.4.0", optional = true}
|
||||
langchainplus-sdk = ">=0.0.17"
|
||||
awadb = {version = "^0.3.3", optional = true}
|
||||
azure-search-documents = {version = "11.4.0a20230509004", source = "azure-sdk-dev", optional = true}
|
||||
esprima = {version = "^4.0.1", optional = true}
|
||||
openllm = {version = ">=0.1.6", optional = true}
|
||||
streamlit = {version = "^1.18.0", optional = true, python = ">=3.8.1,<3.9.7 || >3.9.7,<4.0"}
|
||||
psychicapi = {version = "^0.8.0", optional = true}
|
||||
|
||||
[tool.poetry.group.docs.dependencies]
|
||||
autodoc_pydantic = "^1.8.0"
|
||||
@@ -222,6 +223,7 @@ clarifai = ["clarifai"]
|
||||
cohere = ["cohere"]
|
||||
docarray = ["docarray"]
|
||||
embeddings = ["sentence-transformers"]
|
||||
javascript = ["esprima"]
|
||||
azure = [
|
||||
"azure-identity",
|
||||
"azure-cosmos",
|
||||
@@ -303,6 +305,7 @@ all = [
|
||||
"tigrisdb",
|
||||
"nebula3-python",
|
||||
"awadb",
|
||||
"esprima",
|
||||
]
|
||||
|
||||
# An extra used to be able to add extended testing.
|
||||
@@ -312,6 +315,7 @@ extended_testing = [
|
||||
"beautifulsoup4",
|
||||
"bibtexparser",
|
||||
"chardet",
|
||||
"esprima",
|
||||
"jq",
|
||||
"pdfminer.six",
|
||||
"pgvector",
|
||||
@@ -354,7 +358,7 @@ exclude = [
|
||||
[tool.mypy]
|
||||
ignore_missing_imports = "True"
|
||||
disallow_untyped_defs = "True"
|
||||
exclude = ["notebooks"]
|
||||
exclude = ["notebooks", "examples", "example_data"]
|
||||
|
||||
[tool.coverage.run]
|
||||
omit = [
|
||||
|
||||
@@ -0,0 +1,25 @@
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
from langchain.chains.openai_functions.openapi import get_openapi_chain
|
||||
|
||||
|
||||
def test_openai_opeanapi() -> None:
|
||||
chain = get_openapi_chain(
|
||||
"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/"
|
||||
)
|
||||
output = chain.run("What are some options for a men's large blue button down shirt")
|
||||
|
||||
assert isinstance(output, dict)
|
||||
|
||||
|
||||
def test_openai_opeanapi_headers() -> None:
|
||||
BRANDFETCH_API_KEY = os.environ.get("BRANDFETCH_API_KEY")
|
||||
headers = {"Authorization": f"Bearer {BRANDFETCH_API_KEY}"}
|
||||
file_path = str(
|
||||
Path(__file__).parents[2] / "examples/brandfetch-brandfetch-2.0.0-resolved.json"
|
||||
)
|
||||
chain = get_openapi_chain(file_path, headers=headers)
|
||||
output = chain.run("I want to know about nike.comgg")
|
||||
|
||||
assert isinstance(output, str)
|
||||
@@ -0,0 +1,133 @@
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from langchain.document_loaders.generic import GenericLoader
|
||||
from langchain.document_loaders.parsers import LanguageParser
|
||||
from langchain.text_splitter import Language
|
||||
|
||||
|
||||
def test_language_loader_for_python() -> None:
|
||||
"""Test Python loader with parser enabled."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = GenericLoader.from_filesystem(
|
||||
file_path, glob="hello_world.py", parser=LanguageParser(parser_threshold=5)
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 2
|
||||
|
||||
metadata = docs[0].metadata
|
||||
assert metadata["source"] == str(file_path / "hello_world.py")
|
||||
assert metadata["content_type"] == "functions_classes"
|
||||
assert metadata["language"] == "python"
|
||||
metadata = docs[1].metadata
|
||||
assert metadata["source"] == str(file_path / "hello_world.py")
|
||||
assert metadata["content_type"] == "simplified_code"
|
||||
assert metadata["language"] == "python"
|
||||
|
||||
assert (
|
||||
docs[0].page_content
|
||||
== """def main():
|
||||
print("Hello World!")
|
||||
|
||||
return 0"""
|
||||
)
|
||||
assert (
|
||||
docs[1].page_content
|
||||
== """#!/usr/bin/env python3
|
||||
|
||||
import sys
|
||||
|
||||
|
||||
# Code for: def main():
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())"""
|
||||
)
|
||||
|
||||
|
||||
def test_language_loader_for_python_with_parser_threshold() -> None:
|
||||
"""Test Python loader with parser enabled and below threshold."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = GenericLoader.from_filesystem(
|
||||
file_path,
|
||||
glob="hello_world.py",
|
||||
parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000),
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 1
|
||||
|
||||
|
||||
def esprima_installed() -> bool:
|
||||
try:
|
||||
import esprima # noqa: F401
|
||||
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"esprima not installed, skipping test {e}")
|
||||
return False
|
||||
|
||||
|
||||
@pytest.mark.skipif(not esprima_installed(), reason="requires esprima package")
|
||||
def test_language_loader_for_javascript() -> None:
|
||||
"""Test JavaScript loader with parser enabled."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = GenericLoader.from_filesystem(
|
||||
file_path, glob="hello_world.js", parser=LanguageParser(parser_threshold=5)
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 3
|
||||
|
||||
metadata = docs[0].metadata
|
||||
assert metadata["source"] == str(file_path / "hello_world.js")
|
||||
assert metadata["content_type"] == "functions_classes"
|
||||
assert metadata["language"] == "js"
|
||||
metadata = docs[1].metadata
|
||||
assert metadata["source"] == str(file_path / "hello_world.js")
|
||||
assert metadata["content_type"] == "functions_classes"
|
||||
assert metadata["language"] == "js"
|
||||
metadata = docs[2].metadata
|
||||
assert metadata["source"] == str(file_path / "hello_world.js")
|
||||
assert metadata["content_type"] == "simplified_code"
|
||||
assert metadata["language"] == "js"
|
||||
|
||||
assert (
|
||||
docs[0].page_content
|
||||
== """class HelloWorld {
|
||||
sayHello() {
|
||||
console.log("Hello World!");
|
||||
}
|
||||
}"""
|
||||
)
|
||||
assert (
|
||||
docs[1].page_content
|
||||
== """function main() {
|
||||
const hello = new HelloWorld();
|
||||
hello.sayHello();
|
||||
}"""
|
||||
)
|
||||
assert (
|
||||
docs[2].page_content
|
||||
== """// Code for: class HelloWorld {
|
||||
|
||||
// Code for: function main() {
|
||||
|
||||
main();"""
|
||||
)
|
||||
|
||||
|
||||
def test_language_loader_for_javascript_with_parser_threshold() -> None:
|
||||
"""Test JavaScript loader with parser enabled and below threshold."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = GenericLoader.from_filesystem(
|
||||
file_path,
|
||||
glob="hello_world.js",
|
||||
parser=LanguageParser(language=Language.JS, parser_threshold=1000),
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 1
|
||||
14
tests/integration_tests/document_loaders/test_larksuite.py
Normal file
14
tests/integration_tests/document_loaders/test_larksuite.py
Normal file
@@ -0,0 +1,14 @@
|
||||
from langchain.document_loaders.larksuite import LarkSuiteDocLoader
|
||||
|
||||
DOMAIN = ""
|
||||
ACCESS_TOKEN = ""
|
||||
DOCUMENT_ID = ""
|
||||
|
||||
|
||||
def test_larksuite_doc_loader() -> None:
|
||||
"""Test LarkSuite (FeiShu) document loader."""
|
||||
loader = LarkSuiteDocLoader(DOMAIN, ACCESS_TOKEN, DOCUMENT_ID)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 1
|
||||
assert docs[0].page_content is not None
|
||||
15
tests/integration_tests/document_loaders/test_org_mode.py
Normal file
15
tests/integration_tests/document_loaders/test_org_mode.py
Normal file
@@ -0,0 +1,15 @@
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
from langchain.document_loaders import UnstructuredOrgModeLoader
|
||||
|
||||
EXAMPLE_DIRECTORY = file_path = Path(__file__).parent.parent / "examples"
|
||||
|
||||
|
||||
def test_unstructured_org_mode_loader() -> None:
|
||||
"""Test unstructured loader."""
|
||||
file_path = os.path.join(EXAMPLE_DIRECTORY, "README.org")
|
||||
loader = UnstructuredOrgModeLoader(str(file_path))
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 1
|
||||
27
tests/integration_tests/examples/README.org
Normal file
27
tests/integration_tests/examples/README.org
Normal file
@@ -0,0 +1,27 @@
|
||||
* Example Docs
|
||||
|
||||
The sample docs directory contains the following files:
|
||||
|
||||
- ~example-10k.html~ - A 10-K SEC filing in HTML format
|
||||
- ~layout-parser-paper.pdf~ - A PDF copy of the layout parser paper
|
||||
- ~factbook.xml~ / ~factbook.xsl~ - Example XML/XLS files that you
|
||||
can use to test stylesheets
|
||||
|
||||
These documents can be used to test out the parsers in the library. In
|
||||
addition, here are instructions for pulling in some sample docs that are
|
||||
too big to store in the repo.
|
||||
|
||||
** XBRL 10-K
|
||||
|
||||
You can get an example 10-K in inline XBRL format using the following
|
||||
~curl~. Note, you need to have the user agent set in the header or the
|
||||
SEC site will reject your request.
|
||||
|
||||
#+BEGIN_SRC bash
|
||||
|
||||
curl -O \
|
||||
-A '${organization} ${email}'
|
||||
https://www.sec.gov/Archives/edgar/data/311094/000117184321001344/0001171843-21-001344.txt
|
||||
#+END_SRC
|
||||
|
||||
You can parse this document using the HTML parser.
|
||||
@@ -0,0 +1,282 @@
|
||||
{
|
||||
"openapi": "3.0.1",
|
||||
"info": {
|
||||
"title": "Brandfetch API",
|
||||
"description": "Brandfetch API (v2) for retrieving brand information.\n\nSee our [documentation](https://docs.brandfetch.com/) for further details. ",
|
||||
"termsOfService": "https://brandfetch.com/terms",
|
||||
"contact": {
|
||||
"url": "https://brandfetch.com/developers"
|
||||
},
|
||||
"version": "2.0.0"
|
||||
},
|
||||
"externalDocs": {
|
||||
"description": "Documentation",
|
||||
"url": "https://docs.brandfetch.com/"
|
||||
},
|
||||
"servers": [
|
||||
{
|
||||
"url": "https://api.brandfetch.io/v2"
|
||||
}
|
||||
],
|
||||
"paths": {
|
||||
"/brands/{domainOrId}": {
|
||||
"get": {
|
||||
"summary": "Retrieve a brand",
|
||||
"description": "Fetch brand information by domain or ID\n\nFurther details here: https://docs.brandfetch.com/reference/retrieve-brand\n",
|
||||
"parameters": [
|
||||
{
|
||||
"name": "domainOrId",
|
||||
"in": "path",
|
||||
"description": "Domain or ID of the brand",
|
||||
"required": true,
|
||||
"style": "simple",
|
||||
"explode": false,
|
||||
"schema": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
],
|
||||
"responses": {
|
||||
"200": {
|
||||
"description": "Brand data",
|
||||
"content": {
|
||||
"application/json": {
|
||||
"schema": {
|
||||
"$ref": "#/components/schemas/Brand"
|
||||
},
|
||||
"examples": {
|
||||
"brandfetch.com": {
|
||||
"value": "{\"name\":\"Brandfetch\",\"domain\":\"brandfetch.com\",\"claimed\":true,\"description\":\"All brands. In one place\",\"links\":[{\"name\":\"twitter\",\"url\":\"https://twitter.com/brandfetch\"},{\"name\":\"linkedin\",\"url\":\"https://linkedin.com/company/brandfetch\"}],\"logos\":[{\"type\":\"logo\",\"theme\":\"light\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/id9WE9j86h.svg\",\"background\":\"transparent\",\"format\":\"svg\",\"size\":15555}]},{\"type\":\"logo\",\"theme\":\"dark\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/idWbsK1VCy.png\",\"background\":\"transparent\",\"format\":\"png\",\"height\":215,\"width\":800,\"size\":33937},{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/idtCMfbWO0.svg\",\"background\":\"transparent\",\"format\":\"svg\",\"height\":null,\"width\":null,\"size\":15567}]},{\"type\":\"symbol\",\"theme\":\"light\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/idXGq6SIu2.svg\",\"background\":\"transparent\",\"format\":\"svg\",\"size\":2215}]},{\"type\":\"symbol\",\"theme\":\"dark\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/iddCQ52AR5.svg\",\"background\":\"transparent\",\"format\":\"svg\",\"size\":2215}]},{\"type\":\"icon\",\"theme\":\"dark\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/idls3LaPPQ.png\",\"background\":null,\"format\":\"png\",\"height\":400,\"width\":400,\"size\":2565}]}],\"colors\":[{\"hex\":\"#0084ff\",\"type\":\"accent\",\"brightness\":113},{\"hex\":\"#00193E\",\"type\":\"brand\",\"brightness\":22},{\"hex\":\"#F03063\",\"type\":\"brand\",\"brightness\":93},{\"hex\":\"#7B0095\",\"type\":\"brand\",\"brightness\":37},{\"hex\":\"#76CC4B\",\"type\":\"brand\",\"brightness\":176},{\"hex\":\"#FFDA00\",\"type\":\"brand\",\"brightness\":210},{\"hex\":\"#000000\",\"type\":\"dark\",\"brightness\":0},{\"hex\":\"#ffffff\",\"type\":\"light\",\"brightness\":255}],\"fonts\":[{\"name\":\"Poppins\",\"type\":\"title\",\"origin\":\"google\",\"originId\":\"Poppins\",\"weights\":[]},{\"name\":\"Inter\",\"type\":\"body\",\"origin\":\"google\",\"originId\":\"Inter\",\"weights\":[]}],\"images\":[{\"type\":\"banner\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/idUuia5imo.png\",\"background\":\"transparent\",\"format\":\"png\",\"height\":500,\"width\":1500,\"size\":5539}]}]}"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"400": {
|
||||
"description": "Invalid domain or ID supplied"
|
||||
},
|
||||
"404": {
|
||||
"description": "The brand does not exist or the domain can't be resolved."
|
||||
}
|
||||
},
|
||||
"security": [
|
||||
{
|
||||
"bearerAuth": []
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"components": {
|
||||
"schemas": {
|
||||
"Brand": {
|
||||
"required": [
|
||||
"claimed",
|
||||
"colors",
|
||||
"description",
|
||||
"domain",
|
||||
"fonts",
|
||||
"images",
|
||||
"links",
|
||||
"logos",
|
||||
"name"
|
||||
],
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"images": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/components/schemas/ImageAsset"
|
||||
}
|
||||
},
|
||||
"fonts": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/components/schemas/FontAsset"
|
||||
}
|
||||
},
|
||||
"domain": {
|
||||
"type": "string"
|
||||
},
|
||||
"claimed": {
|
||||
"type": "boolean"
|
||||
},
|
||||
"name": {
|
||||
"type": "string"
|
||||
},
|
||||
"description": {
|
||||
"type": "string"
|
||||
},
|
||||
"links": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/components/schemas/Brand_links"
|
||||
}
|
||||
},
|
||||
"logos": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/components/schemas/ImageAsset"
|
||||
}
|
||||
},
|
||||
"colors": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/components/schemas/ColorAsset"
|
||||
}
|
||||
}
|
||||
},
|
||||
"description": "Object representing a brand"
|
||||
},
|
||||
"ColorAsset": {
|
||||
"required": [
|
||||
"brightness",
|
||||
"hex",
|
||||
"type"
|
||||
],
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"brightness": {
|
||||
"type": "integer"
|
||||
},
|
||||
"hex": {
|
||||
"type": "string"
|
||||
},
|
||||
"type": {
|
||||
"type": "string",
|
||||
"enum": [
|
||||
"accent",
|
||||
"brand",
|
||||
"customizable",
|
||||
"dark",
|
||||
"light",
|
||||
"vibrant"
|
||||
]
|
||||
}
|
||||
},
|
||||
"description": "Brand color asset"
|
||||
},
|
||||
"FontAsset": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"originId": {
|
||||
"type": "string"
|
||||
},
|
||||
"origin": {
|
||||
"type": "string",
|
||||
"enum": [
|
||||
"adobe",
|
||||
"custom",
|
||||
"google",
|
||||
"system"
|
||||
]
|
||||
},
|
||||
"name": {
|
||||
"type": "string"
|
||||
},
|
||||
"type": {
|
||||
"type": "string"
|
||||
},
|
||||
"weights": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "number"
|
||||
}
|
||||
},
|
||||
"items": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"description": "Brand font asset"
|
||||
},
|
||||
"ImageAsset": {
|
||||
"required": [
|
||||
"formats",
|
||||
"theme",
|
||||
"type"
|
||||
],
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"formats": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/components/schemas/ImageFormat"
|
||||
}
|
||||
},
|
||||
"theme": {
|
||||
"type": "string",
|
||||
"enum": [
|
||||
"light",
|
||||
"dark"
|
||||
]
|
||||
},
|
||||
"type": {
|
||||
"type": "string",
|
||||
"enum": [
|
||||
"logo",
|
||||
"icon",
|
||||
"symbol",
|
||||
"banner"
|
||||
]
|
||||
}
|
||||
},
|
||||
"description": "Brand image asset"
|
||||
},
|
||||
"ImageFormat": {
|
||||
"required": [
|
||||
"background",
|
||||
"format",
|
||||
"size",
|
||||
"src"
|
||||
],
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"size": {
|
||||
"type": "integer"
|
||||
},
|
||||
"src": {
|
||||
"type": "string"
|
||||
},
|
||||
"background": {
|
||||
"type": "string",
|
||||
"enum": [
|
||||
"transparent"
|
||||
]
|
||||
},
|
||||
"format": {
|
||||
"type": "string"
|
||||
},
|
||||
"width": {
|
||||
"type": "integer"
|
||||
},
|
||||
"height": {
|
||||
"type": "integer"
|
||||
}
|
||||
},
|
||||
"description": "Brand image asset image format"
|
||||
},
|
||||
"Brand_links": {
|
||||
"required": [
|
||||
"name",
|
||||
"url"
|
||||
],
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string"
|
||||
},
|
||||
"url": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"securitySchemes": {
|
||||
"bearerAuth": {
|
||||
"type": "http",
|
||||
"scheme": "bearer",
|
||||
"bearerFormat": "API Key"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
12
tests/integration_tests/examples/hello_world.js
Normal file
12
tests/integration_tests/examples/hello_world.js
Normal file
@@ -0,0 +1,12 @@
|
||||
class HelloWorld {
|
||||
sayHello() {
|
||||
console.log("Hello World!");
|
||||
}
|
||||
}
|
||||
|
||||
function main() {
|
||||
const hello = new HelloWorld();
|
||||
hello.sayHello();
|
||||
}
|
||||
|
||||
main();
|
||||
13
tests/integration_tests/examples/hello_world.py
Normal file
13
tests/integration_tests/examples/hello_world.py
Normal file
@@ -0,0 +1,13 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import sys
|
||||
|
||||
|
||||
def main():
|
||||
print("Hello World!")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@@ -6,3 +6,5 @@
|
||||
[2023/5/4, 16:13:23] ~ User 2: See you!
|
||||
7/19/22, 11:32 PM - User 1: Hello
|
||||
7/20/22, 11:32 am - User 2: Goodbye
|
||||
4/20/23, 9:42 am - User 3: <Media omitted>
|
||||
6/29/23, 12:16 am - User 4: This message was deleted
|
||||
|
||||
@@ -0,0 +1,46 @@
|
||||
import unittest
|
||||
|
||||
import pytest
|
||||
|
||||
from langchain.document_loaders.parsers.language.javascript import JavaScriptSegmenter
|
||||
|
||||
|
||||
@pytest.mark.requires("esprima")
|
||||
class TestJavaScriptSegmenter(unittest.TestCase):
|
||||
def setUp(self) -> None:
|
||||
self.example_code = """const os = require('os');
|
||||
|
||||
function hello(text) {
|
||||
console.log(text);
|
||||
}
|
||||
|
||||
class Simple {
|
||||
constructor() {
|
||||
this.a = 1;
|
||||
}
|
||||
}
|
||||
|
||||
hello("Hello!");"""
|
||||
|
||||
self.expected_simplified_code = """const os = require('os');
|
||||
|
||||
// Code for: function hello(text) {
|
||||
|
||||
// Code for: class Simple {
|
||||
|
||||
hello("Hello!");"""
|
||||
|
||||
self.expected_extracted_code = [
|
||||
"function hello(text) {\n console.log(text);\n}",
|
||||
"class Simple {\n constructor() {\n this.a = 1;\n }\n}",
|
||||
]
|
||||
|
||||
def test_extract_functions_classes(self) -> None:
|
||||
segmenter = JavaScriptSegmenter(self.example_code)
|
||||
extracted_code = segmenter.extract_functions_classes()
|
||||
self.assertEqual(extracted_code, self.expected_extracted_code)
|
||||
|
||||
def test_simplify_code(self) -> None:
|
||||
segmenter = JavaScriptSegmenter(self.example_code)
|
||||
simplified_code = segmenter.simplify_code()
|
||||
self.assertEqual(simplified_code, self.expected_simplified_code)
|
||||
@@ -0,0 +1,40 @@
|
||||
import unittest
|
||||
|
||||
from langchain.document_loaders.parsers.language.python import PythonSegmenter
|
||||
|
||||
|
||||
class TestPythonSegmenter(unittest.TestCase):
|
||||
def setUp(self) -> None:
|
||||
self.example_code = """import os
|
||||
|
||||
def hello(text):
|
||||
print(text)
|
||||
|
||||
class Simple:
|
||||
def __init__(self):
|
||||
self.a = 1
|
||||
|
||||
hello("Hello!")"""
|
||||
|
||||
self.expected_simplified_code = """import os
|
||||
|
||||
# Code for: def hello(text):
|
||||
|
||||
# Code for: class Simple:
|
||||
|
||||
hello("Hello!")"""
|
||||
|
||||
self.expected_extracted_code = [
|
||||
"def hello(text):\n" " print(text)",
|
||||
"class Simple:\n" " def __init__(self):\n" " self.a = 1",
|
||||
]
|
||||
|
||||
def test_extract_functions_classes(self) -> None:
|
||||
segmenter = PythonSegmenter(self.example_code)
|
||||
extracted_code = segmenter.extract_functions_classes()
|
||||
self.assertEqual(extracted_code, self.expected_extracted_code)
|
||||
|
||||
def test_simplify_code(self) -> None:
|
||||
segmenter = PythonSegmenter(self.example_code)
|
||||
simplified_code = segmenter.simplify_code()
|
||||
self.assertEqual(simplified_code, self.expected_simplified_code)
|
||||
@@ -5,6 +5,7 @@ def test_parsers_public_api_correct() -> None:
|
||||
"""Test public API of parsers for breaking changes."""
|
||||
assert set(__all__) == {
|
||||
"BS4HTMLParser",
|
||||
"LanguageParser",
|
||||
"OpenAIWhisperParser",
|
||||
"PyPDFParser",
|
||||
"PDFMinerParser",
|
||||
|
||||
@@ -23,7 +23,7 @@ def mock_connector_id(): # type: ignore
|
||||
class TestPsychicLoader:
|
||||
MOCK_API_KEY = "api_key"
|
||||
MOCK_CONNECTOR_ID = "notion"
|
||||
MOCK_CONNECTION_ID = "connection_id"
|
||||
MOCK_ACCOUNT_ID = "account_id"
|
||||
|
||||
def test_psychic_loader_initialization(
|
||||
self, mock_psychic: MagicMock, mock_connector_id: MagicMock
|
||||
@@ -31,17 +31,21 @@ class TestPsychicLoader:
|
||||
PsychicLoader(
|
||||
api_key=self.MOCK_API_KEY,
|
||||
connector_id=self.MOCK_CONNECTOR_ID,
|
||||
connection_id=self.MOCK_CONNECTION_ID,
|
||||
account_id=self.MOCK_ACCOUNT_ID,
|
||||
)
|
||||
|
||||
mock_psychic.assert_called_once_with(secret_key=self.MOCK_API_KEY)
|
||||
mock_connector_id.assert_called_once_with(self.MOCK_CONNECTOR_ID)
|
||||
|
||||
def test_psychic_loader_load_data(self, mock_psychic: MagicMock) -> None:
|
||||
mock_psychic.get_documents.return_value = [
|
||||
mock_get_documents_response = MagicMock()
|
||||
mock_get_documents_response.documents = [
|
||||
self._get_mock_document("123"),
|
||||
self._get_mock_document("456"),
|
||||
]
|
||||
mock_get_documents_response.next_page_cursor = None
|
||||
|
||||
mock_psychic.get_documents.return_value = mock_get_documents_response
|
||||
|
||||
psychic_loader = self._get_mock_psychic_loader(mock_psychic)
|
||||
|
||||
@@ -57,7 +61,7 @@ class TestPsychicLoader:
|
||||
psychic_loader = PsychicLoader(
|
||||
api_key=self.MOCK_API_KEY,
|
||||
connector_id=self.MOCK_CONNECTOR_ID,
|
||||
connection_id=self.MOCK_CONNECTION_ID,
|
||||
account_id=self.MOCK_ACCOUNT_ID,
|
||||
)
|
||||
psychic_loader.psychic = mock_psychic
|
||||
return psychic_loader
|
||||
|
||||
@@ -1,5 +1,8 @@
|
||||
"""Test building the Zapier tool, not running it."""
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
import requests
|
||||
|
||||
from langchain.tools.zapier.prompt import BASE_ZAPIER_TOOL_PROMPT
|
||||
from langchain.tools.zapier.tool import ZapierNLARunAction
|
||||
@@ -50,3 +53,234 @@ def test_custom_base_prompt_fail() -> None:
|
||||
base_prompt=base_prompt,
|
||||
api_wrapper=ZapierNLAWrapper(zapier_nla_api_key="test"),
|
||||
)
|
||||
|
||||
|
||||
def test_format_headers_api_key() -> None:
|
||||
"""Test that the action headers is being created correctly."""
|
||||
tool = ZapierNLARunAction(
|
||||
action_id="test",
|
||||
zapier_description="test",
|
||||
params_schema={"test": "test"},
|
||||
api_wrapper=ZapierNLAWrapper(zapier_nla_api_key="test"),
|
||||
)
|
||||
headers = tool.api_wrapper._format_headers()
|
||||
assert headers["Content-Type"] == "application/json"
|
||||
assert headers["Accept"] == "application/json"
|
||||
assert headers["X-API-Key"] == "test"
|
||||
|
||||
|
||||
def test_format_headers_access_token() -> None:
|
||||
"""Test that the action headers is being created correctly."""
|
||||
tool = ZapierNLARunAction(
|
||||
action_id="test",
|
||||
zapier_description="test",
|
||||
params_schema={"test": "test"},
|
||||
api_wrapper=ZapierNLAWrapper(zapier_nla_oauth_access_token="test"),
|
||||
)
|
||||
headers = tool.api_wrapper._format_headers()
|
||||
assert headers["Content-Type"] == "application/json"
|
||||
assert headers["Accept"] == "application/json"
|
||||
assert headers["Authorization"] == "Bearer test"
|
||||
|
||||
|
||||
def test_create_action_payload() -> None:
|
||||
"""Test that the action payload is being created correctly."""
|
||||
tool = ZapierNLARunAction(
|
||||
action_id="test",
|
||||
zapier_description="test",
|
||||
params_schema={"test": "test"},
|
||||
api_wrapper=ZapierNLAWrapper(zapier_nla_api_key="test"),
|
||||
)
|
||||
|
||||
payload = tool.api_wrapper._create_action_payload("some instructions")
|
||||
assert payload["instructions"] == "some instructions"
|
||||
assert payload.get("preview_only") is None
|
||||
|
||||
|
||||
def test_create_action_payload_preview() -> None:
|
||||
"""Test that the action payload with preview is being created correctly."""
|
||||
tool = ZapierNLARunAction(
|
||||
action_id="test",
|
||||
zapier_description="test",
|
||||
params_schema={"test": "test"},
|
||||
api_wrapper=ZapierNLAWrapper(zapier_nla_api_key="test"),
|
||||
)
|
||||
|
||||
payload = tool.api_wrapper._create_action_payload(
|
||||
"some instructions",
|
||||
preview_only=True,
|
||||
)
|
||||
assert payload["instructions"] == "some instructions"
|
||||
assert payload["preview_only"] is True
|
||||
|
||||
|
||||
def test_create_action_payload_with_params() -> None:
|
||||
"""Test that the action payload with params is being created correctly."""
|
||||
tool = ZapierNLARunAction(
|
||||
action_id="test",
|
||||
zapier_description="test",
|
||||
params_schema={"test": "test"},
|
||||
api_wrapper=ZapierNLAWrapper(zapier_nla_api_key="test"),
|
||||
)
|
||||
|
||||
payload = tool.api_wrapper._create_action_payload(
|
||||
"some instructions",
|
||||
{"test": "test"},
|
||||
preview_only=True,
|
||||
)
|
||||
assert payload["instructions"] == "some instructions"
|
||||
assert payload["preview_only"] is True
|
||||
assert payload["test"] == "test"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_apreview(mocker) -> None: # type: ignore[no-untyped-def]
|
||||
"""Test that the action payload with params is being created correctly."""
|
||||
tool = ZapierNLARunAction(
|
||||
action_id="test",
|
||||
zapier_description="test",
|
||||
params_schema={"test": "test"},
|
||||
api_wrapper=ZapierNLAWrapper(
|
||||
zapier_nla_api_key="test",
|
||||
zapier_nla_api_base="http://localhost:8080/v1/",
|
||||
),
|
||||
)
|
||||
mockObj = mocker.patch.object(ZapierNLAWrapper, "_arequest")
|
||||
await tool.api_wrapper.apreview(
|
||||
"random_action_id",
|
||||
"some instructions",
|
||||
{"test": "test"},
|
||||
)
|
||||
mockObj.assert_called_once_with(
|
||||
"POST",
|
||||
"http://localhost:8080/v1/exposed/random_action_id/execute/",
|
||||
json={
|
||||
"instructions": "some instructions",
|
||||
"preview_only": True,
|
||||
"test": "test",
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_arun(mocker) -> None: # type: ignore[no-untyped-def]
|
||||
"""Test that the action payload with params is being created correctly."""
|
||||
tool = ZapierNLARunAction(
|
||||
action_id="test",
|
||||
zapier_description="test",
|
||||
params_schema={"test": "test"},
|
||||
api_wrapper=ZapierNLAWrapper(
|
||||
zapier_nla_api_key="test",
|
||||
zapier_nla_api_base="http://localhost:8080/v1/",
|
||||
),
|
||||
)
|
||||
mockObj = mocker.patch.object(ZapierNLAWrapper, "_arequest")
|
||||
await tool.api_wrapper.arun(
|
||||
"random_action_id",
|
||||
"some instructions",
|
||||
{"test": "test"},
|
||||
)
|
||||
mockObj.assert_called_once_with(
|
||||
"POST",
|
||||
"http://localhost:8080/v1/exposed/random_action_id/execute/",
|
||||
json={"instructions": "some instructions", "test": "test"},
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_alist(mocker) -> None: # type: ignore[no-untyped-def]
|
||||
"""Test that the action payload with params is being created correctly."""
|
||||
tool = ZapierNLARunAction(
|
||||
action_id="test",
|
||||
zapier_description="test",
|
||||
params_schema={"test": "test"},
|
||||
api_wrapper=ZapierNLAWrapper(
|
||||
zapier_nla_api_key="test",
|
||||
zapier_nla_api_base="http://localhost:8080/v1/",
|
||||
),
|
||||
)
|
||||
mockObj = mocker.patch.object(ZapierNLAWrapper, "_arequest")
|
||||
await tool.api_wrapper.alist()
|
||||
mockObj.assert_called_once_with(
|
||||
"GET",
|
||||
"http://localhost:8080/v1/exposed/",
|
||||
)
|
||||
|
||||
|
||||
def test_wrapper_fails_no_api_key_or_access_token_initialization() -> None:
|
||||
"""Test Wrapper requires either an API Key or OAuth Access Token."""
|
||||
with pytest.raises(ValueError):
|
||||
ZapierNLAWrapper()
|
||||
|
||||
|
||||
def test_wrapper_api_key_initialization() -> None:
|
||||
"""Test Wrapper initializes with an API Key."""
|
||||
ZapierNLAWrapper(zapier_nla_api_key="test")
|
||||
|
||||
|
||||
def test_wrapper_access_token_initialization() -> None:
|
||||
"""Test Wrapper initializes with an API Key."""
|
||||
ZapierNLAWrapper(zapier_nla_oauth_access_token="test")
|
||||
|
||||
|
||||
def test_list_raises_401_invalid_api_key() -> None:
|
||||
"""Test that a valid error is raised when the API Key is invalid."""
|
||||
mock_response = MagicMock()
|
||||
mock_response.status_code = 401
|
||||
mock_response.raise_for_status.side_effect = requests.HTTPError(
|
||||
"401 Client Error: Unauthorized for url: https://nla.zapier.com/api/v1/exposed/"
|
||||
)
|
||||
mock_session = MagicMock()
|
||||
mock_session.get.return_value = mock_response
|
||||
|
||||
with patch("requests.Session", return_value=mock_session):
|
||||
wrapper = ZapierNLAWrapper(zapier_nla_api_key="test")
|
||||
|
||||
with pytest.raises(requests.HTTPError) as err:
|
||||
wrapper.list()
|
||||
|
||||
assert str(err.value).startswith(
|
||||
"An unauthorized response occurred. Check that your api key is correct. "
|
||||
"Err:"
|
||||
)
|
||||
|
||||
|
||||
def test_list_raises_401_invalid_access_token() -> None:
|
||||
"""Test that a valid error is raised when the API Key is invalid."""
|
||||
mock_response = MagicMock()
|
||||
mock_response.status_code = 401
|
||||
mock_response.raise_for_status.side_effect = requests.HTTPError(
|
||||
"401 Client Error: Unauthorized for url: https://nla.zapier.com/api/v1/exposed/"
|
||||
)
|
||||
mock_session = MagicMock()
|
||||
mock_session.get.return_value = mock_response
|
||||
|
||||
with patch("requests.Session", return_value=mock_session):
|
||||
wrapper = ZapierNLAWrapper(zapier_nla_oauth_access_token="test")
|
||||
|
||||
with pytest.raises(requests.HTTPError) as err:
|
||||
wrapper.list()
|
||||
|
||||
assert str(err.value).startswith(
|
||||
"An unauthorized response occurred. Check that your access token is "
|
||||
"correct and doesn't need to be refreshed. Err:"
|
||||
)
|
||||
|
||||
|
||||
def test_list_raises_other_error() -> None:
|
||||
"""Test that a valid error is raised when an unknown HTTP Error occurs."""
|
||||
mock_response = MagicMock()
|
||||
mock_response.status_code = 404
|
||||
mock_response.raise_for_status.side_effect = requests.HTTPError(
|
||||
"404 Client Error: Not found for url"
|
||||
)
|
||||
mock_session = MagicMock()
|
||||
mock_session.get.return_value = mock_response
|
||||
|
||||
with patch("requests.Session", return_value=mock_session):
|
||||
wrapper = ZapierNLAWrapper(zapier_nla_oauth_access_token="test")
|
||||
|
||||
with pytest.raises(requests.HTTPError) as err:
|
||||
wrapper.list()
|
||||
|
||||
assert str(err.value) == "404 Client Error: Not found for url"
|
||||
|
||||
Reference in New Issue
Block a user