Compare commits

...

25 Commits

Author SHA1 Message Date
vowelparrot
cddfe05073 Send evaluator logs to new session 2023-06-28 15:45:45 -07:00
Harrison Chase
e5611565b7 bump version to 218 (#6857) 2023-06-27 23:36:37 -07:00
Yaohui Wang
9d1bd18596 feat (documents): add LarkSuite document loader (#6420)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

### Summary

This PR adds a LarkSuite (FeiShu) document loader. 
> [LarkSuite](https://www.larksuite.com/) is an enterprise collaboration
platform developed by ByteDance.

### Tests

- an integration test case is added
- an example notebook showing usage is added. [Notebook
preview](https://github.com/yaohui-wyh/langchain/blob/master/docs/extras/modules/data_connection/document_loaders/integrations/larksuite.ipynb)

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

### Who can review?

- PTAL @eyurtsev @hwchase17

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Yaohui Wang <wangyaohui.01@bytedance.com>
2023-06-27 23:08:05 -07:00
Jingsong Gao
a435a436c1 feat(document_loaders): add tencent cos directory and file loader (#6401)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

- add tencent cos directory and file support for document-loader

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

@eyurtsev
2023-06-27 23:07:20 -07:00
Ninely
d6cd0deaef feat: Add streaming only final aiter of agent (#6274)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

#### Add streaming only final async iterator of agent
This callback returns an async iterator and only streams the final
output of an agent.

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested: @agola11

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-27 23:06:25 -07:00
Shashank Deshpande
1db266b20d Update link in apis.mdx (#6812)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-06-27 23:00:26 -07:00
Lance Martin
3f9900a864 Create MultiQueryRetriever (#6833)
Distance-based vector database retrieval embeds (represents) queries in
high-dimensional space and finds similar embedded documents based on
"distance". But, retrieval may produce difference results with subtle
changes in query wording or if the embeddings do not capture the
semantics of the data well. Prompt engineering / tuning is sometimes
done to manually address these problems, but can be tedious.

The `MultiQueryRetriever` automates the process of prompt tuning by
using an LLM to generate multiple queries from different perspectives
for a given user input query. For each query, it retrieves a set of
relevant documents and takes the unique union across all queries to get
a larger set of potentially relevant documents. By generating multiple
perspectives on the same question, the `MultiQueryRetriever` might be
able to overcome some of the limitations of the distance-based retrieval
and get a richer set of results.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-27 22:59:40 -07:00
Tim Asp
3ca1a387c2 Web Loader: Add proxy support (#6792)
Proxies are helpful, especially when you start querying against more
anti-bot websites.

[Proxy
services](https://developers.oxylabs.io/advanced-proxy-solutions/web-unblocker/making-requests)
(of which there are many) and `requests` make it easy to rotate IPs to
prevent banning by just passing along a simple dict to `requests`.

CC @rlancemartin, @eyurtsev
2023-06-27 22:27:49 -07:00
Ayan Bandyopadhyay
f92ccf70fd Update to the latest Psychic python library version (#6804)
Update the Psychic document loader to use the latest `psychicapi` python
library version: `0.8.0`
2023-06-27 22:26:38 -07:00
Hun-soo Jung
f3d178f600 Specify utilities package in SerpAPIWrapper docstring (#6821)
- Description: Specify utilities package in SerpAPIWrapper docstring
  - Issue: Not an issue
  - Dependencies: (n/a)
  - Tag maintainer: @dev2049 
  - Twitter handle: (n/a)
2023-06-27 22:26:20 -07:00
Matt Robinson
dd2a151543 Docs/unstructured api key (#6781)
### Summary

The Unstructured API will soon begin requiring API keys. This PR updates
the Unstructured integrations docs with instructions on how to generate
Unstructured API keys.

### Reviewers

@rlancemartin
@eyurtsev
@hwchase17
2023-06-27 16:54:15 -07:00
Matthew Plachter
d6664af0ee add async to zapier nla tools (#6791)
Replace this comment with:
  - Description: Add Async functionality to Zapier NLA Tools
  - Issue:  n/a 
  - Dependencies: n/a
  - Tag maintainer: 

Maintainer responsibilities:
  - Agents / Tools / Toolkits: @vowelparrot
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
2023-06-27 16:53:35 -07:00
Neil Neuwirth
efe0d39c6a Adjusted OpenAI cost calculation (#6798)
Added parentheses to ensure the division operation is performed before
multiplication. This now correctly calculates the cost by dividing the
number of tokens by 1000 first (to get the cost per token), and then
multiplies it with the model's cost per 1k tokens @agola11
2023-06-27 16:53:06 -07:00
Ian
b4c196f785 fix pinecone delete bug (#6816)
The implementation of delete in pinecone vector omits the namespace,
which will cause delete failed
2023-06-27 16:50:17 -07:00
Janos Tolgyesi
f1070de038 WebBaseLoader: optionally raise exception in the case of http error (#6823)
- **Description**: this PR adds the possibility to raise an exception in
the case the http request did not return a 2xx status code. This is
particularly useful in the situation when the url points to a
non-existent web page, the server returns a http status of 404 NOT
FOUND, but WebBaseLoader anyway parses and returns the http body of the
error message.
  - **Dependencies**: none,
  - **Tag maintainer**: @rlancemartin, @eyurtsev,
  - **Twitter handle**: jtolgyesi
2023-06-27 16:43:59 -07:00
rafael
ef72a7cf26 rail_parser: Allow creation from pydantic (#6832)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

Adds a way to create the guardrails output parser from a pydantic model.
2023-06-27 16:40:52 -07:00
Augustine Theodore
a980095efc Enhancement : Ignore deleted messages and media in WhatsAppChatLoader (#6839)
- Description: Ignore deleted messages and media
  - Issue: #6838 
  - Dependencies: No new dependencies
  - Tag maintainer: @rlancemartin, @eyurtsev
2023-06-27 16:36:55 -07:00
Robert Lewis
74848aafea Zapier - Add better error messaging for 401 responses (#6840)
Description: When a 401 response is given back by Zapier, hint to the
end user why that may have occurred

- If an API Key was initialized with the wrapper, ask them to check
their API Key value
- if an access token was initialized with the wrapper, ask them to check
their access token or verify that it doesn't need to be refreshed.

Tag maintainer: @dev2049
2023-06-27 16:35:42 -07:00
Matt Robinson
b24472eae3 feat: Add UnstructuredOrgModeLoader (#6842)
### Summary

Adds `UnstructuredOrgModeLoader` for processing
[Org-mode](https://en.wikipedia.org/wiki/Org-mode) documents.

### Testing

```python
from langchain.document_loaders import UnstructuredOrgModeLoader

loader = UnstructuredOrgModeLoader(
    file_path="example_data/README.org", mode="elements"
)
docs = loader.load()
print(docs[0])
```

### Reviewers

- @rlancemartin
- @eyurtsev
- @hwchase17
2023-06-27 16:34:17 -07:00
Piyush Jain
e53995836a Added missing attribute value object (#6849)
## Description
Adds a missing type class for
[AdditionalResultAttributeValue](https://docs.aws.amazon.com/kendra/latest/APIReference/API_AdditionalResultAttributeValue.html).
Fixes validation failure for the query API that have
`AdditionalAttributes` in the response.

cc @dev2049 
cc @zhichenggeng
2023-06-27 16:30:11 -07:00
Cristóbal Carnero Liñán
e494b0a09f feat (documents): add a source code loader based on AST manipulation (#6486)
#### Summary

A new approach to loading source code is implemented:

Each top-level function and class in the code is loaded into separate
documents. Then, an additional document is created with the top-level
code, but without the already loaded functions and classes.

This could improve the accuracy of QA chains over source code.

For instance, having this script:

```
class MyClass:
    def __init__(self, name):
        self.name = name

    def greet(self):
        print(f"Hello, {self.name}!")

def main():
    name = input("Enter your name: ")
    obj = MyClass(name)
    obj.greet()

if __name__ == '__main__':
    main()
```

The loader will create three documents with this content:

First document:
```
class MyClass:
    def __init__(self, name):
        self.name = name

    def greet(self):
        print(f"Hello, {self.name}!")
```

Second document:
```
def main():
    name = input("Enter your name: ")
    obj = MyClass(name)
    obj.greet()
```

Third document:
```
# Code for: class MyClass:

# Code for: def main():

if __name__ == '__main__':
    main()
```

A threshold parameter is added to control whether small scripts are
split in this way or not.

At this moment, only Python and JavaScript are supported. The
appropriate parser is determined by examining the file extension.

#### Tests

This PR adds:

- Unit tests
- Integration tests

#### Dependencies

Only one dependency was added as optional (needed for the JavaScript
parser).

#### Documentation

A notebook is added showing how the loader can be used.

#### Who can review?

@eyurtsev @hwchase17

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-27 15:58:47 -07:00
Robert Lewis
da462d9dd4 Zapier update oauth support (#6780)
Description: Update documentation to

1) point to updated documentation links at Zapier.com (we've revamped
our help docs and paths), and
2) To provide clarity how to use the wrapper with an access token for
OAuth support

Demo:

Initializing the Zapier Wrapper with an OAuth Access Token

`ZapierNLAWrapper(zapier_nla_oauth_access_token="<redacted>")`

Using LangChain to resolve the current weather in Vancouver BC
leveraging Zapier NLA to lookup weather by coords.

```
> Entering new  chain...
 I need to use a tool to get the current weather.
Action: The Weather: Get Current Weather
Action Input: Get the current weather for Vancouver BC
Observation: {"coord__lon": -123.1207, "coord__lat": 49.2827, "weather": [{"id": 802, "main": "Clouds", "description": "scattered clouds", "icon": "03d", "icon_url": "http://openweathermap.org/img/wn/03d@2x.png"}], "weather[]icon_url": ["http://openweathermap.org/img/wn/03d@2x.png"], "weather[]icon": ["03d"], "weather[]id": [802], "weather[]description": ["scattered clouds"], "weather[]main": ["Clouds"], "base": "stations", "main__temp": 71.69, "main__feels_like": 71.56, "main__temp_min": 67.64, "main__temp_max": 76.39, "main__pressure": 1015, "main__humidity": 64, "visibility": 10000, "wind__speed": 3, "wind__deg": 155, "wind__gust": 11.01, "clouds__all": 41, "dt": 1687806607, "sys__type": 2, "sys__id": 2011597, "sys__country": "CA", "sys__sunrise": 1687781297, "sys__sunset": 1687839730, "timezone": -25200, "id": 6173331, "name": "Vancouver", "cod": 200, "summary": "scattered clouds", "_zap_search_was_found_status": true}
Thought: I now know the current weather in Vancouver BC.
Final Answer: The current weather in Vancouver BC is scattered clouds with a temperature of 71.69 and wind speed of 3
```
2023-06-27 11:46:32 -07:00
Joshua Carroll
24e4ae95ba Initial Streamlit callback integration doc (md) (#6788)
**Description:** Add a documentation page for the Streamlit Callback
Handler integration (#6315)

Notes:
- Implemented as a markdown file instead of a notebook since example
code runs in a Streamlit app (happy to discuss / consider alternatives
now or later)
- Contains an embedded Streamlit app ->
https://mrkl-minimal.streamlit.app/ Currently this app is hosted out of
a Streamlit repo but we're working to migrate the code to a LangChain
owned repo


![streamlit_docs](https://github.com/hwchase17/langchain/assets/116604821/0b7a6239-361f-470c-8539-f22c40098d1a)

cc @dev2049 @tconkling
2023-06-27 11:43:49 -07:00
Harrison Chase
8392ca602c bump version to 217 (#6831) 2023-06-27 09:39:56 -07:00
Ismail Pelaseyed
fcb3a64799 Add support for passing headers and search params to openai openapi chain (#6782)
- Description: add support for passing headers and search params to
OpenAI OpenAPI chains.
  - Issue: n/a
  - Dependencies: n/a
  - Tag maintainer: @hwchase17
  - Twitter handle: @pelaseyed

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-27 09:09:03 -07:00
63 changed files with 3601 additions and 115 deletions

View File

@@ -23,11 +23,15 @@ its dependencies running locally.
If you want to get up and running with less set up, you can
simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or
`UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API.
Note that currently (as of 1 May 2023) the Unstructured API is open, but it will soon require
an API. The [Unstructured documentation page](https://unstructured-io.github.io/) will have
instructions on how to generate an API key once they're available. Check out the instructions
[here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image)
if you'd like to self-host the Unstructured API or run it locally.
The Unstructured API requires API keys to make requests.
You can generate a free API key [here](https://www.unstructured.io/api-key) and start using it today!
Checkout the README [here](https://github.com/Unstructured-IO/unstructured-api) here to get started making API calls.
We'd love to hear your feedback, let us know how it goes in our [community slack](https://join.slack.com/t/unstructuredw-kbe4326/shared_invite/zt-1x7cgo0pg-PTptXWylzPQF9xZolzCnwQ).
And stay tuned for improvements to both quality and performance!
Check out the instructions
[here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if you'd like to self-host the Unstructured API or run it locally.
## Wrappers

View File

@@ -7,7 +7,7 @@
"source": [
"# Zapier Natural Language Actions API\n",
"\\\n",
"Full docs here: https://nla.zapier.com/api/v1/docs\n",
"Full docs here: https://nla.zapier.com/start/\n",
"\n",
"**Zapier Natural Language Actions** gives you access to the 5k+ apps, 20k+ actions on Zapier's platform through a natural language API interface.\n",
"\n",
@@ -21,7 +21,7 @@
"\n",
"2. User-facing (Oauth): for production scenarios where you are deploying an end-user facing application and LangChain needs access to end-user's exposed actions and connected accounts on Zapier.com\n",
"\n",
"This quick start will focus on the server-side use case for brevity. Review [full docs](https://nla.zapier.com/api/v1/docs) or reach out to nla@zapier.com for user-facing oauth developer support.\n",
"This quick start will focus on the server-side use case for brevity. Review [full docs](https://nla.zapier.com/start/) for user-facing oauth developer support.\n",
"\n",
"This example goes over how to use the Zapier integration with a `SimpleSequentialChain`, then an `Agent`.\n",
"In code, below:"
@@ -39,7 +39,7 @@
"# get from https://platform.openai.com/\n",
"os.environ[\"OPENAI_API_KEY\"] = os.environ.get(\"OPENAI_API_KEY\", \"\")\n",
"\n",
"# get from https://nla.zapier.com/demo/provider/debug (under User Information, after logging in):\n",
"# get from https://nla.zapier.com/docs/authentication/ after logging in):\n",
"os.environ[\"ZAPIER_NLA_API_KEY\"] = os.environ.get(\"ZAPIER_NLA_API_KEY\", \"\")"
]
},

View File

@@ -0,0 +1,73 @@
# Streamlit
> **[Streamlit](https://streamlit.io/) is a faster way to build and share data apps.**
> Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No frontend experience required.
> See more examples at [streamlit.io/generative-ai](https://streamlit.io/generative-ai).
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/langchain-ai/streamlit-agent?quickstart=1)
In this guide we will demonstrate how to use `StreamlitCallbackHandler` to display the thoughts and actions of an agent in an
interactive Streamlit app. Try it out with the running app below using the [MRKL agent](/docs/modules/agents/how_to/mrkl/):
<iframe loading="lazy" src="https://mrkl-minimal.streamlit.app/?embed=true&embed_options=light_theme"
style={{ width: 100 + '%', border: 'none', marginBottom: 1 + 'rem', height: 600 }}
allow="camera;clipboard-read;clipboard-write;"
></iframe>
## Installation and Setup
```bash
pip install langchain streamlit
```
You can run `streamlit hello` to load a sample app and validate your install succeeded. See full instructions in Streamlit's
[Getting started documentation](https://docs.streamlit.io/library/get-started).
## Display thoughts and actions
To create a `StreamlitCallbackHandler`, you just need to provide a parent container to render the output.
```python
from langchain.callbacks import StreamlitCallbackHandler
import streamlit as st
st_callback = StreamlitCallbackHandler(st.container())
```
Additional keyword arguments to customize the display behavior are described in the
[API reference](https://api.python.langchain.com/en/latest/modules/callbacks.html#langchain.callbacks.StreamlitCallbackHandler).
### Scenario 1: Using an Agent with Tools
The primary supported use case today is visualizing the actions of an Agent with Tools (or Agent Executor). You can create an
agent in your Streamlit app and simply pass the `StreamlitCallbackHandler` to `agent.run()` in order to visualize the
thoughts and actions live in your app.
```python
from langchain.llms import OpenAI
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.callbacks import StreamlitCallbackHandler
import streamlit as st
llm = OpenAI(temperature=0, streaming=True)
tools = load_tools(["ddg-search"])
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if prompt := st.chat_input():
st.chat_message("user").write(prompt)
with st.chat_message("assistant"):
st_callback = StreamlitCallbackHandler(st.container())
response = agent.run(prompt, callbacks=[st_callback])
st.write(response)
```
**Note:** You will need to set `OPENAI_API_KEY` for the above app code to run successfully.
The easiest way to do this is via [Streamlit secrets.toml](https://docs.streamlit.io/library/advanced-features/secrets-management),
or any other local ENV management tool.
### Additional scenarios
Currently `StreamlitCallbackHandler` is geared towards use with a LangChain Agent Executor. Support for additional agent types,
use directly with Chains, etc will be added in the future.

View File

@@ -0,0 +1,27 @@
* Example Docs
The sample docs directory contains the following files:
- ~example-10k.html~ - A 10-K SEC filing in HTML format
- ~layout-parser-paper.pdf~ - A PDF copy of the layout parser paper
- ~factbook.xml~ / ~factbook.xsl~ - Example XML/XLS files that you
can use to test stylesheets
These documents can be used to test out the parsers in the library. In
addition, here are instructions for pulling in some sample docs that are
too big to store in the repo.
** XBRL 10-K
You can get an example 10-K in inline XBRL format using the following
~curl~. Note, you need to have the user agent set in the header or the
SEC site will reject your request.
#+BEGIN_SRC bash
curl -O \
-A '${organization} ${email}'
https://www.sec.gov/Archives/edgar/data/311094/000117184321001344/0001171843-21-001344.txt
#+END_SRC
You can parse this document using the HTML parser.

View File

@@ -0,0 +1,17 @@
class MyClass {
constructor(name) {
this.name = name;
}
greet() {
console.log(`Hello, ${this.name}!`);
}
}
function main() {
const name = prompt("Enter your name:");
const obj = new MyClass(name);
obj.greet();
}
main();

View File

@@ -0,0 +1,16 @@
class MyClass:
def __init__(self, name):
self.name = name
def greet(self):
print(f"Hello, {self.name}!")
def main():
name = input("Enter your name: ")
obj = MyClass(name)
obj.greet()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,103 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "33205b12",
"metadata": {},
"source": [
"# LarkSuite (FeiShu)\n",
"\n",
">[LarkSuite](https://www.larksuite.com/) is an enterprise collaboration platform developed by ByteDance.\n",
"\n",
"This notebook covers how to load data from the `LarkSuite` REST API into a format that can be ingested into LangChain, along with example usage for text summarization.\n",
"\n",
"The LarkSuite API requires an access token (tenant_access_token or user_access_token), checkout [LarkSuite open platform document](https://open.larksuite.com/document) for API details."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "90b69c94",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-19T10:05:03.645161Z",
"start_time": "2023-06-19T10:04:49.541968Z"
},
"tags": []
},
"outputs": [],
"source": [
"from getpass import getpass\n",
"from langchain.document_loaders.larksuite import LarkSuiteDocLoader\n",
"\n",
"DOMAIN = input(\"larksuite domain\")\n",
"ACCESS_TOKEN = getpass(\"larksuite tenant_access_token or user_access_token\")\n",
"DOCUMENT_ID = input(\"larksuite document id\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "13deb0f5",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-19T10:05:36.016495Z",
"start_time": "2023-06-19T10:05:35.360884Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='Test Doc\\nThis is a Test Doc\\n\\n1\\n2\\n3\\n\\n', metadata={'document_id': 'V76kdbd2HoBbYJxdiNNccajunPf', 'revision_id': 11, 'title': 'Test Doc'})]\n"
]
}
],
"source": [
"from pprint import pprint\n",
"\n",
"larksuite_loader = LarkSuiteDocLoader(DOMAIN, ACCESS_TOKEN, DOCUMENT_ID)\n",
"docs = larksuite_loader.load()\n",
"\n",
"pprint(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ccc1e2f",
"metadata": {},
"outputs": [],
"source": [
"# see https://python.langchain.com/docs/use_cases/summarization for more details\n",
"from langchain.chains.summarize import load_summarize_chain\n",
"\n",
"chain = load_summarize_chain(llm, chain_type=\"map_reduce\")\n",
"chain.run(docs)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,88 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Org-mode\n",
"\n",
">A [Org Mode document](https://en.wikipedia.org/wiki/Org-mode) is a document editing, formatting, and organizing mode, designed for notes, planning, and authoring within the free software text editor Emacs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `UnstructuredOrgModeLoader`\n",
"\n",
"You can load data from Org-mode files with `UnstructuredOrgModeLoader` using the following workflow."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredOrgModeLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredOrgModeLoader(\n",
" file_path=\"example_data/README.org\", mode=\"elements\"\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='Example Docs' metadata={'source': 'example_data/README.org', 'filename': 'README.org', 'file_directory': 'example_data', 'filetype': 'text/org', 'page_number': 1, 'category': 'Title'}\n"
]
}
],
"source": [
"print(docs[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,419 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "213a38a2",
"metadata": {},
"source": [
"# Source Code\n",
"\n",
"This notebook covers how to load source code files using a special approach with language parsing: each top-level function and class in the code is loaded into separate documents. Any remaining code top-level code outside the already loaded functions and classes will be loaded into a seperate document.\n",
"\n",
"This approach can potentially improve the accuracy of QA models over source code. Currently, the supported languages for code parsing are Python and JavaScript. The language used for parsing can be configured, along with the minimum number of lines required to activate the splitting based on syntax."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7fa47b2e",
"metadata": {},
"outputs": [],
"source": [
"! pip install esprima"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "beb55c2f",
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"from pprint import pprint\n",
"from langchain.text_splitter import Language\n",
"from langchain.document_loaders.generic import GenericLoader\n",
"from langchain.document_loaders.parsers import LanguageParser"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "64056e07",
"metadata": {},
"outputs": [],
"source": [
"loader = GenericLoader.from_filesystem(\n",
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".py\", \".js\"],\n",
" parser=LanguageParser()\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8af79bd7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "85edf3fc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'content_type': 'functions_classes',\n",
" 'language': <Language.PYTHON: 'python'>,\n",
" 'source': 'example_data/source_code/example.py'}\n",
"{'content_type': 'functions_classes',\n",
" 'language': <Language.PYTHON: 'python'>,\n",
" 'source': 'example_data/source_code/example.py'}\n",
"{'content_type': 'simplified_code',\n",
" 'language': <Language.PYTHON: 'python'>,\n",
" 'source': 'example_data/source_code/example.py'}\n",
"{'content_type': 'functions_classes',\n",
" 'language': <Language.JS: 'js'>,\n",
" 'source': 'example_data/source_code/example.js'}\n",
"{'content_type': 'functions_classes',\n",
" 'language': <Language.JS: 'js'>,\n",
" 'source': 'example_data/source_code/example.js'}\n",
"{'content_type': 'simplified_code',\n",
" 'language': <Language.JS: 'js'>,\n",
" 'source': 'example_data/source_code/example.js'}\n"
]
}
],
"source": [
"for document in docs:\n",
" pprint(document.metadata)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f44e3e37",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"class MyClass:\n",
" def __init__(self, name):\n",
" self.name = name\n",
"\n",
" def greet(self):\n",
" print(f\"Hello, {self.name}!\")\n",
"\n",
"--8<--\n",
"\n",
"def main():\n",
" name = input(\"Enter your name: \")\n",
" obj = MyClass(name)\n",
" obj.greet()\n",
"\n",
"--8<--\n",
"\n",
"# Code for: class MyClass:\n",
"\n",
"\n",
"# Code for: def main():\n",
"\n",
"\n",
"if __name__ == \"__main__\":\n",
" main()\n",
"\n",
"--8<--\n",
"\n",
"class MyClass {\n",
" constructor(name) {\n",
" this.name = name;\n",
" }\n",
"\n",
" greet() {\n",
" console.log(`Hello, ${this.name}!`);\n",
" }\n",
"}\n",
"\n",
"--8<--\n",
"\n",
"function main() {\n",
" const name = prompt(\"Enter your name:\");\n",
" const obj = new MyClass(name);\n",
" obj.greet();\n",
"}\n",
"\n",
"--8<--\n",
"\n",
"// Code for: class MyClass {\n",
"\n",
"// Code for: function main() {\n",
"\n",
"main();\n"
]
}
],
"source": [
"print(\"\\n\\n--8<--\\n\\n\".join([document.page_content for document in docs]))"
]
},
{
"cell_type": "markdown",
"id": "69aad0ed",
"metadata": {},
"source": [
"The parser can be disabled for small files. \n",
"\n",
"The parameter `parser_threshold` indicates the minimum number of lines that the source code file must have to be segmented using the parser."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "ae024794",
"metadata": {},
"outputs": [],
"source": [
"loader = GenericLoader.from_filesystem(\n",
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".py\"],\n",
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000)\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "5d3b372a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "89e546ad",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"class MyClass:\n",
" def __init__(self, name):\n",
" self.name = name\n",
"\n",
" def greet(self):\n",
" print(f\"Hello, {self.name}!\")\n",
"\n",
"\n",
"def main():\n",
" name = input(\"Enter your name: \")\n",
" obj = MyClass(name)\n",
" obj.greet()\n",
"\n",
"\n",
"if __name__ == \"__main__\":\n",
" main()\n",
"\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "c9c71e61",
"metadata": {},
"source": [
"## Splitting\n",
"\n",
"Additional splitting could be needed for those functions, classes, or scripts that are too big."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "adbaa79f",
"metadata": {},
"outputs": [],
"source": [
"loader = GenericLoader.from_filesystem(\n",
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".js\"],\n",
" parser=LanguageParser(language=Language.JS)\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c44c0d3f",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import (\n",
" RecursiveCharacterTextSplitter,\n",
" Language,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "b1e0053d",
"metadata": {},
"outputs": [],
"source": [
"js_splitter = RecursiveCharacterTextSplitter.from_language(\n",
" language=Language.JS, chunk_size=60, chunk_overlap=0\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "7dbe6188",
"metadata": {},
"outputs": [],
"source": [
"result = js_splitter.split_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "8a80d089",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(result)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "000a6011",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"class MyClass {\n",
" constructor(name) {\n",
" this.name = name;\n",
"\n",
"--8<--\n",
"\n",
"}\n",
"\n",
"--8<--\n",
"\n",
"greet() {\n",
" console.log(`Hello, ${this.name}!`);\n",
" }\n",
"}\n",
"\n",
"--8<--\n",
"\n",
"function main() {\n",
" const name = prompt(\"Enter your name:\");\n",
"\n",
"--8<--\n",
"\n",
"const obj = new MyClass(name);\n",
" obj.greet();\n",
"}\n",
"\n",
"--8<--\n",
"\n",
"// Code for: class MyClass {\n",
"\n",
"// Code for: function main() {\n",
"\n",
"--8<--\n",
"\n",
"main();\n"
]
}
],
"source": [
"print(\"\\n\\n--8<--\\n\\n\".join([document.page_content for document in result]))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,116 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a634365e",
"metadata": {},
"source": [
"# Tencent COS Directory\n",
"\n",
"This covers how to load document objects from a `Tencent COS Directory`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85e97267",
"metadata": {},
"outputs": [],
"source": [
"#! pip install cos-python-sdk-v5"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2f0cd6a5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import TencentCOSDirectoryLoader\n",
"from qcloud_cos import CosConfig"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "321cc7f1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"conf = CosConfig(\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
" )\n",
"loader = TencentCOSDirectoryLoader(conf=conf, bucket=\"you_cos_bucket\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c50d2c7",
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "0690c40a",
"metadata": {},
"source": [
"## Specifying a prefix\n",
"You can also specify a prefix for more finegrained control over what files to load."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "72d44781",
"metadata": {},
"outputs": [],
"source": [
"loader = TencentCOSDirectoryLoader(conf=conf, bucket=\"you_cos_bucket\", prefix=\"fake\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d3c32db",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,91 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a634365e",
"metadata": {},
"source": [
"# Tencent COS File\n",
"\n",
"This covers how to load document object from a `Tencent COS File`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85e97267",
"metadata": {},
"outputs": [],
"source": [
"#! pip install cos-python-sdk-v5"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2f0cd6a5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import TencentCOSFileLoader\n",
"from qcloud_cos import CosConfig"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "321cc7f1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"conf = CosConfig(\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
" )\n",
"loader = TencentCOSFileLoader(conf=conf, bucket=\"you_cos_bucket\", key=\"fake.docx\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c50d2c7",
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "0690c40a",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -226,7 +226,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8de9ef16",
"metadata": {},
@@ -303,7 +302,7 @@
"source": [
"## Unstructured API\n",
"\n",
"If you want to get up and running with less set up, you can simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or `UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API. Note that currently (as of 11 May 2023) the Unstructured API is open, but it will soon require an API. The [Unstructured documentation](https://unstructured-io.github.io/) page will have instructions on how to generate an API key once theyre available. Check out the instructions [here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if youd like to self-host the Unstructured API or run it locally."
"If you want to get up and running with less set up, you can simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or `UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API. You can generate a free Unstructured API key [here](https://www.unstructured.io/api-key/). The [Unstructured documentation](https://unstructured-io.github.io/) page will have instructions on how to generate an API key once theyre available. Check out the instructions [here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if youd like to self-host the Unstructured API or run it locally."
]
},
{

View File

@@ -224,13 +224,33 @@
"docs"
]
},
{
"cell_type": "markdown",
"source": [
"## Using proxies\n",
"\n",
"Sometimes you might need to use proxies to get around IP blocks. You can pass in a dictionary of proxies to the loader (and `requests` underneath) to use them."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"id": "1dd8ab23",
"metadata": {},
"outputs": [],
"source": []
"source": [
"loader = WebBaseLoader(\n",
" \"https://www.walmart.com/search?q=parrots\", proxies={\n",
" \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n",
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n",
" }\n",
")\n",
"docs = loader.load()\n"
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {

View File

@@ -0,0 +1,214 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8cc82b48",
"metadata": {},
"source": [
"# MultiQueryRetriever\n",
"\n",
"Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on \"distance\". But, retrieval may produce difference results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.\n",
"\n",
"The `MultiQueryRetriever` automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the `MultiQueryRetriever` might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c2f3f5f2",
"metadata": {},
"outputs": [],
"source": [
"# Build a sample vectorDB\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.document_loaders import PyPDFLoader\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"# Load PDF\n",
"path=\"path-to-files\"\n",
"loaders = [\n",
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture01.pdf\"),\n",
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture02.pdf\"),\n",
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture03.pdf\")\n",
"]\n",
"docs = []\n",
"for loader in loaders:\n",
" docs.extend(loader.load())\n",
" \n",
"# Split\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500,chunk_overlap = 150)\n",
"splits = text_splitter.split_documents(docs)\n",
"\n",
"# VectorDB\n",
"embedding = OpenAIEmbeddings()\n",
"vectordb = Chroma.from_documents(documents=splits,embedding=embedding)"
]
},
{
"cell_type": "markdown",
"id": "cca8f56c",
"metadata": {},
"source": [
"`Simple usage`\n",
"\n",
"Specify the LLM to use for query generation, and the retriver will do the rest."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "edbca101",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.retrievers.multi_query import MultiQueryRetriever\n",
"question=\"What does the course say about regression?\"\n",
"num_queries=3\n",
"llm = ChatOpenAI(temperature=0)\n",
"retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectordb.as_retriever(),llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "e5203612",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:Generated queries: [\"1. What is the course's perspective on regression?\", '2. How does the course discuss regression?', '3. What information does the course provide about regression?']\n"
]
},
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"unique_docs = retriever_from_llm.get_relevant_documents(question=\"What does the course say about regression?\")\n",
"len(unique_docs)"
]
},
{
"cell_type": "markdown",
"id": "c54a282f",
"metadata": {},
"source": [
"`Supplying your own prompt`\n",
"\n",
"You can also supply a prompt along with an output parser to split the results into a list of queries."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d9afb0ca",
"metadata": {},
"outputs": [],
"source": [
"from typing import List\n",
"from langchain import LLMChain\n",
"from pydantic import BaseModel, Field\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.output_parsers import PydanticOutputParser\n",
"\n",
"# Output parser will split the LLM result into a list of queries\n",
"class LineList(BaseModel):\n",
" # \"lines\" is the key (attribute name) of the parsed output\n",
" lines: List[str] = Field(description=\"Lines of text\")\n",
"\n",
"class LineListOutputParser(PydanticOutputParser):\n",
" def __init__(self) -> None:\n",
" super().__init__(pydantic_object=LineList)\n",
" def parse(self, text: str) -> LineList:\n",
" lines = text.strip().split(\"\\n\")\n",
" return LineList(lines=lines)\n",
"\n",
"output_parser = LineListOutputParser()\n",
" \n",
"QUERY_PROMPT = PromptTemplate(\n",
" input_variables=[\"question\"],\n",
" template=\"\"\"You are an AI language model assistant. Your task is to generate five \n",
" different versions of the given user question to retrieve relevant documents from a vector \n",
" database. By generating multiple perspectives on the user question, your goal is to help\n",
" the user overcome some of the limitations of the distance-based similarity search. \n",
" Provide these alternative questions seperated by newlines.\n",
" Original question: {question}\"\"\",\n",
")\n",
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"# Chain\n",
"llm_chain = LLMChain(llm=llm,prompt=QUERY_PROMPT,output_parser=output_parser)\n",
" \n",
"# Other inputs\n",
"question=\"What does the course say about regression?\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6660d7ee",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:Generated queries: [\"1. What is the course's perspective on regression?\", '2. Can you provide information on regression as discussed in the course?', '3. How does the course cover the topic of regression?', \"4. What are the course's teachings on regression?\", '5. In relation to the course, what is mentioned about regression?']\n"
]
},
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Run\n",
"retriever = MultiQueryRetriever(retriever=vectordb.as_retriever(), \n",
" llm_chain=llm_chain,\n",
" parser_key=\"lines\") # \"lines\" is the key (attribute name) of the parsed output\n",
"\n",
"# Results\n",
"unique_docs = retriever.get_relevant_documents(question=\"What does the course say about regression?\")\n",
"len(unique_docs)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -9,7 +9,7 @@ If you are just getting started, and you have relatively simple apis, you should
Chains are a sequence of predetermined steps, so they are good to get started with as they give you more control and let you
understand what is happening better.
- [API Chain](/docs/modules/chains/how_to/api.html)
- [API Chain](/docs/modules/chains/popular/api.html)
## Agents

View File

@@ -29,6 +29,23 @@ class ZapierToolkit(BaseToolkit):
]
return cls(tools=tools)
@classmethod
async def async_from_zapier_nla_wrapper(
cls, zapier_nla_wrapper: ZapierNLAWrapper
) -> "ZapierToolkit":
"""Create a toolkit from a ZapierNLAWrapper."""
actions = await zapier_nla_wrapper.alist()
tools = [
ZapierNLARunAction(
action_id=action["id"],
zapier_description=action["description"],
params_schema=action["params"],
api_wrapper=zapier_nla_wrapper,
)
for action in actions
]
return cls(tools=tools)
def get_tools(self) -> List[BaseTool]:
"""Get the tools in the toolkit."""
return self.tools

View File

@@ -96,7 +96,7 @@ def get_openai_token_cost_for_model(
f"Unknown model: {model_name}. Please provide a valid OpenAI model name."
"Known models are: " + ", ".join(MODEL_COST_PER_1K_TOKENS.keys())
)
return MODEL_COST_PER_1K_TOKENS[model_name] * num_tokens / 1000
return MODEL_COST_PER_1K_TOKENS[model_name] * (num_tokens / 1000)
class OpenAICallbackHandler(BaseCallbackHandler):

View File

@@ -0,0 +1,88 @@
from __future__ import annotations
from typing import Any, Dict, List, Optional
from langchain.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
from langchain.schema import LLMResult
DEFAULT_ANSWER_PREFIX_TOKENS = ["Final", "Answer", ":"]
class AsyncFinalIteratorCallbackHandler(AsyncIteratorCallbackHandler):
"""Callback handler that returns an async iterator.
Only the final output of the agent will be iterated.
"""
def append_to_last_tokens(self, token: str) -> None:
self.last_tokens.append(token)
self.last_tokens_stripped.append(token.strip())
if len(self.last_tokens) > len(self.answer_prefix_tokens):
self.last_tokens.pop(0)
self.last_tokens_stripped.pop(0)
def check_if_answer_reached(self) -> bool:
if self.strip_tokens:
return self.last_tokens_stripped == self.answer_prefix_tokens_stripped
else:
return self.last_tokens == self.answer_prefix_tokens
def __init__(
self,
*,
answer_prefix_tokens: Optional[List[str]] = None,
strip_tokens: bool = True,
stream_prefix: bool = False,
) -> None:
"""Instantiate AsyncFinalIteratorCallbackHandler.
Args:
answer_prefix_tokens: Token sequence that prefixes the answer.
Default is ["Final", "Answer", ":"]
strip_tokens: Ignore white spaces and new lines when comparing
answer_prefix_tokens to last tokens? (to determine if answer has been
reached)
stream_prefix: Should answer prefix itself also be streamed?
"""
super().__init__()
if answer_prefix_tokens is None:
self.answer_prefix_tokens = DEFAULT_ANSWER_PREFIX_TOKENS
else:
self.answer_prefix_tokens = answer_prefix_tokens
if strip_tokens:
self.answer_prefix_tokens_stripped = [
token.strip() for token in self.answer_prefix_tokens
]
else:
self.answer_prefix_tokens_stripped = self.answer_prefix_tokens
self.last_tokens = [""] * len(self.answer_prefix_tokens)
self.last_tokens_stripped = [""] * len(self.answer_prefix_tokens)
self.strip_tokens = strip_tokens
self.stream_prefix = stream_prefix
self.answer_reached = False
async def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
# If two calls are made in a row, this resets the state
self.done.clear()
self.answer_reached = False
async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
if self.answer_reached:
self.done.set()
async def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
# Remember the last n tokens, where n = len(answer_prefix_tokens)
self.append_to_last_tokens(token)
# Check if the last n tokens match the answer_prefix_tokens list ...
if self.check_if_answer_reached():
self.answer_reached = True
if self.stream_prefix:
for t in self.last_tokens:
self.queue.put_nowait(t)
return
# If yes, then put tokens from now on
if self.answer_reached:
self.queue.put_nowait(token)

View File

@@ -5,6 +5,7 @@ from uuid import UUID
from langchainplus_sdk import LangChainPlusClient, RunEvaluator
from langchain.callbacks.manager import tracing_v2_enabled
from langchain.callbacks.tracers.base import BaseTracer
from langchain.callbacks.tracers.schemas import Run
@@ -47,6 +48,7 @@ class EvaluatorCallbackHandler(BaseTracer):
max_workers: Optional[int] = None,
client: Optional[LangChainPlusClient] = None,
example_id: Optional[Union[UUID, str]] = None,
project_name: Optional[str] = None,
**kwargs: Any
) -> None:
super().__init__(**kwargs)
@@ -59,6 +61,23 @@ class EvaluatorCallbackHandler(BaseTracer):
max_workers=max(max_workers or len(evaluators), 1)
)
self.futures: Set[Future] = set()
self.project_name = project_name
def _evaluate_in_project(self, run: Run, evaluator: RunEvaluator) -> None:
"""Evaluate the run in the project.
Parameters
----------
run : Run
The run to be evaluated.
evaluator : RunEvaluator
The evaluator to use for evaluating the run.
"""
if self.project_name is None:
return self.client.evaluate_run(run, evaluator)
with tracing_v2_enabled(project_name=self.project_name):
return self.client.evaluate_run(run, evaluator)
def _persist_run(self, run: Run) -> None:
"""Run the evaluator on the run.
@@ -73,7 +92,7 @@ class EvaluatorCallbackHandler(BaseTracer):
run_.reference_example_id = self.example_id
for evaluator in self.evaluators:
self.futures.add(
self.executor.submit(self.client.evaluate_run, run_, evaluator)
self.executor.submit(self._evaluate_in_project, run_, evaluator)
)
def wait_for_futures(self) -> None:

View File

@@ -157,7 +157,13 @@ def openapi_spec_to_openai_fn(
"url": api_op.base_url + api_op.path,
}
def default_call_api(name: str, fn_args: dict, **kwargs: Any) -> Any:
def default_call_api(
name: str,
fn_args: dict,
headers: Optional[dict] = None,
params: Optional[dict] = None,
**kwargs: Any,
) -> Any:
method = _name_to_call_map[name]["method"]
url = _name_to_call_map[name]["url"]
path_params = fn_args.pop("path_params", {})
@@ -165,6 +171,16 @@ def openapi_spec_to_openai_fn(
if "data" in fn_args and isinstance(fn_args["data"], dict):
fn_args["data"] = json.dumps(fn_args["data"])
_kwargs = {**fn_args, **kwargs}
if headers is not None:
if "headers" in _kwargs:
_kwargs["headers"].update(headers)
else:
_kwargs["headers"] = headers
if params is not None:
if "params" in _kwargs:
_kwargs["params"].update(params)
else:
_kwargs["params"] = params
return requests.request(method, url, **_kwargs)
return functions, default_call_api
@@ -218,6 +234,8 @@ def get_openapi_chain(
request_chain: Optional[Chain] = None,
llm_kwargs: Optional[Dict] = None,
verbose: bool = False,
headers: Optional[Dict] = None,
params: Optional[Dict] = None,
**kwargs: Any,
) -> SequentialChain:
"""Create a chain for querying an API from a OpenAPI spec.
@@ -259,7 +277,10 @@ def get_openapi_chain(
**(llm_kwargs or {}),
)
request_chain = request_chain or SimpleRequestChain(
request_method=call_api_fn, verbose=verbose
request_method=lambda name, args: call_api_fn(
name, args, headers=headers, params=params
),
verbose=verbose,
)
return SequentialChain(
chains=[llm_chain, request_chain],

View File

@@ -296,12 +296,14 @@ async def _callbacks_initializer(
project_name: Optional[str],
client: LangChainPlusClient,
run_evaluators: Sequence[RunEvaluator],
evaluation_handler_collector: List[EvaluatorCallbackHandler],
) -> List[BaseTracer]:
"""
Initialize a tracer to share across tasks.
Args:
project_name: The project name for the tracer.
client: The client to use for the tracer.
Returns:
A LangChainTracer instance with an active project.
@@ -309,15 +311,17 @@ async def _callbacks_initializer(
callbacks: List[BaseTracer] = []
if project_name:
callbacks.append(LangChainTracer(project_name=project_name))
evaluator_project_name = f"{project_name}-evaluators" if project_name else None
if run_evaluators:
callbacks.append(
EvaluatorCallbackHandler(
client=client,
evaluators=run_evaluators,
# We already have concurrency, don't want to overload the machine
max_workers=1,
)
callback = EvaluatorCallbackHandler(
client=client,
evaluators=run_evaluators,
# We already have concurrency, don't want to overload the machine
max_workers=1,
project_name=evaluator_project_name,
)
callbacks.append(callback)
evaluation_handler_collector.append(callback)
return callbacks
@@ -362,9 +366,6 @@ async def arun_on_examples(
client_.create_project(project_name, mode="eval")
results: Dict[str, List[Any]] = {}
evaluation_handler = EvaluatorCallbackHandler(
evaluators=run_evaluators or [], client=client_
)
async def process_example(
example: Example, callbacks: List[BaseCallbackHandler], job_state: dict
@@ -386,17 +387,20 @@ async def arun_on_examples(
flush=True,
)
evaluation_handlers: List[EvaluatorCallbackHandler] = []
await _gather_with_concurrency(
concurrency_level,
functools.partial(
_callbacks_initializer,
project_name=project_name,
client=client_,
evaluation_handler_collector=evaluation_handlers,
run_evaluators=run_evaluators or [],
),
*(functools.partial(process_example, e) for e in examples),
)
evaluation_handler.wait_for_futures()
for handler in evaluation_handlers:
handler.wait_for_futures()
return results
@@ -537,8 +541,11 @@ def run_on_examples(
client_ = client or LangChainPlusClient()
client_.create_project(project_name, mode="eval")
tracer = LangChainTracer(project_name=project_name)
evaluator_project_name = f"{project_name}-evaluators"
evalution_handler = EvaluatorCallbackHandler(
evaluators=run_evaluators or [], client=client_
evaluators=run_evaluators or [],
client=client_,
project_name=evaluator_project_name,
)
callbacks: List[BaseCallbackHandler] = [tracer, evalution_handler]
for i, example in enumerate(examples):

View File

@@ -63,6 +63,7 @@ from langchain.document_loaders.imsdb import IMSDbLoader
from langchain.document_loaders.iugu import IuguLoader
from langchain.document_loaders.joplin import JoplinLoader
from langchain.document_loaders.json_loader import JSONLoader
from langchain.document_loaders.larksuite import LarkSuiteDocLoader
from langchain.document_loaders.markdown import UnstructuredMarkdownLoader
from langchain.document_loaders.mastodon import MastodonTootsLoader
from langchain.document_loaders.max_compute import MaxComputeLoader
@@ -78,6 +79,7 @@ from langchain.document_loaders.odt import UnstructuredODTLoader
from langchain.document_loaders.onedrive import OneDriveLoader
from langchain.document_loaders.onedrive_file import OneDriveFileLoader
from langchain.document_loaders.open_city_data import OpenCityDataLoader
from langchain.document_loaders.org_mode import UnstructuredOrgModeLoader
from langchain.document_loaders.pdf import (
MathpixPDFLoader,
OnlinePDFLoader,
@@ -112,6 +114,8 @@ from langchain.document_loaders.telegram import (
TelegramChatApiLoader,
TelegramChatFileLoader,
)
from langchain.document_loaders.tencent_cos_directory import TencentCOSDirectoryLoader
from langchain.document_loaders.tencent_cos_file import TencentCOSFileLoader
from langchain.document_loaders.text import TextLoader
from langchain.document_loaders.tomarkdown import ToMarkdownLoader
from langchain.document_loaders.toml import TomlLoader
@@ -201,6 +205,7 @@ __all__ = [
"IuguLoader",
"JSONLoader",
"JoplinLoader",
"LarkSuiteDocLoader",
"MWDumpLoader",
"MastodonTootsLoader",
"MathpixPDFLoader",
@@ -242,6 +247,8 @@ __all__ = [
"SnowflakeLoader",
"SpreedlyLoader",
"StripeLoader",
"TencentCOSDirectoryLoader",
"TencentCOSFileLoader",
"TelegramChatApiLoader",
"TelegramChatFileLoader",
"TelegramChatLoader",
@@ -262,6 +269,7 @@ __all__ = [
"UnstructuredImageLoader",
"UnstructuredMarkdownLoader",
"UnstructuredODTLoader",
"UnstructuredOrgModeLoader",
"UnstructuredPDFLoader",
"UnstructuredPowerPointLoader",
"UnstructuredRSTLoader",

View File

@@ -0,0 +1,46 @@
"""Loader that loads LarkSuite (FeiShu) document json dump."""
import json
import urllib.request
from typing import Any, Iterator, List
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
class LarkSuiteDocLoader(BaseLoader):
"""Loader that loads LarkSuite (FeiShu) document."""
def __init__(self, domain: str, access_token: str, document_id: str):
"""Initialize with domain, access_token (tenant / user), and document_id."""
self.domain = domain
self.access_token = access_token
self.document_id = document_id
def _get_larksuite_api_json_data(self, api_url: str) -> Any:
"""Get LarkSuite (FeiShu) API response json data."""
headers = {"Authorization": f"Bearer {self.access_token}"}
request = urllib.request.Request(api_url, headers=headers)
with urllib.request.urlopen(request) as response:
json_data = json.loads(response.read().decode())
return json_data
def lazy_load(self) -> Iterator[Document]:
"""Lazy load LarkSuite (FeiShu) document."""
api_url_prefix = f"{self.domain}/open-apis/docx/v1/documents"
metadata_json = self._get_larksuite_api_json_data(
f"{api_url_prefix}/{self.document_id}"
)
raw_content_json = self._get_larksuite_api_json_data(
f"{api_url_prefix}/{self.document_id}/raw_content"
)
text = raw_content_json["data"]["content"]
metadata = {
"document_id": self.document_id,
"revision_id": metadata_json["data"]["document"]["revision_id"],
"title": metadata_json["data"]["document"]["title"],
}
yield Document(page_content=text, metadata=metadata)
def load(self) -> List[Document]:
"""Load LarkSuite (FeiShu) document."""
return list(self.lazy_load())

View File

@@ -0,0 +1,22 @@
"""Loader that loads Org-Mode files."""
from typing import Any, List
from langchain.document_loaders.unstructured import (
UnstructuredFileLoader,
validate_unstructured_version,
)
class UnstructuredOrgModeLoader(UnstructuredFileLoader):
"""Loader that uses unstructured to load Org-Mode files."""
def __init__(
self, file_path: str, mode: str = "single", **unstructured_kwargs: Any
):
validate_unstructured_version(min_unstructured_version="0.7.9")
super().__init__(file_path=file_path, mode=mode, **unstructured_kwargs)
def _get_elements(self) -> List:
from unstructured.partition.org import partition_org
return partition_org(filename=self.file_path, **self.unstructured_kwargs)

View File

@@ -1,5 +1,6 @@
from langchain.document_loaders.parsers.audio import OpenAIWhisperParser
from langchain.document_loaders.parsers.html import BS4HTMLParser
from langchain.document_loaders.parsers.language import LanguageParser
from langchain.document_loaders.parsers.pdf import (
PDFMinerParser,
PDFPlumberParser,
@@ -10,6 +11,7 @@ from langchain.document_loaders.parsers.pdf import (
__all__ = [
"BS4HTMLParser",
"LanguageParser",
"OpenAIWhisperParser",
"PDFMinerParser",
"PDFPlumberParser",

View File

@@ -0,0 +1,3 @@
from langchain.document_loaders.parsers.language.language_parser import LanguageParser
__all__ = ["LanguageParser"]

View File

@@ -0,0 +1,18 @@
from abc import ABC, abstractmethod
from typing import List
class CodeSegmenter(ABC):
def __init__(self, code: str):
self.code = code
def is_valid(self) -> bool:
return True
@abstractmethod
def simplify_code(self) -> str:
raise NotImplementedError # pragma: no cover
@abstractmethod
def extract_functions_classes(self) -> List[str]:
raise NotImplementedError # pragma: no cover

View File

@@ -0,0 +1,65 @@
from typing import Any, List
from langchain.document_loaders.parsers.language.code_segmenter import CodeSegmenter
class JavaScriptSegmenter(CodeSegmenter):
def __init__(self, code: str):
super().__init__(code)
self.source_lines = self.code.splitlines()
try:
import esprima # noqa: F401
except ImportError:
raise ImportError(
"Could not import esprima Python package. "
"Please install it with `pip install esprima`."
)
def is_valid(self) -> bool:
import esprima
try:
esprima.parseScript(self.code)
return True
except esprima.Error:
return False
def _extract_code(self, node: Any) -> str:
start = node.loc.start.line - 1
end = node.loc.end.line
return "\n".join(self.source_lines[start:end])
def extract_functions_classes(self) -> List[str]:
import esprima
tree = esprima.parseScript(self.code, loc=True)
functions_classes = []
for node in tree.body:
if isinstance(
node,
(esprima.nodes.FunctionDeclaration, esprima.nodes.ClassDeclaration),
):
functions_classes.append(self._extract_code(node))
return functions_classes
def simplify_code(self) -> str:
import esprima
tree = esprima.parseScript(self.code, loc=True)
simplified_lines = self.source_lines[:]
for node in tree.body:
if isinstance(
node,
(esprima.nodes.FunctionDeclaration, esprima.nodes.ClassDeclaration),
):
start = node.loc.start.line - 1
simplified_lines[start] = f"// Code for: {simplified_lines[start]}"
for line_num in range(start + 1, node.loc.end.line):
simplified_lines[line_num] = None # type: ignore
return "\n".join(line for line in simplified_lines if line is not None)

View File

@@ -0,0 +1,143 @@
from typing import Any, Dict, Iterator, Optional
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseBlobParser
from langchain.document_loaders.blob_loaders import Blob
from langchain.document_loaders.parsers.language.javascript import JavaScriptSegmenter
from langchain.document_loaders.parsers.language.python import PythonSegmenter
from langchain.text_splitter import Language
LANGUAGE_EXTENSIONS: Dict[str, str] = {
"py": Language.PYTHON,
"js": Language.JS,
}
LANGUAGE_SEGMENTERS: Dict[str, Any] = {
Language.PYTHON: PythonSegmenter,
Language.JS: JavaScriptSegmenter,
}
class LanguageParser(BaseBlobParser):
"""
Language parser that split code using the respective language syntax.
Each top-level function and class in the code is loaded into separate documents.
Furthermore, an extra document is generated, containing the remaining top-level code
that excludes the already segmented functions and classes.
This approach can potentially improve the accuracy of QA models over source code.
Currently, the supported languages for code parsing are Python and JavaScript.
The language used for parsing can be configured, along with the minimum number of
lines required to activate the splitting based on syntax.
Examples:
.. code-block:: python
from langchain.text_splitter.Language
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py", ".js"],
parser=LanguageParser()
)
docs = loader.load()
Example instantiations to manually select the language:
... code-block:: python
from langchain.text_splitter import Language
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py"],
parser=LanguageParser(language=Language.PYTHON)
)
Example instantiations to set number of lines threshold:
... code-block:: python
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py"],
parser=LanguageParser(parser_threshold=200)
)
"""
def __init__(self, language: Optional[Language] = None, parser_threshold: int = 0):
"""
Language parser that split code using the respective language syntax.
Args:
language: If None (default), it will try to infer language from source.
parser_threshold: Minimum lines needed to activate parsing (0 by default).
"""
self.language = language
self.parser_threshold = parser_threshold
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
code = blob.as_string()
language = self.language or (
LANGUAGE_EXTENSIONS.get(blob.source.rsplit(".", 1)[-1])
if isinstance(blob.source, str)
else None
)
if language is None:
yield Document(
page_content=code,
metadata={
"source": blob.source,
},
)
return
if self.parser_threshold >= len(code.splitlines()):
yield Document(
page_content=code,
metadata={
"source": blob.source,
"language": language,
},
)
return
self.Segmenter = LANGUAGE_SEGMENTERS[language]
segmenter = self.Segmenter(blob.as_string())
if not segmenter.is_valid():
yield Document(
page_content=code,
metadata={
"source": blob.source,
},
)
return
for functions_classes in segmenter.extract_functions_classes():
yield Document(
page_content=functions_classes,
metadata={
"source": blob.source,
"content_type": "functions_classes",
"language": language,
},
)
yield Document(
page_content=segmenter.simplify_code(),
metadata={
"source": blob.source,
"content_type": "simplified_code",
"language": language,
},
)

View File

@@ -0,0 +1,47 @@
import ast
from typing import Any, List
from langchain.document_loaders.parsers.language.code_segmenter import CodeSegmenter
class PythonSegmenter(CodeSegmenter):
def __init__(self, code: str):
super().__init__(code)
self.source_lines = self.code.splitlines()
def is_valid(self) -> bool:
try:
ast.parse(self.code)
return True
except SyntaxError:
return False
def _extract_code(self, node: Any) -> str:
start = node.lineno - 1
end = node.end_lineno
return "\n".join(self.source_lines[start:end])
def extract_functions_classes(self) -> List[str]:
tree = ast.parse(self.code)
functions_classes = []
for node in ast.iter_child_nodes(tree):
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
functions_classes.append(self._extract_code(node))
return functions_classes
def simplify_code(self) -> str:
tree = ast.parse(self.code)
simplified_lines = self.source_lines[:]
for node in ast.iter_child_nodes(tree):
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
start = node.lineno - 1
simplified_lines[start] = f"# Code for: {simplified_lines[start]}"
assert isinstance(node.end_lineno, int)
for line_num in range(start + 1, node.end_lineno):
simplified_lines[line_num] = None # type: ignore
return "\n".join(line for line in simplified_lines if line is not None)

View File

@@ -1,5 +1,5 @@
"""Loader that loads documents from Psychic.dev."""
from typing import List
from typing import List, Optional
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
@@ -8,8 +8,10 @@ from langchain.document_loaders.base import BaseLoader
class PsychicLoader(BaseLoader):
"""Loader that loads documents from Psychic.dev."""
def __init__(self, api_key: str, connector_id: str, connection_id: str):
"""Initialize with API key, connector id, and connection id."""
def __init__(
self, api_key: str, account_id: str, connector_id: Optional[str] = None
):
"""Initialize with API key, connector id, and account id."""
try:
from psychicapi import ConnectorId, Psychic # noqa: F401
@@ -19,16 +21,18 @@ class PsychicLoader(BaseLoader):
)
self.psychic = Psychic(secret_key=api_key)
self.connector_id = ConnectorId(connector_id)
self.connection_id = connection_id
self.account_id = account_id
def load(self) -> List[Document]:
"""Load documents."""
psychic_docs = self.psychic.get_documents(self.connector_id, self.connection_id)
psychic_docs = self.psychic.get_documents(
connector_id=self.connector_id, account_id=self.account_id
)
return [
Document(
page_content=doc["content"],
metadata={"title": doc["title"], "source": doc["uri"]},
)
for doc in psychic_docs
for doc in psychic_docs.documents
]

View File

@@ -0,0 +1,50 @@
"""Loading logic for loading documents from Tencent Cloud COS directory."""
from typing import Any, Iterator, List
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
from langchain.document_loaders.tencent_cos_file import TencentCOSFileLoader
class TencentCOSDirectoryLoader(BaseLoader):
"""Loading logic for loading documents from Tencent Cloud COS."""
def __init__(self, conf: Any, bucket: str, prefix: str = ""):
"""Initialize with COS config, bucket and prefix.
:param conf(CosConfig): COS config.
:param bucket(str): COS bucket.
:param prefix(str): prefix.
"""
self.conf = conf
self.bucket = bucket
self.prefix = prefix
def load(self) -> List[Document]:
return list(self.lazy_load())
def lazy_load(self) -> Iterator[Document]:
"""Load documents."""
try:
from qcloud_cos import CosS3Client
except ImportError:
raise ValueError(
"Could not import cos-python-sdk-v5 python package. "
"Please install it with `pip install cos-python-sdk-v5`."
)
client = CosS3Client(self.conf)
contents = []
marker = ""
while True:
response = client.list_objects(
Bucket=self.bucket, Prefix=self.prefix, Marker=marker, MaxKeys=1000
)
if "Contents" in response:
contents.extend(response["Contents"])
if response["IsTruncated"] == "false":
break
marker = response["NextMarker"]
for content in contents:
if content["Key"].endswith("/"):
continue
loader = TencentCOSFileLoader(self.conf, self.bucket, content["Key"])
yield loader.load()[0]

View File

@@ -0,0 +1,48 @@
"""Loading logic for loading documents from Tencent Cloud COS file."""
import os
import tempfile
from typing import Any, Iterator, List
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
from langchain.document_loaders.unstructured import UnstructuredFileLoader
class TencentCOSFileLoader(BaseLoader):
"""Loading logic for loading documents from Tencent Cloud COS."""
def __init__(self, conf: Any, bucket: str, key: str):
"""Initialize with COS config, bucket and key name.
:param conf(CosConfig): COS config.
:param bucket(str): COS bucket.
:param key(str): COS file key.
"""
self.conf = conf
self.bucket = bucket
self.key = key
def load(self) -> List[Document]:
return list(self.lazy_load())
def lazy_load(self) -> Iterator[Document]:
"""Load documents."""
try:
from qcloud_cos import CosS3Client
except ImportError:
raise ValueError(
"Could not import cos-python-sdk-v5 python package. "
"Please install it with `pip install cos-python-sdk-v5`."
)
# Initialise a client
client = CosS3Client(self.conf)
with tempfile.TemporaryDirectory() as temp_dir:
file_path = f"{temp_dir}/{self.bucket}/{self.key}"
os.makedirs(os.path.dirname(file_path), exist_ok=True)
# Download the file to a destination
client.download_file(
Bucket=self.bucket, Key=self.key, DestFilePath=file_path
)
loader = UnstructuredFileLoader(file_path)
# UnstructuredFileLoader not implement lazy_load yet
return iter(loader.load())

View File

@@ -50,6 +50,9 @@ class WebBaseLoader(BaseLoader):
requests_kwargs: Dict[str, Any] = {}
"""kwargs for requests"""
raise_for_status: bool = False
"""Raise an exception if http status code denotes an error."""
bs_get_text_kwargs: Dict[str, Any] = {}
"""kwargs for beatifulsoup4 get_text"""
@@ -58,6 +61,7 @@ class WebBaseLoader(BaseLoader):
web_path: Union[str, List[str]],
header_template: Optional[dict] = None,
verify: Optional[bool] = True,
proxies: Optional[dict] = None,
):
"""Initialize with webpage path."""
@@ -94,6 +98,9 @@ class WebBaseLoader(BaseLoader):
)
self.session.headers = dict(headers)
if proxies:
self.session.proxies.update(proxies)
@property
def web_path(self) -> str:
if len(self.web_paths) > 1:
@@ -189,6 +196,8 @@ class WebBaseLoader(BaseLoader):
self._check_parser(parser)
html_doc = self.session.get(url, verify=self.verify, **self.requests_kwargs)
if self.raise_for_status:
html_doc.raise_for_status()
html_doc.encoding = html_doc.apparent_encoding
return BeautifulSoup(html_doc.text, parser)

View File

@@ -49,13 +49,15 @@ class WhatsAppChatLoader(BaseLoader):
\s
(.+)
"""
ignore_lines = ["This message was deleted", "<Media omitted>"]
for line in lines:
result = re.match(
message_line_regex, line.strip(), flags=re.VERBOSE | re.IGNORECASE
)
if result:
date, sender, text = result.groups()
text_content += concatenate_rows(date, sender, text)
if text not in ignore_lines:
text_content += concatenate_rows(date, sender, text)
metadata = {"source": str(p)}

View File

@@ -61,6 +61,29 @@ class GuardrailsOutputParser(BaseOutputParser):
kwargs=kwargs,
)
@classmethod
def from_pydantic(
cls,
output_class: Any,
num_reasks: int = 1,
api: Optional[Callable] = None,
*args: Any,
**kwargs: Any,
) -> GuardrailsOutputParser:
try:
from guardrails import Guard
except ImportError:
raise ValueError(
"guardrails-ai package not installed. "
"Install it by running `pip install guardrails-ai`."
)
return cls(
guard=Guard.from_pydantic(output_class, "", num_reasks=num_reasks),
api=api,
args=args,
kwargs=kwargs,
)
def get_format_instructions(self) -> str:
return self.guard.raw_prompt.format_instructions

View File

@@ -14,6 +14,7 @@ from langchain.retrievers.llama_index import (
from langchain.retrievers.merger_retriever import MergerRetriever
from langchain.retrievers.metal import MetalRetriever
from langchain.retrievers.milvus import MilvusRetriever
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.retrievers.pinecone_hybrid_search import PineconeHybridSearchRetriever
from langchain.retrievers.pupmed import PubMedRetriever
from langchain.retrievers.remote_retriever import RemoteLangChainRetriever
@@ -43,6 +44,7 @@ __all__ = [
"MergerRetriever",
"MetalRetriever",
"MilvusRetriever",
"MultiQueryRetriever",
"PineconeHybridSearchRetriever",
"PubMedRetriever",
"RemoteLangChainRetriever",

View File

@@ -32,16 +32,17 @@ class TextWithHighLights(BaseModel, extra=Extra.allow):
Highlights: Optional[Any]
class AdditionalResultAttributeValue(BaseModel, extra=Extra.allow):
TextWithHighlightsValue: TextWithHighLights
class AdditionalResultAttribute(BaseModel, extra=Extra.allow):
Key: str
ValueType: Literal["TEXT_WITH_HIGHLIGHTS_VALUE"]
Value: Optional[TextWithHighLights]
Value: AdditionalResultAttributeValue
def get_value_text(self) -> str:
if not self.Value:
return ""
else:
return self.Value.Text
return self.Value.TextWithHighlightsValue.Text
class QueryResultItem(BaseModel, extra=Extra.allow):

View File

@@ -0,0 +1,158 @@
import logging
from typing import List
from pydantic import BaseModel, Field
from langchain.chains.llm import LLMChain
from langchain.llms.base import BaseLLM
from langchain.output_parsers.pydantic import PydanticOutputParser
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import BaseRetriever, Document
logging.basicConfig(level=logging.INFO)
class LineList(BaseModel):
lines: List[str] = Field(description="Lines of text")
class LineListOutputParser(PydanticOutputParser):
def __init__(self) -> None:
super().__init__(pydantic_object=LineList)
def parse(self, text: str) -> LineList:
lines = text.strip().split("\n")
return LineList(lines=lines)
# Default prompt
DEFAULT_QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an AI language model assistant. Your task is
to generate 3 different versions of the given user
question to retrieve relevant documents from a vector database.
By generating multiple perspectives on the user question,
your goal is to help the user overcome some of the limitations
of distance-based similarity search. Provide these alternative
questions seperated by newlines. Original question: {question}""",
)
class MultiQueryRetriever(BaseRetriever):
"""Given a user query, use an LLM to write a set of queries.
Retrieve docs for each query. Rake the unique union of all retrieved docs."""
def __init__(
self,
retriever: BaseRetriever,
llm_chain: LLMChain,
verbose: bool = True,
parser_key: str = "lines",
) -> None:
"""Initialize MultiQueryRetriever.
Args:
retriever: retriever to query documents from
llm_chain: llm_chain for query generation
verbose: show the queries that we generated to the user
parser_key: attribute name for the parsed output
Returns:
MultiQueryRetriever
"""
self.retriever = retriever
self.llm_chain = llm_chain
self.verbose = verbose
self.parser_key = parser_key
@classmethod
def from_llm(
cls,
retriever: BaseRetriever,
llm: BaseLLM,
prompt: PromptTemplate = DEFAULT_QUERY_PROMPT,
parser_key: str = "lines",
) -> "MultiQueryRetriever":
"""Initialize from llm using default template.
Args:
retriever: retriever to query documents from
llm: llm for query generation using DEFAULT_QUERY_PROMPT
Returns:
MultiQueryRetriever
"""
output_parser = LineListOutputParser()
llm_chain = LLMChain(llm=llm, prompt=prompt, output_parser=output_parser)
return cls(
retriever=retriever,
llm_chain=llm_chain,
parser_key=parser_key,
)
def get_relevant_documents(self, question: str) -> List[Document]:
"""Get relevated documents given a user query.
Args:
question: user query
Returns:
Unique union of relevant documents from all generated queries
"""
queries = self.generate_queries(question)
documents = self.retrieve_documents(queries)
unique_documents = self.unique_union(documents)
return unique_documents
async def aget_relevant_documents(self, query: str) -> List[Document]:
raise NotImplementedError
def generate_queries(self, question: str) -> List[str]:
"""Generate queries based upon user input.
Args:
question: user query
Returns:
List of LLM generated queries that are similar to the user input
"""
response = self.llm_chain({"question": question})
lines = getattr(response["text"], self.parser_key, [])
if self.verbose:
logging.info(f"Generated queries: {lines}")
return lines
def retrieve_documents(self, queries: List[str]) -> List[Document]:
"""Run all LLM generated queries.
Args:
queries: query list
Returns:
List of retrived Documents
"""
documents = []
for query in queries:
docs = self.retriever.get_relevant_documents(query)
documents.extend(docs)
return documents
def unique_union(self, documents: List[Document]) -> List[Document]:
"""Get uniqe Documents.
Args:
documents: List of retrived Documents
Returns:
List of unique retrived Documents
"""
# Create a dictionary with page_content as keys to remove duplicates
# TODO: Add Document ID property (e.g., UUID)
unique_documents_dict = {
(doc.page_content, tuple(sorted(doc.metadata.items()))): doc
for doc in documents
}
unique_documents = list(unique_documents_dict.values())
return unique_documents

View File

@@ -1,6 +1,6 @@
"""## Zapier Natural Language Actions API
\
Full docs here: https://nla.zapier.com/api/v1/docs
Full docs here: https://nla.zapier.com/start/
**Zapier Natural Language Actions** gives you access to the 5k+ apps, 20k+ actions
on Zapier's platform through a natural language API interface.
@@ -24,8 +24,8 @@ NLA offers both API Key and OAuth for signing NLA API requests.
connected accounts on Zapier.com
This quick start will focus on the server-side use case for brevity.
Review [full docs](https://nla.zapier.com/api/v1/docs) or reach out to
nla@zapier.com for user-facing oauth developer support.
Review [full docs](https://nla.zapier.com/start/) for user-facing oauth developer
support.
Typically, you'd use SequentialChain, here's a basic example:
@@ -42,8 +42,7 @@ import os
# get from https://platform.openai.com/
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "")
# get from https://nla.zapier.com/demo/provider/debug
# (under User Information, after logging in):
# get from https://nla.zapier.com/docs/authentication/
os.environ["ZAPIER_NLA_API_KEY"] = os.environ.get("ZAPIER_NLA_API_KEY", "")
from langchain.llms import OpenAI
@@ -61,8 +60,9 @@ from langchain.utilities.zapier import ZapierNLAWrapper
llm = OpenAI(temperature=0)
zapier = ZapierNLAWrapper()
## To leverage a nla_oauth_access_token you may pass the value to the ZapierNLAWrapper
## If you do this there is no need to initialize the ZAPIER_NLA_API_KEY env variable
## To leverage OAuth you may pass the value `nla_oauth_access_token` to
## the ZapierNLAWrapper. If you do this there is no need to initialize
## the ZAPIER_NLA_API_KEY env variable
# zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token="TOKEN_HERE")
toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)
agent = initialize_agent(
@@ -99,7 +99,7 @@ class ZapierNLARunAction(BaseTool):
(eg. "get the latest email from Mike Knoop" for "Gmail: find email" action)
params: a dict, optional. Any params provided will *override* AI guesses
from `instructions` (see "understanding the AI guessing flow" here:
https://nla.zapier.com/api/v1/docs)
https://nla.zapier.com/docs/using-the-api#ai-guessing)
"""
@@ -142,11 +142,15 @@ class ZapierNLARunAction(BaseTool):
async def _arun(
self,
_: str,
instructions: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
raise NotImplementedError("ZapierNLAListActions does not support async")
return await self.api_wrapper.arun_as_str(
self.action_id,
instructions,
self.params,
)
ZapierNLARunAction.__doc__ = (
@@ -184,7 +188,7 @@ class ZapierNLAListActions(BaseTool):
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
raise NotImplementedError("ZapierNLAListActions does not support async")
return await self.api_wrapper.alist_as_str()
ZapierNLAListActions.__doc__ = (

View File

@@ -322,7 +322,7 @@ class SearxSearchWrapper(BaseModel):
str: The result of the query.
Raises:
ValueError: If an error occured with the query.
ValueError: If an error occurred with the query.
Example:

View File

@@ -36,7 +36,7 @@ class SerpAPIWrapper(BaseModel):
Example:
.. code-block:: python
from langchain import SerpAPIWrapper
from langchain.utilities import SerpAPIWrapper
serpapi = SerpAPIWrapper()
"""

View File

@@ -1,6 +1,6 @@
"""Util that can interact with Zapier NLA.
Full docs here: https://nla.zapier.com/api/v1/docs
Full docs here: https://nla.zapier.com/start/
Note: this wrapper currently only implemented the `api_key` auth method for testing
and server-side production use cases (using the developer's connected accounts on
@@ -12,8 +12,9 @@ to use oauth. Review the full docs above and reach out to nla@zapier.com for
developer support.
"""
import json
from typing import Dict, List, Optional
from typing import Any, Dict, List, Optional
import aiohttp
import requests
from pydantic import BaseModel, Extra, root_validator
from requests import Request, Session
@@ -24,16 +25,20 @@ from langchain.utils import get_from_dict_or_env
class ZapierNLAWrapper(BaseModel):
"""Wrapper for Zapier NLA.
Full docs here: https://nla.zapier.com/api/v1/docs
Full docs here: https://nla.zapier.com/start/
Note: this wrapper currently only implemented the `api_key` auth method for
testingand server-side production use cases (using the developer's connected
accounts on Zapier.com)
This wrapper supports both API Key and OAuth Credential auth methods. API Key
is the fastest way to get started using this wrapper.
Call this wrapper with either `zapier_nla_api_key` or
`zapier_nla_oauth_access_token` arguments, or set the `ZAPIER_NLA_API_KEY`
environment variable. If both arguments are set, the Access Token will take
precedence.
For use-cases where LangChain + Zapier NLA is powering a user-facing application,
and LangChain needs access to the end-user's connected accounts on Zapier.com,
you'll need to use oauth. Review the full docs above and reach out to
nla@zapier.com for developer support.
you'll need to use OAuth. Review the full docs above to learn how to create
your own provider and generate credentials.
"""
zapier_nla_api_key: str
@@ -45,36 +50,63 @@ class ZapierNLAWrapper(BaseModel):
extra = Extra.forbid
def _get_session(self) -> Session:
session = requests.Session()
session.headers.update(
{
"Accept": "application/json",
"Content-Type": "application/json",
}
)
def _format_headers(self) -> Dict[str, str]:
"""Format headers for requests."""
headers = {
"Accept": "application/json",
"Content-Type": "application/json",
}
if self.zapier_nla_oauth_access_token:
session.headers.update(
headers.update(
{"Authorization": f"Bearer {self.zapier_nla_oauth_access_token}"}
)
else:
session.params = {"api_key": self.zapier_nla_api_key}
headers.update({"X-API-Key": self.zapier_nla_api_key})
return headers
def _get_session(self) -> Session:
session = requests.Session()
session.headers.update(self._format_headers())
return session
def _get_action_request(
self, action_id: str, instructions: str, params: Optional[Dict] = None
) -> Request:
async def _arequest(self, method: str, url: str, **kwargs: Any) -> Dict[str, Any]:
"""Make an async request."""
async with aiohttp.ClientSession(headers=self._format_headers()) as session:
async with session.request(method, url, **kwargs) as response:
response.raise_for_status()
return await response.json()
def _create_action_payload( # type: ignore[no-untyped-def]
self, instructions: str, params: Optional[Dict] = None, preview_only=False
) -> Dict:
"""Create a payload for an action."""
data = params if params else {}
data.update(
{
"instructions": instructions,
}
)
if preview_only:
data.update({"preview_only": True})
return data
def _create_action_url(self, action_id: str) -> str:
"""Create a url for an action."""
return self.zapier_nla_api_base + f"exposed/{action_id}/execute/"
def _create_action_request( # type: ignore[no-untyped-def]
self,
action_id: str,
instructions: str,
params: Optional[Dict] = None,
preview_only=False,
) -> Request:
data = self._create_action_payload(instructions, params, preview_only)
return Request(
"POST",
self.zapier_nla_api_base + f"exposed/{action_id}/execute/",
self._create_action_url(action_id),
json=data,
)
@@ -103,7 +135,7 @@ class ZapierNLAWrapper(BaseModel):
return values
def list(self) -> List[Dict]:
async def alist(self) -> List[Dict]:
"""Returns a list of all exposed (enabled) actions associated with
current user (associated with the set api_key). Change your exposed
actions here: https://nla.zapier.com/demo/start/
@@ -122,9 +154,45 @@ class ZapierNLAWrapper(BaseModel):
(see "understanding the AI guessing flow" here:
https://nla.zapier.com/api/v1/docs)
"""
response = await self._arequest("GET", self.zapier_nla_api_base + "exposed/")
return response["results"]
def list(self) -> List[Dict]:
"""Returns a list of all exposed (enabled) actions associated with
current user (associated with the set api_key). Change your exposed
actions here: https://nla.zapier.com/demo/start/
The return list can be empty if no actions exposed. Else will contain
a list of action objects:
[{
"id": str,
"description": str,
"params": Dict[str, str]
}]
`params` will always contain an `instructions` key, the only required
param. All others optional and if provided will override any AI guesses
(see "understanding the AI guessing flow" here:
https://nla.zapier.com/docs/using-the-api#ai-guessing)
"""
session = self._get_session()
response = session.get(self.zapier_nla_api_base + "exposed/")
response.raise_for_status()
try:
response = session.get(self.zapier_nla_api_base + "exposed/")
response.raise_for_status()
except requests.HTTPError as http_err:
if response.status_code == 401:
if self.zapier_nla_oauth_access_token:
raise requests.HTTPError(
f"An unauthorized response occurred. Check that your "
f"access token is correct and doesn't need to be "
f"refreshed. Err: {http_err}"
)
raise requests.HTTPError(
f"An unauthorized response occurred. Check that your api "
f"key is correct. Err: {http_err}"
)
raise http_err
return response.json()["results"]
def run(
@@ -139,11 +207,29 @@ class ZapierNLAWrapper(BaseModel):
call.
"""
session = self._get_session()
request = self._get_action_request(action_id, instructions, params)
request = self._create_action_request(action_id, instructions, params)
response = session.send(session.prepare_request(request))
response.raise_for_status()
return response.json()["result"]
async def arun(
self, action_id: str, instructions: str, params: Optional[Dict] = None
) -> Dict:
"""Executes an action that is identified by action_id, must be exposed
(enabled) by the current user (associated with the set api_key). Change
your exposed actions here: https://nla.zapier.com/demo/start/
The return JSON is guaranteed to be less than ~500 words (350
tokens) making it safe to inject into the prompt of another LLM
call.
"""
response = await self._arequest(
"POST",
self._create_action_url(action_id),
json=self._create_action_payload(instructions, params),
)
return response["result"]
def preview(
self, action_id: str, instructions: str, params: Optional[Dict] = None
) -> Dict:
@@ -153,25 +239,58 @@ class ZapierNLAWrapper(BaseModel):
session = self._get_session()
params = params if params else {}
params.update({"preview_only": True})
request = self._get_action_request(action_id, instructions, params)
request = self._create_action_request(action_id, instructions, params, True)
response = session.send(session.prepare_request(request))
response.raise_for_status()
return response.json()["input_params"]
async def apreview(
self, action_id: str, instructions: str, params: Optional[Dict] = None
) -> Dict:
"""Same as run, but instead of actually executing the action, will
instead return a preview of params that have been guessed by the AI in
case you need to explicitly review before executing."""
response = await self._arequest(
"POST",
self._create_action_url(action_id),
json=self._create_action_payload(instructions, params, preview_only=True),
)
return response["result"]
def run_as_str(self, *args, **kwargs) -> str: # type: ignore[no-untyped-def]
"""Same as run, but returns a stringified version of the JSON for
insertting back into an LLM."""
data = self.run(*args, **kwargs)
return json.dumps(data)
async def arun_as_str(self, *args, **kwargs) -> str: # type: ignore[no-untyped-def]
"""Same as run, but returns a stringified version of the JSON for
insertting back into an LLM."""
data = await self.arun(*args, **kwargs)
return json.dumps(data)
def preview_as_str(self, *args, **kwargs) -> str: # type: ignore[no-untyped-def]
"""Same as preview, but returns a stringified version of the JSON for
insertting back into an LLM."""
data = self.preview(*args, **kwargs)
return json.dumps(data)
async def apreview_as_str( # type: ignore[no-untyped-def]
self, *args, **kwargs
) -> str:
"""Same as preview, but returns a stringified version of the JSON for
insertting back into an LLM."""
data = await self.apreview(*args, **kwargs)
return json.dumps(data)
def list_as_str(self) -> str: # type: ignore[no-untyped-def]
"""Same as list, but returns a stringified version of the JSON for
insertting back into an LLM."""
actions = self.list()
return json.dumps(actions)
async def alist_as_str(self) -> str: # type: ignore[no-untyped-def]
"""Same as list, but returns a stringified version of the JSON for
insertting back into an LLM."""
actions = await self.alist()
return json.dumps(actions)

View File

@@ -354,15 +354,16 @@ class Pinecone(VectorStore):
pinecone.Index(index_name), embedding.embed_query, text_key, namespace
)
def delete(self, ids: List[str]) -> None:
def delete(self, ids: List[str], namespace: Optional[str] = None) -> None:
"""Delete by vector IDs.
Args:
ids: List of ids to delete.
"""
# This is the maximum number of IDs that can be deleted
if namespace is None:
namespace = self._namespace
chunk_size = 1000
for i in range(0, len(ids), chunk_size):
chunk = ids[i : i + chunk_size]
self._index.delete(ids=chunk)
self._index.delete(ids=chunk, namespace=namespace)

566
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,6 @@
[tool.poetry]
name = "langchain"
version = "0.0.216"
version = "0.0.218"
description = "Building applications with LLMs through composability"
authors = []
license = "MIT"
@@ -88,7 +88,6 @@ gql = {version = "^3.4.1", optional = true}
pandas = {version = "^2.0.1", optional = true}
telethon = {version = "^1.28.5", optional = true}
neo4j = {version = "^5.8.1", optional = true}
psychicapi = {version = "^0.5", optional = true}
zep-python = {version=">=0.31", optional=true}
langkit = {version = ">=0.0.1.dev3, <0.1.0", optional = true}
chardet = {version="^5.1.0", optional=true}
@@ -109,8 +108,10 @@ nebula3-python = {version = "^3.4.0", optional = true}
langchainplus-sdk = ">=0.0.17"
awadb = {version = "^0.3.3", optional = true}
azure-search-documents = {version = "11.4.0a20230509004", source = "azure-sdk-dev", optional = true}
esprima = {version = "^4.0.1", optional = true}
openllm = {version = ">=0.1.6", optional = true}
streamlit = {version = "^1.18.0", optional = true, python = ">=3.8.1,<3.9.7 || >3.9.7,<4.0"}
psychicapi = {version = "^0.8.0", optional = true}
[tool.poetry.group.docs.dependencies]
autodoc_pydantic = "^1.8.0"
@@ -222,6 +223,7 @@ clarifai = ["clarifai"]
cohere = ["cohere"]
docarray = ["docarray"]
embeddings = ["sentence-transformers"]
javascript = ["esprima"]
azure = [
"azure-identity",
"azure-cosmos",
@@ -303,6 +305,7 @@ all = [
"tigrisdb",
"nebula3-python",
"awadb",
"esprima",
]
# An extra used to be able to add extended testing.
@@ -312,6 +315,7 @@ extended_testing = [
"beautifulsoup4",
"bibtexparser",
"chardet",
"esprima",
"jq",
"pdfminer.six",
"pgvector",
@@ -354,7 +358,7 @@ exclude = [
[tool.mypy]
ignore_missing_imports = "True"
disallow_untyped_defs = "True"
exclude = ["notebooks"]
exclude = ["notebooks", "examples", "example_data"]
[tool.coverage.run]
omit = [

View File

@@ -0,0 +1,25 @@
import os
from pathlib import Path
from langchain.chains.openai_functions.openapi import get_openapi_chain
def test_openai_opeanapi() -> None:
chain = get_openapi_chain(
"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/"
)
output = chain.run("What are some options for a men's large blue button down shirt")
assert isinstance(output, dict)
def test_openai_opeanapi_headers() -> None:
BRANDFETCH_API_KEY = os.environ.get("BRANDFETCH_API_KEY")
headers = {"Authorization": f"Bearer {BRANDFETCH_API_KEY}"}
file_path = str(
Path(__file__).parents[2] / "examples/brandfetch-brandfetch-2.0.0-resolved.json"
)
chain = get_openapi_chain(file_path, headers=headers)
output = chain.run("I want to know about nike.comgg")
assert isinstance(output, str)

View File

@@ -0,0 +1,133 @@
from pathlib import Path
import pytest
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import LanguageParser
from langchain.text_splitter import Language
def test_language_loader_for_python() -> None:
"""Test Python loader with parser enabled."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path, glob="hello_world.py", parser=LanguageParser(parser_threshold=5)
)
docs = loader.load()
assert len(docs) == 2
metadata = docs[0].metadata
assert metadata["source"] == str(file_path / "hello_world.py")
assert metadata["content_type"] == "functions_classes"
assert metadata["language"] == "python"
metadata = docs[1].metadata
assert metadata["source"] == str(file_path / "hello_world.py")
assert metadata["content_type"] == "simplified_code"
assert metadata["language"] == "python"
assert (
docs[0].page_content
== """def main():
print("Hello World!")
return 0"""
)
assert (
docs[1].page_content
== """#!/usr/bin/env python3
import sys
# Code for: def main():
if __name__ == "__main__":
sys.exit(main())"""
)
def test_language_loader_for_python_with_parser_threshold() -> None:
"""Test Python loader with parser enabled and below threshold."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path,
glob="hello_world.py",
parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000),
)
docs = loader.load()
assert len(docs) == 1
def esprima_installed() -> bool:
try:
import esprima # noqa: F401
return True
except Exception as e:
print(f"esprima not installed, skipping test {e}")
return False
@pytest.mark.skipif(not esprima_installed(), reason="requires esprima package")
def test_language_loader_for_javascript() -> None:
"""Test JavaScript loader with parser enabled."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path, glob="hello_world.js", parser=LanguageParser(parser_threshold=5)
)
docs = loader.load()
assert len(docs) == 3
metadata = docs[0].metadata
assert metadata["source"] == str(file_path / "hello_world.js")
assert metadata["content_type"] == "functions_classes"
assert metadata["language"] == "js"
metadata = docs[1].metadata
assert metadata["source"] == str(file_path / "hello_world.js")
assert metadata["content_type"] == "functions_classes"
assert metadata["language"] == "js"
metadata = docs[2].metadata
assert metadata["source"] == str(file_path / "hello_world.js")
assert metadata["content_type"] == "simplified_code"
assert metadata["language"] == "js"
assert (
docs[0].page_content
== """class HelloWorld {
sayHello() {
console.log("Hello World!");
}
}"""
)
assert (
docs[1].page_content
== """function main() {
const hello = new HelloWorld();
hello.sayHello();
}"""
)
assert (
docs[2].page_content
== """// Code for: class HelloWorld {
// Code for: function main() {
main();"""
)
def test_language_loader_for_javascript_with_parser_threshold() -> None:
"""Test JavaScript loader with parser enabled and below threshold."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path,
glob="hello_world.js",
parser=LanguageParser(language=Language.JS, parser_threshold=1000),
)
docs = loader.load()
assert len(docs) == 1

View File

@@ -0,0 +1,14 @@
from langchain.document_loaders.larksuite import LarkSuiteDocLoader
DOMAIN = ""
ACCESS_TOKEN = ""
DOCUMENT_ID = ""
def test_larksuite_doc_loader() -> None:
"""Test LarkSuite (FeiShu) document loader."""
loader = LarkSuiteDocLoader(DOMAIN, ACCESS_TOKEN, DOCUMENT_ID)
docs = loader.load()
assert len(docs) == 1
assert docs[0].page_content is not None

View File

@@ -0,0 +1,15 @@
import os
from pathlib import Path
from langchain.document_loaders import UnstructuredOrgModeLoader
EXAMPLE_DIRECTORY = file_path = Path(__file__).parent.parent / "examples"
def test_unstructured_org_mode_loader() -> None:
"""Test unstructured loader."""
file_path = os.path.join(EXAMPLE_DIRECTORY, "README.org")
loader = UnstructuredOrgModeLoader(str(file_path))
docs = loader.load()
assert len(docs) == 1

View File

@@ -0,0 +1,27 @@
* Example Docs
The sample docs directory contains the following files:
- ~example-10k.html~ - A 10-K SEC filing in HTML format
- ~layout-parser-paper.pdf~ - A PDF copy of the layout parser paper
- ~factbook.xml~ / ~factbook.xsl~ - Example XML/XLS files that you
can use to test stylesheets
These documents can be used to test out the parsers in the library. In
addition, here are instructions for pulling in some sample docs that are
too big to store in the repo.
** XBRL 10-K
You can get an example 10-K in inline XBRL format using the following
~curl~. Note, you need to have the user agent set in the header or the
SEC site will reject your request.
#+BEGIN_SRC bash
curl -O \
-A '${organization} ${email}'
https://www.sec.gov/Archives/edgar/data/311094/000117184321001344/0001171843-21-001344.txt
#+END_SRC
You can parse this document using the HTML parser.

View File

@@ -0,0 +1,282 @@
{
"openapi": "3.0.1",
"info": {
"title": "Brandfetch API",
"description": "Brandfetch API (v2) for retrieving brand information.\n\nSee our [documentation](https://docs.brandfetch.com/) for further details. ",
"termsOfService": "https://brandfetch.com/terms",
"contact": {
"url": "https://brandfetch.com/developers"
},
"version": "2.0.0"
},
"externalDocs": {
"description": "Documentation",
"url": "https://docs.brandfetch.com/"
},
"servers": [
{
"url": "https://api.brandfetch.io/v2"
}
],
"paths": {
"/brands/{domainOrId}": {
"get": {
"summary": "Retrieve a brand",
"description": "Fetch brand information by domain or ID\n\nFurther details here: https://docs.brandfetch.com/reference/retrieve-brand\n",
"parameters": [
{
"name": "domainOrId",
"in": "path",
"description": "Domain or ID of the brand",
"required": true,
"style": "simple",
"explode": false,
"schema": {
"type": "string"
}
}
],
"responses": {
"200": {
"description": "Brand data",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Brand"
},
"examples": {
"brandfetch.com": {
"value": "{\"name\":\"Brandfetch\",\"domain\":\"brandfetch.com\",\"claimed\":true,\"description\":\"All brands. In one place\",\"links\":[{\"name\":\"twitter\",\"url\":\"https://twitter.com/brandfetch\"},{\"name\":\"linkedin\",\"url\":\"https://linkedin.com/company/brandfetch\"}],\"logos\":[{\"type\":\"logo\",\"theme\":\"light\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/id9WE9j86h.svg\",\"background\":\"transparent\",\"format\":\"svg\",\"size\":15555}]},{\"type\":\"logo\",\"theme\":\"dark\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/idWbsK1VCy.png\",\"background\":\"transparent\",\"format\":\"png\",\"height\":215,\"width\":800,\"size\":33937},{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/idtCMfbWO0.svg\",\"background\":\"transparent\",\"format\":\"svg\",\"height\":null,\"width\":null,\"size\":15567}]},{\"type\":\"symbol\",\"theme\":\"light\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/idXGq6SIu2.svg\",\"background\":\"transparent\",\"format\":\"svg\",\"size\":2215}]},{\"type\":\"symbol\",\"theme\":\"dark\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/iddCQ52AR5.svg\",\"background\":\"transparent\",\"format\":\"svg\",\"size\":2215}]},{\"type\":\"icon\",\"theme\":\"dark\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/idls3LaPPQ.png\",\"background\":null,\"format\":\"png\",\"height\":400,\"width\":400,\"size\":2565}]}],\"colors\":[{\"hex\":\"#0084ff\",\"type\":\"accent\",\"brightness\":113},{\"hex\":\"#00193E\",\"type\":\"brand\",\"brightness\":22},{\"hex\":\"#F03063\",\"type\":\"brand\",\"brightness\":93},{\"hex\":\"#7B0095\",\"type\":\"brand\",\"brightness\":37},{\"hex\":\"#76CC4B\",\"type\":\"brand\",\"brightness\":176},{\"hex\":\"#FFDA00\",\"type\":\"brand\",\"brightness\":210},{\"hex\":\"#000000\",\"type\":\"dark\",\"brightness\":0},{\"hex\":\"#ffffff\",\"type\":\"light\",\"brightness\":255}],\"fonts\":[{\"name\":\"Poppins\",\"type\":\"title\",\"origin\":\"google\",\"originId\":\"Poppins\",\"weights\":[]},{\"name\":\"Inter\",\"type\":\"body\",\"origin\":\"google\",\"originId\":\"Inter\",\"weights\":[]}],\"images\":[{\"type\":\"banner\",\"formats\":[{\"src\":\"https://asset.brandfetch.io/idL0iThUh6/idUuia5imo.png\",\"background\":\"transparent\",\"format\":\"png\",\"height\":500,\"width\":1500,\"size\":5539}]}]}"
}
}
}
}
},
"400": {
"description": "Invalid domain or ID supplied"
},
"404": {
"description": "The brand does not exist or the domain can't be resolved."
}
},
"security": [
{
"bearerAuth": []
}
]
}
}
},
"components": {
"schemas": {
"Brand": {
"required": [
"claimed",
"colors",
"description",
"domain",
"fonts",
"images",
"links",
"logos",
"name"
],
"type": "object",
"properties": {
"images": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ImageAsset"
}
},
"fonts": {
"type": "array",
"items": {
"$ref": "#/components/schemas/FontAsset"
}
},
"domain": {
"type": "string"
},
"claimed": {
"type": "boolean"
},
"name": {
"type": "string"
},
"description": {
"type": "string"
},
"links": {
"type": "array",
"items": {
"$ref": "#/components/schemas/Brand_links"
}
},
"logos": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ImageAsset"
}
},
"colors": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ColorAsset"
}
}
},
"description": "Object representing a brand"
},
"ColorAsset": {
"required": [
"brightness",
"hex",
"type"
],
"type": "object",
"properties": {
"brightness": {
"type": "integer"
},
"hex": {
"type": "string"
},
"type": {
"type": "string",
"enum": [
"accent",
"brand",
"customizable",
"dark",
"light",
"vibrant"
]
}
},
"description": "Brand color asset"
},
"FontAsset": {
"type": "object",
"properties": {
"originId": {
"type": "string"
},
"origin": {
"type": "string",
"enum": [
"adobe",
"custom",
"google",
"system"
]
},
"name": {
"type": "string"
},
"type": {
"type": "string"
},
"weights": {
"type": "array",
"items": {
"type": "number"
}
},
"items": {
"type": "string"
}
},
"description": "Brand font asset"
},
"ImageAsset": {
"required": [
"formats",
"theme",
"type"
],
"type": "object",
"properties": {
"formats": {
"type": "array",
"items": {
"$ref": "#/components/schemas/ImageFormat"
}
},
"theme": {
"type": "string",
"enum": [
"light",
"dark"
]
},
"type": {
"type": "string",
"enum": [
"logo",
"icon",
"symbol",
"banner"
]
}
},
"description": "Brand image asset"
},
"ImageFormat": {
"required": [
"background",
"format",
"size",
"src"
],
"type": "object",
"properties": {
"size": {
"type": "integer"
},
"src": {
"type": "string"
},
"background": {
"type": "string",
"enum": [
"transparent"
]
},
"format": {
"type": "string"
},
"width": {
"type": "integer"
},
"height": {
"type": "integer"
}
},
"description": "Brand image asset image format"
},
"Brand_links": {
"required": [
"name",
"url"
],
"type": "object",
"properties": {
"name": {
"type": "string"
},
"url": {
"type": "string"
}
}
}
},
"securitySchemes": {
"bearerAuth": {
"type": "http",
"scheme": "bearer",
"bearerFormat": "API Key"
}
}
}
}

View File

@@ -0,0 +1,12 @@
class HelloWorld {
sayHello() {
console.log("Hello World!");
}
}
function main() {
const hello = new HelloWorld();
hello.sayHello();
}
main();

View File

@@ -0,0 +1,13 @@
#!/usr/bin/env python3
import sys
def main():
print("Hello World!")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -6,3 +6,5 @@
[2023/5/4, 16:13:23] ~ User 2: See you!
7/19/22, 11:32PM - User 1: Hello
7/20/22, 11:32am - User 2: Goodbye
4/20/23, 9:42am - User 3: <Media omitted>
6/29/23, 12:16am - User 4: This message was deleted

View File

@@ -0,0 +1,46 @@
import unittest
import pytest
from langchain.document_loaders.parsers.language.javascript import JavaScriptSegmenter
@pytest.mark.requires("esprima")
class TestJavaScriptSegmenter(unittest.TestCase):
def setUp(self) -> None:
self.example_code = """const os = require('os');
function hello(text) {
console.log(text);
}
class Simple {
constructor() {
this.a = 1;
}
}
hello("Hello!");"""
self.expected_simplified_code = """const os = require('os');
// Code for: function hello(text) {
// Code for: class Simple {
hello("Hello!");"""
self.expected_extracted_code = [
"function hello(text) {\n console.log(text);\n}",
"class Simple {\n constructor() {\n this.a = 1;\n }\n}",
]
def test_extract_functions_classes(self) -> None:
segmenter = JavaScriptSegmenter(self.example_code)
extracted_code = segmenter.extract_functions_classes()
self.assertEqual(extracted_code, self.expected_extracted_code)
def test_simplify_code(self) -> None:
segmenter = JavaScriptSegmenter(self.example_code)
simplified_code = segmenter.simplify_code()
self.assertEqual(simplified_code, self.expected_simplified_code)

View File

@@ -0,0 +1,40 @@
import unittest
from langchain.document_loaders.parsers.language.python import PythonSegmenter
class TestPythonSegmenter(unittest.TestCase):
def setUp(self) -> None:
self.example_code = """import os
def hello(text):
print(text)
class Simple:
def __init__(self):
self.a = 1
hello("Hello!")"""
self.expected_simplified_code = """import os
# Code for: def hello(text):
# Code for: class Simple:
hello("Hello!")"""
self.expected_extracted_code = [
"def hello(text):\n" " print(text)",
"class Simple:\n" " def __init__(self):\n" " self.a = 1",
]
def test_extract_functions_classes(self) -> None:
segmenter = PythonSegmenter(self.example_code)
extracted_code = segmenter.extract_functions_classes()
self.assertEqual(extracted_code, self.expected_extracted_code)
def test_simplify_code(self) -> None:
segmenter = PythonSegmenter(self.example_code)
simplified_code = segmenter.simplify_code()
self.assertEqual(simplified_code, self.expected_simplified_code)

View File

@@ -5,6 +5,7 @@ def test_parsers_public_api_correct() -> None:
"""Test public API of parsers for breaking changes."""
assert set(__all__) == {
"BS4HTMLParser",
"LanguageParser",
"OpenAIWhisperParser",
"PyPDFParser",
"PDFMinerParser",

View File

@@ -23,7 +23,7 @@ def mock_connector_id(): # type: ignore
class TestPsychicLoader:
MOCK_API_KEY = "api_key"
MOCK_CONNECTOR_ID = "notion"
MOCK_CONNECTION_ID = "connection_id"
MOCK_ACCOUNT_ID = "account_id"
def test_psychic_loader_initialization(
self, mock_psychic: MagicMock, mock_connector_id: MagicMock
@@ -31,17 +31,21 @@ class TestPsychicLoader:
PsychicLoader(
api_key=self.MOCK_API_KEY,
connector_id=self.MOCK_CONNECTOR_ID,
connection_id=self.MOCK_CONNECTION_ID,
account_id=self.MOCK_ACCOUNT_ID,
)
mock_psychic.assert_called_once_with(secret_key=self.MOCK_API_KEY)
mock_connector_id.assert_called_once_with(self.MOCK_CONNECTOR_ID)
def test_psychic_loader_load_data(self, mock_psychic: MagicMock) -> None:
mock_psychic.get_documents.return_value = [
mock_get_documents_response = MagicMock()
mock_get_documents_response.documents = [
self._get_mock_document("123"),
self._get_mock_document("456"),
]
mock_get_documents_response.next_page_cursor = None
mock_psychic.get_documents.return_value = mock_get_documents_response
psychic_loader = self._get_mock_psychic_loader(mock_psychic)
@@ -57,7 +61,7 @@ class TestPsychicLoader:
psychic_loader = PsychicLoader(
api_key=self.MOCK_API_KEY,
connector_id=self.MOCK_CONNECTOR_ID,
connection_id=self.MOCK_CONNECTION_ID,
account_id=self.MOCK_ACCOUNT_ID,
)
psychic_loader.psychic = mock_psychic
return psychic_loader

View File

@@ -1,5 +1,8 @@
"""Test building the Zapier tool, not running it."""
from unittest.mock import MagicMock, patch
import pytest
import requests
from langchain.tools.zapier.prompt import BASE_ZAPIER_TOOL_PROMPT
from langchain.tools.zapier.tool import ZapierNLARunAction
@@ -50,3 +53,234 @@ def test_custom_base_prompt_fail() -> None:
base_prompt=base_prompt,
api_wrapper=ZapierNLAWrapper(zapier_nla_api_key="test"),
)
def test_format_headers_api_key() -> None:
"""Test that the action headers is being created correctly."""
tool = ZapierNLARunAction(
action_id="test",
zapier_description="test",
params_schema={"test": "test"},
api_wrapper=ZapierNLAWrapper(zapier_nla_api_key="test"),
)
headers = tool.api_wrapper._format_headers()
assert headers["Content-Type"] == "application/json"
assert headers["Accept"] == "application/json"
assert headers["X-API-Key"] == "test"
def test_format_headers_access_token() -> None:
"""Test that the action headers is being created correctly."""
tool = ZapierNLARunAction(
action_id="test",
zapier_description="test",
params_schema={"test": "test"},
api_wrapper=ZapierNLAWrapper(zapier_nla_oauth_access_token="test"),
)
headers = tool.api_wrapper._format_headers()
assert headers["Content-Type"] == "application/json"
assert headers["Accept"] == "application/json"
assert headers["Authorization"] == "Bearer test"
def test_create_action_payload() -> None:
"""Test that the action payload is being created correctly."""
tool = ZapierNLARunAction(
action_id="test",
zapier_description="test",
params_schema={"test": "test"},
api_wrapper=ZapierNLAWrapper(zapier_nla_api_key="test"),
)
payload = tool.api_wrapper._create_action_payload("some instructions")
assert payload["instructions"] == "some instructions"
assert payload.get("preview_only") is None
def test_create_action_payload_preview() -> None:
"""Test that the action payload with preview is being created correctly."""
tool = ZapierNLARunAction(
action_id="test",
zapier_description="test",
params_schema={"test": "test"},
api_wrapper=ZapierNLAWrapper(zapier_nla_api_key="test"),
)
payload = tool.api_wrapper._create_action_payload(
"some instructions",
preview_only=True,
)
assert payload["instructions"] == "some instructions"
assert payload["preview_only"] is True
def test_create_action_payload_with_params() -> None:
"""Test that the action payload with params is being created correctly."""
tool = ZapierNLARunAction(
action_id="test",
zapier_description="test",
params_schema={"test": "test"},
api_wrapper=ZapierNLAWrapper(zapier_nla_api_key="test"),
)
payload = tool.api_wrapper._create_action_payload(
"some instructions",
{"test": "test"},
preview_only=True,
)
assert payload["instructions"] == "some instructions"
assert payload["preview_only"] is True
assert payload["test"] == "test"
@pytest.mark.asyncio
async def test_apreview(mocker) -> None: # type: ignore[no-untyped-def]
"""Test that the action payload with params is being created correctly."""
tool = ZapierNLARunAction(
action_id="test",
zapier_description="test",
params_schema={"test": "test"},
api_wrapper=ZapierNLAWrapper(
zapier_nla_api_key="test",
zapier_nla_api_base="http://localhost:8080/v1/",
),
)
mockObj = mocker.patch.object(ZapierNLAWrapper, "_arequest")
await tool.api_wrapper.apreview(
"random_action_id",
"some instructions",
{"test": "test"},
)
mockObj.assert_called_once_with(
"POST",
"http://localhost:8080/v1/exposed/random_action_id/execute/",
json={
"instructions": "some instructions",
"preview_only": True,
"test": "test",
},
)
@pytest.mark.asyncio
async def test_arun(mocker) -> None: # type: ignore[no-untyped-def]
"""Test that the action payload with params is being created correctly."""
tool = ZapierNLARunAction(
action_id="test",
zapier_description="test",
params_schema={"test": "test"},
api_wrapper=ZapierNLAWrapper(
zapier_nla_api_key="test",
zapier_nla_api_base="http://localhost:8080/v1/",
),
)
mockObj = mocker.patch.object(ZapierNLAWrapper, "_arequest")
await tool.api_wrapper.arun(
"random_action_id",
"some instructions",
{"test": "test"},
)
mockObj.assert_called_once_with(
"POST",
"http://localhost:8080/v1/exposed/random_action_id/execute/",
json={"instructions": "some instructions", "test": "test"},
)
@pytest.mark.asyncio
async def test_alist(mocker) -> None: # type: ignore[no-untyped-def]
"""Test that the action payload with params is being created correctly."""
tool = ZapierNLARunAction(
action_id="test",
zapier_description="test",
params_schema={"test": "test"},
api_wrapper=ZapierNLAWrapper(
zapier_nla_api_key="test",
zapier_nla_api_base="http://localhost:8080/v1/",
),
)
mockObj = mocker.patch.object(ZapierNLAWrapper, "_arequest")
await tool.api_wrapper.alist()
mockObj.assert_called_once_with(
"GET",
"http://localhost:8080/v1/exposed/",
)
def test_wrapper_fails_no_api_key_or_access_token_initialization() -> None:
"""Test Wrapper requires either an API Key or OAuth Access Token."""
with pytest.raises(ValueError):
ZapierNLAWrapper()
def test_wrapper_api_key_initialization() -> None:
"""Test Wrapper initializes with an API Key."""
ZapierNLAWrapper(zapier_nla_api_key="test")
def test_wrapper_access_token_initialization() -> None:
"""Test Wrapper initializes with an API Key."""
ZapierNLAWrapper(zapier_nla_oauth_access_token="test")
def test_list_raises_401_invalid_api_key() -> None:
"""Test that a valid error is raised when the API Key is invalid."""
mock_response = MagicMock()
mock_response.status_code = 401
mock_response.raise_for_status.side_effect = requests.HTTPError(
"401 Client Error: Unauthorized for url: https://nla.zapier.com/api/v1/exposed/"
)
mock_session = MagicMock()
mock_session.get.return_value = mock_response
with patch("requests.Session", return_value=mock_session):
wrapper = ZapierNLAWrapper(zapier_nla_api_key="test")
with pytest.raises(requests.HTTPError) as err:
wrapper.list()
assert str(err.value).startswith(
"An unauthorized response occurred. Check that your api key is correct. "
"Err:"
)
def test_list_raises_401_invalid_access_token() -> None:
"""Test that a valid error is raised when the API Key is invalid."""
mock_response = MagicMock()
mock_response.status_code = 401
mock_response.raise_for_status.side_effect = requests.HTTPError(
"401 Client Error: Unauthorized for url: https://nla.zapier.com/api/v1/exposed/"
)
mock_session = MagicMock()
mock_session.get.return_value = mock_response
with patch("requests.Session", return_value=mock_session):
wrapper = ZapierNLAWrapper(zapier_nla_oauth_access_token="test")
with pytest.raises(requests.HTTPError) as err:
wrapper.list()
assert str(err.value).startswith(
"An unauthorized response occurred. Check that your access token is "
"correct and doesn't need to be refreshed. Err:"
)
def test_list_raises_other_error() -> None:
"""Test that a valid error is raised when an unknown HTTP Error occurs."""
mock_response = MagicMock()
mock_response.status_code = 404
mock_response.raise_for_status.side_effect = requests.HTTPError(
"404 Client Error: Not found for url"
)
mock_session = MagicMock()
mock_session.get.return_value = mock_response
with patch("requests.Session", return_value=mock_session):
wrapper = ZapierNLAWrapper(zapier_nla_oauth_access_token="test")
with pytest.raises(requests.HTTPError) as err:
wrapper.list()
assert str(err.value) == "404 Client Error: Not found for url"