Compare commits
103 Commits
wfh/async_
...
v0.0.263
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
cdfe2c96c5 | ||
|
|
19f504790e | ||
|
|
1b58460fe3 | ||
|
|
aca8cb5fba | ||
|
|
7edf4ca396 | ||
|
|
8aab39e3ce | ||
|
|
1d3735a84c | ||
|
|
45741bcc1b | ||
|
|
9b64932e55 | ||
|
|
eaa505fb09 | ||
|
|
e21152358a | ||
|
|
edb585228d | ||
|
|
0aabded97f | ||
|
|
00bf472265 | ||
|
|
bace17e0aa | ||
|
|
44bc89b7bf | ||
|
|
e4418d1b7e | ||
|
|
6cb763507c | ||
|
|
6d03f8b5d8 | ||
|
|
16af5f8690 | ||
|
|
8cb2594562 | ||
|
|
926c64da60 | ||
|
|
31cfc00845 | ||
|
|
f7ae183f40 | ||
|
|
0e5d09d0da | ||
|
|
9249d305af | ||
|
|
01ef786e7e | ||
|
|
3b754b5461 | ||
|
|
a429145420 | ||
|
|
7f0e847c13 | ||
|
|
991b448dfc | ||
|
|
3ab4e21579 | ||
|
|
2184e3a400 | ||
|
|
c0acbdca1b | ||
|
|
a2588d6c57 | ||
|
|
b80e3825a6 | ||
|
|
6abb2c2c08 | ||
|
|
57dd4daa9a | ||
|
|
5fc07fa524 | ||
|
|
02430e25b6 | ||
|
|
ee52482db8 | ||
|
|
bb6fbf4c71 | ||
|
|
45f0f9460a | ||
|
|
105c787e5a | ||
|
|
6221eb5974 | ||
|
|
cb5fb751e9 | ||
|
|
16bd328aab | ||
|
|
8eea46ed0e | ||
|
|
67ca187560 | ||
|
|
46f3428cb3 | ||
|
|
e3fb11bc10 | ||
|
|
1edead28b8 | ||
|
|
a5a4c53280 | ||
|
|
80b98812e1 | ||
|
|
fcbbddedae | ||
|
|
e94a5d753f | ||
|
|
b7bc8ec87f | ||
|
|
6c70f491ba | ||
|
|
f3f5853e9f | ||
|
|
2431eca700 | ||
|
|
641cb80c9d | ||
|
|
08a0741d82 | ||
|
|
8d351bfc20 | ||
|
|
3bdc273ab3 | ||
|
|
206f809366 | ||
|
|
8a320e55a0 | ||
|
|
e5db8a16c0 | ||
|
|
e162fd418a | ||
|
|
abb1264edf | ||
|
|
5e05ba2140 | ||
|
|
6e14f9548b | ||
|
|
2380492c8e | ||
|
|
d21333d710 | ||
|
|
dfb93dd2b5 | ||
|
|
2c7297d243 | ||
|
|
434a96415b | ||
|
|
c7a489ae0d | ||
|
|
618cf5241e | ||
|
|
f4a47ec717 | ||
|
|
3b51817706 | ||
|
|
bbbd2b076f | ||
|
|
d248481f13 | ||
|
|
efa02ed768 | ||
|
|
5454591b0a | ||
|
|
c72da53c10 | ||
|
|
8dd071ad08 | ||
|
|
96d064e305 | ||
|
|
c2f46b2cdb | ||
|
|
808248049d | ||
|
|
a6e6e9bb86 | ||
|
|
90579021f8 | ||
|
|
539672a7fd | ||
|
|
269f85b7b7 | ||
|
|
3adb1e12ca | ||
|
|
b8df15cd64 | ||
|
|
4d72288487 | ||
|
|
3c6eccd701 | ||
|
|
7de6a1b78e | ||
|
|
a2681f950d | ||
|
|
3f64b8a761 | ||
|
|
0a1be1d501 | ||
|
|
e3056340da | ||
|
|
99b5a7226c |
22
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -1,28 +1,20 @@
|
||||
<!-- Thank you for contributing to LangChain!
|
||||
|
||||
Replace this comment with:
|
||||
Replace this entire comment with:
|
||||
- Description: a description of the change,
|
||||
- Issue: the issue # it fixes (if applicable),
|
||||
- Dependencies: any dependencies required for this change,
|
||||
- Tag maintainer: for a quicker response, tag the relevant maintainer (see below),
|
||||
- Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out!
|
||||
|
||||
Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
|
||||
Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
|
||||
|
||||
See contribution guidelines for more information on how to write/run tests, lint, etc:
|
||||
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
|
||||
|
||||
If you're adding a new integration, please include:
|
||||
1. a test for the integration, preferably unit tests that do not rely on network access,
|
||||
2. an example notebook showing its use.
|
||||
2. an example notebook showing its use. These live is docs/extras directory.
|
||||
|
||||
Maintainer responsibilities:
|
||||
- General / Misc / if you don't know who to tag: @baskaryan
|
||||
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
|
||||
- Models / Prompts: @hwchase17, @baskaryan
|
||||
- Memory: @hwchase17
|
||||
- Agents / Tools / Toolkits: @hinthornw
|
||||
- Tracing / Callbacks: @agola11
|
||||
- Async: @agola11
|
||||
|
||||
If no one reviews your PR within a few days, feel free to @-mention the same people again.
|
||||
|
||||
See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
|
||||
If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin.
|
||||
-->
|
||||
|
||||
10
.github/workflows/scheduled_test.yml
vendored
@@ -1,7 +1,8 @@
|
||||
name: Scheduled tests
|
||||
|
||||
on:
|
||||
scheduled:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
schedule:
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
env:
|
||||
@@ -9,6 +10,9 @@ env:
|
||||
|
||||
jobs:
|
||||
build:
|
||||
defaults:
|
||||
run:
|
||||
working-directory: libs/langchain
|
||||
runs-on: ubuntu-latest
|
||||
environment: Scheduled testing
|
||||
strategy:
|
||||
@@ -26,13 +30,13 @@ jobs:
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: "1.4.2"
|
||||
working-directory: libs/langchain
|
||||
install-command: |
|
||||
echo "Running scheduled tests, installing dependencies with poetry..."
|
||||
poetry install -E scheduled_testing
|
||||
poetry install --with=test_integration
|
||||
- name: Run tests
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
run: |
|
||||
make scheduled_tests
|
||||
shell: bash
|
||||
secrets: inherit
|
||||
|
||||
@@ -21,7 +21,7 @@ Looking for the JS/TS version? Check out [LangChain.js](https://github.com/hwcha
|
||||
**Production Support:** As you move your LangChains into production, we'd love to offer more hands-on support.
|
||||
Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) to share more about what you're building, and our team will get in touch.
|
||||
|
||||
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28
|
||||
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
|
||||
|
||||
In an effort to make `langchain` leaner and safer, we are moving select chains to `langchain_experimental`.
|
||||
This migration has already started, but we are remaining backwards compatible until 7/28.
|
||||
|
||||
@@ -145,30 +145,36 @@ def _load_package_modules(
|
||||
package_name = package_path.name
|
||||
|
||||
for file_path in package_path.rglob("*.py"):
|
||||
if not file_path.name.startswith("__"):
|
||||
relative_module_name = file_path.relative_to(package_path)
|
||||
# Get the full namespace of the module
|
||||
namespace = str(relative_module_name).replace(".py", "").replace("/", ".")
|
||||
# Keep only the top level namespace
|
||||
top_namespace = namespace.split(".")[0]
|
||||
if file_path.name.startswith("_"):
|
||||
continue
|
||||
|
||||
try:
|
||||
module_members = _load_module_members(
|
||||
f"{package_name}.{namespace}", namespace
|
||||
relative_module_name = file_path.relative_to(package_path)
|
||||
|
||||
if relative_module_name.name.startswith("_"):
|
||||
continue
|
||||
|
||||
# Get the full namespace of the module
|
||||
namespace = str(relative_module_name).replace(".py", "").replace("/", ".")
|
||||
# Keep only the top level namespace
|
||||
top_namespace = namespace.split(".")[0]
|
||||
|
||||
try:
|
||||
module_members = _load_module_members(
|
||||
f"{package_name}.{namespace}", namespace
|
||||
)
|
||||
# Merge module members if the namespace already exists
|
||||
if top_namespace in modules_by_namespace:
|
||||
existing_module_members = modules_by_namespace[top_namespace]
|
||||
_module_members = _merge_module_members(
|
||||
[existing_module_members, module_members]
|
||||
)
|
||||
# Merge module members if the namespace already exists
|
||||
if top_namespace in modules_by_namespace:
|
||||
existing_module_members = modules_by_namespace[top_namespace]
|
||||
_module_members = _merge_module_members(
|
||||
[existing_module_members, module_members]
|
||||
)
|
||||
else:
|
||||
_module_members = module_members
|
||||
else:
|
||||
_module_members = module_members
|
||||
|
||||
modules_by_namespace[top_namespace] = _module_members
|
||||
modules_by_namespace[top_namespace] = _module_members
|
||||
|
||||
except ImportError as e:
|
||||
print(f"Error: Unable to import module '{namespace}' with error: {e}")
|
||||
except ImportError as e:
|
||||
print(f"Error: Unable to import module '{namespace}' with error: {e}")
|
||||
|
||||
return modules_by_namespace
|
||||
|
||||
@@ -222,9 +228,9 @@ Classes
|
||||
"""
|
||||
|
||||
for class_ in classes:
|
||||
if not class_['is_public']:
|
||||
if not class_["is_public"]:
|
||||
continue
|
||||
|
||||
|
||||
if class_["kind"] == "TypedDict":
|
||||
template = "typeddict.rst"
|
||||
elif class_["kind"] == "enum":
|
||||
|
||||
54
docs/docs_skeleton/docs/community.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Community Navigator
|
||||
|
||||
Hi! Thanks for being here. We’re lucky to have a community of so many passionate developers building with LangChain–we have so much to teach and learn from each other. Community members contribute code, host meetups, write blog posts, amplify each other’s work, become each other's customers and collaborators, and so much more.
|
||||
|
||||
Whether you’re new to LangChain, looking to go deeper, or just want to get more exposure to the world of building with LLMs, this page can point you in the right direction.
|
||||
|
||||
- **🦜 Contribute to LangChain**
|
||||
|
||||
- **🌍 Meetups, Events, and Hackathons**
|
||||
|
||||
- **📣 Help Us Amplify Your Work**
|
||||
|
||||
- **💬 Stay in the loop**
|
||||
|
||||
|
||||
# 🦜 Contribute to LangChain
|
||||
|
||||
LangChain is the product of over 5,000+ contributions by 1,500+ contributors, and there is ******still****** so much to do together. Here are some ways to get involved:
|
||||
|
||||
- **[Open a pull request](https://github.com/langchain-ai/langchain/issues):** we’d appreciate all forms of contributions–new features, infrastructure improvements, better documentation, bug fixes, etc. If you have an improvement or an idea, we’d love to work on it with you.
|
||||
- **[Read our contributor guidelines:](https://github.com/langchain-ai/langchain/blob/bbd22b9b761389a5e40fc45b0570e1830aabb707/.github/CONTRIBUTING.md)** We ask contributors to follow a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow, run a few local checks for formatting, linting, and testing before submitting, and follow certain documentation and testing conventions.
|
||||
- **First time contributor?** [Try one of these PRs with the “good first issue” tag](https://github.com/langchain-ai/langchain/contribute).
|
||||
- **Become an expert:** our experts help the community by answering product questions in Discord. If that’s a role you’d like to play, we’d be so grateful! (And we have some special experts-only goodies/perks we can tell you more about). Send us an email to introduce yourself at hello@langchain.dev and we’ll take it from there!
|
||||
- **Integrate with LangChain:** if your product integrates with LangChain–or aspires to–we want to help make sure the experience is as smooth as possible for you and end users. Send us an email at hello@langchain.dev and tell us what you’re working on.
|
||||
- **Become an Integration Maintainer:** Partner with our team to ensure your integration stays up-to-date and talk directly with users (and answer their inquiries) in our Discord. Introduce yourself at hello@langchain.dev if you’d like to explore this role.
|
||||
|
||||
|
||||
# 🌍 Meetups, Events, and Hackathons
|
||||
|
||||
One of our favorite things about working in AI is how much enthusiasm there is for building together. We want to help make that as easy and impactful for you as possible!
|
||||
- **Find a meetup, hackathon, or webinar:** you can find the one for you on on our [global events calendar](https://mirror-feeling-d80.notion.site/0bc81da76a184297b86ca8fc782ee9a3?v=0d80342540df465396546976a50cfb3f).
|
||||
- **Submit an event to our calendar:** email us at events@langchain.dev with a link to your event page! We can also help you spread the word with our local communities.
|
||||
- **Host a meetup:** If you want to bring a group of builders together, we want to help! We can publicize your event on our event calendar/Twitter, share with our local communities in Discord, send swag, or potentially hook you up with a sponsor. Email us at events@langchain.dev to tell us about your event!
|
||||
- **Become a meetup sponsor:** we often hear from groups of builders that want to get together, but are blocked or limited on some dimension (space to host, budget for snacks, prizes to distribute, etc.). If you’d like to help, send us an email to events@langchain.dev we can share more about how it works!
|
||||
- **Speak at an event:** meetup hosts are always looking for great speakers, presenters, and panelists. If you’d like to do that at an event, send us an email to hello@langchain.dev with more information about yourself, what you want to talk about, and what city you’re based in and we’ll try to match you with an upcoming event!
|
||||
- **Tell us about your LLM community:** If you host or participate in a community that would welcome support from LangChain and/or our team, send us an email at hello@langchain.dev and let us know how we can help.
|
||||
|
||||
# 📣 Help Us Amplify Your Work
|
||||
|
||||
If you’re working on something you’re proud of, and think the LangChain community would benefit from knowing about it, we want to help you show it off.
|
||||
|
||||
- **Post about your work and mention us:** we love hanging out on Twitter to see what people in the space are talking about and working on. If you tag [@langchainai](https://twitter.com/LangChainAI), we’ll almost certainly see it and can show you some love.
|
||||
- **Publish something on our blog:** if you’re writing about your experience building with LangChain, we’d love to post (or crosspost) it on our blog! E-mail hello@langchain.dev with a draft of your post! Or even an idea for something you want to write about.
|
||||
- **Get your product onto our [integrations hub](https://integrations.langchain.com/):** Many developers take advantage of our seamless integrations with other products, and come to our integrations hub to find out who those are. If you want to get your product up there, tell us about it (and how it works with LangChain) at hello@langchain.dev.
|
||||
|
||||
# ☀️ Stay in the loop
|
||||
|
||||
Here’s where our team hangs out, talks shop, spotlights cool work, and shares what we’re up to. We’d love to see you there too.
|
||||
|
||||
- **[Twitter](https://twitter.com/LangChainAI):** we post about what we’re working on and what cool things we’re seeing in the space. If you tag @langchainai in your post, we’ll almost certainly see it, and can snow you some love!
|
||||
- **[Discord](https://discord.gg/6adMQxSpJS):** connect with with >30k developers who are building with LangChain
|
||||
- **[GitHub](https://github.com/langchain-ai/langchain):** open pull requests, contribute to a discussion, and/or contribute
|
||||
- **[Subscribe to our bi-weekly Release Notes](https://6w1pwbss0py.typeform.com/to/KjZB1auB):** a twice/month email roundup of the coolest things going on in our orbit
|
||||
- **Slack:** if you’re building an application in production at your company, we’d love to get into a Slack channel together. Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) and we’ll get in touch about setting one up.
|
||||
@@ -8,9 +8,9 @@ import DocCardList from "@theme/DocCardList";
|
||||
|
||||
Building applications with language models involves many moving parts. One of the most critical components is ensuring that the outcomes produced by your models are reliable and useful across a broad array of inputs, and that they work well with your application's other software components. Ensuring reliability usually boils down to some combination of application design, testing & evaluation, and runtime checks.
|
||||
|
||||
The guides in this section review the APIs and functionality LangChain provides to help yous better evaluate your applications. Evaluation and testing are both critical when thinking about deploying LLM applications, since production environments require repeatable and useful outcomes.
|
||||
The guides in this section review the APIs and functionality LangChain provides to help you better evaluate your applications. Evaluation and testing are both critical when thinking about deploying LLM applications, since production environments require repeatable and useful outcomes.
|
||||
|
||||
LangChain offers various types of evaluators to help you measure performance and integrity on diverse data, and we hope to encourage the the community to create and share other useful evaluators so everyone can improve. These docs will introduce the evaluator types, how to use them, and provide some examples of their use in real-world scenarios.
|
||||
LangChain offers various types of evaluators to help you measure performance and integrity on diverse data, and we hope to encourage the community to create and share other useful evaluators so everyone can improve. These docs will introduce the evaluator types, how to use them, and provide some examples of their use in real-world scenarios.
|
||||
|
||||
Each evaluator type in LangChain comes with ready-to-use implementations and an extensible API that allows for customization according to your unique requirements. Here are some of the types of evaluators we offer:
|
||||
|
||||
|
||||
@@ -5,8 +5,8 @@ import DocCardList from "@theme/DocCardList";
|
||||
LangSmith helps you trace and evaluate your language model applications and intelligent agents to help you
|
||||
move from prototype to production.
|
||||
|
||||
Check out the [interactive walkthrough](walkthrough) below to get started.
|
||||
Check out the [interactive walkthrough](/docs/guides/langsmith/walkthrough) below to get started.
|
||||
|
||||
For more information, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)
|
||||
|
||||
<DocCardList />
|
||||
<DocCardList />
|
||||
|
||||
@@ -12,7 +12,7 @@ Here are the agents available in LangChain.
|
||||
|
||||
### [Zero-shot ReAct](/docs/modules/agents/agent_types/react.html)
|
||||
|
||||
This agent uses the [ReAct](https://arxiv.org/pdf/2205.00445.pdf) framework to determine which tool to use
|
||||
This agent uses the [ReAct](https://arxiv.org/pdf/2210.03629) framework to determine which tool to use
|
||||
based solely on the tool's description. Any number of tools can be provided.
|
||||
This agent requires that a description is provided for each tool.
|
||||
|
||||
|
||||
@@ -1,9 +0,0 @@
|
||||
---
|
||||
sidebar_position: 0
|
||||
---
|
||||
# API chains
|
||||
APIChain enables using LLMs to interact with APIs to retrieve relevant information. Construct the chain by providing a question relevant to the provided API documentation.
|
||||
|
||||
import Example from "@snippets/modules/chains/popular/api.mdx"
|
||||
|
||||
<Example/>
|
||||
9
docs/docs_skeleton/docs/use_cases/web_scraping/index.mdx
Normal file
@@ -0,0 +1,9 @@
|
||||
---
|
||||
sidebar_position: 3
|
||||
---
|
||||
|
||||
# Web Scraping
|
||||
|
||||
Web scraping has historically been a challenging endeavor due to the ever-changing nature of website structures, making it tedious for developers to maintain their scraping scripts. Traditional methods often rely on specific HTML tags and patterns which, when altered, can disrupt data extraction processes.
|
||||
|
||||
Enter the LLM-based method for parsing HTML: By leveraging the capabilities of LLMs, and especially OpenAI Functions in LangChain's extraction chain, developers can instruct the model to extract only the desired data in a specified format. This method not only streamlines the extraction process but also significantly reduces the time spent on manual debugging and script modifications. Its adaptability means that even if websites undergo significant design changes, the extraction remains consistent and robust. This level of resilience translates to reduced maintenance efforts, cost savings, and ensures a higher quality of extracted data. Compared to its predecessors, LLM-based approach wins out the web scraping domain by transforming a historically cumbersome task into a more automated and efficient process.
|
||||
8
docs/docs_skeleton/package-lock.json
generated
@@ -12,7 +12,7 @@
|
||||
"@docusaurus/preset-classic": "2.4.0",
|
||||
"@docusaurus/remark-plugin-npm2yarn": "^2.4.0",
|
||||
"@mdx-js/react": "^1.6.22",
|
||||
"@mendable/search": "^0.0.137",
|
||||
"@mendable/search": "^0.0.150",
|
||||
"clsx": "^1.2.1",
|
||||
"json-loader": "^0.5.7",
|
||||
"process": "^0.11.10",
|
||||
@@ -3212,9 +3212,9 @@
|
||||
}
|
||||
},
|
||||
"node_modules/@mendable/search": {
|
||||
"version": "0.0.137",
|
||||
"resolved": "https://registry.npmjs.org/@mendable/search/-/search-0.0.137.tgz",
|
||||
"integrity": "sha512-2J2fd5eqToK+mLzrSDA6NAr4F1kfql7QRiHpD7AUJJX0nqpvInhr/mMJKBCUSCv2z76UKCmF5wLuPSw+C90Qdg==",
|
||||
"version": "0.0.150",
|
||||
"resolved": "https://registry.npmjs.org/@mendable/search/-/search-0.0.150.tgz",
|
||||
"integrity": "sha512-Eb5SeAWlMxzEim/8eJ/Ysn01Pyh39xlPBzRBw/5OyOBhti0HVLXk4wd1Fq2TKgJC2ppQIvhEKO98PUcj9dNDFw==",
|
||||
"dependencies": {
|
||||
"html-react-parser": "^4.2.0",
|
||||
"posthog-js": "^1.45.1"
|
||||
|
||||
@@ -23,7 +23,7 @@
|
||||
"@docusaurus/preset-classic": "2.4.0",
|
||||
"@docusaurus/remark-plugin-npm2yarn": "^2.4.0",
|
||||
"@mdx-js/react": "^1.6.22",
|
||||
"@mendable/search": "^0.0.137",
|
||||
"@mendable/search": "^0.0.150",
|
||||
"clsx": "^1.2.1",
|
||||
"json-loader": "^0.5.7",
|
||||
"process": "^0.11.10",
|
||||
|
||||
@@ -75,6 +75,7 @@ module.exports = {
|
||||
slug: "additional_resources",
|
||||
},
|
||||
},
|
||||
'community'
|
||||
],
|
||||
integrations: [
|
||||
{
|
||||
|
||||
BIN
docs/docs_skeleton/static/img/api_chain.png
Normal file
|
After Width: | Height: | Size: 471 KiB |
BIN
docs/docs_skeleton/static/img/api_chain_response.png
Normal file
|
After Width: | Height: | Size: 520 KiB |
BIN
docs/docs_skeleton/static/img/api_function_call.png
Normal file
|
After Width: | Height: | Size: 98 KiB |
BIN
docs/docs_skeleton/static/img/api_use_case.png
Normal file
|
After Width: | Height: | Size: 117 KiB |
BIN
docs/docs_skeleton/static/img/code_retrieval.png
Normal file
|
After Width: | Height: | Size: 307 KiB |
BIN
docs/docs_skeleton/static/img/code_understanding.png
Normal file
|
After Width: | Height: | Size: 193 KiB |
BIN
docs/docs_skeleton/static/img/tagging.png
Normal file
|
After Width: | Height: | Size: 111 KiB |
BIN
docs/docs_skeleton/static/img/tagging_trace.png
Normal file
|
After Width: | Height: | Size: 130 KiB |
BIN
docs/docs_skeleton/static/img/web_research.png
Normal file
|
After Width: | Height: | Size: 152 KiB |
BIN
docs/docs_skeleton/static/img/web_scraping.png
Normal file
|
After Width: | Height: | Size: 172 KiB |
BIN
docs/docs_skeleton/static/img/wsj_page.png
Normal file
|
After Width: | Height: | Size: 716 KiB |
@@ -556,6 +556,14 @@
|
||||
"source": "/docs/integrations/llamacpp",
|
||||
"destination": "/docs/integrations/providers/llamacpp"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/integrations/log10.html",
|
||||
"destination": "/docs/integrations/providers/log10"
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/log10",
|
||||
"destination": "/docs/integrations/providers/log10"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/integrations/mediawikidump.html",
|
||||
"destination": "/docs/integrations/providers/mediawikidump"
|
||||
|
||||
323
docs/extras/guides/adapters/openai.ipynb
Normal file
@@ -0,0 +1,323 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "700a516b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAI Adapter\n",
|
||||
"\n",
|
||||
"A lot of people get started with OpenAI but want to explore other models. LangChain's integrations with many model providers make this easy to do so. While LangChain has it's own message and model APIs, we've also made it as easy as possible to explore other models by exposing an adapter to adapt LangChain models to the OpenAI api.\n",
|
||||
"\n",
|
||||
"At the moment this only deals with output and does not return other information (token counts, stop reasons, etc)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "6017f26a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import openai\n",
|
||||
"from langchain.adapters import openai as lc_openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b522ceda",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## ChatCompletion.create"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"id": "1d22eb61",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"messages = [{\"role\": \"user\", \"content\": \"hi\"}]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d550d3ad",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Original OpenAI call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "e1d27dfa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"result = openai.ChatCompletion.create(\n",
|
||||
" messages=messages, \n",
|
||||
" model=\"gpt-3.5-turbo\", \n",
|
||||
" temperature=0\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "012d81ae",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'role': 'assistant', 'content': 'Hello! How can I assist you today?'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result[\"choices\"][0]['message'].to_dict_recursive()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "db5b5500",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"LangChain OpenAI wrapper call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "87c2d515",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"lc_result = lc_openai.ChatCompletion.create(\n",
|
||||
" messages=messages, \n",
|
||||
" model=\"gpt-3.5-turbo\", \n",
|
||||
" temperature=0\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "c67a5ac8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'role': 'assistant', 'content': 'Hello! How can I assist you today?'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"lc_result[\"choices\"][0]['message']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "034ba845",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Swapping out model providers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "7a2c011c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"lc_result = lc_openai.ChatCompletion.create(\n",
|
||||
" messages=messages, \n",
|
||||
" model=\"claude-2\", \n",
|
||||
" temperature=0, \n",
|
||||
" provider=\"ChatAnthropic\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "f7c94827",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'role': 'assistant', 'content': ' Hello!'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"lc_result[\"choices\"][0]['message']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cb3f181d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## ChatCompletion.stream"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f7b8cd18",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Original OpenAI call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"id": "fd8cb1ea",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'role': 'assistant', 'content': ''}\n",
|
||||
"{'content': 'Hello'}\n",
|
||||
"{'content': '!'}\n",
|
||||
"{'content': ' How'}\n",
|
||||
"{'content': ' can'}\n",
|
||||
"{'content': ' I'}\n",
|
||||
"{'content': ' assist'}\n",
|
||||
"{'content': ' you'}\n",
|
||||
"{'content': ' today'}\n",
|
||||
"{'content': '?'}\n",
|
||||
"{}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for c in openai.ChatCompletion.create(\n",
|
||||
" messages = messages,\n",
|
||||
" model=\"gpt-3.5-turbo\", \n",
|
||||
" temperature=0,\n",
|
||||
" stream=True\n",
|
||||
"):\n",
|
||||
" print(c[\"choices\"][0]['delta'].to_dict_recursive())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0b2a076b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"LangChain OpenAI wrapper call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"id": "9521218c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'role': 'assistant', 'content': ''}\n",
|
||||
"{'content': 'Hello'}\n",
|
||||
"{'content': '!'}\n",
|
||||
"{'content': ' How'}\n",
|
||||
"{'content': ' can'}\n",
|
||||
"{'content': ' I'}\n",
|
||||
"{'content': ' assist'}\n",
|
||||
"{'content': ' you'}\n",
|
||||
"{'content': ' today'}\n",
|
||||
"{'content': '?'}\n",
|
||||
"{}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for c in lc_openai.ChatCompletion.create(\n",
|
||||
" messages = messages,\n",
|
||||
" model=\"gpt-3.5-turbo\", \n",
|
||||
" temperature=0,\n",
|
||||
" stream=True\n",
|
||||
"):\n",
|
||||
" print(c[\"choices\"][0]['delta'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0fc39750",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Swapping out model providers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"id": "68f0214e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'role': 'assistant', 'content': ' Hello'}\n",
|
||||
"{'content': '!'}\n",
|
||||
"{}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for c in lc_openai.ChatCompletion.create(\n",
|
||||
" messages = messages,\n",
|
||||
" model=\"claude-2\", \n",
|
||||
" temperature=0,\n",
|
||||
" stream=True,\n",
|
||||
" provider=\"ChatAnthropic\",\n",
|
||||
"):\n",
|
||||
" print(c[\"choices\"][0]['delta'])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1648,6 +1648,186 @@
|
||||
"source": [
|
||||
"## Conversational Retrieval With Memory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "92c87dd8-bb6f-4f32-a30d-8f5459ce6265",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fallbacks\n",
|
||||
"\n",
|
||||
"With LCEL you can easily introduce fallbacks for any Runnable component, like an LLM."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "1b1cb744-31fc-4261-ab25-65fe1fcad559",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='To get to the other side.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"bad_llm = ChatOpenAI(model_name=\"gpt-fake\")\n",
|
||||
"good_llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\")\n",
|
||||
"llm = bad_llm.with_fallbacks([good_llm])\n",
|
||||
"\n",
|
||||
"llm.invoke(\"Why did the the chicken cross the road?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b8cf3982-03f6-49b3-8ff5-7cd12444f19c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Looking at the trace, we can see that the first model failed but the second succeeded, so we still got an output: https://smith.langchain.com/public/dfaf0bf6-d86d-43e9-b084-dd16a56df15c/r\n",
|
||||
"\n",
|
||||
"We can add an arbitrary sequence of fallbacks, which will be executed in order:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "31819be0-7f40-4e67-b5ab-61340027b948",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='To get to the other side.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm = bad_llm.with_fallbacks([bad_llm, bad_llm, good_llm])\n",
|
||||
"\n",
|
||||
"llm.invoke(\"Why did the the chicken cross the road?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "acad6e88-8046-450e-b005-db7e50f33b80",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Trace: https://smith.langchain.com/public/c09efd01-3184-4369-a225-c9da8efcaf47/r\n",
|
||||
"\n",
|
||||
"We can continue to use our Runnable with fallbacks the same way we use any Runnable, mean we can include it in sequences:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "bab114a1-bb93-4b7e-a639-e7e00f21aebc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='To show off its incredible jumping skills! Kangaroos are truly amazing creatures.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", \"You're a nice assistant who always includes a compliment in your response\"),\n",
|
||||
" (\"human\", \"Why did the {animal} cross the road\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"chain = prompt | llm\n",
|
||||
"chain.invoke({\"animal\": \"kangaroo\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "58340afa-8187-4ffe-9bd2-7912fb733a15",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Trace: https://smith.langchain.com/public/ba03895f-f8bd-4c70-81b7-8b930353eabd/r\n",
|
||||
"\n",
|
||||
"Note, since every sequence of Runnables is itself a Runnable, we can create fallbacks for whole Sequences. We can also continue using the full interface, including asynchronous calls, batched calls, and streams:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "45aa3170-b2e6-430d-887b-bd879048060a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[\"\\n\\nAnswer: The rabbit crossed the road to get to the other side. That's quite clever of him!\",\n",
|
||||
" '\\n\\nAnswer: The turtle crossed the road to get to the other side. You must be pretty clever to come up with that riddle!']"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"chat_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", \"You're a nice assistant who always includes a compliment in your response\"),\n",
|
||||
" (\"human\", \"Why did the {animal} cross the road\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"chat_model = ChatOpenAI(model_name=\"gpt-fake\")\n",
|
||||
"\n",
|
||||
"prompt_template = \"\"\"Instructions: You should always include a compliment in your response.\n",
|
||||
"\n",
|
||||
"Question: Why did the {animal} cross the road?\"\"\"\n",
|
||||
"prompt = PromptTemplate.from_template(prompt_template)\n",
|
||||
"llm = OpenAI()\n",
|
||||
"\n",
|
||||
"bad_chain = chat_prompt | chat_model\n",
|
||||
"good_chain = prompt | llm\n",
|
||||
"chain = bad_chain.with_fallbacks([good_chain])\n",
|
||||
"await chain.abatch([{\"animal\": \"rabbit\"}, {\"animal\": \"turtle\"}])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "af6731c6-0c73-4b1d-a433-6e8f6ecce2bb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Traces: \n",
|
||||
"1. https://smith.langchain.com/public/ccd73236-9ae5-48a6-94b5-41210be18a46/r\n",
|
||||
"2. https://smith.langchain.com/public/f43f608e-075c-45c7-bf73-b64e4d3f3082/r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3d2fe1fe-506b-4ee5-8056-8b9df801765f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -1666,7 +1846,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -147,7 +147,7 @@
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"dataset.push_to_argilla(\"langchain-dataset\")"
|
||||
"dataset.push_to_argilla(\"langchain-dataset\");"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
382
docs/extras/integrations/callbacks/labelstudio.ipynb
Normal file
@@ -0,0 +1,382 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Label Studio\n",
|
||||
"\n",
|
||||
"<div>\n",
|
||||
"<img src=\"https://labelstudio-pub.s3.amazonaws.com/lc/open-source-data-labeling-platform.png\" width=\"400\"/>\n",
|
||||
"</div>\n",
|
||||
"\n",
|
||||
"Label Studio is an open-source data labeling platform that provides LangChain with flexibility when it comes to labeling data for fine-tuning large language models (LLMs). It also enables the preparation of custom training data and the collection and evaluation of responses through human feedback.\n",
|
||||
"\n",
|
||||
"In this guide, you will learn how to connect a LangChain pipeline to Label Studio to:\n",
|
||||
"\n",
|
||||
"- Aggregate all input prompts, conversations, and responses in a single LabelStudio project. This consolidates all the data in one place for easier labeling and analysis.\n",
|
||||
"- Refine prompts and responses to create a dataset for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) scenarios. The labeled data can be used to further train the LLM to improve its performance.\n",
|
||||
"- Evaluate model responses through human feedback. LabelStudio provides an interface for humans to review and provide feedback on model responses, allowing evaluation and iteration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Installation and setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"First install latest versions of Label Studio and Label Studio API client:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install -U label-studio label-studio-sdk openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Next, run `label-studio` on the command line to start the local LabelStudio instance at `http://localhost:8080`. See the [Label Studio installation guide](https://labelstud.io/guide/install) for more options."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"You'll need a token to make API calls.\n",
|
||||
"\n",
|
||||
"Open your LabelStudio instance in your browser, go to `Account & Settings > Access Token` and copy the key.\n",
|
||||
"\n",
|
||||
"Set environment variables with your LabelStudio URL, API key and OpenAI API key:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ['LABEL_STUDIO_URL'] = '<YOUR-LABEL-STUDIO-URL>' # e.g. http://localhost:8080\n",
|
||||
"os.environ['LABEL_STUDIO_API_KEY'] = '<YOUR-LABEL-STUDIO-API-KEY>'\n",
|
||||
"os.environ['OPENAI_API_KEY'] = '<YOUR-OPENAI-API-KEY>'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Collecting LLMs prompts and responses"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The data used for labeling is stored in projects within Label Studio. Every project is identified by an XML configuration that details the specifications for input and output data. \n",
|
||||
"\n",
|
||||
"Create a project that takes human input in text format and outputs an editable LLM response in a text area:\n",
|
||||
"\n",
|
||||
"```xml\n",
|
||||
"<View>\n",
|
||||
"<Style>\n",
|
||||
" .prompt-box {\n",
|
||||
" background-color: white;\n",
|
||||
" border-radius: 10px;\n",
|
||||
" box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);\n",
|
||||
" padding: 20px;\n",
|
||||
" }\n",
|
||||
"</Style>\n",
|
||||
"<View className=\"root\">\n",
|
||||
" <View className=\"prompt-box\">\n",
|
||||
" <Text name=\"prompt\" value=\"$prompt\"/>\n",
|
||||
" </View>\n",
|
||||
" <TextArea name=\"response\" toName=\"prompt\"\n",
|
||||
" maxSubmissions=\"1\" editable=\"true\"\n",
|
||||
" required=\"true\"/>\n",
|
||||
"</View>\n",
|
||||
"<Header value=\"Rate the response:\"/>\n",
|
||||
"<Rating name=\"rating\" toName=\"prompt\"/>\n",
|
||||
"</View>\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"1. To create a project in Label Studio, click on the \"Create\" button. \n",
|
||||
"2. Enter a name for your project in the \"Project Name\" field, such as `My Project`.\n",
|
||||
"3. Navigate to `Labeling Setup > Custom Template` and paste the XML configuration provided above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"You can collect input LLM prompts and output responses in a LabelStudio project, connecting it via `LabelStudioCallbackHandler`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.callbacks import LabelStudioCallbackHandler\n",
|
||||
"\n",
|
||||
"llm = OpenAI(\n",
|
||||
" temperature=0,\n",
|
||||
" callbacks=[\n",
|
||||
" LabelStudioCallbackHandler(\n",
|
||||
" project_name=\"My Project\"\n",
|
||||
" )]\n",
|
||||
")\n",
|
||||
"print(llm(\"Tell me a joke\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"In the Label Studio, open `My Project`. You will see the prompts, responses, and metadata like the model name. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Collecting Chat model Dialogues"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also track and display full chat dialogues in LabelStudio, with the ability to rate and modify the last response:\n",
|
||||
"\n",
|
||||
"1. Open Label Studio and click on the \"Create\" button.\n",
|
||||
"2. Enter a name for your project in the \"Project Name\" field, such as `New Project with Chat`.\n",
|
||||
"3. Navigate to Labeling Setup > Custom Template and paste the following XML configuration:\n",
|
||||
"\n",
|
||||
"```xml\n",
|
||||
"<View>\n",
|
||||
"<View className=\"root\">\n",
|
||||
" <Paragraphs name=\"dialogue\"\n",
|
||||
" value=\"$prompt\"\n",
|
||||
" layout=\"dialogue\"\n",
|
||||
" textKey=\"content\"\n",
|
||||
" nameKey=\"role\"\n",
|
||||
" granularity=\"sentence\"/>\n",
|
||||
" <Header value=\"Final response:\"/>\n",
|
||||
" <TextArea name=\"response\" toName=\"dialogue\"\n",
|
||||
" maxSubmissions=\"1\" editable=\"true\"\n",
|
||||
" required=\"true\"/>\n",
|
||||
"</View>\n",
|
||||
"<Header value=\"Rate the response:\"/>\n",
|
||||
"<Rating name=\"rating\" toName=\"dialogue\"/>\n",
|
||||
"</View>\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.schema import HumanMessage, SystemMessage\n",
|
||||
"from langchain.callbacks import LabelStudioCallbackHandler\n",
|
||||
"\n",
|
||||
"chat_llm = ChatOpenAI(callbacks=[\n",
|
||||
" LabelStudioCallbackHandler(\n",
|
||||
" mode=\"chat\",\n",
|
||||
" project_name=\"New Project with Chat\",\n",
|
||||
" )\n",
|
||||
"])\n",
|
||||
"llm_results = chat_llm([\n",
|
||||
" SystemMessage(content=\"Always use a lot of emojis\"),\n",
|
||||
" HumanMessage(content=\"Tell me a joke\")\n",
|
||||
"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In Label Studio, open \"New Project with Chat\". Click on a created task to view dialog history and edit/annotate responses."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Custom Labeling Configuration"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"You can modify the default labeling configuration in LabelStudio to add more target labels like response sentiment, relevance, and many [other types annotator's feedback](https://labelstud.io/tags/).\n",
|
||||
"\n",
|
||||
"New labeling configuration can be added from UI: go to `Settings > Labeling Interface` and set up a custom configuration with additional tags like `Choices` for sentiment or `Rating` for relevance. Keep in mind that [`TextArea` tag](https://labelstud.io/tags/textarea) should be presented in any configuration to display the LLM responses.\n",
|
||||
"\n",
|
||||
"Alternatively, you can specify the labeling configuration on the initial call before project creation:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ls = LabelStudioCallbackHandler(project_config='''\n",
|
||||
"<View>\n",
|
||||
"<Text name=\"prompt\" value=\"$prompt\"/>\n",
|
||||
"<TextArea name=\"response\" toName=\"prompt\"/>\n",
|
||||
"<TextArea name=\"user_feedback\" toName=\"prompt\"/>\n",
|
||||
"<Rating name=\"rating\" toName=\"prompt\"/>\n",
|
||||
"<Choices name=\"sentiment\" toName=\"prompt\">\n",
|
||||
" <Choice value=\"Positive\"/>\n",
|
||||
" <Choice value=\"Negative\"/>\n",
|
||||
"</Choices>\n",
|
||||
"</View>\n",
|
||||
"''')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that if the project doesn't exist, it will be created with the specified labeling configuration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Other parameters"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"The `LabelStudioCallbackHandler` accepts several optional parameters:\n",
|
||||
"\n",
|
||||
"- **api_key** - Label Studio API key. Overrides environmental variable `LABEL_STUDIO_API_KEY`.\n",
|
||||
"- **url** - Label Studio URL. Overrides `LABEL_STUDIO_URL`, default `http://localhost:8080`.\n",
|
||||
"- **project_id** - Existing Label Studio project ID. Overrides `LABEL_STUDIO_PROJECT_ID`. Stores data in this project.\n",
|
||||
"- **project_name** - Project name if project ID not specified. Creates a new project. Default is `\"LangChain-%Y-%m-%d\"` formatted with the current date.\n",
|
||||
"- **project_config** - [custom labeling configuration](#custom-labeling-configuration)\n",
|
||||
"- **mode**: use this shortcut to create target configuration from scratch:\n",
|
||||
" - `\"prompt\"` - Single prompt, single response. Default.\n",
|
||||
" - `\"chat\"` - Multi-turn chat mode.\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "labelops",
|
||||
"language": "python",
|
||||
"name": "labelops"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 1
|
||||
}
|
||||
@@ -74,6 +74,124 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f27fa24d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model Version\n",
|
||||
"Azure OpenAI responses contain `model` property, which is name of the model used to generate the response. However unlike native OpenAI responses, it does not contain the version of the model, which is set on the deplyoment in Azure. This makes it tricky to know which version of the model was used to generate the response, which as result can lead to e.g. wrong total cost calculation with `OpenAICallbackHandler`.\n",
|
||||
"\n",
|
||||
"To solve this problem, you can pass `model_version` parameter to `AzureChatOpenAI` class, which will be added to the model name in the llm output. This way you can easily distinguish between different versions of the model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0531798a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks import get_openai_callback"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "3fd97dfc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"BASE_URL = \"https://{endpoint}.openai.azure.com\"\n",
|
||||
"API_KEY = \"...\"\n",
|
||||
"DEPLOYMENT_NAME = \"gpt-35-turbo\" # in Azure, this deployment has version 0613 - input and output tokens are counted separately"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "aceddb72",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Total Cost (USD): $0.000054\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model = AzureChatOpenAI(\n",
|
||||
" openai_api_base=BASE_URL,\n",
|
||||
" openai_api_version=\"2023-05-15\",\n",
|
||||
" deployment_name=DEPLOYMENT_NAME,\n",
|
||||
" openai_api_key=API_KEY,\n",
|
||||
" openai_api_type=\"azure\",\n",
|
||||
")\n",
|
||||
"with get_openai_callback() as cb:\n",
|
||||
" model(\n",
|
||||
" [\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Translate this sentence from English to French. I love programming.\"\n",
|
||||
" )\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" print(f\"Total Cost (USD): ${format(cb.total_cost, '.6f')}\") # without specifying the model version, flat-rate 0.002 USD per 1k input and output tokens is used\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2e61eefd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can provide the model version to `AzureChatOpenAI` constructor. It will get appended to the model name returned by Azure OpenAI and cost will be counted correctly."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "8d5e54e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Total Cost (USD): $0.000044\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model0613 = AzureChatOpenAI(\n",
|
||||
" openai_api_base=BASE_URL,\n",
|
||||
" openai_api_version=\"2023-05-15\",\n",
|
||||
" deployment_name=DEPLOYMENT_NAME,\n",
|
||||
" openai_api_key=API_KEY,\n",
|
||||
" openai_api_type=\"azure\",\n",
|
||||
" model_version=\"0613\"\n",
|
||||
")\n",
|
||||
"with get_openai_callback() as cb:\n",
|
||||
" model0613(\n",
|
||||
" [\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Translate this sentence from English to French. I love programming.\"\n",
|
||||
" )\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" print(f\"Total Cost (USD): ${format(cb.total_cost, '.6f')}\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "99682534",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -92,7 +210,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.8.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
325
docs/extras/integrations/document_loaders/arcgis.ipynb
Normal file
@@ -0,0 +1,325 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "62359e08-cf80-4210-a30c-f450000e65b9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ArcGISLoader\n",
|
||||
"\n",
|
||||
"This notebook demonstrates the use of the `langchain.document_loaders.ArcGISLoader` class.\n",
|
||||
"\n",
|
||||
"You will need to install the ArcGIS API for Python `arcgis` and, optionally, `bs4.BeautifulSoup`.\n",
|
||||
"\n",
|
||||
"You can use an `arcgis.gis.GIS` object for authenticated data loading, or leave it blank to access public data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "b782cab5-0584-4e2a-9073-009fb8dc93a3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import ArcGISLoader\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"url = \"https://maps1.vcgov.org/arcgis/rest/services/Beaches/MapServer/7\"\n",
|
||||
"\n",
|
||||
"loader = ArcGISLoader(url)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "aa3053cf-4127-43ea-bf56-e378b348091f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 4.04 ms, sys: 1.63 ms, total: 5.67 ms\n",
|
||||
"Wall time: 644 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a2444519-9117-4feb-8bb9-8931ce286fa5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"dict_keys(['url', 'layer_description', 'item_description', 'layer_properties'])"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].metadata.keys()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "6b6e9107-6a80-4ef7-8149-3013faa2de76",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"KeysView({\n",
|
||||
" \"currentVersion\": 10.81,\n",
|
||||
" \"id\": 7,\n",
|
||||
" \"name\": \"Beach Ramps\",\n",
|
||||
" \"type\": \"Feature Layer\",\n",
|
||||
" \"description\": \"\",\n",
|
||||
" \"geometryType\": \"esriGeometryPoint\",\n",
|
||||
" \"sourceSpatialReference\": {\n",
|
||||
" \"wkid\": 2881,\n",
|
||||
" \"latestWkid\": 2881\n",
|
||||
" },\n",
|
||||
" \"copyrightText\": \"\",\n",
|
||||
" \"parentLayer\": null,\n",
|
||||
" \"subLayers\": [],\n",
|
||||
" \"minScale\": 750000,\n",
|
||||
" \"maxScale\": 0,\n",
|
||||
" \"drawingInfo\": {\n",
|
||||
" \"renderer\": {\n",
|
||||
" \"type\": \"simple\",\n",
|
||||
" \"symbol\": {\n",
|
||||
" \"type\": \"esriPMS\",\n",
|
||||
" \"url\": \"9bb2e5ca499bb68aa3ee0d4e1ecc3849\",\n",
|
||||
" \"imageData\": \"iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAAXNSR0IB2cksfwAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAJJJREFUOI3NkDEKg0AQRZ9kkSnSGBshR7DJqdJYeg7BMpcS0uQWQsqoCLExkcUJzGqT38zw2fcY1rEzbp7vjXz0EXC7gBxs1ABcG/8CYkCcDqwyLqsV+RlV0I/w7PzuJBArr1VB20H58Ls6h+xoFITkTwWpQJX7XSIBAnFwVj7MLAjJV/AC6G3QoAmK+74Lom04THTBEp/HCSc6AAAAAElFTkSuQmCC\",\n",
|
||||
" \"contentType\": \"image/png\",\n",
|
||||
" \"width\": 12,\n",
|
||||
" \"height\": 12,\n",
|
||||
" \"angle\": 0,\n",
|
||||
" \"xoffset\": 0,\n",
|
||||
" \"yoffset\": 0\n",
|
||||
" },\n",
|
||||
" \"label\": \"\",\n",
|
||||
" \"description\": \"\"\n",
|
||||
" },\n",
|
||||
" \"transparency\": 0,\n",
|
||||
" \"labelingInfo\": null\n",
|
||||
" },\n",
|
||||
" \"defaultVisibility\": true,\n",
|
||||
" \"extent\": {\n",
|
||||
" \"xmin\": -81.09480168806815,\n",
|
||||
" \"ymin\": 28.858349245353473,\n",
|
||||
" \"xmax\": -80.77512908572814,\n",
|
||||
" \"ymax\": 29.41078388840041,\n",
|
||||
" \"spatialReference\": {\n",
|
||||
" \"wkid\": 4326,\n",
|
||||
" \"latestWkid\": 4326\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" \"hasAttachments\": false,\n",
|
||||
" \"htmlPopupType\": \"esriServerHTMLPopupTypeNone\",\n",
|
||||
" \"displayField\": \"AccessName\",\n",
|
||||
" \"typeIdField\": null,\n",
|
||||
" \"subtypeFieldName\": null,\n",
|
||||
" \"subtypeField\": null,\n",
|
||||
" \"defaultSubtypeCode\": null,\n",
|
||||
" \"fields\": [\n",
|
||||
" {\n",
|
||||
" \"name\": \"OBJECTID\",\n",
|
||||
" \"type\": \"esriFieldTypeOID\",\n",
|
||||
" \"alias\": \"OBJECTID\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"Shape\",\n",
|
||||
" \"type\": \"esriFieldTypeGeometry\",\n",
|
||||
" \"alias\": \"Shape\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessName\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessName\",\n",
|
||||
" \"length\": 40,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessID\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessID\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessType\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessType\",\n",
|
||||
" \"length\": 25,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"GeneralLoc\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"GeneralLoc\",\n",
|
||||
" \"length\": 100,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"MilePost\",\n",
|
||||
" \"type\": \"esriFieldTypeDouble\",\n",
|
||||
" \"alias\": \"MilePost\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"City\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"City\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessStatus\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessStatus\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"Entry_Date_Time\",\n",
|
||||
" \"type\": \"esriFieldTypeDate\",\n",
|
||||
" \"alias\": \"Entry_Date_Time\",\n",
|
||||
" \"length\": 8,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"DrivingZone\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"DrivingZone\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"geometryField\": {\n",
|
||||
" \"name\": \"Shape\",\n",
|
||||
" \"type\": \"esriFieldTypeGeometry\",\n",
|
||||
" \"alias\": \"Shape\"\n",
|
||||
" },\n",
|
||||
" \"indexes\": null,\n",
|
||||
" \"subtypes\": [],\n",
|
||||
" \"relationships\": [],\n",
|
||||
" \"canModifyLayer\": true,\n",
|
||||
" \"canScaleSymbols\": false,\n",
|
||||
" \"hasLabels\": false,\n",
|
||||
" \"capabilities\": \"Map,Query,Data\",\n",
|
||||
" \"maxRecordCount\": 1000,\n",
|
||||
" \"supportsStatistics\": true,\n",
|
||||
" \"supportsAdvancedQueries\": true,\n",
|
||||
" \"supportedQueryFormats\": \"JSON, geoJSON\",\n",
|
||||
" \"isDataVersioned\": false,\n",
|
||||
" \"ownershipBasedAccessControlForFeatures\": {\n",
|
||||
" \"allowOthersToQuery\": true\n",
|
||||
" },\n",
|
||||
" \"useStandardizedQueries\": true,\n",
|
||||
" \"advancedQueryCapabilities\": {\n",
|
||||
" \"useStandardizedQueries\": true,\n",
|
||||
" \"supportsStatistics\": true,\n",
|
||||
" \"supportsHavingClause\": true,\n",
|
||||
" \"supportsCountDistinct\": true,\n",
|
||||
" \"supportsOrderBy\": true,\n",
|
||||
" \"supportsDistinct\": true,\n",
|
||||
" \"supportsPagination\": true,\n",
|
||||
" \"supportsTrueCurve\": true,\n",
|
||||
" \"supportsReturningQueryExtent\": true,\n",
|
||||
" \"supportsQueryWithDistance\": true,\n",
|
||||
" \"supportsSqlExpression\": true\n",
|
||||
" },\n",
|
||||
" \"supportsDatumTransformation\": true,\n",
|
||||
" \"dateFieldsTimeReference\": null,\n",
|
||||
" \"supportsCoordinatesQuantization\": true\n",
|
||||
"})"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].metadata['layer_properties'].keys()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "1d132b7d-5a13-4d66-98e8-785ffdf87af0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{\"OBJECTID\": 2, \"AccessName\": \"27TH AV\", \"AccessID\": \"NS-141\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3600 BLK S ATLANTIC AV\", \"MilePost\": 4.83, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 7, \"AccessName\": \"BEACHWAY AV\", \"AccessID\": \"NS-106\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1400 N ATLANTIC AV\", \"MilePost\": 1.57, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 10, \"AccessName\": \"SEABREEZE BLVD\", \"AccessID\": \"DB-051\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"500 BLK N ATLANTIC AV\", \"MilePost\": 14.24, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691394892000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 13, \"AccessName\": \"GRANADA BLVD\", \"AccessID\": \"OB-030\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"20 BLK OCEAN SHORE BLVD\", \"MilePost\": 10.02, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 16, \"AccessName\": \"INTERNATIONAL SPEEDWAY BLVD\", \"AccessID\": \"DB-059\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"300 BLK S ATLANTIC AV\", \"MilePost\": 15.27, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691395174000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 26, \"AccessName\": \"UNIVERSITY BLVD\", \"AccessID\": \"DB-048\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"900 BLK N ATLANTIC AV\", \"MilePost\": 13.74, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691394892000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 36, \"AccessName\": \"BEACH ST\", \"AccessID\": \"PI-097\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"4890 BLK S ATLANTIC AV\", \"MilePost\": 25.85, \"City\": \"PONCE INLET\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 40, \"AccessName\": \"BOTEFUHR AV\", \"AccessID\": \"DBS-067\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1900 BLK S ATLANTIC AV\", \"MilePost\": 16.68, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691395124000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 41, \"AccessName\": \"SILVER BEACH AV\", \"AccessID\": \"DB-064\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1000 BLK S ATLANTIC AV\", \"MilePost\": 15.98, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691395174000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 50, \"AccessName\": \"3RD AV\", \"AccessID\": \"NS-118\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1200 BLK HILL ST\", \"MilePost\": 3.25, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 58, \"AccessName\": \"DUNLAWTON BLVD\", \"AccessID\": \"DBS-078\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3400 BLK S ATLANTIC AV\", \"MilePost\": 20.61, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 63, \"AccessName\": \"MILSAP RD\", \"AccessID\": \"OB-037\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"700 BLK S ATLANTIC AV\", \"MilePost\": 11.52, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 68, \"AccessName\": \"EMILIA AV\", \"AccessID\": \"DBS-082\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3790 BLK S ATLANTIC AV\", \"MilePost\": 21.38, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 92, \"AccessName\": \"FLAGLER AV\", \"AccessID\": \"NS-110\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"500 BLK FLAGLER AV\", \"MilePost\": 2.57, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 94, \"AccessName\": \"CRAWFORD RD\", \"AccessID\": \"NS-108\", \"AccessType\": \"OPEN VEHICLE RAMP - PASS\", \"GeneralLoc\": \"800 BLK N ATLANTIC AV\", \"MilePost\": 2.19, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 122, \"AccessName\": \"HARTFORD AV\", \"AccessID\": \"DB-043\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1890 BLK N ATLANTIC AV\", \"MilePost\": 12.76, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"CLOSED - SEASONAL\", \"Entry_Date_Time\": 1691394832000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 125, \"AccessName\": \"WILLIAMS AV\", \"AccessID\": \"DB-042\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"2200 BLK N ATLANTIC AV\", \"MilePost\": 12.5, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 134, \"AccessName\": \"CARDINAL DR\", \"AccessID\": \"OB-036\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"600 BLK S ATLANTIC AV\", \"MilePost\": 11.27, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 229, \"AccessName\": \"EL PORTAL ST\", \"AccessID\": \"DBS-076\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3200 BLK S ATLANTIC AV\", \"MilePost\": 20.04, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 230, \"AccessName\": \"HARVARD DR\", \"AccessID\": \"OB-038\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"900 BLK S ATLANTIC AV\", \"MilePost\": 11.72, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 232, \"AccessName\": \"VAN AV\", \"AccessID\": \"DBS-075\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3100 BLK S ATLANTIC AV\", \"MilePost\": 19.6, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 233, \"AccessName\": \"ROCKEFELLER DR\", \"AccessID\": \"OB-034\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"400 BLK S ATLANTIC AV\", \"MilePost\": 10.9, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"CLOSED - SEASONAL\", \"Entry_Date_Time\": 1691394832000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 235, \"AccessName\": \"MINERVA RD\", \"AccessID\": \"DBS-069\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"2300 BLK S ATLANTIC AV\", \"MilePost\": 17.52, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691395124000, \"DrivingZone\": \"YES\"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for doc in docs:\n",
|
||||
" print(doc.page_content)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
101
docs/extras/integrations/document_loaders/async_chromium.ipynb
Normal file
@@ -0,0 +1,101 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ad553e51",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Async Chromium\n",
|
||||
"\n",
|
||||
"Chromium is one of the browsers supported by Playwright, a library used to control browser automation. \n",
|
||||
"\n",
|
||||
"By running `p.chromium.launch(headless=True)`, we are launching a headless instance of Chromium. \n",
|
||||
"\n",
|
||||
"Headless mode means that the browser is running without a graphical user interface.\n",
|
||||
"\n",
|
||||
"`AsyncChromiumLoader` load the page, and then we use `Html2TextTransformer` to trasnform to text."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1c3a4c19",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install -q playwright beautifulsoup4\n",
|
||||
"! playwright install"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "dd2cdea7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'<!DOCTYPE html><html lang=\"en\"><head><script src=\"https://s0.2mdn.net/instream/video/client.js\" asyn'"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AsyncChromiumLoader\n",
|
||||
"urls = [\"https://www.wsj.com\"]\n",
|
||||
"loader = AsyncChromiumLoader(urls)\n",
|
||||
"docs = loader.load()\n",
|
||||
"docs[0].page_content[0:100]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "013caa7e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Skip to Main ContentSkip to SearchSkip to... Select * Top News * What's News *\\nFeatured Stories * Retirement * Life & Arts * Hip-Hop * Sports * Video *\\nEconomy * Real Estate * Sports * CMO * CIO * CFO * Risk & Compliance *\\nLogistics Report * Sustainable Business * Heard on the Street * Barron’s *\\nMarketWatch * Mansion Global * Penta * Opinion * Journal Reports * Sponsored\\nOffers Explore Our Brands * WSJ * * * * * Barron's * * * * * MarketWatch * * *\\n* * IBD # The Wall Street Journal SubscribeSig\""
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.document_transformers import Html2TextTransformer\n",
|
||||
"html2text = Html2TextTransformer()\n",
|
||||
"docs_transformed = html2text.transform_documents(docs)\n",
|
||||
"docs_transformed[0].page_content[0:500]"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -9,66 +9,16 @@
|
||||
"\n",
|
||||
"GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents.\n",
|
||||
"\n",
|
||||
"It is particularly good for sturctured PDFs, like academic papers.\n",
|
||||
"It is designed and expected to be used to parse academic papers, where it works particularly well. Note: if the articles supplied to Grobid are large documents (e.g. dissertations) exceeding a certain number of elements, they might not be processed. \n",
|
||||
"\n",
|
||||
"This loader uses GROBIB to parse PDFs into `Documents` that retain metadata associated with the section of text.\n",
|
||||
"This loader uses Grobid to parse PDFs into `Documents` that retain metadata associated with the section of text.\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"The best approach is to install Grobid via docker, see https://grobid.readthedocs.io/en/latest/Grobid-docker/. \n",
|
||||
"\n",
|
||||
"For users on `Mac` - \n",
|
||||
"(Note: additional instructions can be found [here](https://python.langchain.com/docs/extras/integrations/providers/grobid.mdx).)\n",
|
||||
"\n",
|
||||
"(Note: additional instructions can be found [here](https://python.langchain.com/docs/ecosystem/integrations/grobid.mdx).)\n",
|
||||
"\n",
|
||||
"Install Java (Apple Silicon):\n",
|
||||
"```\n",
|
||||
"$ arch -arm64 brew install openjdk@11\n",
|
||||
"$ brew --prefix openjdk@11\n",
|
||||
"/opt/homebrew/opt/openjdk@ 11\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"In `~/.zshrc`:\n",
|
||||
"```\n",
|
||||
"export JAVA_HOME=/opt/homebrew/opt/openjdk@11\n",
|
||||
"export PATH=$JAVA_HOME/bin:$PATH\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Then, in Terminal:\n",
|
||||
"```\n",
|
||||
"$ source ~/.zshrc\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Confirm install:\n",
|
||||
"```\n",
|
||||
"$ which java\n",
|
||||
"/opt/homebrew/opt/openjdk@11/bin/java\n",
|
||||
"$ java -version \n",
|
||||
"openjdk version \"11.0.19\" 2023-04-18\n",
|
||||
"OpenJDK Runtime Environment Homebrew (build 11.0.19+0)\n",
|
||||
"OpenJDK 64-Bit Server VM Homebrew (build 11.0.19+0, mixed mode)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Then, get [Grobid](https://grobid.readthedocs.io/en/latest/Install-Grobid/#getting-grobid):\n",
|
||||
"```\n",
|
||||
"$ curl -LO https://github.com/kermitt2/grobid/archive/0.7.3.zip\n",
|
||||
"$ unzip 0.7.3.zip\n",
|
||||
"```\n",
|
||||
" \n",
|
||||
"Build\n",
|
||||
"```\n",
|
||||
"$ ./gradlew clean install\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Then, run the server:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "2d8992fc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')"
|
||||
"Once grobid is up-and-running you can interact as described below. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -0,0 +1,95 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2ed9a4c2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Beautiful Soup\n",
|
||||
"\n",
|
||||
"Beautiful Soup offers fine-grained control over HTML content, enabling specific tag extraction, removal, and content cleaning. \n",
|
||||
"\n",
|
||||
"It's suited for cases where you want to extract specific information and clean up the HTML content according to your needs.\n",
|
||||
"\n",
|
||||
"For example, we can scrape text content within `<p>, <li>, <div>, and <a>` tags from the HTML content:\n",
|
||||
"\n",
|
||||
"* `<p>`: The paragraph tag. It defines a paragraph in HTML and is used to group together related sentences and/or phrases.\n",
|
||||
" \n",
|
||||
"* `<li>`: The list item tag. It is used within ordered (`<ol>`) and unordered (`<ul>`) lists to define individual items within the list.\n",
|
||||
" \n",
|
||||
"* `<div>`: The division tag. It is a block-level element used to group other inline or block-level elements.\n",
|
||||
" \n",
|
||||
"* `<a>`: The anchor tag. It is used to define hyperlinks."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "dd710e5b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AsyncChromiumLoader\n",
|
||||
"from langchain.document_transformers import BeautifulSoupTransformer\n",
|
||||
"\n",
|
||||
"# Load HTML\n",
|
||||
"loader = AsyncChromiumLoader([\"https://www.wsj.com\"])\n",
|
||||
"html = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "052b64dd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Transform\n",
|
||||
"bs_transformer = BeautifulSoupTransformer()\n",
|
||||
"docs_transformed = bs_transformer.transform_documents(html,tags_to_extract=[\"p\", \"li\", \"div\", \"a\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "b53a5307",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Conservative legal activists are challenging Amazon, Comcast and others using many of the same tools that helped kill affirmative-action programs in colleges.1,2099 min read U.S. stock indexes fell and government-bond prices climbed, after Moody’s lowered credit ratings for 10 smaller U.S. banks and said it was reviewing ratings for six larger ones. The Dow industrials dropped more than 150 points.3 min read Penn Entertainment’s Barstool Sportsbook app will be rebranded as ESPN Bet this fall as '"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs_transformed[0].page_content[0:500]"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -14,7 +14,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"execution_count": 1,
|
||||
"id": "60b6dbb2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -42,25 +42,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 2,
|
||||
"id": "9ca87a2e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize a Fireworks LLM\n",
|
||||
"os.environ['FIREWORKS_API_KEY'] = \"\" #change this to your own API KEY\n",
|
||||
"llm = Fireworks(model_id=\"accounts/fireworks/models/fireworks-llama-v2-13b-chat\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "43a11ba8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create LLM chain\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
|
||||
"os.environ['FIREWORKS_API_KEY'] = \"<YOUR_API_KEY>\" # Change this to your own API key\n",
|
||||
"llm = Fireworks(model_id=\"accounts/fireworks/models/llama-v2-13b-chat\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -72,16 +61,33 @@
|
||||
"\n",
|
||||
"You can use the LLMs to call the model for specified prompt(s). \n",
|
||||
"\n",
|
||||
"Current Specified Models: \n",
|
||||
"* accounts/fireworks/models/fireworks-falcon-7b, accounts/fireworks/models/fireworks-falcon-40b-w8a16\n",
|
||||
"* accounts/fireworks/models/fireworks-starcoder-1b-w8a16-1gpu, accounts/fireworks/models/fireworks-starcoder-3b-w8a16-1gpu, accounts/fireworks/models/fireworks-starcoder-7b-w8a16-1gpu, accounts/fireworks/models/fireworks-starcoder-16b-w8a16 \n",
|
||||
"* accounts/fireworks/models/fireworks-llama-v2-13b, accounts/fireworks/models/fireworks-llama-v2-13b-chat, accounts/fireworks/models/fireworks-llama-v2-13b-w8a16, accounts/fireworks/models/fireworks-llama-v2-13b-chat-w8a16\n",
|
||||
"* accounts/fireworks/models/fireworks-llama-v2-7b, accounts/fireworks/models/fireworks-llama-v2-7b-chat, accounts/fireworks/models/fireworks-llama-v2-7b-w8a16, accounts/fireworks/models/fireworks-llama-v2-7b-chat-w8a16, accounts/fireworks/models/fireworks-llama-v2-70b-chat-4gpu"
|
||||
"Currently supported models: \n",
|
||||
"\n",
|
||||
"* Falcon\n",
|
||||
" * `accounts/fireworks/models/falcon-7b`\n",
|
||||
" * `accounts/fireworks/models/falcon-40b-w8a16`\n",
|
||||
"* Llama 2\n",
|
||||
" * `accounts/fireworks/models/llama-v2-7b`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-7b-w8a16`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-7b-chat`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-7b-chat-w8a16`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-13b`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-13b-w8a16`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-13b-chat`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-13b-chat-w8a16`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-70b-chat-4gpu`\n",
|
||||
"* StarCoder\n",
|
||||
" * `accounts/fireworks/models/starcoder-1b-w8a16-1gpu`\n",
|
||||
" * `accounts/fireworks/models/starcoder-3b-w8a16-1gpu`\n",
|
||||
" * `accounts/fireworks/models/starcoder-7b-w8a16-1gpu`\n",
|
||||
" * `accounts/fireworks/models/starcoder-16b-w8a16`\n",
|
||||
"\n",
|
||||
"See the full, most up-to-date list on [app.fireworks.ai](https://app.fireworks.ai)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 3,
|
||||
"id": "bf0a425c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -89,29 +95,41 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Is it Tom Brady, Aaron Rodgers, or someone else? It's a tough question to answer, and there are strong arguments for each of these quarterbacks. Here are some of the reasons why each of these quarterbacks could be considered the best:\n",
|
||||
"\n",
|
||||
"Tom Brady:\n",
|
||||
"\n",
|
||||
"It's a question that has been debated for years, with different analysts and fans making their cases for various signal-callers. Here are some of the top contenders for the title of best quarterback in the NFL:\n",
|
||||
"* He has the most Super Bowl wins (6) of any quarterback in NFL history.\n",
|
||||
"* He has been named Super Bowl MVP four times, more than any other player.\n",
|
||||
"* He has led the New England Patriots to 18 playoff victories, the most in NFL history.\n",
|
||||
"* He has thrown for over 70,000 yards in his career, the most of any quarterback in NFL history.\n",
|
||||
"* He has thrown for 50 or more touchdowns in a season four times, the most of any quarterback in NFL history.\n",
|
||||
"\n",
|
||||
"1. Tom Brady: The New England Patriots legend has won six Super Bowls and has been named Super Bowl MVP four times. He's known for his precision passing, pocket presence, and ability to read defenses.\n",
|
||||
"2. Aaron Rodgers: The Green Bay Packers quarterback has won two Super Bowls and has been named NFL MVP twice. He's known for his quick release, accuracy, and ability to extend plays with his feet.\n",
|
||||
"3. Drew Brees: The New Orleans Saints quarterback has won a Super Bowl and has been named NFL MVP once. He's known for his accuracy, pocket presence, and ability to read defenses.\n",
|
||||
"4. Patrick Mahomes: The Kansas City Chiefs quarterback has won a Super Bowl and has been named NFL MVP twice. He's known for his arm strength, athleticism, and ability to make plays outside of the pocket.\n",
|
||||
"5. Russell Wilson: The Seattle Seahawks quarterback has won a Super Bowl and has been named NFL MVP once. He's known for his mobility, accuracy, and ability to extend plays with his feet.\n",
|
||||
"Aaron Rodgers:\n",
|
||||
"\n",
|
||||
"Of course, there are other talented quarterbacks in the league, such as Lamar Jackson, Deshaun Watson, and Carson Wentz, who could also be considered among the best. Ultimately, the answer to the question of who's the best quarterback in the NFL is subjective and can vary depending on individual perspectives and criteria.\n"
|
||||
"* He has led the Green Bay Packers to a Super Bowl victory in 2010.\n",
|
||||
"* He has been named Super Bowl MVP once.\n",
|
||||
"* He has thrown for over 40,000 yards in his career, the most of any quarterback in NFL history.\n",
|
||||
"* He has thrown for 40 or more touchdowns in a season three times, the most of any quarterback in NFL history.\n",
|
||||
"* He has a career passer rating of 103.1, the highest of any quarterback in NFL history.\n",
|
||||
"\n",
|
||||
"So, who's the best quarterback in the NFL? It's a tough call, but here's my opinion:\n",
|
||||
"\n",
|
||||
"I think Aaron Rodgers is the best quarterback in the NFL right now. He has led the Packers to a Super Bowl victory and has had some incredible seasons, including the 2011 season when he threw for 45 touchdowns and just 6 interceptions. He has a strong arm, great accuracy, and is incredibly mobile for a quarterback of his size. He also has a great sense of timing and knows when to take risks and when to play it safe.\n",
|
||||
"\n",
|
||||
"Tom Brady is a close second, though. He has an incredible track record of success, including six Super Bowl victories, and has been one of the most consistent quarterbacks in the league for the past two decades. He has a strong arm and is incredibly accurate\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"#single prompt\n",
|
||||
"# Single prompt\n",
|
||||
"output = llm(\"Who's the best quarterback in the NFL?\")\n",
|
||||
"print(output)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 4,
|
||||
"id": "afc7de6f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -119,19 +137,22 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"generations=[[Generation(text=\"\\nWho is the best cricket player in the world in 2016?\\nThe best cricket player in the world in 2016 is Virat Kohli. The Indian captain has had a fabulous year, scoring heavily in all formats of the game, leading India to several victories, and breaking several records. In Test cricket, Kohli has scored 1215 runs at an average of 75.33 with 6 centuries and 4 fifties, which is the highest number of runs scored by any player in a calendar year. In ODI cricket, he has scored 1143 runs at an average of 83.42 with 7 centuries and 6 fifties, which is also the highest number of runs scored by any player in a calendar year. Additionally, Kohli has led India to the number one ranking in Test cricket, and has been named the ICC Test Player of the Year and the ICC ODI Player of the Year.\\nVirat Kohli has been in incredible form in 2016, and his performances have made him the standout player of the year. Other players who have had a great year include Steve Smith, Joe Root, and Kane Williamson, but Kohli's consistency and dominance in all formats of the game make him the best cricket player in the world in 2016.\", generation_info=None)], [Generation(text=\"\\n\\nA: LeBron James.\\n\\nB: Kevin Durant.\\n\\nC: Steph Curry.\\n\\nD: James Harden.\\n\\nE: Other (please specify).\\n\\nWhat's your answer?\", generation_info=None)]] llm_output={'token_usage': {}, 'model_id': 'fireworks-llama-v2-13b-chat'} run=[RunInfo(run_id=UUID('d14b6bee-7692-46ad-8798-acb6f72fc7fb')), RunInfo(run_id=UUID('b9f5b3b5-9e62-4eaf-b269-ecf0cbbcfb82'))]\n"
|
||||
"[[Generation(text='\\nThe best cricket player in 2016 is a matter of opinion, but some of the top contenders for the title include:\\n\\n1. Virat Kohli (India): Kohli had a phenomenal year in 2016, scoring over 1,000 runs in Test cricket, including four centuries, and averaging over 70. He also scored heavily in ODI cricket, with an average of over 80.\\n2. Steve Smith (Australia): Smith had a remarkable year in 2016, leading Australia to a Test series victory in India and scoring over 1,000 runs in the format, including five centuries. He also averaged over 60 in ODI cricket.\\n3. KL Rahul (India): Rahul had a breakout year in 2016, scoring over 1,000 runs in Test cricket, including four centuries, and averaging over 60. He also scored heavily in ODI cricket, with an average of over 70.\\n4. Joe Root (England): Root had a solid year in 2016, scoring over 1,000 runs in Test cricket, including four centuries, and averaging over 50. He also scored heavily in ODI cricket, with an average of over 80.\\n5. Quinton de Kock (South Africa): De Kock had a remarkable year in 2016, scoring over 1,000 runs in ODI cricket, including six centuries, and averaging over 80. He also scored heavily in Test cricket, with an average of over 50.\\n\\nThese are just a few of the top contenders for the title of best cricket player in 2016, but there were many other talented players who also had impressive years. Ultimately, the answer to this question is subjective and depends on individual opinions and criteria for evaluation.', generation_info=None)], [Generation(text=\"\\nThis is a tough one, as there are so many great players in the league right now. But if I had to choose one, I'd say LeBron James is the best basketball player in the league. He's a once-in-a-generation talent who can dominate the game in so many ways. He's got incredible speed, strength, and court vision, and he's always finding new ways to improve his game. Plus, he's been doing it at an elite level for over a decade now, which is just amazing.\\n\\nBut don't just take my word for it - there are plenty of other great players in the league who could make a strong case for being the best. Guys like Kevin Durant, Steph Curry, James Harden, and Giannis Antetokounmpo are all having incredible seasons, and they've all got their own unique skills and strengths that make them special. So ultimately, it's up to you to decide who you think is the best basketball player in the league.\", generation_info=None)]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"#calling multiple prompts\n",
|
||||
"output = llm.generate([\"Who's the best cricket player in 2016?\", \"Who's the best basketball player in the league?\"])\n",
|
||||
"print(output)"
|
||||
"# Calling multiple prompts\n",
|
||||
"output = llm.generate([\n",
|
||||
" \"Who's the best cricket player in 2016?\",\n",
|
||||
" \"Who's the best basketball player in the league?\",\n",
|
||||
"])\n",
|
||||
"print(output.generations)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 5,
|
||||
"id": "b801c20d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -140,13 +161,13 @@
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Kansas City in December can be quite chilly, with average high\n"
|
||||
"Kansas City in December is quite cold, with temperatures typically r\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"#setting parameters: model_id, temperature, max_tokens, top_p\n",
|
||||
"llm = Fireworks(model_id=\"accounts/fireworks/models/fireworks-llama-v2-13b-chat\", temperature=0.7, max_tokens=15, top_p=1.0)\n",
|
||||
"# Setting additional parameters: temperature, max_tokens, top_p\n",
|
||||
"llm = Fireworks(model_id=\"accounts/fireworks/models/llama-v2-13b-chat\", temperature=0.7, max_tokens=15, top_p=1.0)\n",
|
||||
"print(llm(\"What's the weather like in Kansas City in December?\"))"
|
||||
]
|
||||
},
|
||||
@@ -162,7 +183,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"execution_count": 6,
|
||||
"id": "fd2c6bc1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -171,34 +192,25 @@
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"(Note: I'm just an AI and not a branding expert, so take this as a starting point for your own research and brainstorming.)\n",
|
||||
"A good name for a company that makes football helmets could be:\n",
|
||||
"Naming a company can be a fun and creative process! Here are a few name ideas for a company that makes football helmets:\n",
|
||||
"\n",
|
||||
"1. Helix Pro: This name plays off the idea of a helix, or spiral, shape that is commonly associated with football helmets. \"Pro\" implies a professional-level product.\n",
|
||||
"2. Gridiron Gear: \"Gridiron\" is a term used to describe a football field, and \"gear\" highlights the company's focus on producing high-quality football helmets.\n",
|
||||
"3. Linebacker Lab: \"Linebacker\" is a position on the football field, and \"Lab\" suggests a focus on research and development.\n",
|
||||
"4. Helmet Hut: This name is simple and easy to remember, and it immediately conveys the company's focus on football helmets.\n",
|
||||
"5. Tackle Tech: \"Tackle\" is a term used in football to describe a hit or collision, and \"Tech\" implies a focus on advanced technology and innovation.\n",
|
||||
"6. Victory Vest: \"Victory\" implies a focus on winning and success, and \"Vest\" could suggest a protective or armored design.\n",
|
||||
"7. Pigskin Pro: \"Pigskin\" is a term used to describe a football, and \"Pro\" implies a professional-level product.\n",
|
||||
"8. Football Fusion: This name could suggest a combination of different materials or technologies to create a high-quality football helmet.\n",
|
||||
"9. Endzone Edge: \"Endzone\" is the area of the football field where a team scores a touchdown, and \"Edge\" implies a competitive advantage.\n",
|
||||
"10. MVP Masks: \"MVP\" stands for \"Most Valuable Player,\" and \"Masks\" highlights the protective nature of the company's football helmets.\n",
|
||||
"1. Helix Headgear: This name plays off the idea of the helix shape of a football helmet and could be a memorable and catchy name for a company.\n",
|
||||
"2. Gridiron Gear: \"Gridiron\" is a term used to describe a football field, and \"gear\" refers to the products the company sells. This name is straightforward and easy to understand.\n",
|
||||
"3. Cushion Crusaders: This name emphasizes the protective qualities of football helmets and could appeal to customers looking for safety-conscious products.\n",
|
||||
"4. Helmet Heroes: This name has a fun, heroic tone and could appeal to customers looking for high-quality products.\n",
|
||||
"5. Tackle Tech: \"Tackle\" is a term used in football to describe a player's attempt to stop an opponent, and \"tech\" refers to the technology used in the helmets. This name could appeal to customers interested in innovative products.\n",
|
||||
"6. Padded Protection: This name emphasizes the protective qualities of football helmets and could appeal to customers looking for products that prioritize safety.\n",
|
||||
"7. Gridiron Gear Co.: This name is simple and straightforward, and it clearly conveys the company's focus on football-related products.\n",
|
||||
"8. Helmet Haven: This name has a soothing, protective tone and could appeal to customers looking for a reliable brand.\n",
|
||||
"\n",
|
||||
"Remember, the name you choose for your company should be memorable, easy to pronounce and spell, and convey a sense of quality and professionalism. It's also important to check that the name isn't already in use by another company, and to consider any potential trademark issues.\n"
|
||||
"Remember to choose a name that reflects your company's values and mission, and that resonates with your target market. Good luck with your company!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"human_message_prompt = HumanMessagePromptTemplate(\n",
|
||||
" prompt=PromptTemplate(\n",
|
||||
" template=\"What is a good name for a company that makes {product}?\",\n",
|
||||
" input_variables=[\"product\"],\n",
|
||||
" )\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"human_message_prompt = HumanMessagePromptTemplate.from_template(\"What is a good name for a company that makes {product}?\")\n",
|
||||
"chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])\n",
|
||||
"chat = Fireworks()\n",
|
||||
"chat = FireworksChat()\n",
|
||||
"chain = LLMChain(llm=chat, prompt=chat_prompt_template)\n",
|
||||
"output = chain.run(\"football helmets\")\n",
|
||||
"\n",
|
||||
@@ -222,7 +234,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.11.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -31,12 +31,11 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms.bedrock import Bedrock\n",
|
||||
"from langchain.llms import Bedrock\n",
|
||||
"\n",
|
||||
"llm = Bedrock(\n",
|
||||
" credentials_profile_name=\"bedrock-admin\",\n",
|
||||
" model_id=\"amazon.titan-tg1-large\",\n",
|
||||
" endpoint_url=\"custom_endpoint_url\",\n",
|
||||
" model_id=\"amazon.titan-tg1-large\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
||||
169
docs/extras/integrations/llms/titan_takeoff.ipynb
Normal file
@@ -0,0 +1,169 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Titan Takeoff\n",
|
||||
"\n",
|
||||
"TitanML helps businesses build and deploy better, smaller, cheaper, and faster NLP models through our training, compression, and inference optimization platform. \n",
|
||||
"\n",
|
||||
"Our inference server, [Titan Takeoff](https://docs.titanml.co/docs/titan-takeoff/getting-started) enables deployment of LLMs locally on your hardware in a single command. Most generative model architectures are supported, such as Falcon, Llama 2, GPT2, T5 and many more."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation\n",
|
||||
"\n",
|
||||
"To get started with Iris Takeoff, all you need is to have docker and python installed on your local system. If you wish to use the server with gpu suport, then you will need to install docker with cuda support.\n",
|
||||
"\n",
|
||||
"For Mac and Windows users, make sure you have the docker daemon running! You can check this by running docker ps in your terminal. To start the daemon, open the docker desktop app.\n",
|
||||
"\n",
|
||||
"Run the following command to install the Iris CLI that will enable you to run the takeoff server:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "shellscript"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install titan-iris"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Choose a Model\n",
|
||||
"Iris Takeoff supports many of the most powerful generative text models, such as Falcon, MPT, and Llama. See the [supported models](https://docs.titanml.co/docs/titan-takeoff/supported-models) for more information. For information about using your own models, see the [custom models](https://docs.titanml.co/docs/titan-takeoff/Advanced/custom-models).\n",
|
||||
"\n",
|
||||
"Going forward in this demo we will be using the falcon 7B instruct model. This is a good open source model that is trained to follow instructions, and is small enough to easily inference even on CPUs.\n",
|
||||
"\n",
|
||||
"## Taking off\n",
|
||||
"Models are referred to by their model id on HuggingFace. Takeoff uses port 8000 by default, but can be configured to use another port. There is also support to use a Nvidia GPU by specifing cuda for the device flag.\n",
|
||||
"\n",
|
||||
"To start the takeoff server, run:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "shellscript"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"iris takeoff --model tiiuae/falcon-7b-instruct --device cpu\n",
|
||||
"iris takeoff --model tiiuae/falcon-7b-instruct --device cuda # Nvidia GPU required\n",
|
||||
"iris takeoff --model tiiuae/falcon-7b-instruct --device cpu --port 5000 # run on port 5000 (default: 8000)\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You will then be directed to a login page, where you will need to create an account to proceed.\n",
|
||||
"After logging in, run the command onscreen to check whether the server is ready. When it is ready, you can start using the Takeoff integration\n",
|
||||
"\n",
|
||||
"## Inferencing your model\n",
|
||||
"To access your LLM, use the TitanTakeoff LLM wrapper:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import TitanTakeoff\n",
|
||||
"\n",
|
||||
"llm = TitanTakeoff(\n",
|
||||
" port=8000,\n",
|
||||
" generate_max_length=128,\n",
|
||||
" temperature=1.0\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"prompt = \"What is the largest planet in the solar system?\"\n",
|
||||
"\n",
|
||||
"llm(prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"No parameters are needed by default, but a port can be specified and [generation parameters](https://docs.titanml.co/docs/titan-takeoff/Advanced/generation-parameters) can be supplied.\n",
|
||||
"\n",
|
||||
"### Streaming\n",
|
||||
"Streaming is also supported via the streaming flag:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"\n",
|
||||
"llm = TitanTakeoff(port=8000, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), streaming=True)\n",
|
||||
"\n",
|
||||
"prompt = \"What is the capital of France?\"\n",
|
||||
"\n",
|
||||
"llm(prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Integration with LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"\n",
|
||||
"llm = TitanTakeoff()\n",
|
||||
"\n",
|
||||
"template = \"What is the capital of {country}\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"country\"])\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
|
||||
"\n",
|
||||
"generated = llm_chain.run(country=\"Belgium\")\n",
|
||||
"print(generated)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,67 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Rockset Chat Message History\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use [Rockset](https://rockset.com/docs) to store chat message history. \n",
|
||||
"\n",
|
||||
"To begin, with get your API key from the [Rockset console](https://console.rockset.com/apikeys). Find your API region for the Rockset [API reference](https://rockset.com/docs/rest-api#introduction)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "plaintext"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory.chat_message_histories import RocksetChatMessageHistory\n",
|
||||
"from rockset import RocksetClient, Regions\n",
|
||||
"\n",
|
||||
"history = RocksetChatMessageHistory(\n",
|
||||
" session_id=\"MySession\",\n",
|
||||
" client=RocksetClient(\n",
|
||||
" api_key=\"YOUR API KEY\", \n",
|
||||
" host=Regions.usw2a1 # us-west-2 Oregon\n",
|
||||
" ),\n",
|
||||
" collection=\"langchain_demo\",\n",
|
||||
" sync=True\n",
|
||||
")\n",
|
||||
"history.add_user_message(\"hi!\")\n",
|
||||
"history.add_ai_message(\"whats up?\")\n",
|
||||
"print(history.messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The output should be something like:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"[\n",
|
||||
" HumanMessage(content='hi!', additional_kwargs={'id': '2e62f1c2-e9f7-465e-b551-49bae07fe9f0'}, example=False), \n",
|
||||
" AIMessage(content='whats up?', additional_kwargs={'id': 'b9be8eda-4c18-4cf8-81c3-e91e876927d0'}, example=False)\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"```"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
21
docs/extras/integrations/providers/bageldb.mdx
Normal file
@@ -0,0 +1,21 @@
|
||||
# BagelDB
|
||||
|
||||
> [BagelDB](https://www.bageldb.ai/) (`Open Vector Database for AI`), is like GitHub for AI data.
|
||||
It is a collaborative platform where users can create,
|
||||
share, and manage vector datasets. It can support private projects for independent developers,
|
||||
internal collaborations for enterprises, and public contributions for data DAOs.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install betabageldb
|
||||
```
|
||||
|
||||
|
||||
## VectorStore
|
||||
|
||||
See a [usage example](/docs/integrations/vectorstores/bageldb).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Bagel
|
||||
```
|
||||
19
docs/extras/integrations/providers/dingo.mdx
Normal file
@@ -0,0 +1,19 @@
|
||||
# Dingo
|
||||
|
||||
This page covers how to use the Dingo ecosystem within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Dingo wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python SDK with `pip install dingodb`
|
||||
|
||||
## VectorStore
|
||||
|
||||
There exists a wrapper around Dingo indexes, allowing you to use it as a vectorstore,
|
||||
whether for semantic search or example selection.
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain.vectorstores import Dingo
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the Dingo wrapper, see [this notebook](/docs/integrations/vectorstores/dingo.html)
|
||||
@@ -4,14 +4,15 @@ This page covers how to use the Fireworks models within Langchain.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- To use the Fireworks model, you need to have a Fireworks API key. To generate one, sign up at platform.fireworks.ai
|
||||
- To use the Fireworks model, you need to have a Fireworks API key. To generate one, sign up at [app.fireworks.ai](https://app.fireworks.ai).
|
||||
- Authenticate by setting the FIREWORKS_API_KEY environment variable.
|
||||
|
||||
## LLM
|
||||
|
||||
Fireworks integrates with Langchain through the LLM module, which allows for standardized usage of any models deployed on the Fireworks models.
|
||||
|
||||
In this example, we'll work the llama-v2-13b.
|
||||
In this example, we'll work the llama-v2-13b-chat model.
|
||||
|
||||
```python
|
||||
from langchain.llms.fireworks import Fireworks
|
||||
|
||||
|
||||
@@ -1,22 +1,23 @@
|
||||
# Grobid
|
||||
|
||||
GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents.
|
||||
|
||||
It is designed and expected to be used to parse academic papers, where it works particularly well.
|
||||
|
||||
*Note*: if the articles supplied to Grobid are large documents (e.g. dissertations) exceeding a certain number
|
||||
of elements, they might not be processed.
|
||||
|
||||
This page covers how to use the Grobid to parse articles for LangChain.
|
||||
It is separated into two parts: installation and running the server
|
||||
|
||||
## Installation and Setup
|
||||
#Ensure You have Java installed
|
||||
!apt-get install -y openjdk-11-jdk -q
|
||||
!update-alternatives --set java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
|
||||
## Installation
|
||||
The grobid installation is described in details in https://grobid.readthedocs.io/en/latest/Install-Grobid/.
|
||||
However, it is probably easier and less troublesome to run grobid through a docker container,
|
||||
as documented [here](https://grobid.readthedocs.io/en/latest/Grobid-docker/).
|
||||
|
||||
#Clone and install the Grobid Repo
|
||||
import os
|
||||
!git clone https://github.com/kermitt2/grobid.git
|
||||
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
|
||||
os.chdir('grobid')
|
||||
!./gradlew clean install
|
||||
## Use Grobid with LangChain
|
||||
|
||||
#Run the server,
|
||||
get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')
|
||||
Once grobid is installed and up and running (you can check by accessing it http://localhost:8070),
|
||||
you're ready to go.
|
||||
|
||||
You can now use the GrobidParser to produce documents
|
||||
```python
|
||||
@@ -41,4 +42,5 @@ loader = GenericLoader.from_filesystem(
|
||||
)
|
||||
docs = loader.load()
|
||||
```
|
||||
Chunk metadata will include bboxes although these are a bit funky to parse, see https://grobid.readthedocs.io/en/latest/Coordinates-in-PDF/
|
||||
Chunk metadata will include Bounding Boxes. Although these are a bit funky to parse,
|
||||
they are explained in https://grobid.readthedocs.io/en/latest/Coordinates-in-PDF/
|
||||
104
docs/extras/integrations/providers/log10.mdx
Normal file
@@ -0,0 +1,104 @@
|
||||
# Log10
|
||||
|
||||
This page covers how to use the [Log10](https://log10.io) within LangChain.
|
||||
|
||||
## What is Log10?
|
||||
|
||||
Log10 is an [open source](https://github.com/log10-io/log10) proxiless LLM data management and application development platform that lets you log, debug and tag your Langchain calls.
|
||||
|
||||
## Quick start
|
||||
|
||||
1. Create your free account at [log10.io](https://log10.io)
|
||||
2. Add your `LOG10_TOKEN` and `LOG10_ORG_ID` from the Settings and Organization tabs respectively as environment variables.
|
||||
3. Also add `LOG10_URL=https://log10.io` and your usual LLM API key: for e.g. `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` to your environment
|
||||
|
||||
## How to enable Log10 data management for Langchain
|
||||
|
||||
Integration with log10 is a simple one-line `log10_callback` integration as shown below:
|
||||
|
||||
```python
|
||||
from langchain.chat_models import ChatOpenAI
|
||||
from langchain.schema import HumanMessage
|
||||
|
||||
from log10.langchain import Log10Callback
|
||||
from log10.llm import Log10Config
|
||||
|
||||
log10_callback = Log10Callback(log10_config=Log10Config())
|
||||
|
||||
messages = [
|
||||
HumanMessage(content="You are a ping pong machine"),
|
||||
HumanMessage(content="Ping?"),
|
||||
]
|
||||
|
||||
llm = ChatOpenAI(model_name="gpt-3.5-turbo", callbacks=[log10_callback])
|
||||
```
|
||||
|
||||
[Log10 + Langchain + Logs docs](https://github.com/log10-io/log10/blob/main/logging.md#langchain-logger)
|
||||
|
||||
[More details + screenshots](https://log10.io/docs/logs) including instructions for self-hosting logs
|
||||
|
||||
## How to use tags with Log10
|
||||
|
||||
```python
|
||||
from langchain import OpenAI
|
||||
from langchain.chat_models import ChatAnthropic
|
||||
from langchain.chat_models import ChatOpenAI
|
||||
from langchain.schema import HumanMessage
|
||||
|
||||
from log10.langchain import Log10Callback
|
||||
from log10.llm import Log10Config
|
||||
|
||||
log10_callback = Log10Callback(log10_config=Log10Config())
|
||||
|
||||
messages = [
|
||||
HumanMessage(content="You are a ping pong machine"),
|
||||
HumanMessage(content="Ping?"),
|
||||
]
|
||||
|
||||
llm = ChatOpenAI(model_name="gpt-3.5-turbo", callbacks=[log10_callback], temperature=0.5, tags=["test"])
|
||||
completion = llm.predict_messages(messages, tags=["foobar"])
|
||||
print(completion)
|
||||
|
||||
llm = ChatAnthropic(model="claude-2", callbacks=[log10_callback], temperature=0.7, tags=["baz"])
|
||||
llm.predict_messages(messages)
|
||||
print(completion)
|
||||
|
||||
llm = OpenAI(model_name="text-davinci-003", callbacks=[log10_callback], temperature=0.5)
|
||||
completion = llm.predict("You are a ping pong machine.\nPing?\n")
|
||||
print(completion)
|
||||
```
|
||||
|
||||
You can also intermix direct OpenAI calls and Langchain LLM calls:
|
||||
|
||||
```python
|
||||
import os
|
||||
from log10.load import log10, log10_session
|
||||
import openai
|
||||
from langchain import OpenAI
|
||||
|
||||
log10(openai)
|
||||
|
||||
with log10_session(tags=["foo", "bar"]):
|
||||
# Log a direct OpenAI call
|
||||
response = openai.Completion.create(
|
||||
model="text-ada-001",
|
||||
prompt="Where is the Eiffel Tower?",
|
||||
temperature=0,
|
||||
max_tokens=1024,
|
||||
top_p=1,
|
||||
frequency_penalty=0,
|
||||
presence_penalty=0,
|
||||
)
|
||||
print(response)
|
||||
|
||||
# Log a call via Langchain
|
||||
llm = OpenAI(model_name="text-ada-001", temperature=0.5)
|
||||
response = llm.predict("You are a ping pong machine.\nPing?\n")
|
||||
print(response)
|
||||
```
|
||||
|
||||
## How to debug Langchain calls
|
||||
|
||||
[Example of debugging](https://log10.io/docs/prompt_chain_debugging)
|
||||
|
||||
[More Langchain examples](https://github.com/log10-io/log10/tree/main/examples#langchain)
|
||||
@@ -23,4 +23,11 @@ from langchain.vectorstores import Rockset
|
||||
See a [usage example](/docs/integrations/document_loaders/rockset).
|
||||
```python
|
||||
from langchain.document_loaders import RocksetLoader
|
||||
```
|
||||
|
||||
## Chat Message History
|
||||
|
||||
See a [usage example](/docs/integrations/memory/rockset_chat_message_history).
|
||||
```python
|
||||
from langchain.memory.chat_message_histories import RocksetChatMessageHistory
|
||||
```
|
||||
@@ -63,7 +63,7 @@ results = vectara.similarity_score("what is LangChain?")
|
||||
- `k`: number of results to return (defaults to 5)
|
||||
- `lambda_val`: the [lexical matching](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) factor for hybrid search (defaults to 0.025)
|
||||
- `filter`: a [filter](https://docs.vectara.com/docs/common-use-cases/filtering-by-metadata/filter-overview) to apply to the results (default None)
|
||||
- `n_sentence_context`: number of sentences to include before/after the actual matching segment when returning results. This defaults to 0 so as to return the exact text segment that matches, but can be used with other values e.g. 2 or 3 to return adjacent text segments.
|
||||
- `n_sentence_context`: number of sentences to include before/after the actual matching segment when returning results. This defaults to 2.
|
||||
|
||||
The results are returned as a list of relevant documents, and a relevance score of each document.
|
||||
|
||||
|
||||
@@ -20,7 +20,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 5,
|
||||
"id": "282239c8-e03a-4abc-86c1-ca6120231a20",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -28,7 +28,7 @@
|
||||
"from langchain.embeddings import BedrockEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = BedrockEmbeddings(\n",
|
||||
" credentials_profile_name=\"bedrock-admin\", endpoint_url=\"custom_endpoint_url\"\n",
|
||||
" credentials_profile_name=\"bedrock-admin\", region_name=\"us-east-1\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -49,7 +49,29 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings.embed_documents([\"This is a content of the document\"])"
|
||||
"embeddings.embed_documents([\"This is a content of the document\", \"This is another document\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9f6b364d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# async embed query\n",
|
||||
"await embeddings.aembed_query(\"This is a content of the document\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c9240a5a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# async embed documents\n",
|
||||
"await embeddings.aembed_documents([\"This is a content of the document\", \"This is another document\"])"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -69,7 +91,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
"version": "3.9.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -12,7 +12,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 6,
|
||||
"id": "0be1af71",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -22,7 +22,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 29,
|
||||
"id": "2c66e5da",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -32,7 +32,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 30,
|
||||
"id": "01370375",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -42,7 +42,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 31,
|
||||
"id": "bfb6142c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -52,7 +52,32 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 32,
|
||||
"id": "91bc875d-829b-4c3d-8e6f-fc2dda30a3bd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[-0.003186025367556387,\n",
|
||||
" 0.011071979803637493,\n",
|
||||
" -0.004020420763285827,\n",
|
||||
" -0.011658221276953042,\n",
|
||||
" -0.0010534035786864363]"
|
||||
]
|
||||
},
|
||||
"execution_count": 32,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query_result[:5]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"id": "0356c3b7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -60,6 +85,31 @@
|
||||
"doc_result = embeddings.embed_documents([text])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 34,
|
||||
"id": "a4b0d49e-0c73-44b6-aed5-5b426564e085",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[-0.003186025367556387,\n",
|
||||
" 0.011071979803637493,\n",
|
||||
" -0.004020420763285827,\n",
|
||||
" -0.011658221276953042,\n",
|
||||
" -0.0010534035786864363]"
|
||||
]
|
||||
},
|
||||
"execution_count": 34,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"doc_result[0][:5]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bb61bbeb",
|
||||
@@ -70,7 +120,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"id": "c0b072cc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -80,17 +130,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 23,
|
||||
"id": "a56b70f5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-search-ada-doc-001\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 24,
|
||||
"id": "14aefb64",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -100,7 +150,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 25,
|
||||
"id": "3c39ed33",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -110,7 +160,32 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 26,
|
||||
"id": "2ee7ce9f-d506-4810-8897-e44334412714",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[0.004452846988523035,\n",
|
||||
" 0.034550655976098514,\n",
|
||||
" -0.015029939040690051,\n",
|
||||
" 0.03827273883655212,\n",
|
||||
" 0.005785414075152477]"
|
||||
]
|
||||
},
|
||||
"execution_count": 26,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query_result[:5]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"id": "e3221db6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -118,6 +193,31 @@
|
||||
"doc_result = embeddings.embed_documents([text])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"id": "a0865409-3a6d-468f-939f-abde17c7cac3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[0.004452846988523035,\n",
|
||||
" 0.034550655976098514,\n",
|
||||
" -0.015029939040690051,\n",
|
||||
" 0.03827273883655212,\n",
|
||||
" 0.005785414075152477]"
|
||||
]
|
||||
},
|
||||
"execution_count": 28,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"doc_result[0][:5]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -132,7 +232,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.11.1 64-bit",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -146,7 +246,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
134
docs/extras/integrations/tools/alpha_vantage.ipynb
Normal file
@@ -0,0 +1,134 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "245a954a",
|
||||
"metadata": {
|
||||
"id": "245a954a"
|
||||
},
|
||||
"source": [
|
||||
"# Alpha Vantage\n",
|
||||
"\n",
|
||||
">[Alpha Vantage](https://www.alphavantage.co) Alpha Vantage provides realtime and historical financial market data through a set of powerful and developer-friendly data APIs and spreadsheets. \n",
|
||||
"\n",
|
||||
"Use the ``AlphaVantageAPIWrapper`` to get currency exchange rates."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "34bb5968",
|
||||
"metadata": {
|
||||
"id": "34bb5968"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"ALPHAVANTAGE_API_KEY\"] = getpass.getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "ac4910f8",
|
||||
"metadata": {
|
||||
"id": "ac4910f8"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.utilities.alpha_vantage import AlphaVantageAPIWrapper"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "84b8f773",
|
||||
"metadata": {
|
||||
"id": "84b8f773"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"alpha_vantage = AlphaVantageAPIWrapper()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "068991a6",
|
||||
"metadata": {
|
||||
"id": "068991a6",
|
||||
"outputId": "c5cdc6ec-03cf-4084-cc6f-6ae792d91d39"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'1. From_Currency Code': 'USD',\n",
|
||||
" '2. From_Currency Name': 'United States Dollar',\n",
|
||||
" '3. To_Currency Code': 'JPY',\n",
|
||||
" '4. To_Currency Name': 'Japanese Yen',\n",
|
||||
" '5. Exchange Rate': '144.93000000',\n",
|
||||
" '6. Last Refreshed': '2023-08-11 21:31:01',\n",
|
||||
" '7. Time Zone': 'UTC',\n",
|
||||
" '8. Bid Price': '144.92600000',\n",
|
||||
" '9. Ask Price': '144.93400000'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"alpha_vantage.run(\"USD\", \"JPY\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "84fc2b66-c08f-4cd3-ae13-494c54789c09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "53f3bc57609c7a84333bb558594977aa5b4026b1d6070b93987956689e367341"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,181 +1,194 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Dall-E Image Generator\n",
|
||||
"\n",
|
||||
"This notebook shows how you can generate images from a prompt synthesized using an OpenAI LLM. The images are generated using Dall-E, which uses the same OpenAI API key as the LLM."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Needed if you would like to display images in the notebook\n",
|
||||
"!pip install opencv-python scikit-image"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"id": "q-k8wmp0zquh"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"import os\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"<your-key-here>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run as a chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.utilities.dalle_image_generator import DallEAPIWrapper\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0.9)\n",
|
||||
"prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"image_desc\"],\n",
|
||||
" template=\"Generate a detailed prompt to generate an image based on the following description: {image_desc}\",\n",
|
||||
")\n",
|
||||
"chain = LLMChain(llm=llm, prompt=prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"https://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-mg1OWiziXxQN1aR2XRsLNndg.png?st=2023-01-31T07%3A34%3A15Z&se=2023-01-31T09%3A34%3A15Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A44Z&ske=2023-01-31T22%3A19%3A44Z&sks=b&skv=2021-08-06&sig=XDPee5aEng%2BcbXq2mqhh39uHGZTBmJgGAerSd0g%2BMEs%3D\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"image_url = DallEAPIWrapper().run(chain.run(\"halloween night at a haunted museum\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You can click on the link above to display the image for\n",
|
||||
"# Or you can try the options below to display the image inline in this notebook\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" import google.colab\n",
|
||||
" IN_COLAB = True\n",
|
||||
"except:\n",
|
||||
" IN_COLAB = False\n",
|
||||
"\n",
|
||||
"if IN_COLAB:\n",
|
||||
" from google.colab.patches import cv2_imshow # for image display\n",
|
||||
" from skimage import io\n",
|
||||
"\n",
|
||||
" image = io.imread(image_url) \n",
|
||||
" cv2_imshow(image)\n",
|
||||
"else:\n",
|
||||
" import cv2\n",
|
||||
" from skimage import io\n",
|
||||
"\n",
|
||||
" image = io.imread(image_url) \n",
|
||||
" cv2.imshow('image', image)\n",
|
||||
" cv2.waitKey(0) #wait for a keyboard input\n",
|
||||
" cv2.destroyAllWindows()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run as a tool with an agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m What is the best way to turn this description into an image?\n",
|
||||
"Action: Dall-E Image Generator\n",
|
||||
"Action Input: A spooky Halloween night at a haunted museum\u001b[0mhttps://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A36Z&ske=2023-01-31T22%3A19%3A36Z&sks=b&skv=2021-08-06&sig=XsomxxBfu2CP78SzR9lrWUlbask4wBNnaMsHamy4VvU%3D\n",
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mhttps://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A36Z&ske=2023-01-31T22%3A19%3A36Z&sks=b&skv=2021-08-06&sig=XsomxxBfu2CP78SzR9lrWUlbask4wBNnaMsHamy4VvU%3D\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m With the image generated, I can now make my final answer.\n",
|
||||
"Final Answer: An image of a Halloween night at a haunted museum can be seen here: https://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.agents import load_tools\n",
|
||||
"from langchain.agents import initialize_agent\n",
|
||||
"\n",
|
||||
"tools = load_tools(['dalle-image-generator'])\n",
|
||||
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)\n",
|
||||
"output = agent.run(\"Create an image of a halloween night at a haunted museum\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "langchain",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "3570c8892273ffbeee7ead61dc7c022b73551d9f55fb2584ac0e8e8920b18a89"
|
||||
}
|
||||
}
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Dall-E Image Generator\n",
|
||||
"\n",
|
||||
"This notebook shows how you can generate images from a prompt synthesized using an OpenAI LLM. The images are generated using Dall-E, which uses the same OpenAI API key as the LLM."
|
||||
]
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Needed if you would like to display images in the notebook\n",
|
||||
"!pip install opencv-python scikit-image"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"id": "q-k8wmp0zquh"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"import os\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"<your-key-here>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run as a chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.utilities.dalle_image_generator import DallEAPIWrapper\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0.9)\n",
|
||||
"prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"image_desc\"],\n",
|
||||
" template=\"Generate a detailed prompt to generate an image based on the following description: {image_desc}\",\n",
|
||||
")\n",
|
||||
"chain = LLMChain(llm=llm, prompt=prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"image_url = DallEAPIWrapper().run(chain.run(\"halloween night at a haunted museum\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'https://oaidalleapiprodscus.blob.core.windows.net/private/org-i0zjYONU3PemzJ222esBaAzZ/user-f6uEIOFxoiUZivy567cDSWni/img-i7Z2ZxvJ4IbbdAiO6OXJgS3v.png?st=2023-08-11T14%3A03%3A14Z&se=2023-08-11T16%3A03%3A14Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-08-10T20%3A58%3A32Z&ske=2023-08-11T20%3A58%3A32Z&sks=b&skv=2021-08-06&sig=/sECe7C0EAq37ssgBm7g7JkVIM/Q1W3xOstd0Go6slA%3D'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"image_url"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You can click on the link above to display the image \n",
|
||||
"# Or you can try the options below to display the image inline in this notebook\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" import google.colab\n",
|
||||
" IN_COLAB = True\n",
|
||||
"except:\n",
|
||||
" IN_COLAB = False\n",
|
||||
"\n",
|
||||
"if IN_COLAB:\n",
|
||||
" from google.colab.patches import cv2_imshow # for image display\n",
|
||||
" from skimage import io\n",
|
||||
"\n",
|
||||
" image = io.imread(image_url) \n",
|
||||
" cv2_imshow(image)\n",
|
||||
"else:\n",
|
||||
" import cv2\n",
|
||||
" from skimage import io\n",
|
||||
"\n",
|
||||
" image = io.imread(image_url) \n",
|
||||
" cv2.imshow('image', image)\n",
|
||||
" cv2.waitKey(0) #wait for a keyboard input\n",
|
||||
" cv2.destroyAllWindows()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run as a tool with an agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m What is the best way to turn this description into an image?\n",
|
||||
"Action: Dall-E Image Generator\n",
|
||||
"Action Input: A spooky Halloween night at a haunted museum\u001b[0mhttps://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A36Z&ske=2023-01-31T22%3A19%3A36Z&sks=b&skv=2021-08-06&sig=XsomxxBfu2CP78SzR9lrWUlbask4wBNnaMsHamy4VvU%3D\n",
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mhttps://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A36Z&ske=2023-01-31T22%3A19%3A36Z&sks=b&skv=2021-08-06&sig=XsomxxBfu2CP78SzR9lrWUlbask4wBNnaMsHamy4VvU%3D\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m With the image generated, I can now make my final answer.\n",
|
||||
"Final Answer: An image of a Halloween night at a haunted museum can be seen here: https://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.agents import load_tools\n",
|
||||
"from langchain.agents import initialize_agent\n",
|
||||
"\n",
|
||||
"tools = load_tools(['dalle-image-generator'])\n",
|
||||
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)\n",
|
||||
"output = agent.run(\"Create an image of a halloween night at a haunted museum\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "poetry-venv",
|
||||
"language": "python",
|
||||
"name": "poetry-venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "3570c8892273ffbeee7ead61dc7c022b73551d9f55fb2584ac0e8e8920b18a89"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
|
||||
300
docs/extras/integrations/vectorstores/bageldb.ipynb
Normal file
@@ -0,0 +1,300 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# BagelDB\n",
|
||||
"\n",
|
||||
"> [BagelDB](https://www.bageldb.ai/) (`Open Vector Database for AI`), is like GitHub for AI data.\n",
|
||||
"It is a collaborative platform where users can create,\n",
|
||||
"share, and manage vector datasets. It can support private projects for independent developers,\n",
|
||||
"internal collaborations for enterprises, and public contributions for data DAOs.\n",
|
||||
"\n",
|
||||
"### Installation and Setup\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"pip install betabageldb\n",
|
||||
"```\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create VectorStore from texts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import Bagel\n",
|
||||
"\n",
|
||||
"texts = [\"hello bagel\", \"hello langchain\", \"I love salad\", \"my car\", \"a dog\"]\n",
|
||||
"# create cluster and add texts\n",
|
||||
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='hello bagel', metadata={}),\n",
|
||||
" Document(page_content='my car', metadata={}),\n",
|
||||
" Document(page_content='I love salad', metadata={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# similarity search\n",
|
||||
"cluster.similarity_search(\"bagel\", k=3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[(Document(page_content='hello bagel', metadata={}), 0.27392977476119995),\n",
|
||||
" (Document(page_content='my car', metadata={}), 1.4783176183700562),\n",
|
||||
" (Document(page_content='I love salad', metadata={}), 1.5342965126037598)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# the score is a distance metric, so lower is better\n",
|
||||
"cluster.similarity_search_with_score(\"bagel\", k=3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# delete the cluster\n",
|
||||
"cluster.delete_cluster()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create VectorStore from docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)[:10]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 36,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create cluster with docs\n",
|
||||
"cluster = Bagel.from_documents(cluster_name=\"testing_with_docs\", documents=docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 37,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# similarity search\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = cluster.similarity_search(query)\n",
|
||||
"print(docs[0].page_content[:102])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Get all text/doc from Cluster"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 53,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"texts = [\"hello bagel\", \"this is langchain\"]\n",
|
||||
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts)\n",
|
||||
"cluster_data = cluster.get()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 54,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"dict_keys(['ids', 'embeddings', 'metadatas', 'documents'])"
|
||||
]
|
||||
},
|
||||
"execution_count": 54,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# all keys\n",
|
||||
"cluster_data.keys()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 56,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'ids': ['578c6d24-3763-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" '578c6d25-3763-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" 'fb2fc7d8-3762-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" 'fb2fc7d9-3762-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" '6b40881a-3762-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" '6b40881b-3762-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" '581e691e-3762-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" '581e691f-3762-11ee-a8ab-b7b7b34f99ba'],\n",
|
||||
" 'embeddings': None,\n",
|
||||
" 'metadatas': [{}, {}, {}, {}, {}, {}, {}, {}],\n",
|
||||
" 'documents': ['hello bagel',\n",
|
||||
" 'this is langchain',\n",
|
||||
" 'hello bagel',\n",
|
||||
" 'this is langchain',\n",
|
||||
" 'hello bagel',\n",
|
||||
" 'this is langchain',\n",
|
||||
" 'hello bagel',\n",
|
||||
" 'this is langchain']}"
|
||||
]
|
||||
},
|
||||
"execution_count": 56,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# all values and keys\n",
|
||||
"cluster_data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 57,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"cluster.delete_cluster()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create cluster with metadata & filter using metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 63,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[(Document(page_content='hello bagel', metadata={'source': 'notion'}), 0.0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 63,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"texts = [\"hello bagel\", \"this is langchain\"]\n",
|
||||
"metadatas = [{\"source\": \"notion\"}, {\"source\": \"google\"}]\n",
|
||||
"\n",
|
||||
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts, metadatas=metadatas)\n",
|
||||
"cluster.similarity_search_with_score(\"hello bagel\", where={\"source\": \"notion\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 64,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# delete the cluster\n",
|
||||
"cluster.delete_cluster()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
244
docs/extras/integrations/vectorstores/dingo.ipynb
Normal file
@@ -0,0 +1,244 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Dingo\n",
|
||||
"\n",
|
||||
">[Dingo](https://dingodb.readthedocs.io/en/latest/) is a distributed multi-mode vector database, which combines the characteristics of data lakes and vector databases, and can store data of any type and size (Key-Value, PDF, audio, video, etc.). It has real-time low-latency processing capabilities to achieve rapid insight and response, and can efficiently conduct instant analysis and process multi-modal data.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the DingoDB vector database.\n",
|
||||
"\n",
|
||||
"To run, you should have a [DingoDB instance up and running](https://github.com/dingodb/dingo-deploy/blob/main/README.md)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a62cff8a-bcf7-4e33-bbbc-76999c2e3e20",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install dingodb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7a0f9e02-8eb0-4aef-b11f-8861360472ee",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "8b6ed9cd-81b9-46e5-9c20-5aafca2844d0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key:········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Dingo\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "dcf88bdf",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from dingodb import DingoDB\n",
|
||||
"\n",
|
||||
"index_name = \"langchain-demo\"\n",
|
||||
"\n",
|
||||
"dingo_client = DingoDB(user=\"\", password=\"\", host=[\"127.0.0.1:13000\"])\n",
|
||||
"# First, check if our index already exists. If it doesn't, we create it\n",
|
||||
"if index_name not in dingo_client.get_index():\n",
|
||||
" # we create a new index\n",
|
||||
" dingo_client.create_index(\n",
|
||||
" index_name=index_name,\n",
|
||||
" dimension=1536,\n",
|
||||
" metric_type='cosine',\n",
|
||||
" auto_id=False\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`\n",
|
||||
"docsearch = Dingo.from_documents(docs, embeddings, client=dingo_client, index_name=index_name)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c3aae49e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Dingo\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "a8c513ab",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "fc516993",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(docs[0][1])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1eca81e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Adding More Text to an Existing Index\n",
|
||||
"\n",
|
||||
"More text can embedded and upserted to an existing Dingo index using the `add_texts` function"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e40d558b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vectorstore = Dingo(client, embeddings.embed_query, \"text\")\n",
|
||||
"\n",
|
||||
"vectorstore.add_texts(\"More text!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bcb858a8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Maximal Marginal Relevance Searches\n",
|
||||
"\n",
|
||||
"In addition to using similarity search in the retriever object, you can also use `mmr` as retriever."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "649083ab",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
|
||||
"matched_docs = retriever.get_relevant_documents(query)\n",
|
||||
"for i, d in enumerate(matched_docs):\n",
|
||||
" print(f\"\\n## Document {i}\\n\")\n",
|
||||
" print(d.page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7d3831ad",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Or use `max_marginal_relevance_search` directly:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "732f58b1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
|
||||
"for i, doc in enumerate(found_docs):\n",
|
||||
" print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -35,7 +35,7 @@
|
||||
"\n",
|
||||
" -- Create a table to store your documents\n",
|
||||
" create table documents (\n",
|
||||
" id bigserial primary key,\n",
|
||||
" id uuid primary key,\n",
|
||||
" content text, -- corresponds to Document.pageContent\n",
|
||||
" metadata jsonb, -- corresponds to Document.metadata\n",
|
||||
" embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed\n",
|
||||
|
||||
@@ -81,17 +81,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 21,
|
||||
"id": "53b7ce2d-3c09-4d1c-b66b-5769ce6746ae",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"WEAVIATE_API_KEY\"] = getpass.getpass(\"WEAVIATE_API_KEY:\")"
|
||||
"os.environ[\"WEAVIATE_API_KEY\"] = getpass.getpass(\"WEAVIATE_API_KEY:\")\n",
|
||||
"WEAVIATE_API_KEY = os.environ[\"WEAVIATE_API_KEY\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 7,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -106,7 +107,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 12,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -123,7 +124,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 14,
|
||||
"id": "21e9e528",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -133,7 +134,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 15,
|
||||
"id": "b4170176",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -144,7 +145,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 16,
|
||||
"id": "ecf3b890",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -166,6 +167,53 @@
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7826d0ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Authentication"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "13989a7c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Weaviate instances have authentication enabled by default. You can use either a username/password combination or API key. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "f6604f1d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<langchain.vectorstores.weaviate.Weaviate object at 0x107f46550>\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import weaviate\n",
|
||||
"\n",
|
||||
"client = weaviate.Client(url=WEAVIATE_URL, auth_client_secret=weaviate.AuthApiKey(WEAVIATE_API_KEY))\n",
|
||||
"\n",
|
||||
"# client = weaviate.Client(\n",
|
||||
"# url=WEAVIATE_URL,\n",
|
||||
"# auth_client_secret=weaviate.AuthClientPassword(\n",
|
||||
"# username = \"WCS_USERNAME\", # Replace w/ your WCS username\n",
|
||||
"# password = \"WCS_PASSWORD\", # Replace w/ your WCS password\n",
|
||||
"# ),\n",
|
||||
"# )\n",
|
||||
"\n",
|
||||
"vectorstore = Weaviate.from_documents(documents, embeddings, client=client, by_text=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
@@ -187,7 +235,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 17,
|
||||
"id": "102105a1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -213,7 +261,7 @@
|
||||
"id": "8fc3487b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Persistance"
|
||||
"# Persistence"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -249,7 +297,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": null,
|
||||
"id": "8b7df7ae",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -287,7 +335,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": null,
|
||||
"id": "5e824f3b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -298,7 +346,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": null,
|
||||
"id": "61209cc3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -311,7 +359,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": null,
|
||||
"id": "4abc3d37",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -327,7 +375,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": null,
|
||||
"id": "c7062393",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -339,7 +387,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": null,
|
||||
"id": "7e41b773",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
|
||||
594
docs/extras/modules/data_connection/caching_embeddings.ipynb
Normal file
@@ -0,0 +1,594 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bf4061ce",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Caching Embeddings\n",
|
||||
"\n",
|
||||
"Embeddings can be stored or temporarily cached to avoid needing to recompute them.\n",
|
||||
"\n",
|
||||
"Caching embeddings can be done using a `CacheBackedEmbeddings`.\n",
|
||||
"\n",
|
||||
"The cache backed embedder is a wrapper around an embedder that caches\n",
|
||||
"embeddings in a key-value store. \n",
|
||||
"\n",
|
||||
"The text is hashed and the hash is used as the key in the cache.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"The main supported way to initialized a `CacheBackedEmbeddings` is `from_bytes_store`. This takes in the following parameters:\n",
|
||||
"\n",
|
||||
"- underlying_embedder: The embedder to use for embedding.\n",
|
||||
"- document_embedding_cache: The cache to use for storing document embeddings.\n",
|
||||
"- namespace: (optional, defaults to `\"\"`) The namespace to use for document cache. This namespace is used to avoid collisions with other caches. For example, set it to the name of the embedding model used.\n",
|
||||
"\n",
|
||||
"**Attention**: Be sure to set the `namespace` parameter to avoid collisions of the same text embedded using different embeddings models."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "a463c3c2-749b-40d1-a433-84f68a1cd1c7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.storage import InMemoryStore, LocalFileStore, RedisStore\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings, CacheBackedEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9ddf07dd-3e72-41de-99d4-78e9521e272f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using with a vectorstore\n",
|
||||
"\n",
|
||||
"First, let's see an example that uses the local file system for storing embeddings and uses FAISS vectorstore for retrieval."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "9e4314d8-88ef-4f52-81ae-0be771168bb6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import FAISS"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "3e751f26-9b5b-4c10-843a-d784b5ea8538",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"underlying_embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "30743664-38f5-425d-8216-772b64e7f348",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fs = LocalFileStore(\"./cache/\")\n",
|
||||
"\n",
|
||||
"cached_embedder = CacheBackedEmbeddings.from_bytes_store(\n",
|
||||
" underlying_embeddings, fs, namespace=underlying_embeddings.model\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f8cdf33c-321d-4d2c-b76b-d6f5f8b42a92",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The cache is empty prior to embedding"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "f9ad627f-ced2-4277-b336-2434f22f2c8a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"list(fs.yield_keys())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a4effe04-b40f-42f8-a449-72fe6991cf20",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Load the document, split it into chunks, embed each chunk and load it into the vector store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "cf958ac2-e60e-4668-b32c-8bb2d78b3c61",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"raw_documents = TextLoader(\"../state_of_the_union.txt\").load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"documents = text_splitter.split_documents(raw_documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f526444b-93f8-423f-b6d1-dab539450921",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"create the vectorstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "3a1d7bb8-3b72-4bb5-9013-cf7729caca61",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 608 ms, sys: 58.9 ms, total: 667 ms\n",
|
||||
"Wall time: 1.3 s\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"db = FAISS.from_documents(documents, cached_embedder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "64fc53f5-d559-467f-bf62-5daef32ffbc0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If we try to create the vectostore again, it'll be much faster since it does not need to re-compute any embeddings."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "714cb2e2-77ba-41a8-bb83-84e75342af2d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 33.6 ms, sys: 3.96 ms, total: 37.6 ms\n",
|
||||
"Wall time: 36.8 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"db2 = FAISS.from_documents(documents, cached_embedder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1acc76b9-9c70-4160-b593-5f932c75e2b6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And here are some of the embeddings that got created:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "f2ca32dd-3712-4093-942b-4122f3dc8a8e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['text-embedding-ada-002614d7cf6-46f1-52fa-9d3a-740c39e7a20e',\n",
|
||||
" 'text-embedding-ada-0020fc1ede2-407a-5e14-8f8f-5642214263f5',\n",
|
||||
" 'text-embedding-ada-002e4ad20ef-dfaa-5916-9459-f90c6d8e8159',\n",
|
||||
" 'text-embedding-ada-002a5ef11e4-0474-5725-8d80-81c91943b37f',\n",
|
||||
" 'text-embedding-ada-00281426526-23fe-58be-9e84-6c7c72c8ca9a']"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"list(fs.yield_keys())[:5]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "564c9801-29f0-4452-aeac-527382e2c0e8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## In Memory\n",
|
||||
"\n",
|
||||
"This section shows how to set up an in memory cache for embeddings. This type of cache is primarily \n",
|
||||
"useful for unit tests or prototyping. Do **not** use this cache if you need to actually store the embeddings."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "13bd1c5b-b7ba-4394-957c-7d5b5a841972",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"store = InMemoryStore()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "9d99885f-99e1-498c-904d-6db539ac9466",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"underlying_embeddings = OpenAIEmbeddings()\n",
|
||||
"embedder = CacheBackedEmbeddings.from_bytes_store(\n",
|
||||
" underlying_embeddings, store, namespace=underlying_embeddings.model\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "682eb5d4-0b7a-4dac-b8fb-3de4ca6e421c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 10.9 ms, sys: 916 µs, total: 11.8 ms\n",
|
||||
"Wall time: 159 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"embeddings = embedder.embed_documents([\"hello\", \"goodbye\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "95233026-147f-49d1-bd87-e1e8b88ebdbc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The second time we try to embed the embedding time is only 2 ms because the embeddings are looked up in the cache."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "f819c3ff-a212-4d06-a5f7-5eb1435c1feb",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 1.67 ms, sys: 342 µs, total: 2.01 ms\n",
|
||||
"Wall time: 2.01 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"embeddings_from_cache = embedder.embed_documents([\"hello\", \"goodbye\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "ec38fb72-90a9-4687-a483-c62c87d1f4dd",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"embeddings == embeddings_from_cache"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f6cbe100-8587-4830-b207-fb8b524a9854",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## File system\n",
|
||||
"\n",
|
||||
"This section covers how to use a file system store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "a0070271-0809-4528-97e0-2a88216846f3",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fs = LocalFileStore(\"./test_cache/\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "0b20e9fe-f57f-4d7c-9f81-105c5f8726f4",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embedder2 = CacheBackedEmbeddings.from_bytes_store(\n",
|
||||
" underlying_embeddings, fs, namespace=underlying_embeddings.model\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "630515fd-bf5c-4d9c-a404-9705308f3a2c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 6.89 ms, sys: 4.89 ms, total: 11.8 ms\n",
|
||||
"Wall time: 184 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"embeddings = embedder2.embed_documents([\"hello\", \"goodbye\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"id": "30e6bb87-42c9-4d08-88ac-0d22c9c449a1",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 0 ns, sys: 3.24 ms, total: 3.24 ms\n",
|
||||
"Wall time: 2.84 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"embeddings = embedder2.embed_documents([\"hello\", \"goodbye\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "12ed5a45-8352-4e0f-8583-5537397f53c0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here are the embeddings that have been persisted to the directory `./test_cache`. \n",
|
||||
"\n",
|
||||
"Notice that the embedder takes a namespace parameter."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "658e2914-05e9-44a3-a8fe-3fe17ca84039",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['text-embedding-ada-002e885db5b-c0bd-5fbc-88b1-4d1da6020aa5',\n",
|
||||
" 'text-embedding-ada-0026ba52e44-59c9-5cc9-a084-284061b13c80']"
|
||||
]
|
||||
},
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"list(fs.yield_keys())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cd5f5a96-6ffa-429d-aa82-00b3f6532871",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Redis Store\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"id": "4879c134-141f-48a0-acfe-7d6f30253af0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.storage import RedisStore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "8b2bb9a0-6549-4487-8532-29ab4ab7336f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# For cache isolation can use a separate DB\n",
|
||||
"# Or additional namepace\n",
|
||||
"store = RedisStore(redis_url=\"redis://localhost:6379\", client_kwargs={'db': 2}, namespace='embedding_caches')\n",
|
||||
"\n",
|
||||
"underlying_embeddings = OpenAIEmbeddings()\n",
|
||||
"embedder = CacheBackedEmbeddings.from_bytes_store(\n",
|
||||
" underlying_embeddings, store, namespace=underlying_embeddings.model\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "eca3cb99-2bb3-49d5-81f9-1dee03da4b8c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 3.99 ms, sys: 0 ns, total: 3.99 ms\n",
|
||||
"Wall time: 3.5 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"embeddings = embedder.embed_documents([\"hello\", \"goodbye\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"id": "317ba5d8-89f9-462c-b807-ad4ef26e518b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 2.47 ms, sys: 767 µs, total: 3.24 ms\n",
|
||||
"Wall time: 2.75 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"embeddings = embedder.embed_documents([\"hello\", \"goodbye\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "8a540317-5142-4491-9062-a097932b56e3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['text-embedding-ada-002e885db5b-c0bd-5fbc-88b1-4d1da6020aa5',\n",
|
||||
" 'text-embedding-ada-0026ba52e44-59c9-5cc9-a084-284061b13c80']"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"list(store.yield_keys())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "cd9b0d4a-f816-4dce-9dde-cde1ad9a65fb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[b'embedding_caches/text-embedding-ada-002e885db5b-c0bd-5fbc-88b1-4d1da6020aa5',\n",
|
||||
" b'embedding_caches/text-embedding-ada-0026ba52e44-59c9-5cc9-a084-284061b13c80']"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"list(store.client.scan_iter())"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,440 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "34883374",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Parent Document Retriever\n",
|
||||
"\n",
|
||||
"When splitting documents for retrieval, there are often conflicting desires:\n",
|
||||
"\n",
|
||||
"1. You may want to have small documents, so that their embeddings can most\n",
|
||||
" accurately reflect their meaning. If too long, then the embeddings can\n",
|
||||
" lose meaning.\n",
|
||||
"2. You want to have long enough documents that the context of each chunk is\n",
|
||||
" retained.\n",
|
||||
"\n",
|
||||
"The ParentDocumentRetriever strikes that balance by splitting and storing\n",
|
||||
"small chunks of data. During retrieval, it first fetches the small chunks\n",
|
||||
"but then looks up the parent ids for those chunks and returns those larger\n",
|
||||
"documents.\n",
|
||||
"\n",
|
||||
"Note that \"parent document\" refers to the document that a small chunk\n",
|
||||
"originated from. This can either be the whole raw document OR a larger\n",
|
||||
"chunk."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "8b6e74b2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.retrievers import ParentDocumentRetriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "1d17af96",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "604ff981",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loaders = [\n",
|
||||
" TextLoader('../../paul_graham_essay.txt'),\n",
|
||||
" TextLoader('../../state_of_the_union.txt'),\n",
|
||||
"]\n",
|
||||
"docs = []\n",
|
||||
"for l in loaders:\n",
|
||||
" docs.extend(l.load())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d3943f72",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Retrieving Full Documents\n",
|
||||
"\n",
|
||||
"In this mode, we want to retrieve the full documents. Therefor, we only specify a child splitter."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "1a8b2e5f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This text splitter is used to create the child documents\n",
|
||||
"\n",
|
||||
"child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)\n",
|
||||
"# The vectorstore to use to index the child chunks\n",
|
||||
"vectorstore = Chroma(\n",
|
||||
" collection_name=\"full_documents\",\n",
|
||||
" embedding_function=OpenAIEmbeddings()\n",
|
||||
")\n",
|
||||
"# The storage layer for the parent documents\n",
|
||||
"store = InMemoryStore()\n",
|
||||
"retriever = ParentDocumentRetriever(\n",
|
||||
" vectorstore=vectorstore, \n",
|
||||
" docstore=store, \n",
|
||||
" child_splitter=child_splitter,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "2b107935",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever.add_documents(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d05b97b7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This should yield two keys, because we added two documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "30e3812b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['05fe8d8a-bf60-4f87-b576-4351b23df266',\n",
|
||||
" '571cc9e5-9ef7-4f6c-b800-835c83a1858b']"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"list(store.yield_keys())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f895d62b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's now call the vectorstore search functionality - we should see that it returns small chunks (since we're storing the small chunks"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "b261c02c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sub_docs = vectorstore.similarity_search(\"justice breyer\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "5108222f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(sub_docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bda8ed5a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's now retrieve from the overall retriever. This should return large documents - since it returns the documents where the smaller chunks are located."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "419a91c4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retrieved_docs = retriever.get_relevant_documents(\"justice breyer\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "cf10d250",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"38539"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(retrieved_docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "14f813a5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Retrieving Larger Chunks\n",
|
||||
"\n",
|
||||
"Sometimes, the full documents can be too big to want to retrieve them as is. In that case, what we really want to do is to first split the raw documents into larger chunks, and then split it into smaller chunks. We then index the smaller chunks, but on retrieval we retrieve the larger chunks (but still not the full documents)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "b6f9a4f0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This text splitter is used to create the parent documents\n",
|
||||
"parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)\n",
|
||||
"# This text splitter is used to create the child documents\n",
|
||||
"# It should create documents smaller than the parent\n",
|
||||
"child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)\n",
|
||||
"# The vectorstore to use to index the child chunks\n",
|
||||
"vectorstore = Chroma(collection_name=\"split_parents\", embedding_function=OpenAIEmbeddings())\n",
|
||||
"# The storage layer for the parent documents\n",
|
||||
"store = InMemoryStore()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "19478ff3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = ParentDocumentRetriever(\n",
|
||||
" vectorstore=vectorstore, \n",
|
||||
" docstore=store, \n",
|
||||
" child_splitter=child_splitter,\n",
|
||||
" parent_splitter=parent_splitter,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "fe16e620",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever.add_documents(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "64ad3c8c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can see that there are much more than two documents now - these are the larger chunks"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "24d81886",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"66"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(list(store.yield_keys()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "baaef673",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's make sure the underlying vectorstore still retrieves the small chunks."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "b1c859de",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sub_docs = vectorstore.similarity_search(\"justice breyer\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "6fffa2eb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(sub_docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "3a3202df",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retrieved_docs = retriever.get_relevant_documents(\"justice breyer\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "684fdb2c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"1849"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(retrieved_docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "9f17f662",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
|
||||
"\n",
|
||||
"We cannot let this happen. \n",
|
||||
"\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. \n",
|
||||
"\n",
|
||||
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
|
||||
"\n",
|
||||
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
|
||||
"\n",
|
||||
"We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
|
||||
"\n",
|
||||
"We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
|
||||
"\n",
|
||||
"We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
|
||||
"\n",
|
||||
"We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(retrieved_docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "facfdacb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
423
docs/extras/use_cases/apis.ipynb
Normal file
@@ -0,0 +1,423 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a15e6a18",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Interacting with APIs\n",
|
||||
"\n",
|
||||
"[](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/apis.ipynb)\n",
|
||||
"\n",
|
||||
"## Use case \n",
|
||||
"\n",
|
||||
"Suppose you want an LLM to interact with external APIs.\n",
|
||||
"\n",
|
||||
"This can be very useful for retrieving context for the LLM to utilize.\n",
|
||||
"\n",
|
||||
"And, more generally, it allows us to interact with APIs using natural langugage! \n",
|
||||
" \n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"There are two primary ways to interface LLMs with external APIs:\n",
|
||||
" \n",
|
||||
"* `Functions`: For example, [OpenAI functions](https://platform.openai.com/docs/guides/gpt/function-calling) is one popular means of doing this.\n",
|
||||
"* `LLM-generated interface`: Use an LLM with access to API documentation to create an interface.\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "abbd82f0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Quickstart \n",
|
||||
"\n",
|
||||
"Many APIs already are compatible with OpenAI function calling.\n",
|
||||
"\n",
|
||||
"For example, [Klarna](https://www.klarna.com/international/press/klarna-brings-smoooth-shopping-to-chatgpt/) has a YAML file that describes its API and allows OpenAI to interact with it:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Other options include:\n",
|
||||
"\n",
|
||||
"* [Speak](https://api.speak.com/openapi.yaml) for translation\n",
|
||||
"* [XKCD](https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml) for comics\n",
|
||||
"\n",
|
||||
"We can supply the specification to `get_openapi_chain` directly in order to query the API with OpenAI functions:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5a218fcc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install langchain openai \n",
|
||||
"\n",
|
||||
"# Set env var OPENAI_API_KEY or load from a .env file:\n",
|
||||
"# import dotenv\n",
|
||||
"# dotenv.load_env()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "30b780e3",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'query': \"What are some options for a men's large blue button down shirt\",\n",
|
||||
" 'response': {'products': [{'name': 'Cubavera Four Pocket Guayabera Shirt',\n",
|
||||
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3202055522/Clothing/Cubavera-Four-Pocket-Guayabera-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
|
||||
" 'price': '$13.50',\n",
|
||||
" 'attributes': ['Material:Polyester,Cotton',\n",
|
||||
" 'Target Group:Man',\n",
|
||||
" 'Color:Red,White,Blue,Black',\n",
|
||||
" 'Properties:Pockets',\n",
|
||||
" 'Pattern:Solid Color',\n",
|
||||
" 'Size (Small-Large):S,XL,L,M,XXL']},\n",
|
||||
" {'name': 'Polo Ralph Lauren Plaid Short Sleeve Button-down Oxford Shirt',\n",
|
||||
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3207163438/Clothing/Polo-Ralph-Lauren-Plaid-Short-Sleeve-Button-down-Oxford-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
|
||||
" 'price': '$52.20',\n",
|
||||
" 'attributes': ['Material:Cotton',\n",
|
||||
" 'Target Group:Man',\n",
|
||||
" 'Color:Red,Blue,Multicolor',\n",
|
||||
" 'Size (Small-Large):S,XL,L,M,XXL']},\n",
|
||||
" {'name': 'Brixton Bowery Flannel Shirt',\n",
|
||||
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3202331096/Clothing/Brixton-Bowery-Flannel-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
|
||||
" 'price': '$27.48',\n",
|
||||
" 'attributes': ['Material:Cotton',\n",
|
||||
" 'Target Group:Man',\n",
|
||||
" 'Color:Gray,Blue,Black,Orange',\n",
|
||||
" 'Properties:Pockets',\n",
|
||||
" 'Pattern:Checkered',\n",
|
||||
" 'Size (Small-Large):XL,3XL,4XL,5XL,L,M,XXL']},\n",
|
||||
" {'name': 'Vineyard Vines Gingham On-The-Go brrr Classic Fit Shirt Crystal',\n",
|
||||
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3201938510/Clothing/Vineyard-Vines-Gingham-On-The-Go-brrr-Classic-Fit-Shirt-Crystal/?utm_source=openai&ref-site=openai_plugin',\n",
|
||||
" 'price': '$80.64',\n",
|
||||
" 'attributes': ['Material:Cotton',\n",
|
||||
" 'Target Group:Man',\n",
|
||||
" 'Color:Blue',\n",
|
||||
" 'Size (Small-Large):XL,XS,L,M']},\n",
|
||||
" {'name': \"Carhartt Men's Loose Fit Midweight Short Sleeve Plaid Shirt\",\n",
|
||||
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3201826024/Clothing/Carhartt-Men-s-Loose-Fit-Midweight-Short-Sleeve-Plaid-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
|
||||
" 'price': '$17.99',\n",
|
||||
" 'attributes': ['Material:Cotton',\n",
|
||||
" 'Target Group:Man',\n",
|
||||
" 'Color:Red,Brown,Blue,Green',\n",
|
||||
" 'Properties:Pockets',\n",
|
||||
" 'Pattern:Checkered',\n",
|
||||
" 'Size (Small-Large):S,XL,L,M']}]}}"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains.openai_functions.openapi import get_openapi_chain\n",
|
||||
"chain = get_openapi_chain(\"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\")\n",
|
||||
"chain(\"What are some options for a men's large blue button down shirt\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9162c91c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Functions \n",
|
||||
"\n",
|
||||
"We can unpack what is hapening when we use the funtions to calls external APIs.\n",
|
||||
"\n",
|
||||
"Let's look at the [LangSmith trace](https://smith.langchain.com/public/76a58b85-193f-4eb7-ba40-747f0d5dd56e/r):\n",
|
||||
"\n",
|
||||
"* See [here](https://github.com/langchain-ai/langchain/blob/7fc07ba5df99b9fa8bef837b0fafa220bc5c932c/libs/langchain/langchain/chains/openai_functions/openapi.py#L279C9-L279C19) that we call the OpenAI LLM with the provided API spec:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"* The prompt then tells the LLM to use the API spec wiith input question:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Use the provided API's to respond to this user query:\n",
|
||||
"What are some options for a men's large blue button down shirt\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"* The LLM returns the parameters for the function call `productsUsingGET`, which is [specified in the provided API spec](https://www.klarna.com/us/shopping/public/openai/v0/api-docs/):\n",
|
||||
"```\n",
|
||||
"function_call:\n",
|
||||
" name: productsUsingGET\n",
|
||||
" arguments: |-\n",
|
||||
" {\n",
|
||||
" \"params\": {\n",
|
||||
" \"countryCode\": \"US\",\n",
|
||||
" \"q\": \"men's large blue button down shirt\",\n",
|
||||
" \"size\": 5,\n",
|
||||
" \"min_price\": 0,\n",
|
||||
" \"max_price\": 100\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ```\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* This `Dict` above split and the [API is called here](https://github.com/langchain-ai/langchain/blob/7fc07ba5df99b9fa8bef837b0fafa220bc5c932c/libs/langchain/langchain/chains/openai_functions/openapi.py#L215)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1fe49a0d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API Chain \n",
|
||||
"\n",
|
||||
"We can also build our own interface to external APIs using the `APIChain` and provided API documentation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "4ef0c3d0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new APIChain chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mhttps://api.open-meteo.com/v1/forecast?latitude=48.1351&longitude=11.5820&hourly=temperature_2m&temperature_unit=fahrenheit¤t_weather=true\u001b[0m\n",
|
||||
"\u001b[33;1m\u001b[1;3m{\"latitude\":48.14,\"longitude\":11.58,\"generationtime_ms\":1.0769367218017578,\"utc_offset_seconds\":0,\"timezone\":\"GMT\",\"timezone_abbreviation\":\"GMT\",\"elevation\":521.0,\"current_weather\":{\"temperature\":52.9,\"windspeed\":12.6,\"winddirection\":239.0,\"weathercode\":3,\"is_day\":0,\"time\":\"2023-08-07T22:00\"},\"hourly_units\":{\"time\":\"iso8601\",\"temperature_2m\":\"°F\"},\"hourly\":{\"time\":[\"2023-08-07T00:00\",\"2023-08-07T01:00\",\"2023-08-07T02:00\",\"2023-08-07T03:00\",\"2023-08-07T04:00\",\"2023-08-07T05:00\",\"2023-08-07T06:00\",\"2023-08-07T07:00\",\"2023-08-07T08:00\",\"2023-08-07T09:00\",\"2023-08-07T10:00\",\"2023-08-07T11:00\",\"2023-08-07T12:00\",\"2023-08-07T13:00\",\"2023-08-07T14:00\",\"2023-08-07T15:00\",\"2023-08-07T16:00\",\"2023-08-07T17:00\",\"2023-08-07T18:00\",\"2023-08-07T19:00\",\"2023-08-07T20:00\",\"2023-08-07T21:00\",\"2023-08-07T22:00\",\"2023-08-07T23:00\",\"2023-08-08T00:00\",\"2023-08-08T01:00\",\"2023-08-08T02:00\",\"2023-08-08T03:00\",\"2023-08-08T04:00\",\"2023-08-08T05:00\",\"2023-08-08T06:00\",\"2023-08-08T07:00\",\"2023-08-08T08:00\",\"2023-08-08T09:00\",\"2023-08-08T10:00\",\"2023-08-08T11:00\",\"2023-08-08T12:00\",\"2023-08-08T13:00\",\"2023-08-08T14:00\",\"2023-08-08T15:00\",\"2023-08-08T16:00\",\"2023-08-08T17:00\",\"2023-08-08T18:00\",\"2023-08-08T19:00\",\"2023-08-08T20:00\",\"2023-08-08T21:00\",\"2023-08-08T22:00\",\"2023-08-08T23:00\",\"2023-08-09T00:00\",\"2023-08-09T01:00\",\"2023-08-09T02:00\",\"2023-08-09T03:00\",\"2023-08-09T04:00\",\"2023-08-09T05:00\",\"2023-08-09T06:00\",\"2023-08-09T07:00\",\"2023-08-09T08:00\",\"2023-08-09T09:00\",\"2023-08-09T10:00\",\"2023-08-09T11:00\",\"2023-08-09T12:00\",\"2023-08-09T13:00\",\"2023-08-09T14:00\",\"2023-08-09T15:00\",\"2023-08-09T16:00\",\"2023-08-09T17:00\",\"2023-08-09T18:00\",\"2023-08-09T19:00\",\"2023-08-09T20:00\",\"2023-08-09T21:00\",\"2023-08-09T22:00\",\"2023-08-09T23:00\",\"2023-08-10T00:00\",\"2023-08-10T01:00\",\"2023-08-10T02:00\",\"2023-08-10T03:00\",\"2023-08-10T04:00\",\"2023-08-10T05:00\",\"2023-08-10T06:00\",\"2023-08-10T07:00\",\"2023-08-10T08:00\",\"2023-08-10T09:00\",\"2023-08-10T10:00\",\"2023-08-10T11:00\",\"2023-08-10T12:00\",\"2023-08-10T13:00\",\"2023-08-10T14:00\",\"2023-08-10T15:00\",\"2023-08-10T16:00\",\"2023-08-10T17:00\",\"2023-08-10T18:00\",\"2023-08-10T19:00\",\"2023-08-10T20:00\",\"2023-08-10T21:00\",\"2023-08-10T22:00\",\"2023-08-10T23:00\",\"2023-08-11T00:00\",\"2023-08-11T01:00\",\"2023-08-11T02:00\",\"2023-08-11T03:00\",\"2023-08-11T04:00\",\"2023-08-11T05:00\",\"2023-08-11T06:00\",\"2023-08-11T07:00\",\"2023-08-11T08:00\",\"2023-08-11T09:00\",\"2023-08-11T10:00\",\"2023-08-11T11:00\",\"2023-08-11T12:00\",\"2023-08-11T13:00\",\"2023-08-11T14:00\",\"2023-08-11T15:00\",\"2023-08-11T16:00\",\"2023-08-11T17:00\",\"2023-08-11T18:00\",\"2023-08-11T19:00\",\"2023-08-11T20:00\",\"2023-08-11T21:00\",\"2023-08-11T22:00\",\"2023-08-11T23:00\",\"2023-08-12T00:00\",\"2023-08-12T01:00\",\"2023-08-12T02:00\",\"2023-08-12T03:00\",\"2023-08-12T04:00\",\"2023-08-12T05:00\",\"2023-08-12T06:00\",\"2023-08-12T07:00\",\"2023-08-12T08:00\",\"2023-08-12T09:00\",\"2023-08-12T10:00\",\"2023-08-12T11:00\",\"2023-08-12T12:00\",\"2023-08-12T13:00\",\"2023-08-12T14:00\",\"2023-08-12T15:00\",\"2023-08-12T16:00\",\"2023-08-12T17:00\",\"2023-08-12T18:00\",\"2023-08-12T19:00\",\"2023-08-12T20:00\",\"2023-08-12T21:00\",\"2023-08-12T22:00\",\"2023-08-12T23:00\",\"2023-08-13T00:00\",\"2023-08-13T01:00\",\"2023-08-13T02:00\",\"2023-08-13T03:00\",\"2023-08-13T04:00\",\"2023-08-13T05:00\",\"2023-08-13T06:00\",\"2023-08-13T07:00\",\"2023-08-13T08:00\",\"2023-08-13T09:00\",\"2023-08-13T10:00\",\"2023-08-13T11:00\",\"2023-08-13T12:00\",\"2023-08-13T13:00\",\"2023-08-13T14:00\",\"2023-08-13T15:00\",\"2023-08-13T16:00\",\"2023-08-13T17:00\",\"2023-08-13T18:00\",\"2023-08-13T19:00\",\"2023-08-13T20:00\",\"2023-08-13T21:00\",\"2023-08-13T22:00\",\"2023-08-13T23:00\"],\"temperature_2m\":[53.0,51.2,50.9,50.4,50.7,51.3,51.7,52.9,54.3,56.1,57.4,59.3,59.1,60.7,59.7,58.8,58.8,57.8,56.6,55.3,53.9,52.7,52.9,53.2,52.0,51.8,51.3,50.7,50.8,51.5,53.9,57.7,61.2,63.2,64.7,66.6,67.5,67.0,68.7,68.7,67.9,66.2,64.4,61.4,59.8,58.9,57.9,56.3,55.7,55.3,55.5,55.4,55.7,56.5,57.6,58.8,59.7,59.1,58.9,60.6,59.9,59.8,59.9,61.7,63.2,63.6,62.3,58.9,57.3,57.1,57.0,56.5,56.2,56.0,55.3,54.7,54.4,55.2,57.8,60.7,63.0,65.3,66.9,68.2,70.1,72.1,72.6,71.4,69.7,68.6,66.2,63.6,61.8,60.6,59.6,58.9,58.0,57.1,56.3,56.2,56.7,57.9,59.9,63.7,68.4,72.4,75.0,76.8,78.0,78.7,78.9,78.4,76.9,74.8,72.5,70.1,67.6,65.6,64.4,63.9,63.4,62.7,62.2,62.1,62.5,63.4,65.1,68.0,71.7,74.8,76.8,78.2,79.1,79.6,79.7,79.2,77.6,75.3,73.7,68.6,66.8,65.3,64.2,63.4,62.6,61.7,60.9,60.6,60.9,61.6,63.2,65.9,69.3,72.2,74.4,76.2,77.6,78.8,79.6,79.6,78.4,76.4,74.3,72.3,70.4,68.7,67.6,66.8]}}\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The current temperature in Munich, Germany is 52.9°F.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains import APIChain\n",
|
||||
"from langchain.chains.api import open_meteo_docs\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"chain = APIChain.from_llm_and_api_docs(llm, open_meteo_docs.OPEN_METEO_DOCS, verbose=True)\n",
|
||||
"chain.run('What is the weather like right now in Munich, Germany in degrees Fahrenheit?')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5b179318",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that we supply information about the API:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 37,
|
||||
"id": "a9e03cc2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'BASE URL: https://api.open-meteo.com/\\n\\nAPI Documentation\\nThe API endpoint /v1/forecast accepts a geographical coordinate, a list of weather variables and responds with a JSON hourly weather forecast for 7 days. Time always starts at 0:00 today and contains 168 hours. All URL parameters are listed below:\\n\\nParameter\\tFormat\\tRequired\\tDefault\\tDescription\\nlatitude, longitude\\tFloating point\\tYes\\t\\tGeographical WGS84 coordinate of the location\\nhourly\\tString array\\tNo\\t\\tA list of weather variables which shou'"
|
||||
]
|
||||
},
|
||||
"execution_count": 37,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"open_meteo_docs.OPEN_METEO_DOCS[0:500]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3fab7930",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Under the hood, we do two things:\n",
|
||||
" \n",
|
||||
"* `api_request_chain`: Generate an API URL based on the input question and the api_docs\n",
|
||||
"* `api_answer_chain`: generate a final answer based on the API response\n",
|
||||
"\n",
|
||||
"We can look at the [LangSmith trace](https://smith.langchain.com/public/1e0d18ca-0d76-444c-97df-a939a6a815a7/r) to inspect this:\n",
|
||||
"\n",
|
||||
"* The `api_request_chain` produces the API url from our question and the API documentation:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"* [Here](https://github.com/langchain-ai/langchain/blob/bbd22b9b761389a5e40fc45b0570e1830aabb707/libs/langchain/langchain/chains/api/base.py#L82) we make the API request with the API url.\n",
|
||||
"* The `api_answer_chain` takes the response from the API and provides us with a natural langugae response:\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2511f446",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Going deeper\n",
|
||||
"\n",
|
||||
"**Test with other APIs**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1e1cf418",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"os.environ['TMDB_BEARER_TOKEN'] = \"\"\n",
|
||||
"from langchain.chains.api import tmdb_docs\n",
|
||||
"headers = {\"Authorization\": f\"Bearer {os.environ['TMDB_BEARER_TOKEN']}\"}\n",
|
||||
"chain = APIChain.from_llm_and_api_docs(llm, tmdb_docs.TMDB_DOCS, headers=headers, verbose=True)\n",
|
||||
"chain.run(\"Search for 'Avatar'\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dd80a717",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains.api import podcast_docs\n",
|
||||
"from langchain.chains import APIChain\n",
|
||||
" \n",
|
||||
"listen_api_key = 'xxx' # Get api key here: https://www.listennotes.com/api/pricing/\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"headers = {\"X-ListenAPI-Key\": listen_api_key}\n",
|
||||
"chain = APIChain.from_llm_and_api_docs(llm, podcast_docs.PODCAST_DOCS, headers=headers, verbose=True)\n",
|
||||
"chain.run(\"Search for 'silicon valley bank' podcast episodes, audio length is more than 30 minutes, return only 1 results\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a5939be5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Web requests**\n",
|
||||
"\n",
|
||||
"URL requets are such a common use-case that we have the `LLMRequestsChain`, which makes a HTTP GET request. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 39,
|
||||
"id": "0b158296",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.chains import LLMRequestsChain, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 40,
|
||||
"id": "d49c33e4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"template = \"\"\"Between >>> and <<< are the raw search result text from google.\n",
|
||||
"Extract the answer to the question '{query}' or say \"not found\" if the information is not contained.\n",
|
||||
"Use the format\n",
|
||||
"Extracted:<answer or \"not found\">\n",
|
||||
">>> {requests_result} <<<\n",
|
||||
"Extracted:\"\"\"\n",
|
||||
"\n",
|
||||
"PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"query\", \"requests_result\"],\n",
|
||||
" template=template,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 43,
|
||||
"id": "d0fd4aab",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'query': 'What are the Three (3) biggest countries, and their respective sizes?',\n",
|
||||
" 'url': 'https://www.google.com/search?q=What+are+the+Three+(3)+biggest+countries,+and+their+respective+sizes?',\n",
|
||||
" 'output': ' Russia (17,098,242 km²), Canada (9,984,670 km²), China (9,706,961 km²)'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 43,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain = LLMRequestsChain(llm_chain=LLMChain(llm=OpenAI(temperature=0), prompt=PROMPT))\n",
|
||||
"question = \"What are the Three (3) biggest countries, and their respective sizes?\"\n",
|
||||
"inputs = {\n",
|
||||
" \"query\": question,\n",
|
||||
" \"url\": \"https://www.google.com/search?q=\" + question.replace(\" \", \"+\"),\n",
|
||||
"}\n",
|
||||
"chain(inputs)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,24 +0,0 @@
|
||||
---
|
||||
sidebar_position: 3
|
||||
---
|
||||
|
||||
# Interacting with APIs
|
||||
|
||||
Lots of data and information is stored behind APIs.
|
||||
This page covers all resources available in LangChain for working with APIs.
|
||||
|
||||
## Chains
|
||||
|
||||
If you are just getting started, and you have relatively simple apis, you should get started with chains.
|
||||
Chains are a sequence of predetermined steps, so they are good to get started with as they give you more control and let you
|
||||
understand what is happening better.
|
||||
|
||||
- [API Chain](/docs/use_cases/apis/api.html)
|
||||
|
||||
## Agents
|
||||
|
||||
Agents are more complex, and involve multiple queries to the LLM to understand what to do.
|
||||
The downside of agents are that you have less control. The upside is that they are more powerful,
|
||||
which allows you to use them on larger and more complex schemas.
|
||||
|
||||
- [OpenAPI Agent](/docs/integrations/toolkits/openapi.html)
|
||||
@@ -1,123 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dd7ec7af",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# HTTP request chain\n",
|
||||
"\n",
|
||||
"Using the request library to get HTML results from a URL and then an LLM to parse results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "dd8eae75",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains import LLMRequestsChain, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "65bf324e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"template = \"\"\"Between >>> and <<< are the raw search result text from google.\n",
|
||||
"Extract the answer to the question '{query}' or say \"not found\" if the information is not contained.\n",
|
||||
"Use the format\n",
|
||||
"Extracted:<answer or \"not found\">\n",
|
||||
">>> {requests_result} <<<\n",
|
||||
"Extracted:\"\"\"\n",
|
||||
"\n",
|
||||
"PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"query\", \"requests_result\"],\n",
|
||||
" template=template,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "f36ae0d8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = LLMRequestsChain(llm_chain=LLMChain(llm=OpenAI(temperature=0), prompt=PROMPT))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "b5d22d9d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = \"What are the Three (3) biggest countries, and their respective sizes?\"\n",
|
||||
"inputs = {\n",
|
||||
" \"query\": question,\n",
|
||||
" \"url\": \"https://www.google.com/search?q=\" + question.replace(\" \", \"+\"),\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "2ea81168",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'query': 'What are the Three (3) biggest countries, and their respective sizes?',\n",
|
||||
" 'url': 'https://www.google.com/search?q=What+are+the+Three+(3)+biggest+countries,+and+their+respective+sizes?',\n",
|
||||
" 'output': ' Russia (17,098,242 km²), Canada (9,984,670 km²), United States (9,826,675 km²)'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain(inputs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "db8f2b6d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,583 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9fcaa37f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAPI chain\n",
|
||||
"\n",
|
||||
"This notebook shows an example of using an OpenAPI chain to call an endpoint in natural language, and get back a response in natural language."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "efa6909f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.tools import OpenAPISpec, APIOperation\n",
|
||||
"from langchain.chains import OpenAPIEndpointChain\n",
|
||||
"from langchain.requests import Requests\n",
|
||||
"from langchain.llms import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "71e38c6c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load the spec\n",
|
||||
"\n",
|
||||
"Load a wrapper of the spec (so we can work with it more easily). You can load from a url or from a local file."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "0831271b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"spec = OpenAPISpec.from_url(\n",
|
||||
" \"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "189dd506",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Alternative loading from file\n",
|
||||
"# spec = OpenAPISpec.from_file(\"openai_openapi.yaml\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f7093582",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Select the Operation\n",
|
||||
"\n",
|
||||
"In order to provide a focused on modular chain, we create a chain specifically only for one of the endpoints. Here we get an API operation from a specified endpoint and method."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "157494b9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"operation = APIOperation.from_openapi_spec(spec, \"/public/openai/v0/products\", \"get\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3ab1c5c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Construct the chain\n",
|
||||
"\n",
|
||||
"We can now construct a chain to interact with it. In order to construct such a chain, we will pass in:\n",
|
||||
"\n",
|
||||
"1. The operation endpoint\n",
|
||||
"2. A requests wrapper (can be used to handle authentication, etc)\n",
|
||||
"3. The LLM to use to interact with it"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "788a7cef",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI() # Load a Language Model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "c5f27406",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = OpenAPIEndpointChain.from_api_operation(\n",
|
||||
" operation,\n",
|
||||
" llm,\n",
|
||||
" requests=Requests(),\n",
|
||||
" verbose=True,\n",
|
||||
" return_intermediate_steps=True, # Return request and response text\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "23652053",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new OpenAPIEndpointChain chain...\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new APIRequesterChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a helpful AI Assistant. Please provide JSON arguments to agentFunc() based on the user's instructions.\n",
|
||||
"\n",
|
||||
"API_SCHEMA: ```typescript\n",
|
||||
"/* API for fetching Klarna product information */\n",
|
||||
"type productsUsingGET = (_: {\n",
|
||||
"/* A precise query that matches one very small category or product that needs to be searched for to find the products the user is looking for. If the user explicitly stated what they want, use that as a query. The query is as specific as possible to the product name or category mentioned by the user in its singular form, and don't contain any clarifiers like latest, newest, cheapest, budget, premium, expensive or similar. The query is always taken from the latest topic, if there is a new topic a new query is started. */\n",
|
||||
"\t\tq: string,\n",
|
||||
"/* number of products returned */\n",
|
||||
"\t\tsize?: number,\n",
|
||||
"/* (Optional) Minimum price in local currency for the product searched for. Either explicitly stated by the user or implicitly inferred from a combination of the user's request and the kind of product searched for. */\n",
|
||||
"\t\tmin_price?: number,\n",
|
||||
"/* (Optional) Maximum price in local currency for the product searched for. Either explicitly stated by the user or implicitly inferred from a combination of the user's request and the kind of product searched for. */\n",
|
||||
"\t\tmax_price?: number,\n",
|
||||
"}) => any;\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"USER_INSTRUCTIONS: \"whats the most expensive shirt?\"\n",
|
||||
"\n",
|
||||
"Your arguments must be plain json provided in a markdown block:\n",
|
||||
"\n",
|
||||
"ARGS: ```json\n",
|
||||
"{valid json conforming to API_SCHEMA}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Example\n",
|
||||
"-----\n",
|
||||
"\n",
|
||||
"ARGS: ```json\n",
|
||||
"{\"foo\": \"bar\", \"baz\": {\"qux\": \"quux\"}}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"The block must be no more than 1 line long, and all arguments must be valid JSON. All string arguments must be wrapped in double quotes.\n",
|
||||
"You MUST strictly comply to the types indicated by the provided schema, including all required args.\n",
|
||||
"\n",
|
||||
"If you don't have sufficient information to call the function due to things like requiring specific uuid's, you can reply with the following message:\n",
|
||||
"\n",
|
||||
"Message: ```text\n",
|
||||
"Concise response requesting the additional information that would make calling the function successful.\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Begin\n",
|
||||
"-----\n",
|
||||
"ARGS:\n",
|
||||
"\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m{\"q\": \"shirt\", \"size\": 1, \"max_price\": null}\u001b[0m\n",
|
||||
"\u001b[36;1m\u001b[1;3m{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]}]}\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new APIResponderChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a helpful AI assistant trained to answer user queries from API responses.\n",
|
||||
"You attempted to call an API, which resulted in:\n",
|
||||
"API_RESPONSE: {\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]}]}\n",
|
||||
"\n",
|
||||
"USER_COMMENT: \"whats the most expensive shirt?\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"If the API_RESPONSE can answer the USER_COMMENT respond with the following markdown json block:\n",
|
||||
"Response: ```json\n",
|
||||
"{\"response\": \"Human-understandable synthesis of the API_RESPONSE\"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Otherwise respond with the following markdown json block:\n",
|
||||
"Response Error: ```json\n",
|
||||
"{\"response\": \"What you did and a concise statement of the resulting error. If it can be easily fixed, provide a suggestion.\"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"You MUST respond as a markdown json code block. The person you are responding to CANNOT see the API_RESPONSE, so if there is any relevant information there you must include it in your response.\n",
|
||||
"\n",
|
||||
"Begin:\n",
|
||||
"---\n",
|
||||
"\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[33;1m\u001b[1;3mThe most expensive shirt in the API response is the Burberry Check Poplin Shirt, which costs $360.00.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"output = chain(\"whats the most expensive shirt?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "c000295e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'request_args': '{\"q\": \"shirt\", \"size\": 1, \"max_price\": null}',\n",
|
||||
" 'response_text': '{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]}]}'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# View intermediate steps\n",
|
||||
"output[\"intermediate_steps\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "092bdb4d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Return raw response\n",
|
||||
"\n",
|
||||
"We can also run this chain without synthesizing the response. This will have the effect of just returning the raw API output."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "4dff3849",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = OpenAPIEndpointChain.from_api_operation(\n",
|
||||
" operation,\n",
|
||||
" llm,\n",
|
||||
" requests=Requests(),\n",
|
||||
" verbose=True,\n",
|
||||
" return_intermediate_steps=True, # Return request and response text\n",
|
||||
" raw_response=True, # Return raw response\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "762499a9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new OpenAPIEndpointChain chain...\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new APIRequesterChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a helpful AI Assistant. Please provide JSON arguments to agentFunc() based on the user's instructions.\n",
|
||||
"\n",
|
||||
"API_SCHEMA: ```typescript\n",
|
||||
"/* API for fetching Klarna product information */\n",
|
||||
"type productsUsingGET = (_: {\n",
|
||||
"/* A precise query that matches one very small category or product that needs to be searched for to find the products the user is looking for. If the user explicitly stated what they want, use that as a query. The query is as specific as possible to the product name or category mentioned by the user in its singular form, and don't contain any clarifiers like latest, newest, cheapest, budget, premium, expensive or similar. The query is always taken from the latest topic, if there is a new topic a new query is started. */\n",
|
||||
"\t\tq: string,\n",
|
||||
"/* number of products returned */\n",
|
||||
"\t\tsize?: number,\n",
|
||||
"/* (Optional) Minimum price in local currency for the product searched for. Either explicitly stated by the user or implicitly inferred from a combination of the user's request and the kind of product searched for. */\n",
|
||||
"\t\tmin_price?: number,\n",
|
||||
"/* (Optional) Maximum price in local currency for the product searched for. Either explicitly stated by the user or implicitly inferred from a combination of the user's request and the kind of product searched for. */\n",
|
||||
"\t\tmax_price?: number,\n",
|
||||
"}) => any;\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"USER_INSTRUCTIONS: \"whats the most expensive shirt?\"\n",
|
||||
"\n",
|
||||
"Your arguments must be plain json provided in a markdown block:\n",
|
||||
"\n",
|
||||
"ARGS: ```json\n",
|
||||
"{valid json conforming to API_SCHEMA}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Example\n",
|
||||
"-----\n",
|
||||
"\n",
|
||||
"ARGS: ```json\n",
|
||||
"{\"foo\": \"bar\", \"baz\": {\"qux\": \"quux\"}}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"The block must be no more than 1 line long, and all arguments must be valid JSON. All string arguments must be wrapped in double quotes.\n",
|
||||
"You MUST strictly comply to the types indicated by the provided schema, including all required args.\n",
|
||||
"\n",
|
||||
"If you don't have sufficient information to call the function due to things like requiring specific uuid's, you can reply with the following message:\n",
|
||||
"\n",
|
||||
"Message: ```text\n",
|
||||
"Concise response requesting the additional information that would make calling the function successful.\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Begin\n",
|
||||
"-----\n",
|
||||
"ARGS:\n",
|
||||
"\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m{\"q\": \"shirt\", \"max_price\": null}\u001b[0m\n",
|
||||
"\u001b[36;1m\u001b[1;3m{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$229.02\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Stretch Cotton Twill Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3202342515/Clothing/Burberry-Vintage-Check-Stretch-Cotton-Twill-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$309.99\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Woman\",\"Color:Beige\",\"Properties:Stretch\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"output = chain(\"whats the most expensive shirt?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "4afc021a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'instructions': 'whats the most expensive shirt?',\n",
|
||||
" 'output': '{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$229.02\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Stretch Cotton Twill Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3202342515/Clothing/Burberry-Vintage-Check-Stretch-Cotton-Twill-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$309.99\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Woman\",\"Color:Beige\",\"Properties:Stretch\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}',\n",
|
||||
" 'intermediate_steps': {'request_args': '{\"q\": \"shirt\", \"max_price\": null}',\n",
|
||||
" 'response_text': '{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$229.02\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Stretch Cotton Twill Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3202342515/Clothing/Burberry-Vintage-Check-Stretch-Cotton-Twill-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$309.99\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Woman\",\"Color:Beige\",\"Properties:Stretch\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}'}}"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"output"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8d7924e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example POST message\n",
|
||||
"\n",
|
||||
"For this demo, we will interact with the speak API."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "c56b1a04",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
|
||||
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"spec = OpenAPISpec.from_url(\"https://api.speak.com/openapi.yaml\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "177d8275",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"operation = APIOperation.from_openapi_spec(\n",
|
||||
" spec, \"/v1/public/openai/explain-task\", \"post\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "835c5ddc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI()\n",
|
||||
"chain = OpenAPIEndpointChain.from_api_operation(\n",
|
||||
" operation, llm, requests=Requests(), verbose=True, return_intermediate_steps=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "59855d60",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new OpenAPIEndpointChain chain...\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new APIRequesterChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a helpful AI Assistant. Please provide JSON arguments to agentFunc() based on the user's instructions.\n",
|
||||
"\n",
|
||||
"API_SCHEMA: ```typescript\n",
|
||||
"type explainTask = (_: {\n",
|
||||
"/* Description of the task that the user wants to accomplish or do. For example, \"tell the waiter they messed up my order\" or \"compliment someone on their shirt\" */\n",
|
||||
" task_description?: string,\n",
|
||||
"/* The foreign language that the user is learning and asking about. The value can be inferred from question - for example, if the user asks \"how do i ask a girl out in mexico city\", the value should be \"Spanish\" because of Mexico City. Always use the full name of the language (e.g. Spanish, French). */\n",
|
||||
" learning_language?: string,\n",
|
||||
"/* The user's native language. Infer this value from the language the user asked their question in. Always use the full name of the language (e.g. Spanish, French). */\n",
|
||||
" native_language?: string,\n",
|
||||
"/* A description of any additional context in the user's question that could affect the explanation - e.g. setting, scenario, situation, tone, speaking style and formality, usage notes, or any other qualifiers. */\n",
|
||||
" additional_context?: string,\n",
|
||||
"/* Full text of the user's question. */\n",
|
||||
" full_query?: string,\n",
|
||||
"}) => any;\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"USER_INSTRUCTIONS: \"How would ask for more tea in Delhi?\"\n",
|
||||
"\n",
|
||||
"Your arguments must be plain json provided in a markdown block:\n",
|
||||
"\n",
|
||||
"ARGS: ```json\n",
|
||||
"{valid json conforming to API_SCHEMA}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Example\n",
|
||||
"-----\n",
|
||||
"\n",
|
||||
"ARGS: ```json\n",
|
||||
"{\"foo\": \"bar\", \"baz\": {\"qux\": \"quux\"}}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"The block must be no more than 1 line long, and all arguments must be valid JSON. All string arguments must be wrapped in double quotes.\n",
|
||||
"You MUST strictly comply to the types indicated by the provided schema, including all required args.\n",
|
||||
"\n",
|
||||
"If you don't have sufficient information to call the function due to things like requiring specific uuid's, you can reply with the following message:\n",
|
||||
"\n",
|
||||
"Message: ```text\n",
|
||||
"Concise response requesting the additional information that would make calling the function successful.\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Begin\n",
|
||||
"-----\n",
|
||||
"ARGS:\n",
|
||||
"\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m{\"task_description\": \"ask for more tea\", \"learning_language\": \"Hindi\", \"native_language\": \"English\", \"full_query\": \"How would I ask for more tea in Delhi?\"}\u001b[0m\n",
|
||||
"\u001b[36;1m\u001b[1;3m{\"explanation\":\"<what-to-say language=\\\"Hindi\\\" context=\\\"None\\\">\\nऔर चाय लाओ। (Aur chai lao.) \\n</what-to-say>\\n\\n<alternatives context=\\\"None\\\">\\n1. \\\"चाय थोड़ी ज्यादा मिल सकती है?\\\" *(Chai thodi zyada mil sakti hai? - Polite, asking if more tea is available)*\\n2. \\\"मुझे महसूस हो रहा है कि मुझे कुछ अन्य प्रकार की चाय पीनी चाहिए।\\\" *(Mujhe mehsoos ho raha hai ki mujhe kuch anya prakar ki chai peeni chahiye. - Formal, indicating a desire for a different type of tea)*\\n3. \\\"क्या मुझे or cup में milk/tea powder मिल सकता है?\\\" *(Kya mujhe aur cup mein milk/tea powder mil sakta hai? - Very informal/casual tone, asking for an extra serving of milk or tea powder)*\\n</alternatives>\\n\\n<usage-notes>\\nIn India and Indian culture, serving guests with food and beverages holds great importance in hospitality. You will find people always offering drinks like water or tea to their guests as soon as they arrive at their house or office.\\n</usage-notes>\\n\\n<example-convo language=\\\"Hindi\\\">\\n<context>At home during breakfast.</context>\\nPreeti: सर, क्या main aur cups chai lekar aaun? (Sir,kya main aur cups chai lekar aaun? - Sir, should I get more tea cups?)\\nRahul: हां,बिल्कुल। और चाय की मात्रा में भी थोड़ा सा इजाफा करना। (Haan,bilkul. Aur chai ki matra mein bhi thoda sa eejafa karna. - Yes, please. And add a little extra in the quantity of tea as well.)\\n</example-convo>\\n\\n*[Report an issue or leave feedback](https://speak.com/chatgpt?rid=d4mcapbkopo164pqpbk321oc})*\",\"extra_response_instructions\":\"Use all information in the API response and fully render all Markdown.\\nAlways end your response with a link to report an issue or leave feedback on the plugin.\"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new APIResponderChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a helpful AI assistant trained to answer user queries from API responses.\n",
|
||||
"You attempted to call an API, which resulted in:\n",
|
||||
"API_RESPONSE: {\"explanation\":\"<what-to-say language=\\\"Hindi\\\" context=\\\"None\\\">\\nऔर चाय लाओ। (Aur chai lao.) \\n</what-to-say>\\n\\n<alternatives context=\\\"None\\\">\\n1. \\\"चाय थोड़ी ज्यादा मिल सकती है?\\\" *(Chai thodi zyada mil sakti hai? - Polite, asking if more tea is available)*\\n2. \\\"मुझे महसूस हो रहा है कि मुझे कुछ अन्य प्रकार की चाय पीनी चाहिए।\\\" *(Mujhe mehsoos ho raha hai ki mujhe kuch anya prakar ki chai peeni chahiye. - Formal, indicating a desire for a different type of tea)*\\n3. \\\"क्या मुझे or cup में milk/tea powder मिल सकता है?\\\" *(Kya mujhe aur cup mein milk/tea powder mil sakta hai? - Very informal/casual tone, asking for an extra serving of milk or tea powder)*\\n</alternatives>\\n\\n<usage-notes>\\nIn India and Indian culture, serving guests with food and beverages holds great importance in hospitality. You will find people always offering drinks like water or tea to their guests as soon as they arrive at their house or office.\\n</usage-notes>\\n\\n<example-convo language=\\\"Hindi\\\">\\n<context>At home during breakfast.</context>\\nPreeti: सर, क्या main aur cups chai lekar aaun? (Sir,kya main aur cups chai lekar aaun? - Sir, should I get more tea cups?)\\nRahul: हां,बिल्कुल। और चाय की मात्रा में भी थोड़ा सा इजाफा करना। (Haan,bilkul. Aur chai ki matra mein bhi thoda sa eejafa karna. - Yes, please. And add a little extra in the quantity of tea as well.)\\n</example-convo>\\n\\n*[Report an issue or leave feedback](https://speak.com/chatgpt?rid=d4mcapbkopo164pqpbk321oc})*\",\"extra_response_instructions\":\"Use all information in the API response and fully render all Markdown.\\nAlways end your response with a link to report an issue or leave feedback on the plugin.\"}\n",
|
||||
"\n",
|
||||
"USER_COMMENT: \"How would ask for more tea in Delhi?\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"If the API_RESPONSE can answer the USER_COMMENT respond with the following markdown json block:\n",
|
||||
"Response: ```json\n",
|
||||
"{\"response\": \"Concise response to USER_COMMENT based on API_RESPONSE.\"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Otherwise respond with the following markdown json block:\n",
|
||||
"Response Error: ```json\n",
|
||||
"{\"response\": \"What you did and a concise statement of the resulting error. If it can be easily fixed, provide a suggestion.\"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"You MUST respond as a markdown json code block.\n",
|
||||
"\n",
|
||||
"Begin:\n",
|
||||
"---\n",
|
||||
"\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[33;1m\u001b[1;3mIn Delhi you can ask for more tea by saying 'Chai thodi zyada mil sakti hai?'\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"output = chain(\"How would ask for more tea in Delhi?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "91bddb18",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['{\"task_description\": \"ask for more tea\", \"learning_language\": \"Hindi\", \"native_language\": \"English\", \"full_query\": \"How would I ask for more tea in Delhi?\"}',\n",
|
||||
" '{\"explanation\":\"<what-to-say language=\\\\\"Hindi\\\\\" context=\\\\\"None\\\\\">\\\\nऔर चाय लाओ। (Aur chai lao.) \\\\n</what-to-say>\\\\n\\\\n<alternatives context=\\\\\"None\\\\\">\\\\n1. \\\\\"चाय थोड़ी ज्यादा मिल सकती है?\\\\\" *(Chai thodi zyada mil sakti hai? - Polite, asking if more tea is available)*\\\\n2. \\\\\"मुझे महसूस हो रहा है कि मुझे कुछ अन्य प्रकार की चाय पीनी चाहिए।\\\\\" *(Mujhe mehsoos ho raha hai ki mujhe kuch anya prakar ki chai peeni chahiye. - Formal, indicating a desire for a different type of tea)*\\\\n3. \\\\\"क्या मुझे or cup में milk/tea powder मिल सकता है?\\\\\" *(Kya mujhe aur cup mein milk/tea powder mil sakta hai? - Very informal/casual tone, asking for an extra serving of milk or tea powder)*\\\\n</alternatives>\\\\n\\\\n<usage-notes>\\\\nIn India and Indian culture, serving guests with food and beverages holds great importance in hospitality. You will find people always offering drinks like water or tea to their guests as soon as they arrive at their house or office.\\\\n</usage-notes>\\\\n\\\\n<example-convo language=\\\\\"Hindi\\\\\">\\\\n<context>At home during breakfast.</context>\\\\nPreeti: सर, क्या main aur cups chai lekar aaun? (Sir,kya main aur cups chai lekar aaun? - Sir, should I get more tea cups?)\\\\nRahul: हां,बिल्कुल। और चाय की मात्रा में भी थोड़ा सा इजाफा करना। (Haan,bilkul. Aur chai ki matra mein bhi thoda sa eejafa karna. - Yes, please. And add a little extra in the quantity of tea as well.)\\\\n</example-convo>\\\\n\\\\n*[Report an issue or leave feedback](https://speak.com/chatgpt?rid=d4mcapbkopo164pqpbk321oc})*\",\"extra_response_instructions\":\"Use all information in the API response and fully render all Markdown.\\\\nAlways end your response with a link to report an issue or leave feedback on the plugin.\"}']"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Show the API chain's intermediate steps\n",
|
||||
"output[\"intermediate_steps\"]"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,249 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e734b314",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAPI calls with OpenAI functions\n",
|
||||
"\n",
|
||||
"In this notebook we'll show how to create a chain that automatically makes calls to an API based only on an OpenAPI spec. Under the hood, we're parsing the OpenAPI spec into a JSON schema that the OpenAI functions API can handle. This allows ChatGPT to automatically select and populate the relevant API call to make for any user input. Using the output of ChatGPT we then make the actual API call, and return the result."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "555661b5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.openai_functions.openapi import get_openapi_chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a95f510a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query Klarna"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "08e19b64",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = get_openapi_chain(\n",
|
||||
" \"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "3959f866",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'products': [{'name': \"Tommy Hilfiger Men's Short Sleeve Button-Down Shirt\",\n",
|
||||
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3204878580/Clothing/Tommy-Hilfiger-Men-s-Short-Sleeve-Button-Down-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
|
||||
" 'price': '$26.78',\n",
|
||||
" 'attributes': ['Material:Linen,Cotton',\n",
|
||||
" 'Target Group:Man',\n",
|
||||
" 'Color:Gray,Pink,White,Blue,Beige,Black,Turquoise',\n",
|
||||
" 'Size:S,XL,M,XXL']},\n",
|
||||
" {'name': \"Van Heusen Men's Long Sleeve Button-Down Shirt\",\n",
|
||||
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3201809514/Clothing/Van-Heusen-Men-s-Long-Sleeve-Button-Down-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
|
||||
" 'price': '$18.89',\n",
|
||||
" 'attributes': ['Material:Cotton',\n",
|
||||
" 'Target Group:Man',\n",
|
||||
" 'Color:Red,Gray,White,Blue',\n",
|
||||
" 'Size:XL,XXL']},\n",
|
||||
" {'name': 'Brixton Bowery Flannel Shirt',\n",
|
||||
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3202331096/Clothing/Brixton-Bowery-Flannel-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
|
||||
" 'price': '$34.48',\n",
|
||||
" 'attributes': ['Material:Cotton',\n",
|
||||
" 'Target Group:Man',\n",
|
||||
" 'Color:Gray,Blue,Black,Orange',\n",
|
||||
" 'Size:XL,3XL,4XL,5XL,L,M,XXL']},\n",
|
||||
" {'name': 'Cubavera Four Pocket Guayabera Shirt',\n",
|
||||
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3202055522/Clothing/Cubavera-Four-Pocket-Guayabera-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
|
||||
" 'price': '$23.22',\n",
|
||||
" 'attributes': ['Material:Polyester,Cotton',\n",
|
||||
" 'Target Group:Man',\n",
|
||||
" 'Color:Red,White,Blue,Black',\n",
|
||||
" 'Size:S,XL,L,M,XXL']},\n",
|
||||
" {'name': 'Theory Sylvain Shirt - Eclipse',\n",
|
||||
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3202028254/Clothing/Theory-Sylvain-Shirt-Eclipse/?utm_source=openai&ref-site=openai_plugin',\n",
|
||||
" 'price': '$86.01',\n",
|
||||
" 'attributes': ['Material:Polyester,Cotton',\n",
|
||||
" 'Target Group:Man',\n",
|
||||
" 'Color:Blue',\n",
|
||||
" 'Size:S,XL,XS,L,M,XXL']}]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"What are some options for a men's large blue button down shirt\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6f648c77",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query a translation service\n",
|
||||
"\n",
|
||||
"Additionally, see the request payload by setting `verbose=True`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bf6cd695",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = get_openapi_chain(\"https://api.speak.com/openapi.yaml\", verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "1ba51609",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mHuman: Use the provided API's to respond to this user query:\n",
|
||||
"\n",
|
||||
"How would you say no thanks in Russian\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"Calling endpoint \u001b[32;1m\u001b[1;3mtranslate\u001b[0m with arguments:\n",
|
||||
"\u001b[32;1m\u001b[1;3m{\n",
|
||||
" \"json\": {\n",
|
||||
" \"phrase_to_translate\": \"no thanks\",\n",
|
||||
" \"learning_language\": \"russian\",\n",
|
||||
" \"native_language\": \"english\",\n",
|
||||
" \"additional_context\": \"\",\n",
|
||||
" \"full_query\": \"How would you say no thanks in Russian\"\n",
|
||||
" }\n",
|
||||
"}\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'explanation': '<translation language=\"Russian\">\\nНет, спасибо. (Net, spasibo)\\n</translation>\\n\\n<alternatives>\\n1. \"Нет, я в порядке\" *(Neutral/Formal - Can be used in professional settings or formal situations.)*\\n2. \"Нет, спасибо, я откажусь\" *(Formal - Can be used in polite settings, such as a fancy dinner with colleagues or acquaintances.)*\\n3. \"Не надо\" *(Informal - Can be used in informal situations, such as declining an offer from a friend.)*\\n</alternatives>\\n\\n<example-convo language=\"Russian\">\\n<context>Max is being offered a cigarette at a party.</context>\\n* Sasha: \"Хочешь покурить?\"\\n* Max: \"Нет, спасибо. Я бросил.\"\\n* Sasha: \"Окей, понятно.\"\\n</example-convo>\\n\\n*[Report an issue or leave feedback](https://speak.com/chatgpt?rid=noczaa460do8yqs8xjun6zdm})*',\n",
|
||||
" 'extra_response_instructions': 'Use all information in the API response and fully render all Markdown.\\nAlways end your response with a link to report an issue or leave feedback on the plugin.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"How would you say no thanks in Russian\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4923a291",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query XKCD"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a9198f62",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = get_openapi_chain(\n",
|
||||
" \"https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "3110c398",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'month': '6',\n",
|
||||
" 'num': 2793,\n",
|
||||
" 'link': '',\n",
|
||||
" 'year': '2023',\n",
|
||||
" 'news': '',\n",
|
||||
" 'safe_title': 'Garden Path Sentence',\n",
|
||||
" 'transcript': '',\n",
|
||||
" 'alt': 'Arboretum Owner Denied Standing in Garden Path Suit on Grounds Grounds Appealing Appealing',\n",
|
||||
" 'img': 'https://imgs.xkcd.com/comics/garden_path_sentence.png',\n",
|
||||
" 'title': 'Garden Path Sentence',\n",
|
||||
" 'day': '23'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"What's today's comic?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
354
docs/extras/use_cases/code_understanding.ipynb
Normal file
@@ -0,0 +1,354 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Code Understanding\n",
|
||||
"\n",
|
||||
"[](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/code_understanding.ipynb)\n",
|
||||
"\n",
|
||||
"## Use case\n",
|
||||
"\n",
|
||||
"Source code analysis is one of the most popular LLM applications (e.g., [GitHub Co-Pilot](https://github.com/features/copilot), [Code Interpreter](https://chat.openai.com/auth/login?next=%2F%3Fmodel%3Dgpt-4-code-interpreter), [Codium](https://www.codium.ai/), and [Codeium](https://codeium.com/about)) for use-cases such as:\n",
|
||||
"\n",
|
||||
"- Q&A over the code base to understand how it works\n",
|
||||
"- Using LLMs for suggesting refactors or improvements\n",
|
||||
"- Using LLMs for documenting the code\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"The pipeline for QA over code follows [the steps we do for document question answering](/docs/extras/use_cases/question_answering), with some differences:\n",
|
||||
"\n",
|
||||
"In particular, we can employ a [splitting strategy](https://python.langchain.com/docs/integrations/document_loaders/source_code) that does a few things:\n",
|
||||
"\n",
|
||||
"* Keeps each top-level function and class in the code is loaded into separate documents. \n",
|
||||
"* Puts remaining into a seperate document.\n",
|
||||
"* Retains metadata about where each split comes from\n",
|
||||
"\n",
|
||||
"## Quickstart"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install openai tiktoken chromadb langchain\n",
|
||||
"\n",
|
||||
"# Set env var OPENAI_API_KEY or load from a .env file\n",
|
||||
"# import dotenv\n",
|
||||
"\n",
|
||||
"# dotenv.load_env()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We'lll follow the structure of [this notebook](https://github.com/cristobalcl/LearningLangChain/blob/master/notebooks/04%20-%20QA%20with%20code.ipynb) and employ [context aware code splitting](https://python.langchain.com/docs/integrations/document_loaders/source_code)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Loading\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"We will upload all python project files using the `langchain.document_loaders.TextLoader`.\n",
|
||||
"\n",
|
||||
"The following script iterates over the files in the LangChain repository and loads every `.py` file (a.k.a. **documents**):"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from git import Repo\n",
|
||||
"from langchain.text_splitter import Language\n",
|
||||
"from langchain.document_loaders.generic import GenericLoader\n",
|
||||
"from langchain.document_loaders.parsers import LanguageParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Clone\n",
|
||||
"repo_path = \"/Users/rlm/Desktop/test_repo\"\n",
|
||||
"repo = Repo.clone_from(\"https://github.com/hwchase17/langchain\", to_path=repo_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We load the py code using [`LanguageParser`](https://python.langchain.com/docs/integrations/document_loaders/source_code), which will:\n",
|
||||
"\n",
|
||||
"* Keep top-level functions and classes together (into a single document)\n",
|
||||
"* Put remaining code into a seperate document\n",
|
||||
"* Retains metadata about where each split comes from"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"1293"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Load\n",
|
||||
"loader = GenericLoader.from_filesystem(\n",
|
||||
" repo_path+\"/libs/langchain/langchain\",\n",
|
||||
" glob=\"**/*\",\n",
|
||||
" suffixes=[\".py\"],\n",
|
||||
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=500)\n",
|
||||
")\n",
|
||||
"documents = loader.load()\n",
|
||||
"len(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Splittng\n",
|
||||
"\n",
|
||||
"Split the `Document` into chunks for embedding and vector storage.\n",
|
||||
"\n",
|
||||
"We can use `RecursiveCharacterTextSplitter` w/ `language` specified."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"3748"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"python_splitter = RecursiveCharacterTextSplitter.from_language(language=Language.PYTHON, \n",
|
||||
" chunk_size=2000, \n",
|
||||
" chunk_overlap=200)\n",
|
||||
"texts = python_splitter.split_documents(documents)\n",
|
||||
"len(texts)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### RetrievalQA\n",
|
||||
"\n",
|
||||
"We need to store the documents in a way we can semantically search for their content. \n",
|
||||
"\n",
|
||||
"The most common approach is to embed the contents of each document then store the embedding and document in a vector store. \n",
|
||||
"\n",
|
||||
"When setting up the vectorstore retriever:\n",
|
||||
"\n",
|
||||
"* We test [max marginal relevance](/docs/extras/use_cases/question_answering) for retrieval\n",
|
||||
"* And 8 documents returned\n",
|
||||
"\n",
|
||||
"#### Go deeper\n",
|
||||
"\n",
|
||||
"- Browse the > 40 vectorstores integrations [here](https://integrations.langchain.com/).\n",
|
||||
"- See further documentation on vectorstores [here](/docs/modules/data_connection/vectorstores/).\n",
|
||||
"- Browse the > 30 text embedding integrations [here](https://integrations.langchain.com/).\n",
|
||||
"- See further documentation on embedding models [here](/docs/modules/data_connection/text_embedding/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"db = Chroma.from_documents(texts, OpenAIEmbeddings(disallowed_special=()))\n",
|
||||
"retriever = db.as_retriever(\n",
|
||||
" search_type=\"mmr\", # Also test \"similarity\"\n",
|
||||
" search_kwargs={\"k\": 8},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Chat\n",
|
||||
"\n",
|
||||
"Test chat, just as we do for [chatbots](/docs/extras/use_cases/chatbots).\n",
|
||||
"\n",
|
||||
"#### Go deeper\n",
|
||||
"\n",
|
||||
"- Browse the > 55 LLM and chat model integrations [here](https://integrations.langchain.com/).\n",
|
||||
"- See further documentation on LLMs and chat models [here](/docs/modules/model_io/models/).\n",
|
||||
"- Use local LLMS: The popularity of [PrivateGPT](https://github.com/imartinez/privateGPT) and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.memory import ConversationSummaryMemory\n",
|
||||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||||
"llm = ChatOpenAI(model_name=\"gpt-4\") \n",
|
||||
"memory = ConversationSummaryMemory(llm=llm,memory_key=\"chat_history\",return_messages=True)\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'To load a source code as documents for a QA over code, you can use the `CodeLoader` class. This class allows you to load source code files and split them into classes and functions.\\n\\nHere is an example of how to use the `CodeLoader` class:\\n\\n```python\\nfrom langchain.document_loaders.code import CodeLoader\\n\\n# Specify the path to the source code file\\ncode_file_path = \"path/to/code/file.py\"\\n\\n# Create an instance of the CodeLoader class\\ncode_loader = CodeLoader(code_file_path)\\n\\n# Load the code as documents\\ndocuments = code_loader.load()\\n\\n# Iterate over the documents\\nfor document in documents:\\n # Access the class or function name\\n name = document.metadata[\"name\"]\\n \\n # Access the code content\\n code = document.page_content\\n \\n # Process the code as needed\\n # ...\\n```\\n\\nIn the example above, `code_file_path` should be replaced with the actual path to your source code file. The `load()` method of the `CodeLoader` class will return a list of `Document` objects, where each document represents a class or function in the source code. You can access the class or function name using the `metadata[\"name\"]` attribute, and the code content using the `page_content` attribute of each `Document` object.\\n\\nYou can then process the code as needed for your QA task.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 30,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"How can I load a source code as documents, for a QA over code, spliting the code in classes and functions?\"\n",
|
||||
"result = qa(question)\n",
|
||||
"result['answer']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"-> **Question**: What is the class hierarchy? \n",
|
||||
"\n",
|
||||
"**Answer**: The class hierarchy in object-oriented programming is the structure that forms when classes are derived from other classes. The derived class is a subclass of the base class also known as the superclass. This hierarchy is formed based on the concept of inheritance in object-oriented programming where a subclass inherits the properties and functionalities of the superclass. \n",
|
||||
"\n",
|
||||
"In the given context, we have the following examples of class hierarchies:\n",
|
||||
"\n",
|
||||
"1. `BaseCallbackHandler --> <name>CallbackHandler` means `BaseCallbackHandler` is a base class and `<name>CallbackHandler` (like `AimCallbackHandler`, `ArgillaCallbackHandler` etc.) are derived classes that inherit from `BaseCallbackHandler`.\n",
|
||||
"\n",
|
||||
"2. `BaseLoader --> <name>Loader` means `BaseLoader` is a base class and `<name>Loader` (like `TextLoader`, `UnstructuredFileLoader` etc.) are derived classes that inherit from `BaseLoader`.\n",
|
||||
"\n",
|
||||
"3. `ToolMetaclass --> BaseTool --> <name>Tool` means `ToolMetaclass` is a base class, `BaseTool` is a derived class that inherits from `ToolMetaclass`, and `<name>Tool` (like `AIPluginTool`, `BaseGraphQLTool` etc.) are further derived classes that inherit from `BaseTool`. \n",
|
||||
"\n",
|
||||
"-> **Question**: What classes are derived from the Chain class? \n",
|
||||
"\n",
|
||||
"**Answer**: The classes that are derived from the Chain class are:\n",
|
||||
"\n",
|
||||
"1. LLMSummarizationCheckerChain\n",
|
||||
"2. MapReduceChain\n",
|
||||
"3. OpenAIModerationChain\n",
|
||||
"4. NatBotChain\n",
|
||||
"5. QAGenerationChain\n",
|
||||
"6. QAWithSourcesChain\n",
|
||||
"7. RetrievalQAWithSourcesChain\n",
|
||||
"8. VectorDBQAWithSourcesChain\n",
|
||||
"9. RetrievalQA\n",
|
||||
"10. VectorDBQA\n",
|
||||
"11. LLMRouterChain\n",
|
||||
"12. MultiPromptChain\n",
|
||||
"13. MultiRetrievalQAChain\n",
|
||||
"14. MultiRouteChain\n",
|
||||
"15. RouterChain\n",
|
||||
"16. SequentialChain\n",
|
||||
"17. SimpleSequentialChain\n",
|
||||
"18. TransformChain\n",
|
||||
"19. BaseConversationalRetrievalChain\n",
|
||||
"20. ConstitutionalChain \n",
|
||||
"\n",
|
||||
"-> **Question**: What one improvement do you propose in code in relation to the class herarchy for the Chain class? \n",
|
||||
"\n",
|
||||
"**Answer**: As an AI model, I don't have personal opinions. However, one suggestion could be to improve the documentation of the Chain class hierarchy. The current comments and docstrings provide some details but it could be helpful to include more explicit explanations about the hierarchy, roles of each subclass, and their relationships with one another. Also, incorporating UML diagrams or other visuals could help developers better understand the structure and interactions of the classes. \n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"questions = [\n",
|
||||
" \"What is the class hierarchy?\",\n",
|
||||
" \"What classes are derived from the Chain class?\",\n",
|
||||
" \"What one improvement do you propose in code in relation to the class herarchy for the Chain class?\",\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"for question in questions:\n",
|
||||
" result = qa(question)\n",
|
||||
" print(f\"-> **Question**: {question} \\n\")\n",
|
||||
" print(f\"**Answer**: {result['answer']} \\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The can look at the [LangSmith trace](https://smith.langchain.com/public/2b23045f-4e49-4d2d-8980-dec85259af36/r) to see what is happening under the hood:\n",
|
||||
"\n",
|
||||
"* In particular, the code well structured and kept together in the retrival output\n",
|
||||
"* The retrieved code and chat history are passed to the LLM for answer distillation\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -18,8 +18,8 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"host = \"<neptune-host>\"\n",
|
||||
"port = 80\n",
|
||||
"use_https = False\n",
|
||||
"port = 8182\n",
|
||||
"use_https = True\n",
|
||||
"\n",
|
||||
"graph = NeptuneGraph(host=host, port=port, use_https=use_https)"
|
||||
]
|
||||
|
||||
281
docs/extras/use_cases/self_check/smart_llm.ipynb
Normal file
@@ -0,0 +1,281 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9e9b7651",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to use a SmartLLMChain\n",
|
||||
"\n",
|
||||
"A SmartLLMChain is a form of self-critique chain that can help you if have particularly complex questions to answer. Instead of doing a single LLM pass, it instead performs these 3 steps:\n",
|
||||
"1. Ideation: Pass the user prompt n times through the LLM to get n output proposals (called \"ideas\"), where n is a parameter you can set \n",
|
||||
"2. Critique: The LLM critiques all ideas to find possible flaws and picks the best one \n",
|
||||
"3. Resolve: The LLM tries to improve upon the best idea (as chosen in the critique step) and outputs it. This is then the final output.\n",
|
||||
"\n",
|
||||
"SmartLLMChains are based on the SmartGPT workflow proposed in https://youtu.be/wVzuvf9D9BU.\n",
|
||||
"\n",
|
||||
"Note that SmartLLMChains\n",
|
||||
"- use more LLM passes (ie n+2 instead of just 1)\n",
|
||||
"- only work then the underlying LLM has the capability for reflection, whicher smaller models often don't\n",
|
||||
"- only work with underlying models that return exactly 1 output, not multiple\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use a SmartLLMChain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "714dede0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"##### Same LLM for all steps"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "d3f7fb22",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"...\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "10e5ece6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain_experimental.smart_llm import SmartLLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1780da51",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As example question, we will use \"I have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it?\". This is an example from the original SmartGPT video (https://youtu.be/wVzuvf9D9BU?t=384). While this seems like a very easy question, LLMs struggle do these kinds of questions that involve numbers and physical reasoning.\n",
|
||||
"\n",
|
||||
"As we will see, all 3 initial ideas are completely wrong - even though we're using GPT4! Only when using self-reflection do we get a correct answer. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "054af6b1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"hard_question = \"I have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it?\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "8049cecd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So, we first create an LLM and prompt template"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "811ea8e1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"prompt = PromptTemplate.from_template(hard_question)\n",
|
||||
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "50b602e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can create a SmartLLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "8cd49199",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = SmartLLMChain(llm=llm, prompt=prompt, n_ideas=3, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6a72f276",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can use the SmartLLM as a drop-in replacement for our LLM. E.g.:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "074e5e75",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new SmartLLMChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mI have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it?\u001b[0m\n",
|
||||
"Idea 1:\n",
|
||||
"\u001b[36;1m\u001b[1;3m1. Fill the 6-liter jug completely.\n",
|
||||
"2. Pour the water from the 6-liter jug into the 12-liter jug.\n",
|
||||
"3. Fill the 6-liter jug again.\n",
|
||||
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full.\n",
|
||||
"5. The amount of water left in the 6-liter jug will be exactly 6 liters.\u001b[0m\n",
|
||||
"Idea 2:\n",
|
||||
"\u001b[36;1m\u001b[1;3m1. Fill the 6-liter jug completely.\n",
|
||||
"2. Pour the water from the 6-liter jug into the 12-liter jug.\n",
|
||||
"3. Fill the 6-liter jug again.\n",
|
||||
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full.\n",
|
||||
"5. Since the 12-liter jug is now full, there will be 2 liters of water left in the 6-liter jug.\n",
|
||||
"6. Empty the 12-liter jug.\n",
|
||||
"7. Pour the 2 liters of water from the 6-liter jug into the 12-liter jug.\n",
|
||||
"8. Fill the 6-liter jug completely again.\n",
|
||||
"9. Pour the water from the 6-liter jug into the 12-liter jug, which already has 2 liters in it.\n",
|
||||
"10. Now, the 12-liter jug will have exactly 6 liters of water (2 liters from before + 4 liters from the 6-liter jug).\u001b[0m\n",
|
||||
"Idea 3:\n",
|
||||
"\u001b[36;1m\u001b[1;3m1. Fill the 6-liter jug completely.\n",
|
||||
"2. Pour the water from the 6-liter jug into the 12-liter jug.\n",
|
||||
"3. Fill the 6-liter jug again.\n",
|
||||
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full.\n",
|
||||
"5. The amount of water left in the 6-liter jug will be exactly 6 liters.\u001b[0m\n",
|
||||
"Critique:\n",
|
||||
"\u001b[33;1m\u001b[1;3mIdea 1:\n",
|
||||
"1. Fill the 6-liter jug completely. (No flaw)\n",
|
||||
"2. Pour the water from the 6-liter jug into the 12-liter jug. (No flaw)\n",
|
||||
"3. Fill the 6-liter jug again. (No flaw)\n",
|
||||
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full. (Flaw: The 12-liter jug will never be full in this step, as it can hold 12 liters and we are only pouring 6 liters into it.)\n",
|
||||
"5. The amount of water left in the 6-liter jug will be exactly 6 liters. (Flaw: This statement is incorrect, as there will be no water left in the 6-liter jug after pouring it into the 12-liter jug.)\n",
|
||||
"\n",
|
||||
"Idea 2:\n",
|
||||
"1. Fill the 6-liter jug completely. (No flaw)\n",
|
||||
"2. Pour the water from the 6-liter jug into the 12-liter jug. (No flaw)\n",
|
||||
"3. Fill the 6-liter jug again. (No flaw)\n",
|
||||
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full. (Flaw: The 12-liter jug will never be full in this step, as it can hold 12 liters and we are only pouring 6 liters into it.)\n",
|
||||
"5. Since the 12-liter jug is now full, there will be 2 liters of water left in the 6-liter jug. (Flaw: This statement is incorrect, as the 12-liter jug will not be full and there will be no water left in the 6-liter jug.)\n",
|
||||
"6. Empty the 12-liter jug. (No flaw)\n",
|
||||
"7. Pour the 2 liters of water from the 6-liter jug into the 12-liter jug. (Flaw: This step is based on the incorrect assumption that there are 2 liters of water left in the 6-liter jug.)\n",
|
||||
"8. Fill the 6-liter jug completely again. (No flaw)\n",
|
||||
"9. Pour the water from the 6-liter jug into the 12-liter jug, which already has 2 liters in it. (Flaw: This step is based on the incorrect assumption that there are 2 liters of water in the 12-liter jug.)\n",
|
||||
"10. Now, the 12-liter jug will have exactly 6 liters of water (2 liters from before + 4 liters from the 6-liter jug). (Flaw: This conclusion is based on the incorrect assumptions made in the previous steps.)\n",
|
||||
"\n",
|
||||
"Idea 3:\n",
|
||||
"1. Fill the 6-liter jug completely. (No flaw)\n",
|
||||
"2. Pour the water from the 6-liter jug into the 12-liter jug. (No flaw)\n",
|
||||
"3. Fill the 6-liter jug again. (No flaw)\n",
|
||||
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full. (Flaw: The 12-liter jug will never be full in this step, as it can hold 12 liters and we are only pouring 6 liters into it.)\n",
|
||||
"5. The amount of water left in the 6-liter jug will be exactly 6 liters. (Flaw: This statement is incorrect, as there will be no water left in the 6-liter jug after pouring it into the 12-liter jug.)\u001b[0m\n",
|
||||
"Resolution:\n",
|
||||
"\u001b[32;1m\u001b[1;3m1. Fill the 12-liter jug completely.\n",
|
||||
"2. Pour the water from the 12-liter jug into the 6-liter jug until the 6-liter jug is full.\n",
|
||||
"3. The amount of water left in the 12-liter jug will be exactly 6 liters.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'1. Fill the 12-liter jug completely.\\n2. Pour the water from the 12-liter jug into the 6-liter jug until the 6-liter jug is full.\\n3. The amount of water left in the 12-liter jug will be exactly 6 liters.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run({})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "bbfebea1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"##### Different LLM for different steps"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "5be6ec08",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also use different LLMs for the different steps by passing `ideation_llm`, `critique_llm` and `resolve_llm`. You might want to do this to use a more creative (i.e., high-temperature) model for ideation and a more strict (i.e., low-temperature) model for critique and resolution."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "9c33fa19",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = SmartLLMChain(\n",
|
||||
" ideation_llm=ChatOpenAI(temperature=0.9, model_name=\"gpt-4\"),\n",
|
||||
" llm=ChatOpenAI(\n",
|
||||
" temperature=0, model_name=\"gpt-4\"\n",
|
||||
" ), # will be used for critqiue and resolution as no specific llms are given\n",
|
||||
" prompt=prompt,\n",
|
||||
" n_ideas=3,\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "886c1cc1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -2,97 +2,90 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a13ea924",
|
||||
"id": "a0507a4b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tagging\n",
|
||||
"\n",
|
||||
"The tagging chain uses the OpenAI `functions` parameter to specify a schema to tag a document with. This helps us make sure that the model outputs exactly tags that we want, with their appropriate types.\n",
|
||||
"[](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/tagging.ipynb)\n",
|
||||
"\n",
|
||||
"The tagging chain is to be used when we want to tag a passage with a specific attribute (i.e. what is the sentiment of this message?)"
|
||||
"## Use case\n",
|
||||
"\n",
|
||||
"Tagging means labeling a document with classes such as:\n",
|
||||
"\n",
|
||||
"- sentiment\n",
|
||||
"- language\n",
|
||||
"- style (formal, informal etc.)\n",
|
||||
"- covered topics\n",
|
||||
"- political tendency\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"Tagging has a few components:\n",
|
||||
"\n",
|
||||
"* `function`: Like [extraction](/docs/use_cases/extraction), tagging uses [functions](https://openai.com/blog/function-calling-and-other-api-updates) to specify how the model should tag a document\n",
|
||||
"* `schema`: defines how we want to tag the document\n",
|
||||
"\n",
|
||||
"## Quickstart\n",
|
||||
"\n",
|
||||
"Let's see a very straightforward example of how we can use OpenAI functions for tagging in LangChain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "bafb496a",
|
||||
"execution_count": null,
|
||||
"id": "dc5cbb6f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic\n",
|
||||
"from langchain.prompts import ChatPromptTemplate"
|
||||
"!pip install langchain openai \n",
|
||||
"\n",
|
||||
"# Set env var OPENAI_API_KEY or load from a .env file:\n",
|
||||
"# import dotenv\n",
|
||||
"# dotenv.load_env()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "39f3ce3e",
|
||||
"id": "bafb496a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "832ddcd9",
|
||||
"id": "b8ca3f93",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Simplest approach, only specifying type"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4fc8d766",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can start by specifying a few properties with their expected type in our schema"
|
||||
"We specify a few properties with their expected type in our schema."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8329f943",
|
||||
"execution_count": 4,
|
||||
"id": "39f3ce3e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Schema\n",
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"sentiment\": {\"type\": \"string\"},\n",
|
||||
" \"aggressiveness\": {\"type\": \"integer\"},\n",
|
||||
" \"language\": {\"type\": \"string\"},\n",
|
||||
" }\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "6146ae70",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_tagging_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9e306ca3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see in the examples, it correctly interprets what we want but the results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"We will see how to control these results in the next section."
|
||||
"# LLM\n",
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||
"chain = create_tagging_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -126,7 +119,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'Spanish'}"
|
||||
"{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'es'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
@@ -140,25 +133,15 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "aae85b27",
|
||||
"cell_type": "markdown",
|
||||
"id": "d921bb53",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'positive', 'aggressiveness': 0, 'language': 'English'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inp = \"Weather is ok here, I can go outside without much more than a coat\"\n",
|
||||
"chain.run(inp)"
|
||||
"As we can see in the examples, it correctly interprets what we want.\n",
|
||||
"\n",
|
||||
"The results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
|
||||
"\n",
|
||||
"We will see how to control these results in the next section."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -166,9 +149,11 @@
|
||||
"id": "bebb2f83",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## More control\n",
|
||||
"## Finer control\n",
|
||||
"\n",
|
||||
"By being smart about how we define our schema we can have more control over the model's output. Specifically we can define:\n",
|
||||
"Careful schema definition gives us more control over the model's output. \n",
|
||||
"\n",
|
||||
"Specifically, we can define:\n",
|
||||
"\n",
|
||||
"- possible values for each property\n",
|
||||
"- description to make sure that the model understands the property\n",
|
||||
@@ -180,7 +165,7 @@
|
||||
"id": "69ef0b9a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Following is an example of how we can use _enum_, _description_ and _required_ to control for each of the previously mentioned aspects:"
|
||||
"Here is an example of how we can use `_enum_`, `_description_`, and `_required_` to control for each of the previously mentioned aspects:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -192,7 +177,6 @@
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"sentiment\": {\"type\": \"string\", \"enum\": [\"happy\", \"neutral\", \"sad\"]},\n",
|
||||
" \"aggressiveness\": {\n",
|
||||
" \"type\": \"integer\",\n",
|
||||
" \"enum\": [1, 2, 3, 4, 5],\n",
|
||||
@@ -234,7 +218,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'happy', 'aggressiveness': 0, 'language': 'spanish'}"
|
||||
"{'aggressiveness': 0, 'language': 'spanish'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
@@ -256,7 +240,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'sad', 'aggressiveness': 10, 'language': 'spanish'}"
|
||||
"{'aggressiveness': 5, 'language': 'spanish'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
@@ -278,7 +262,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'neutral', 'aggressiveness': 0, 'language': 'english'}"
|
||||
"{'aggressiveness': 0, 'language': 'english'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
@@ -291,12 +275,25 @@
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cf6b7389",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The [LangSmith trace](https://smith.langchain.com/public/311e663a-bbe8-4053-843e-5735055c032d/r) lets us peek under the hood:\n",
|
||||
"\n",
|
||||
"* As with [extraction](/docs/use_cases/extraction), we call the `information_extraction` function [here](https://github.com/langchain-ai/langchain/blob/269f85b7b7ffd74b38cd422d4164fc033388c3d0/libs/langchain/langchain/chains/openai_functions/extraction.py#L20) on the input string.\n",
|
||||
"* This OpenAI funtion extraction information based upon the provided schema.\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e68ad17e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Specifying schema with Pydantic"
|
||||
"## Pydantic"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -304,11 +301,11 @@
|
||||
"id": "2f5970ec",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also use a Pydantic schema to specify the required properties and types. We can also send other arguments, such as 'enum' or 'description' as can be seen in the example below.\n",
|
||||
"We can also use a Pydantic schema to specify the required properties and types. \n",
|
||||
"\n",
|
||||
"By using the `create_tagging_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
|
||||
"We can also send other arguments, such as `enum` or `description`, to each field.\n",
|
||||
"\n",
|
||||
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
|
||||
"This lets us specify our schema in the same manner that we would a new class or function in Python with purely Pythonic types."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -371,7 +368,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Tags(sentiment='sad', aggressiveness=10, language='spanish')"
|
||||
"Tags(sentiment='sad', aggressiveness=5, language='spanish')"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
@@ -382,6 +379,17 @@
|
||||
"source": [
|
||||
"res"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "29346d09",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Going deeper\n",
|
||||
"\n",
|
||||
"* You can use the [metadata tagger](https://python.langchain.com/docs/integrations/document_transformers/openai_metadata_tagger) document transformer to extract metadata from a LangChain `Document`. \n",
|
||||
"* This covers the same basic functionality as the tagging chain, only applied to a LangChain `Document`."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -400,7 +408,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
599
docs/extras/use_cases/web_scraping.ipynb
Normal file
@@ -0,0 +1,599 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6605e7f7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Web scraping\n",
|
||||
"\n",
|
||||
"[](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/web_scraping.ipynb)\n",
|
||||
"\n",
|
||||
"## Use case\n",
|
||||
"\n",
|
||||
"[Web research](https://blog.langchain.dev/automating-web-research/) is one of the killer LLM applications:\n",
|
||||
"\n",
|
||||
"* Users have [highlighted it](https://twitter.com/GregKamradt/status/1679913813297225729?s=20) as one of his top desired AI tools. \n",
|
||||
"* OSS repos like [gpt-researcher](https://github.com/assafelovic/gpt-researcher) are growing in popularity. \n",
|
||||
" \n",
|
||||
"\n",
|
||||
" \n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"Gathering content from the web has a few components:\n",
|
||||
"\n",
|
||||
"* `Search`: Query to url (e.g., using `GoogleSearchAPIWrapper`).\n",
|
||||
"* `Loading`: Url to HTML (e.g., using `AsyncHtmlLoader`, `AsyncChromiumLoader`, etc).\n",
|
||||
"* `Transforming`: HTML to formatted text (e.g., using `HTML2Text` or `Beautiful Soup`).\n",
|
||||
"\n",
|
||||
"## Quickstart"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1803c182",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install -q openai langchain playwright beautifulsoup4\n",
|
||||
"playwright install\n",
|
||||
"\n",
|
||||
"# Set env var OPENAI_API_KEY or load from a .env file:\n",
|
||||
"# import dotenv\n",
|
||||
"# dotenv.load_env()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "50741083",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Scraping HTML content using a headless instance of Chromium.\n",
|
||||
"\n",
|
||||
"* The async nature of the scraping process is handled using Python's asyncio library.\n",
|
||||
"* The actual interaction with the web pages is handled by Playwright."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "cd457cb1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AsyncChromiumLoader\n",
|
||||
"from langchain.document_transformers import BeautifulSoupTransformer\n",
|
||||
"\n",
|
||||
"# Load HTML\n",
|
||||
"loader = AsyncChromiumLoader([\"https://www.wsj.com\"])\n",
|
||||
"html = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2a879806",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Scrape text content tags such as `<p>, <li>, <div>, and <a>` tags from the HTML content:\n",
|
||||
"\n",
|
||||
"* `<p>`: The paragraph tag. It defines a paragraph in HTML and is used to group together related sentences and/or phrases.\n",
|
||||
" \n",
|
||||
"* `<li>`: The list item tag. It is used within ordered (`<ol>`) and unordered (`<ul>`) lists to define individual items within the list.\n",
|
||||
" \n",
|
||||
"* `<div>`: The division tag. It is a block-level element used to group other inline or block-level elements.\n",
|
||||
" \n",
|
||||
"* `<a>`: The anchor tag. It is used to define hyperlinks.\n",
|
||||
"\n",
|
||||
"* `<span>`: an inline container used to mark up a part of a text, or a part of a document. \n",
|
||||
"\n",
|
||||
"For many news websites (e.g., WSJ, CNN), headlines and summaries are all in `<span>` tags."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "141f206b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Transform\n",
|
||||
"bs_transformer = BeautifulSoupTransformer()\n",
|
||||
"docs_transformed = bs_transformer.transform_documents(html,tags_to_extract=[\"span\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "73ddb234",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'English EditionEnglish中文 (Chinese)日本語 (Japanese) More Other Products from WSJBuy Side from WSJWSJ ShopWSJ Wine Other Products from WSJ Search Quotes and Companies Search Quotes and Companies 0.15% 0.03% 0.12% -0.42% 4.102% -0.69% -0.25% -0.15% -1.82% 0.24% 0.19% -1.10% About Evan His Family Reflects His Reporting How You Can Help Write a Message Life in Detention Latest News Get Email Updates Four Americans Released From Iranian Prison The Americans will remain under house arrest until they are '"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Result\n",
|
||||
"docs_transformed[0].page_content[0:500]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7d26d185",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These `Documents` now are staged for downstream usage in various LLM apps, as discussed below.\n",
|
||||
"\n",
|
||||
"## Loader\n",
|
||||
"\n",
|
||||
"### AsyncHtmlLoader\n",
|
||||
"\n",
|
||||
"The [AsyncHtmlLoader](docs/integrations/document_loaders/async_html) uses the `aiohttp` library to make asynchronous HTTP requests, suitable for simpler and lightweight scraping.\n",
|
||||
"\n",
|
||||
"### AsyncChromiumLoader\n",
|
||||
"\n",
|
||||
"The [AsyncChromiumLoader](docs/integrations/document_loaders/async_chromium) uses Playwright to launch a Chromium instance, which can handle JavaScript rendering and more complex web interactions.\n",
|
||||
"\n",
|
||||
"Chromium is one of the browsers supported by Playwright, a library used to control browser automation. \n",
|
||||
"\n",
|
||||
"Headless mode means that the browser is running without a graphical user interface, which is commonly used for web scrapin."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8941e855",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AsyncHtmlLoader\n",
|
||||
"urls = [\"https://www.espn.com\",\"https://lilianweng.github.io/posts/2023-06-23-agent/\"]\n",
|
||||
"loader = AsyncHtmlLoader(urls)\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e47f4bf0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Transformer\n",
|
||||
"\n",
|
||||
"### HTML2Text\n",
|
||||
"\n",
|
||||
"[HTML2Text](docs/integrations/document_transformers/html2text) provides a straightforward conversion of HTML content into plain text (with markdown-like formatting) without any specific tag manipulation. \n",
|
||||
"\n",
|
||||
"It's best suited for scenarios where the goal is to extract human-readable text without needing to manipulate specific HTML elements.\n",
|
||||
"\n",
|
||||
"### Beautiful Soup\n",
|
||||
" \n",
|
||||
"Beautiful Soup offers more fine-grained control over HTML content, enabling specific tag extraction, removal, and content cleaning. \n",
|
||||
"\n",
|
||||
"It's suited for cases where you want to extract specific information and clean up the HTML content according to your needs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "99a7e2a8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Fetching pages: 100%|#############################################################################################################| 2/2 [00:00<00:00, 7.01it/s]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AsyncHtmlLoader\n",
|
||||
"urls = [\"https://www.espn.com\", \"https://lilianweng.github.io/posts/2023-06-23-agent/\"]\n",
|
||||
"loader = AsyncHtmlLoader(urls)\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "a2cd3e8d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Skip to main content Skip to navigation\\n\\n<\\n\\n>\\n\\nMenu\\n\\n## ESPN\\n\\n * Search\\n\\n * * scores\\n\\n * NFL\\n * MLB\\n * NBA\\n * NHL\\n * Soccer\\n * NCAAF\\n * …\\n\\n * Women's World Cup\\n * LLWS\\n * NCAAM\\n * NCAAW\\n * Sports Betting\\n * Boxing\\n * CFL\\n * NCAA\\n * Cricket\\n * F1\\n * Golf\\n * Horse\\n * MMA\\n * NASCAR\\n * NBA G League\\n * Olympic Sports\\n * PLL\\n * Racing\\n * RN BB\\n * RN FB\\n * Rugby\\n * Tennis\\n * WNBA\\n * WWE\\n * X Games\\n * XFL\\n\\n * More\""
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.document_transformers import Html2TextTransformer\n",
|
||||
"html2text = Html2TextTransformer()\n",
|
||||
"docs_transformed = html2text.transform_documents(docs)\n",
|
||||
"docs_transformed[0].page_content[0:500]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8aef9861",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Scraping with extraction\n",
|
||||
"\n",
|
||||
"### LLM with function calling\n",
|
||||
"\n",
|
||||
"Web scraping is challenging for many reasons. \n",
|
||||
"\n",
|
||||
"One of them is the changing nature of modern websites' layouts and content, which requires modifying scraping scripts to accommodate the changes.\n",
|
||||
"\n",
|
||||
"Using Function (e.g., OpenAI) with an extraction chain, we avoid having to change your code constantly when websites change. \n",
|
||||
"\n",
|
||||
"We're using `gpt-3.5-turbo-0613` to guarantee access to OpenAI Functions feature (although this might be available to everyone by time of writing). \n",
|
||||
"\n",
|
||||
"We're also keeping `temperature` at `0` to keep randomness of the LLM down."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "52d49f6f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fc5757ce",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Define a schema\n",
|
||||
"\n",
|
||||
"Next, you define a schema to specify what kind of data you want to extract. \n",
|
||||
"\n",
|
||||
"Here, the key names matter as they tell the LLM what kind of information they want. \n",
|
||||
"\n",
|
||||
"So, be as detailed as possible. \n",
|
||||
"\n",
|
||||
"In this example, we want to scrape only news article's name and summary from The Wall Street Journal website."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "95506f8e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import create_extraction_chain\n",
|
||||
"\n",
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"news_article_title\": {\"type\": \"string\"},\n",
|
||||
" \"news_article_summary\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
" \"required\": [\"news_article_title\", \"news_article_summary\"],\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"def extract(content: str, schema: dict):\n",
|
||||
" return create_extraction_chain(schema=schema, llm=llm).run(content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "97f7de42",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run the web scraper w/ BeautifulSoup\n",
|
||||
"\n",
|
||||
"As shown above, we'll using `BeautifulSoupTransformer`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "977560ba",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Extracting content with LLM\n",
|
||||
"[{'news_article_summary': 'The Americans will remain under house arrest until '\n",
|
||||
" 'they are allowed to return to the U.S. in coming '\n",
|
||||
" 'weeks, following a monthslong diplomatic push by '\n",
|
||||
" 'the Biden administration.',\n",
|
||||
" 'news_article_title': 'Four Americans Released From Iranian Prison'},\n",
|
||||
" {'news_article_summary': 'Price pressures continued cooling last month, with '\n",
|
||||
" 'the CPI rising a mild 0.2% from June, likely '\n",
|
||||
" 'deterring the Federal Reserve from raising interest '\n",
|
||||
" 'rates at its September meeting.',\n",
|
||||
" 'news_article_title': 'Cooler July Inflation Opens Door to Fed Pause on '\n",
|
||||
" 'Rates'},\n",
|
||||
" {'news_article_summary': 'The company has decided to eliminate 27 of its 30 '\n",
|
||||
" 'clothing labels, such as Lark & Ro and Goodthreads, '\n",
|
||||
" 'as it works to fend off antitrust scrutiny and cut '\n",
|
||||
" 'costs.',\n",
|
||||
" 'news_article_title': 'Amazon Cuts Dozens of House Brands'},\n",
|
||||
" {'news_article_summary': 'President Biden’s order comes on top of a slowing '\n",
|
||||
" 'Chinese economy, Covid lockdowns and rising '\n",
|
||||
" 'tensions between the two powers.',\n",
|
||||
" 'news_article_title': 'U.S. Investment Ban on China Poised to Deepen Divide'},\n",
|
||||
" {'news_article_summary': 'The proposed trial date in the '\n",
|
||||
" 'election-interference case comes on the same day as '\n",
|
||||
" 'the former president’s not guilty plea on '\n",
|
||||
" 'additional Mar-a-Lago charges.',\n",
|
||||
" 'news_article_title': 'Trump Should Be Tried in January, Prosecutors Tell '\n",
|
||||
" 'Judge'},\n",
|
||||
" {'news_article_summary': 'The CEO who started in June says the platform has '\n",
|
||||
" '“an entirely different road map” for the future.',\n",
|
||||
" 'news_article_title': 'Yaccarino Says X Is Watching Threads but Has Its Own '\n",
|
||||
" 'Vision'},\n",
|
||||
" {'news_article_summary': 'Students foot the bill for flagship state '\n",
|
||||
" 'universities that pour money into new buildings and '\n",
|
||||
" 'programs with little pushback.',\n",
|
||||
" 'news_article_title': 'Colleges Spend Like There’s No Tomorrow. ‘These '\n",
|
||||
" 'Places Are Just Devouring Money.’'},\n",
|
||||
" {'news_article_summary': 'Wildfires fanned by hurricane winds have torn '\n",
|
||||
" 'through parts of the Hawaiian island, devastating '\n",
|
||||
" 'the popular tourist town of Lahaina.',\n",
|
||||
" 'news_article_title': 'Maui Wildfires Leave at Least 36 Dead'},\n",
|
||||
" {'news_article_summary': 'After its large armored push stalled, Kyiv has '\n",
|
||||
" 'fallen back on the kind of tactics that brought it '\n",
|
||||
" 'success earlier in the war.',\n",
|
||||
" 'news_article_title': 'Ukraine Uses Small-Unit Tactics to Retake Captured '\n",
|
||||
" 'Territory'},\n",
|
||||
" {'news_article_summary': 'President Guillermo Lasso says the Aug. 20 election '\n",
|
||||
" 'will proceed, as the Andean country grapples with '\n",
|
||||
" 'rising drug gang violence.',\n",
|
||||
" 'news_article_title': 'Ecuador Declares State of Emergency After '\n",
|
||||
" 'Presidential Hopeful Killed'},\n",
|
||||
" {'news_article_summary': 'This year’s hurricane season, which typically runs '\n",
|
||||
" 'from June to the end of November, has been '\n",
|
||||
" 'difficult to predict, climate scientists said.',\n",
|
||||
" 'news_article_title': 'Atlantic Hurricane Season Prediction Increased to '\n",
|
||||
" '‘Above Normal,’ NOAA Says'},\n",
|
||||
" {'news_article_summary': 'The NFL is raising the price of its NFL+ streaming '\n",
|
||||
" 'packages as it adds the NFL Network and RedZone.',\n",
|
||||
" 'news_article_title': 'NFL to Raise Price of NFL+ Streaming Packages as It '\n",
|
||||
" 'Adds NFL Network, RedZone'},\n",
|
||||
" {'news_article_summary': 'Russia is planning a moon mission as part of the '\n",
|
||||
" 'new space race.',\n",
|
||||
" 'news_article_title': 'Russia’s Moon Mission and the New Space Race'},\n",
|
||||
" {'news_article_summary': 'Tapestry’s $8.5 billion acquisition of Capri would '\n",
|
||||
" 'create a conglomerate with more than $12 billion in '\n",
|
||||
" 'annual sales, but it would still lack the '\n",
|
||||
" 'high-wattage labels and diversity that have fueled '\n",
|
||||
" 'LVMH’s success.',\n",
|
||||
" 'news_article_title': \"Why the Coach and Kors Marriage Doesn't Scare LVMH\"},\n",
|
||||
" {'news_article_summary': 'The Supreme Court has blocked Purdue Pharma’s $6 '\n",
|
||||
" 'billion Sackler opioid settlement.',\n",
|
||||
" 'news_article_title': 'Supreme Court Blocks Purdue Pharma’s $6 Billion '\n",
|
||||
" 'Sackler Opioid Settlement'},\n",
|
||||
" {'news_article_summary': 'The Social Security COLA is expected to rise in '\n",
|
||||
" '2024, but not by a lot.',\n",
|
||||
" 'news_article_title': 'Social Security COLA Expected to Rise in 2024, but '\n",
|
||||
" 'Not by a Lot'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import pprint\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"\n",
|
||||
"def scrape_with_playwright(urls, schema):\n",
|
||||
" \n",
|
||||
" loader = AsyncChromiumLoader(urls)\n",
|
||||
" docs = loader.load()\n",
|
||||
" bs_transformer = BeautifulSoupTransformer()\n",
|
||||
" docs_transformed = bs_transformer.transform_documents(docs,tags_to_extract=[\"span\"])\n",
|
||||
" print(\"Extracting content with LLM\")\n",
|
||||
" \n",
|
||||
" # Grab the first 1000 tokens of the site\n",
|
||||
" splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=1000, \n",
|
||||
" chunk_overlap=0)\n",
|
||||
" splits = splitter.split_documents(docs_transformed)\n",
|
||||
" \n",
|
||||
" # Process the first split \n",
|
||||
" extracted_content = extract(\n",
|
||||
" schema=schema, content=splits[0].page_content\n",
|
||||
" )\n",
|
||||
" pprint.pprint(extracted_content)\n",
|
||||
" return extracted_content\n",
|
||||
"\n",
|
||||
"urls = [\"https://www.wsj.com\"]\n",
|
||||
"extracted_content = scrape_with_playwright(urls, schema=schema)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b08a8cef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can compare the headlines scraped to the page:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Looking at the [LangSmith trace](https://smith.langchain.com/public/c3070198-5b13-419b-87bf-3821cdf34fa6/r), we can see what is going on under the hood:\n",
|
||||
"\n",
|
||||
"* It's following what is explained in the [extraction](docs/use_cases/extraction).\n",
|
||||
"* We call the `information_extraction` function on the input text.\n",
|
||||
"* It will attempt to populate the provided schema from the url content."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a5a6f11e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Research automation\n",
|
||||
"\n",
|
||||
"Related to scraping, we may want to answer specific questions using searched content.\n",
|
||||
"\n",
|
||||
"We can automate the process of [web research](https://blog.langchain.dev/automating-web-research/) using a retriver, such as the `WebResearchRetriever` ([docs](https://python.langchain.com/docs/modules/data_connection/retrievers/web_research)).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Copy requirments [from here](https://github.com/langchain-ai/web-explorer/blob/main/requirements.txt):\n",
|
||||
"\n",
|
||||
"`pip install -r requirements.txt`\n",
|
||||
" \n",
|
||||
"Set `GOOGLE_CSE_ID` and `GOOGLE_API_KEY`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "414f0d41",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.chat_models.openai import ChatOpenAI\n",
|
||||
"from langchain.utilities import GoogleSearchAPIWrapper\n",
|
||||
"from langchain.retrievers.web_research import WebResearchRetriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 41,
|
||||
"id": "5d1ce098",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Vectorstore\n",
|
||||
"vectorstore = Chroma(embedding_function=OpenAIEmbeddings(),persist_directory=\"./chroma_db_oai\")\n",
|
||||
"\n",
|
||||
"# LLM\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"\n",
|
||||
"# Search \n",
|
||||
"search = GoogleSearchAPIWrapper()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6d808b9d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Initialize retriever with the above tools to:\n",
|
||||
" \n",
|
||||
"* Use an LLM to generate multiple relevant search queries (one LLM call)\n",
|
||||
"* Execute a search for each query\n",
|
||||
"* Choose the top K links per query (multiple search calls in parallel)\n",
|
||||
"* Load the information from all chosen links (scrape pages in parallel)\n",
|
||||
"* Index those documents into a vectorstore\n",
|
||||
"* Find the most relevant documents for each original generated search query"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 42,
|
||||
"id": "e3e3a589",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize\n",
|
||||
"web_research_retriever = WebResearchRetriever.from_llm(\n",
|
||||
" vectorstore=vectorstore,\n",
|
||||
" llm=llm, \n",
|
||||
" search=search)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 44,
|
||||
"id": "20655b74",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"INFO:langchain.retrievers.web_research:Generating questions for Google Search ...\n",
|
||||
"INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'How do LLM Powered Autonomous Agents work?', 'text': LineList(lines=['1. What is the functioning principle of LLM Powered Autonomous Agents?\\n', '2. How do LLM Powered Autonomous Agents operate?\\n'])}\n",
|
||||
"INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. What is the functioning principle of LLM Powered Autonomous Agents?\\n', '2. How do LLM Powered Autonomous Agents operate?\\n']\n",
|
||||
"INFO:langchain.retrievers.web_research:Searching for relevat urls ...\n",
|
||||
"INFO:langchain.retrievers.web_research:Searching for relevat urls ...\n",
|
||||
"INFO:langchain.retrievers.web_research:Search results: [{'title': 'LLM Powered Autonomous Agents | Hacker News', 'link': 'https://news.ycombinator.com/item?id=36488871', 'snippet': 'Jun 26, 2023 ... Exactly. A temperature of 0 means you always pick the highest probability token (i.e. the \"max\" function), while a temperature of 1 means you\\xa0...'}]\n",
|
||||
"INFO:langchain.retrievers.web_research:Searching for relevat urls ...\n",
|
||||
"INFO:langchain.retrievers.web_research:Search results: [{'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'link': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'snippet': 'Jun 23, 2023 ... Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\" , \"What are the subgoals for achieving XYZ?\" , (2) by\\xa0...'}]\n",
|
||||
"INFO:langchain.retrievers.web_research:New URLs to load: []\n",
|
||||
"INFO:langchain.retrievers.web_research:Grabbing most relevant splits from urls...\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'question': 'How do LLM Powered Autonomous Agents work?',\n",
|
||||
" 'answer': \"LLM-powered autonomous agents work by using LLM as the agent's brain, complemented by several key components such as planning, memory, and tool use. In terms of planning, the agent breaks down large tasks into smaller subgoals and can reflect and refine its actions based on past experiences. Memory is divided into short-term memory, which is used for in-context learning, and long-term memory, which allows the agent to retain and recall information over extended periods. Tool use involves the agent calling external APIs for additional information. These agents have been used in various applications, including scientific discovery and generative agents simulation.\",\n",
|
||||
" 'sources': ''}"
|
||||
]
|
||||
},
|
||||
"execution_count": 44,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Run\n",
|
||||
"import logging\n",
|
||||
"logging.basicConfig()\n",
|
||||
"logging.getLogger(\"langchain.retrievers.web_research\").setLevel(logging.INFO)\n",
|
||||
"from langchain.chains import RetrievalQAWithSourcesChain\n",
|
||||
"user_input = \"How do LLM Powered Autonomous Agents work?\"\n",
|
||||
"qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=web_research_retriever)\n",
|
||||
"result = qa_chain({\"question\": user_input})\n",
|
||||
"result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ff62e5f5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Going deeper \n",
|
||||
"\n",
|
||||
"* Here's a [app](https://github.com/langchain-ai/web-explorer/tree/main) that wraps this retriver with a lighweight UI."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -43,7 +43,7 @@ llm("Tell me a joke")
|
||||
</CodeOutputBlock>
|
||||
|
||||
### `generate`: batch calls, richer outputs
|
||||
`generate` lets you can call the model with a list of strings, getting back a more complete response than just the text. This complete response can includes things like multiple top responses and other LLM provider-specific information:
|
||||
`generate` lets you can call the model with a list of strings, getting back a more complete response than just the text. This complete response can include things like multiple top responses and other LLM provider-specific information:
|
||||
|
||||
```python
|
||||
llm_result = llm.generate(["Tell me a joke", "Tell me a poem"]*15)
|
||||
|
||||
@@ -106,9 +106,11 @@ llm(template.format_messages(text='i dont like eating tasty things.'))
|
||||
```
|
||||
|
||||
<CodeOutputBlock lang="python">
|
||||
|
||||
```
|
||||
AIMessage(content='I absolutely adore indulging in delicious treats!', additional_kwargs={}, example=False)
|
||||
```
|
||||
|
||||
</CodeOutputBlock>
|
||||
|
||||
This provides you with a lot of flexibility in how you construct your chat prompts.
|
||||
|
||||
@@ -145,7 +145,7 @@ class BabyAGI(Chain, BaseModel):
|
||||
self.print_task_result(result)
|
||||
|
||||
# Step 3: Store the result in Pinecone
|
||||
result_id = f"result_{task['task_id']}"
|
||||
result_id = f"result_{task['task_id']}_{num_iters}"
|
||||
self.vectorstore.add_texts(
|
||||
texts=[result],
|
||||
metadatas=[{"task": task["task_name"]}],
|
||||
|
||||
@@ -0,0 +1,5 @@
|
||||
"""Generalized implementation of SmartGPT (origin: https://youtu.be/wVzuvf9D9BU)"""
|
||||
|
||||
from langchain_experimental.smart_llm.base import SmartLLMChain
|
||||
|
||||
__all__ = ["SmartLLMChain"]
|
||||
323
libs/experimental/langchain_experimental/smart_llm/base.py
Normal file
@@ -0,0 +1,323 @@
|
||||
"""Chain for applying self-critique using the SmartGPT workflow."""
|
||||
from typing import Any, Dict, List, Optional, Tuple, Type
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.manager import CallbackManagerForChainRun
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.input import get_colored_text
|
||||
from langchain.prompts.base import BasePromptTemplate
|
||||
from langchain.prompts.chat import (
|
||||
AIMessagePromptTemplate,
|
||||
BaseMessagePromptTemplate,
|
||||
ChatPromptTemplate,
|
||||
HumanMessagePromptTemplate,
|
||||
)
|
||||
from langchain.schema import LLMResult, PromptValue
|
||||
from pydantic import Extra, root_validator
|
||||
|
||||
|
||||
class SmartLLMChain(Chain):
|
||||
"""
|
||||
Generalized implementation of SmartGPT (origin: https://youtu.be/wVzuvf9D9BU)
|
||||
|
||||
A SmartLLMChain is an LLMChain that instead of simply passing the prompt to the LLM
|
||||
performs these 3 steps:
|
||||
1. Ideate: Pass the user prompt to an ideation LLM n_ideas times,
|
||||
each result is an "idea"
|
||||
2. Critique: Pass the ideas to a critique LLM which looks for flaws in the ideas
|
||||
& picks the best one
|
||||
3. Resolve: Pass the critique to a resolver LLM which improves upon the best idea
|
||||
& outputs only the (improved version of) the best output
|
||||
|
||||
In total, SmartLLMChain pass will use n_ideas+2 LLM calls
|
||||
|
||||
Note that SmartLLMChain will only improve results (compared to a basic LLMChain),
|
||||
when the underlying models have the capability for reflection, which smaller models
|
||||
often don't.
|
||||
|
||||
Finally, a SmartLLMChain assumes that each underlying LLM outputs exactly 1 result.
|
||||
"""
|
||||
|
||||
class SmartLLMChainHistory:
|
||||
question: str = ""
|
||||
ideas: List[str] = []
|
||||
critique: str = ""
|
||||
|
||||
@property
|
||||
def n_ideas(self) -> int:
|
||||
return len(self.ideas)
|
||||
|
||||
def ideation_prompt_inputs(self) -> Dict[str, Any]:
|
||||
return {"question": self.question}
|
||||
|
||||
def critique_prompt_inputs(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"question": self.question,
|
||||
**{f"idea_{i+1}": idea for i, idea in enumerate(self.ideas)},
|
||||
}
|
||||
|
||||
def resolve_prompt_inputs(self) -> Dict[str, Any]:
|
||||
return {
|
||||
"question": self.question,
|
||||
**{f"idea_{i+1}": idea for i, idea in enumerate(self.ideas)},
|
||||
"critique": self.critique,
|
||||
}
|
||||
|
||||
prompt: BasePromptTemplate
|
||||
"""Prompt object to use."""
|
||||
ideation_llm: Optional[BaseLanguageModel] = None
|
||||
"""LLM to use in ideation step. If None given, 'llm' will be used."""
|
||||
critique_llm: Optional[BaseLanguageModel] = None
|
||||
"""LLM to use in critique step. If None given, 'llm' will be used."""
|
||||
resolver_llm: Optional[BaseLanguageModel] = None
|
||||
"""LLM to use in resolve step. If None given, 'llm' will be used."""
|
||||
llm: Optional[BaseLanguageModel] = None
|
||||
"""LLM to use for each steps, if no specific llm for that step is given. """
|
||||
n_ideas: int = 3
|
||||
"""Number of ideas to generate in idea step"""
|
||||
return_intermediate_steps: bool = False
|
||||
"""Whether to return ideas and critique, in addition to resolution."""
|
||||
history: SmartLLMChainHistory = SmartLLMChainHistory()
|
||||
|
||||
class Config:
|
||||
extra = Extra.forbid
|
||||
|
||||
@root_validator
|
||||
@classmethod
|
||||
def validate_inputs(cls, values: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Ensure we have an LLM for each step."""
|
||||
llm = values.get("llm")
|
||||
ideation_llm = values.get("ideation_llm")
|
||||
critique_llm = values.get("critique_llm")
|
||||
resolver_llm = values.get("resolver_llm")
|
||||
|
||||
if not llm and not ideation_llm:
|
||||
raise ValueError(
|
||||
"Either ideation_llm or llm needs to be given. Pass llm, "
|
||||
"if you want to use the same llm for all steps, or pass "
|
||||
"ideation_llm, critique_llm and resolver_llm if you want "
|
||||
"to use different llms for each step."
|
||||
)
|
||||
if not llm and not critique_llm:
|
||||
raise ValueError(
|
||||
"Either critique_llm or llm needs to be given. Pass llm, "
|
||||
"if you want to use the same llm for all steps, or pass "
|
||||
"ideation_llm, critique_llm and resolver_llm if you want "
|
||||
"to use different llms for each step."
|
||||
)
|
||||
if not llm and not resolver_llm:
|
||||
raise ValueError(
|
||||
"Either resolve_llm or llm needs to be given. Pass llm, "
|
||||
"if you want to use the same llm for all steps, or pass "
|
||||
"ideation_llm, critique_llm and resolver_llm if you want "
|
||||
"to use different llms for each step."
|
||||
)
|
||||
if llm and ideation_llm and critique_llm and resolver_llm:
|
||||
raise ValueError(
|
||||
"LLMs are given for each step (ideation_llm, critique_llm,"
|
||||
" resolver_llm), but backup LLM (llm) is also given, which"
|
||||
" would not be used."
|
||||
)
|
||||
return values
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
"""Defines the input keys."""
|
||||
return self.prompt.input_variables
|
||||
|
||||
@property
|
||||
def output_keys(self) -> List[str]:
|
||||
"""Defines the output keys."""
|
||||
if self.return_intermediate_steps:
|
||||
return ["ideas", "critique", "resolution"]
|
||||
return ["resolution"]
|
||||
|
||||
def prep_prompts(
|
||||
self,
|
||||
inputs: Dict[str, Any],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Tuple[PromptValue, Optional[List[str]]]:
|
||||
"""Prepare prompts from inputs."""
|
||||
stop = None
|
||||
if "stop" in inputs:
|
||||
stop = inputs["stop"]
|
||||
selected_inputs = {k: inputs[k] for k in self.prompt.input_variables}
|
||||
prompt = self.prompt.format_prompt(**selected_inputs)
|
||||
_colored_text = get_colored_text(prompt.to_string(), "green")
|
||||
_text = "Prompt after formatting:\n" + _colored_text
|
||||
if run_manager:
|
||||
run_manager.on_text(_text, end="\n", verbose=self.verbose)
|
||||
if "stop" in inputs and inputs["stop"] != stop:
|
||||
raise ValueError(
|
||||
"If `stop` is present in any inputs, should be present in all."
|
||||
)
|
||||
return prompt, stop
|
||||
|
||||
def _call(
|
||||
self,
|
||||
input_list: Dict[str, Any],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, Any]:
|
||||
prompt, stop = self.prep_prompts(input_list, run_manager=run_manager)
|
||||
self.history.question = prompt.to_string()
|
||||
ideas = self._ideate(stop, run_manager)
|
||||
self.history.ideas = ideas
|
||||
critique = self._critique(stop, run_manager)
|
||||
self.history.critique = critique
|
||||
resolution = self._resolve(stop, run_manager)
|
||||
if self.return_intermediate_steps:
|
||||
return {"ideas": ideas, "critique": critique, "resolution": resolution}
|
||||
return {"resolution": resolution}
|
||||
|
||||
def _get_text_from_llm_result(self, result: LLMResult, step: str) -> str:
|
||||
"""Between steps, only the LLM result text is passed, not the LLMResult object.
|
||||
This function extracts the text from an LLMResult."""
|
||||
if len(result.generations) != 1:
|
||||
raise ValueError(
|
||||
f"In SmartLLM the LLM result in step {step} is not "
|
||||
"exactly 1 element. This should never happen"
|
||||
)
|
||||
if len(result.generations[0]) != 1:
|
||||
raise ValueError(
|
||||
f"In SmartLLM the LLM in step {step} returned more than "
|
||||
"1 output. SmartLLM only works with LLMs returning "
|
||||
"exactly 1 output."
|
||||
)
|
||||
return result.generations[0][0].text
|
||||
|
||||
def get_prompt_strings(
|
||||
self, stage: str
|
||||
) -> List[Tuple[Type[BaseMessagePromptTemplate], str]]:
|
||||
role_strings: List[Tuple[Type[BaseMessagePromptTemplate], str]] = []
|
||||
role_strings.append(
|
||||
(
|
||||
HumanMessagePromptTemplate,
|
||||
"Question: {question}\nAnswer: Let's work this out in a step by "
|
||||
"step way to be sure we have the right answer:",
|
||||
)
|
||||
)
|
||||
if stage == "ideation":
|
||||
return role_strings
|
||||
role_strings.extend(
|
||||
[
|
||||
*[
|
||||
(
|
||||
AIMessagePromptTemplate,
|
||||
"Idea " + str(i + 1) + ": {idea_" + str(i + 1) + "}",
|
||||
)
|
||||
for i in range(self.n_ideas)
|
||||
],
|
||||
(
|
||||
HumanMessagePromptTemplate,
|
||||
"You are a researcher tasked with investigating the "
|
||||
f"{self.n_ideas} response options provided. List the flaws and "
|
||||
"faulty logic of each answer options. Let'w work this out in a step"
|
||||
" by step way to be sure we have all the errors:",
|
||||
),
|
||||
]
|
||||
)
|
||||
if stage == "critique":
|
||||
return role_strings
|
||||
role_strings.extend(
|
||||
[
|
||||
(AIMessagePromptTemplate, "Critique: {critique}"),
|
||||
(
|
||||
HumanMessagePromptTemplate,
|
||||
"You are a resolved tasked with 1) finding which of "
|
||||
f"the {self.n_ideas} anwer options the researcher thought was "
|
||||
"best,2) improving that answer and 3) printing the answer in full. "
|
||||
"Don't output anything for step 1 or 2, only the full answer in 3. "
|
||||
"Let's work this out in a step by step way to be sure we have "
|
||||
"the right answer:",
|
||||
),
|
||||
]
|
||||
)
|
||||
if stage == "resolve":
|
||||
return role_strings
|
||||
raise ValueError(
|
||||
"stage should be either 'ideation', 'critique' or 'resolve',"
|
||||
f" but it is '{stage}'. This should never happen."
|
||||
)
|
||||
|
||||
def ideation_prompt(self) -> ChatPromptTemplate:
|
||||
return ChatPromptTemplate.from_strings(self.get_prompt_strings("ideation"))
|
||||
|
||||
def critique_prompt(self) -> ChatPromptTemplate:
|
||||
return ChatPromptTemplate.from_strings(self.get_prompt_strings("critique"))
|
||||
|
||||
def resolve_prompt(self) -> ChatPromptTemplate:
|
||||
return ChatPromptTemplate.from_strings(self.get_prompt_strings("resolve"))
|
||||
|
||||
def _ideate(
|
||||
self,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> List[str]:
|
||||
"""Generate n_ideas ideas as response to user prompt."""
|
||||
llm = self.ideation_llm if self.ideation_llm else self.llm
|
||||
prompt = self.ideation_prompt().format_prompt(
|
||||
**self.history.ideation_prompt_inputs()
|
||||
)
|
||||
callbacks = run_manager.get_child() if run_manager else None
|
||||
if llm:
|
||||
ideas = [
|
||||
self._get_text_from_llm_result(
|
||||
llm.generate_prompt([prompt], stop, callbacks),
|
||||
step="ideate",
|
||||
)
|
||||
for _ in range(self.n_ideas)
|
||||
]
|
||||
for i, idea in enumerate(ideas):
|
||||
_colored_text = get_colored_text(idea, "blue")
|
||||
_text = f"Idea {i+1}:\n" + _colored_text
|
||||
if run_manager:
|
||||
run_manager.on_text(_text, end="\n", verbose=self.verbose)
|
||||
return ideas
|
||||
else:
|
||||
raise ValueError("llm is none, which should never happen")
|
||||
|
||||
def _critique(
|
||||
self,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> str:
|
||||
"""Critique each of the ideas from ideation stage & select best one."""
|
||||
llm = self.critique_llm if self.critique_llm else self.llm
|
||||
prompt = self.critique_prompt().format_prompt(
|
||||
**self.history.critique_prompt_inputs()
|
||||
)
|
||||
callbacks = run_manager.handlers if run_manager else None
|
||||
if llm:
|
||||
critique = self._get_text_from_llm_result(
|
||||
llm.generate_prompt([prompt], stop, callbacks), step="critique"
|
||||
)
|
||||
_colored_text = get_colored_text(critique, "yellow")
|
||||
_text = "Critique:\n" + _colored_text
|
||||
if run_manager:
|
||||
run_manager.on_text(_text, end="\n", verbose=self.verbose)
|
||||
return critique
|
||||
else:
|
||||
raise ValueError("llm is none, which should never happen")
|
||||
|
||||
def _resolve(
|
||||
self,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> str:
|
||||
"""Improve upon the best idea as chosen in critique step & return it."""
|
||||
llm = self.resolver_llm if self.resolver_llm else self.llm
|
||||
prompt = self.resolve_prompt().format_prompt(
|
||||
**self.history.resolve_prompt_inputs()
|
||||
)
|
||||
callbacks = run_manager.handlers if run_manager else None
|
||||
if llm:
|
||||
resolution = self._get_text_from_llm_result(
|
||||
llm.generate_prompt([prompt], stop, callbacks), step="resolve"
|
||||
)
|
||||
_colored_text = get_colored_text(resolution, "green")
|
||||
_text = "Resolution:\n" + _colored_text
|
||||
if run_manager:
|
||||
run_manager.on_text(_text, end="\n", verbose=self.verbose)
|
||||
return resolution
|
||||
else:
|
||||
raise ValueError("llm is none, which should never happen")
|
||||
120
libs/experimental/tests/unit_tests/test_smartllm.py
Normal file
@@ -0,0 +1,120 @@
|
||||
"""Test SmartLLM."""
|
||||
from langchain.chat_models import FakeListChatModel
|
||||
from langchain.llms import FakeListLLM
|
||||
from langchain.prompts.prompt import PromptTemplate
|
||||
|
||||
from langchain_experimental.smart_llm import SmartLLMChain
|
||||
|
||||
|
||||
def test_ideation() -> None:
|
||||
# test that correct responses are returned
|
||||
responses = ["Idea 1", "Idea 2", "Idea 3"]
|
||||
llm = FakeListLLM(responses=responses)
|
||||
prompt = PromptTemplate(
|
||||
input_variables=["product"],
|
||||
template="What is a good name for a company that makes {product}?",
|
||||
)
|
||||
chain = SmartLLMChain(llm=llm, prompt=prompt)
|
||||
prompt_value, _ = chain.prep_prompts({"product": "socks"})
|
||||
chain.history.question = prompt_value.to_string()
|
||||
results = chain._ideate()
|
||||
assert results == responses
|
||||
|
||||
# test that correct number of responses are returned
|
||||
for i in range(1, 5):
|
||||
responses = [f"Idea {j+1}" for j in range(i)]
|
||||
llm = FakeListLLM(responses=responses)
|
||||
chain = SmartLLMChain(llm=llm, prompt=prompt, n_ideas=i)
|
||||
prompt_value, _ = chain.prep_prompts({"product": "socks"})
|
||||
chain.history.question = prompt_value.to_string()
|
||||
results = chain._ideate()
|
||||
assert len(results) == i
|
||||
|
||||
|
||||
def test_critique() -> None:
|
||||
response = "Test Critique"
|
||||
llm = FakeListLLM(responses=[response])
|
||||
prompt = PromptTemplate(
|
||||
input_variables=["product"],
|
||||
template="What is a good name for a company that makes {product}?",
|
||||
)
|
||||
chain = SmartLLMChain(llm=llm, prompt=prompt, n_ideas=2)
|
||||
prompt_value, _ = chain.prep_prompts({"product": "socks"})
|
||||
chain.history.question = prompt_value.to_string()
|
||||
chain.history.ideas = ["Test Idea 1", "Test Idea 2"]
|
||||
result = chain._critique()
|
||||
assert result == response
|
||||
|
||||
|
||||
def test_resolver() -> None:
|
||||
response = "Test resolution"
|
||||
llm = FakeListLLM(responses=[response])
|
||||
prompt = PromptTemplate(
|
||||
input_variables=["product"],
|
||||
template="What is a good name for a company that makes {product}?",
|
||||
)
|
||||
chain = SmartLLMChain(llm=llm, prompt=prompt, n_ideas=2)
|
||||
prompt_value, _ = chain.prep_prompts({"product": "socks"})
|
||||
chain.history.question = prompt_value.to_string()
|
||||
chain.history.ideas = ["Test Idea 1", "Test Idea 2"]
|
||||
chain.history.critique = "Test Critique"
|
||||
result = chain._resolve()
|
||||
assert result == response
|
||||
|
||||
|
||||
def test_all_steps() -> None:
|
||||
joke = "Why did the chicken cross the Mobius strip?"
|
||||
response = "Resolution response"
|
||||
ideation_llm = FakeListLLM(responses=["Ideation response" for _ in range(20)])
|
||||
critique_llm = FakeListLLM(responses=["Critique response" for _ in range(20)])
|
||||
resolver_llm = FakeListLLM(responses=[response for _ in range(20)])
|
||||
prompt = PromptTemplate(
|
||||
input_variables=["joke"],
|
||||
template="Explain this joke to me: {joke}?",
|
||||
)
|
||||
chain = SmartLLMChain(
|
||||
ideation_llm=ideation_llm,
|
||||
critique_llm=critique_llm,
|
||||
resolver_llm=resolver_llm,
|
||||
prompt=prompt,
|
||||
)
|
||||
result = chain(joke)
|
||||
assert result["joke"] == joke
|
||||
assert result["resolution"] == response
|
||||
|
||||
|
||||
def test_intermediate_output() -> None:
|
||||
joke = "Why did the chicken cross the Mobius strip?"
|
||||
llm = FakeListLLM(responses=[f"Response {i+1}" for i in range(5)])
|
||||
prompt = PromptTemplate(
|
||||
input_variables=["joke"],
|
||||
template="Explain this joke to me: {joke}?",
|
||||
)
|
||||
chain = SmartLLMChain(llm=llm, prompt=prompt, return_intermediate_steps=True)
|
||||
result = chain(joke)
|
||||
assert result["joke"] == joke
|
||||
assert result["ideas"] == [f"Response {i+1}" for i in range(3)]
|
||||
assert result["critique"] == "Response 4"
|
||||
assert result["resolution"] == "Response 5"
|
||||
|
||||
|
||||
def test_all_steps_with_chat_model() -> None:
|
||||
joke = "Why did the chicken cross the Mobius strip?"
|
||||
response = "Resolution response"
|
||||
|
||||
ideation_llm = FakeListChatModel(responses=["Ideation response" for _ in range(20)])
|
||||
critique_llm = FakeListChatModel(responses=["Critique response" for _ in range(20)])
|
||||
resolver_llm = FakeListChatModel(responses=[response for _ in range(20)])
|
||||
prompt = PromptTemplate(
|
||||
input_variables=["joke"],
|
||||
template="Explain this joke to me: {joke}?",
|
||||
)
|
||||
chain = SmartLLMChain(
|
||||
ideation_llm=ideation_llm,
|
||||
critique_llm=critique_llm,
|
||||
resolver_llm=resolver_llm,
|
||||
prompt=prompt,
|
||||
)
|
||||
result = chain(joke)
|
||||
assert result["joke"] == joke
|
||||
assert result["resolution"] == response
|
||||
0
libs/langchain/langchain/adapters/__init__.py
Normal file
208
libs/langchain/langchain/adapters/openai.py
Normal file
@@ -0,0 +1,208 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib
|
||||
from typing import (
|
||||
Any,
|
||||
AsyncIterator,
|
||||
Dict,
|
||||
Iterable,
|
||||
List,
|
||||
Mapping,
|
||||
Sequence,
|
||||
Union,
|
||||
overload,
|
||||
)
|
||||
|
||||
from typing_extensions import Literal
|
||||
|
||||
from langchain.schema.messages import (
|
||||
AIMessage,
|
||||
AIMessageChunk,
|
||||
BaseMessage,
|
||||
BaseMessageChunk,
|
||||
ChatMessage,
|
||||
FunctionMessage,
|
||||
HumanMessage,
|
||||
SystemMessage,
|
||||
)
|
||||
|
||||
|
||||
async def aenumerate(
|
||||
iterable: AsyncIterator[Any], start: int = 0
|
||||
) -> AsyncIterator[tuple[int, Any]]:
|
||||
"""Async version of enumerate."""
|
||||
i = start
|
||||
async for x in iterable:
|
||||
yield i, x
|
||||
i += 1
|
||||
|
||||
|
||||
def convert_dict_to_message(_dict: Mapping[str, Any]) -> BaseMessage:
|
||||
role = _dict["role"]
|
||||
if role == "user":
|
||||
return HumanMessage(content=_dict["content"])
|
||||
elif role == "assistant":
|
||||
# Fix for azure
|
||||
# Also OpenAI returns None for tool invocations
|
||||
content = _dict.get("content", "") or ""
|
||||
if _dict.get("function_call"):
|
||||
additional_kwargs = {"function_call": dict(_dict["function_call"])}
|
||||
else:
|
||||
additional_kwargs = {}
|
||||
return AIMessage(content=content, additional_kwargs=additional_kwargs)
|
||||
elif role == "system":
|
||||
return SystemMessage(content=_dict["content"])
|
||||
elif role == "function":
|
||||
return FunctionMessage(content=_dict["content"], name=_dict["name"])
|
||||
else:
|
||||
return ChatMessage(content=_dict["content"], role=role)
|
||||
|
||||
|
||||
def convert_message_to_dict(message: BaseMessage) -> dict:
|
||||
message_dict: Dict[str, Any]
|
||||
if isinstance(message, ChatMessage):
|
||||
message_dict = {"role": message.role, "content": message.content}
|
||||
elif isinstance(message, HumanMessage):
|
||||
message_dict = {"role": "user", "content": message.content}
|
||||
elif isinstance(message, AIMessage):
|
||||
message_dict = {"role": "assistant", "content": message.content}
|
||||
if "function_call" in message.additional_kwargs:
|
||||
message_dict["function_call"] = message.additional_kwargs["function_call"]
|
||||
# If function call only, content is None not empty string
|
||||
if message_dict["content"] == "":
|
||||
message_dict["content"] = None
|
||||
elif isinstance(message, SystemMessage):
|
||||
message_dict = {"role": "system", "content": message.content}
|
||||
elif isinstance(message, FunctionMessage):
|
||||
message_dict = {
|
||||
"role": "function",
|
||||
"content": message.content,
|
||||
"name": message.name,
|
||||
}
|
||||
else:
|
||||
raise TypeError(f"Got unknown type {message}")
|
||||
if "name" in message.additional_kwargs:
|
||||
message_dict["name"] = message.additional_kwargs["name"]
|
||||
return message_dict
|
||||
|
||||
|
||||
def convert_openai_messages(messages: Sequence[Dict[str, Any]]) -> List[BaseMessage]:
|
||||
"""Convert dictionaries representing OpenAI messages to LangChain format.
|
||||
|
||||
Args:
|
||||
messages: List of dictionaries representing OpenAI messages
|
||||
|
||||
Returns:
|
||||
List of LangChain BaseMessage objects.
|
||||
"""
|
||||
return [convert_dict_to_message(m) for m in messages]
|
||||
|
||||
|
||||
def _convert_message_chunk_to_delta(chunk: BaseMessageChunk, i: int) -> Dict[str, Any]:
|
||||
_dict: Dict[str, Any] = {}
|
||||
if isinstance(chunk, AIMessageChunk):
|
||||
if i == 0:
|
||||
# Only shows up in the first chunk
|
||||
_dict["role"] = "assistant"
|
||||
if "function_call" in chunk.additional_kwargs:
|
||||
_dict["function_call"] = chunk.additional_kwargs["function_call"]
|
||||
# If the first chunk is a function call, the content is not empty string,
|
||||
# not missing, but None.
|
||||
if i == 0:
|
||||
_dict["content"] = None
|
||||
else:
|
||||
_dict["content"] = chunk.content
|
||||
else:
|
||||
raise ValueError(f"Got unexpected streaming chunk type: {type(chunk)}")
|
||||
# This only happens at the end of streams, and OpenAI returns as empty dict
|
||||
if _dict == {"content": ""}:
|
||||
_dict = {}
|
||||
return {"choices": [{"delta": _dict}]}
|
||||
|
||||
|
||||
class ChatCompletion:
|
||||
@overload
|
||||
@staticmethod
|
||||
def create(
|
||||
messages: Sequence[Dict[str, Any]],
|
||||
*,
|
||||
provider: str = "ChatOpenAI",
|
||||
stream: Literal[False] = False,
|
||||
**kwargs: Any,
|
||||
) -> dict:
|
||||
...
|
||||
|
||||
@overload
|
||||
@staticmethod
|
||||
def create(
|
||||
messages: Sequence[Dict[str, Any]],
|
||||
*,
|
||||
provider: str = "ChatOpenAI",
|
||||
stream: Literal[True],
|
||||
**kwargs: Any,
|
||||
) -> Iterable:
|
||||
...
|
||||
|
||||
@staticmethod
|
||||
def create(
|
||||
messages: Sequence[Dict[str, Any]],
|
||||
*,
|
||||
provider: str = "ChatOpenAI",
|
||||
stream: bool = False,
|
||||
**kwargs: Any,
|
||||
) -> Union[dict, Iterable]:
|
||||
models = importlib.import_module("langchain.chat_models")
|
||||
model_cls = getattr(models, provider)
|
||||
model_config = model_cls(**kwargs)
|
||||
converted_messages = convert_openai_messages(messages)
|
||||
if not stream:
|
||||
result = model_config.invoke(converted_messages)
|
||||
return {"choices": [{"message": convert_message_to_dict(result)}]}
|
||||
else:
|
||||
return (
|
||||
_convert_message_chunk_to_delta(c, i)
|
||||
for i, c in enumerate(model_config.stream(converted_messages))
|
||||
)
|
||||
|
||||
@overload
|
||||
@staticmethod
|
||||
async def acreate(
|
||||
messages: Sequence[Dict[str, Any]],
|
||||
*,
|
||||
provider: str = "ChatOpenAI",
|
||||
stream: Literal[False] = False,
|
||||
**kwargs: Any,
|
||||
) -> dict:
|
||||
...
|
||||
|
||||
@overload
|
||||
@staticmethod
|
||||
async def acreate(
|
||||
messages: Sequence[Dict[str, Any]],
|
||||
*,
|
||||
provider: str = "ChatOpenAI",
|
||||
stream: Literal[True],
|
||||
**kwargs: Any,
|
||||
) -> AsyncIterator:
|
||||
...
|
||||
|
||||
@staticmethod
|
||||
async def acreate(
|
||||
messages: Sequence[Dict[str, Any]],
|
||||
*,
|
||||
provider: str = "ChatOpenAI",
|
||||
stream: bool = False,
|
||||
**kwargs: Any,
|
||||
) -> Union[dict, AsyncIterator]:
|
||||
models = importlib.import_module("langchain.chat_models")
|
||||
model_cls = getattr(models, provider)
|
||||
model_config = model_cls(**kwargs)
|
||||
converted_messages = convert_openai_messages(messages)
|
||||
if not stream:
|
||||
result = await model_config.ainvoke(converted_messages)
|
||||
return {"choices": [{"message": convert_message_to_dict(result)}]}
|
||||
else:
|
||||
return (
|
||||
_convert_message_chunk_to_delta(c, i)
|
||||
async for i, c in aenumerate(model_config.astream(converted_messages))
|
||||
)
|
||||
@@ -475,7 +475,8 @@ class Agent(BaseSingleActionAgent):
|
||||
"""
|
||||
full_inputs = self.get_full_inputs(intermediate_steps, **kwargs)
|
||||
full_output = await self.llm_chain.apredict(callbacks=callbacks, **full_inputs)
|
||||
return self.output_parser.parse(full_output)
|
||||
agent_output = await self.output_parser.aparse(full_output)
|
||||
return agent_output
|
||||
|
||||
def get_full_inputs(
|
||||
self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
|
||||
|
||||
@@ -55,6 +55,16 @@ def dereference_refs(spec_obj: dict, full_spec: dict) -> Union[dict, list]:
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ReducedOpenAPISpec:
|
||||
"""A reduced OpenAPI spec.
|
||||
|
||||
This is a quick and dirty representation for OpenAPI specs.
|
||||
|
||||
Attributes:
|
||||
servers: The servers in the spec.
|
||||
description: The description of the spec.
|
||||
endpoints: The endpoints in the spec.
|
||||
"""
|
||||
|
||||
servers: List[dict]
|
||||
description: str
|
||||
endpoints: List[Tuple[str, str, dict]]
|
||||
|
||||
@@ -22,7 +22,20 @@ def create_vectorstore_agent(
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Dict[str, Any],
|
||||
) -> AgentExecutor:
|
||||
"""Construct a VectorStore agent from an LLM and tools."""
|
||||
"""Construct a VectorStore agent from an LLM and tools.
|
||||
|
||||
Args:
|
||||
llm (BaseLanguageModel): LLM that will be used by the agent
|
||||
toolkit (VectorStoreToolkit): Set of tools for the agent
|
||||
callback_manager (Optional[BaseCallbackManager], optional): Object to handle the callback [ Defaults to None. ]
|
||||
prefix (str, optional): The prefix prompt for the agent. If not provided uses default PREFIX.
|
||||
verbose (bool, optional): If you want to see the content of the scratchpad. [ Defaults to False ]
|
||||
agent_executor_kwargs (Optional[Dict[str, Any]], optional): If there is any other parameter you want to send to the agent. [ Defaults to None ]
|
||||
**kwargs: Additional named parameters to pass to the ZeroShotAgent.
|
||||
|
||||
Returns:
|
||||
AgentExecutor: Returns a callable AgentExecutor object. Either you can call it or use run method with the query to get the response
|
||||
""" # noqa: E501
|
||||
tools = toolkit.get_tools()
|
||||
prompt = ZeroShotAgent.create_prompt(tools, prefix=prefix)
|
||||
llm_chain = LLMChain(
|
||||
@@ -50,7 +63,20 @@ def create_vectorstore_router_agent(
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Dict[str, Any],
|
||||
) -> AgentExecutor:
|
||||
"""Construct a VectorStore router agent from an LLM and tools."""
|
||||
"""Construct a VectorStore router agent from an LLM and tools.
|
||||
|
||||
Args:
|
||||
llm (BaseLanguageModel): LLM that will be used by the agent
|
||||
toolkit (VectorStoreRouterToolkit): Set of tools for the agent which have routing capability with multiple vector stores
|
||||
callback_manager (Optional[BaseCallbackManager], optional): Object to handle the callback [ Defaults to None. ]
|
||||
prefix (str, optional): The prefix prompt for the router agent. If not provided uses default ROUTER_PREFIX.
|
||||
verbose (bool, optional): If you want to see the content of the scratchpad. [ Defaults to False ]
|
||||
agent_executor_kwargs (Optional[Dict[str, Any]], optional): If there is any other parameter you want to send to the agent. [ Defaults to None ]
|
||||
**kwargs: Additional named parameters to pass to the ZeroShotAgent.
|
||||
|
||||
Returns:
|
||||
AgentExecutor: Returns a callable AgentExecutor object. Either you can call it or use run method with the query to get the response.
|
||||
""" # noqa: E501
|
||||
tools = toolkit.get_tools()
|
||||
prompt = ZeroShotAgent.create_prompt(tools, prefix=prefix)
|
||||
llm_chain = LLMChain(
|
||||
|
||||
@@ -19,12 +19,14 @@ logger = logging.getLogger(__name__)
|
||||
class StructuredChatOutputParser(AgentOutputParser):
|
||||
"""Output parser for the structured chat agent."""
|
||||
|
||||
pattern = re.compile(r"```(?:json)?\n(.*?)```", re.DOTALL)
|
||||
|
||||
def get_format_instructions(self) -> str:
|
||||
return FORMAT_INSTRUCTIONS
|
||||
|
||||
def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
|
||||
try:
|
||||
action_match = re.search(r"```(.*?)```?", text, re.DOTALL)
|
||||
action_match = self.pattern.search(text)
|
||||
if action_match is not None:
|
||||
response = json.loads(action_match.group(1).strip(), strict=False)
|
||||
if isinstance(response, list):
|
||||
|
||||
@@ -10,6 +10,8 @@ from langchain.tools.base import BaseTool
|
||||
|
||||
|
||||
class XMLAgentOutputParser(AgentOutputParser):
|
||||
"""Output parser for XMLAgent."""
|
||||
|
||||
def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
|
||||
if "</tool>" in text:
|
||||
tool, tool_input = text.split("</tool>")
|
||||
|
||||
@@ -18,6 +18,7 @@ from langchain.callbacks.file import FileCallbackHandler
|
||||
from langchain.callbacks.flyte_callback import FlyteCallbackHandler
|
||||
from langchain.callbacks.human import HumanApprovalCallbackHandler
|
||||
from langchain.callbacks.infino_callback import InfinoCallbackHandler
|
||||
from langchain.callbacks.labelstudio_callback import LabelStudioCallbackHandler
|
||||
from langchain.callbacks.manager import (
|
||||
get_openai_callback,
|
||||
tracing_enabled,
|
||||
@@ -68,4 +69,5 @@ __all__ = [
|
||||
"wandb_tracing_enabled",
|
||||
"FlyteCallbackHandler",
|
||||
"SageMakerCallbackHandler",
|
||||
"LabelStudioCallbackHandler",
|
||||
]
|
||||
|
||||
@@ -2,6 +2,8 @@ import os
|
||||
import warnings
|
||||
from typing import Any, Dict, List, Optional, Union
|
||||
|
||||
from packaging.version import parse
|
||||
|
||||
from langchain.callbacks.base import BaseCallbackHandler
|
||||
from langchain.schema import AgentAction, AgentFinish, LLMResult
|
||||
|
||||
@@ -19,11 +21,10 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
|
||||
default workspace will be used.
|
||||
api_url: URL of the Argilla Server that we want to use, and where the
|
||||
`FeedbackDataset` lives in. Defaults to `None`, which means that either
|
||||
`ARGILLA_API_URL` environment variable or the default http://localhost:6900
|
||||
will be used.
|
||||
`ARGILLA_API_URL` environment variable or the default will be used.
|
||||
api_key: API Key to connect to the Argilla Server. Defaults to `None`, which
|
||||
means that either `ARGILLA_API_KEY` environment variable or the default
|
||||
`argilla.apikey` will be used.
|
||||
will be used.
|
||||
|
||||
Raises:
|
||||
ImportError: if the `argilla` package is not installed.
|
||||
@@ -51,6 +52,12 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
|
||||
"Argilla, no doubt about it."
|
||||
"""
|
||||
|
||||
REPO_URL = "https://github.com/argilla-io/argilla"
|
||||
ISSUES_URL = f"{REPO_URL}/issues"
|
||||
BLOG_URL = "https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html" # noqa: E501
|
||||
|
||||
DEFAULT_API_URL = "http://localhost:6900"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
dataset_name: str,
|
||||
@@ -70,11 +77,10 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
|
||||
default workspace will be used.
|
||||
api_url: URL of the Argilla Server that we want to use, and where the
|
||||
`FeedbackDataset` lives in. Defaults to `None`, which means that either
|
||||
`ARGILLA_API_URL` environment variable or the default
|
||||
http://localhost:6900 will be used.
|
||||
`ARGILLA_API_URL` environment variable or the default will be used.
|
||||
api_key: API Key to connect to the Argilla Server. Defaults to `None`, which
|
||||
means that either `ARGILLA_API_KEY` environment variable or the default
|
||||
`argilla.apikey` will be used.
|
||||
will be used.
|
||||
|
||||
Raises:
|
||||
ImportError: if the `argilla` package is not installed.
|
||||
@@ -87,41 +93,58 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
|
||||
# Import Argilla (not via `import_argilla` to keep hints in IDEs)
|
||||
try:
|
||||
import argilla as rg # noqa: F401
|
||||
|
||||
self.ARGILLA_VERSION = rg.__version__
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"To use the Argilla callback manager you need to have the `argilla` "
|
||||
"Python package installed. Please install it with `pip install argilla`"
|
||||
)
|
||||
|
||||
# Check whether the Argilla version is compatible
|
||||
if parse(self.ARGILLA_VERSION) < parse("1.8.0"):
|
||||
raise ImportError(
|
||||
f"The installed `argilla` version is {self.ARGILLA_VERSION} but "
|
||||
"`ArgillaCallbackHandler` requires at least version 1.8.0. Please "
|
||||
"upgrade `argilla` with `pip install --upgrade argilla`."
|
||||
)
|
||||
|
||||
# Show a warning message if Argilla will assume the default values will be used
|
||||
if api_url is None and os.getenv("ARGILLA_API_URL") is None:
|
||||
warnings.warn(
|
||||
(
|
||||
"Since `api_url` is None, and the env var `ARGILLA_API_URL` is not"
|
||||
" set, it will default to `http://localhost:6900`."
|
||||
f" set, it will default to `{self.DEFAULT_API_URL}`, which is the"
|
||||
" default API URL in Argilla Quickstart."
|
||||
),
|
||||
)
|
||||
api_url = self.DEFAULT_API_URL
|
||||
|
||||
if api_key is None and os.getenv("ARGILLA_API_KEY") is None:
|
||||
self.DEFAULT_API_KEY = (
|
||||
"admin.apikey"
|
||||
if parse(self.ARGILLA_VERSION) < parse("1.11.0")
|
||||
else "owner.apikey"
|
||||
)
|
||||
|
||||
warnings.warn(
|
||||
(
|
||||
"Since `api_key` is None, and the env var `ARGILLA_API_KEY` is not"
|
||||
" set, it will default to `argilla.apikey`."
|
||||
f" set, it will default to `{self.DEFAULT_API_KEY}`, which is the"
|
||||
" default API key in Argilla Quickstart."
|
||||
),
|
||||
)
|
||||
api_url = self.DEFAULT_API_URL
|
||||
|
||||
# Connect to Argilla with the provided credentials, if applicable
|
||||
try:
|
||||
rg.init(
|
||||
api_key=api_key,
|
||||
api_url=api_url,
|
||||
)
|
||||
rg.init(api_key=api_key, api_url=api_url)
|
||||
except Exception as e:
|
||||
raise ConnectionError(
|
||||
f"Could not connect to Argilla with exception: '{e}'.\n"
|
||||
"Please check your `api_key` and `api_url`, and make sure that "
|
||||
"the Argilla server is up and running. If the problem persists "
|
||||
"please report it to https://github.com/argilla-io/argilla/issues "
|
||||
"with the label `langchain`."
|
||||
f"please report it to {self.ISSUES_URL} as an `integration` issue."
|
||||
) from e
|
||||
|
||||
# Set the Argilla variables
|
||||
@@ -130,46 +153,47 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
|
||||
|
||||
# Retrieve the `FeedbackDataset` from Argilla (without existing records)
|
||||
try:
|
||||
extra_args = {}
|
||||
if parse(self.ARGILLA_VERSION) < parse("1.14.0"):
|
||||
warnings.warn(
|
||||
f"You have Argilla {self.ARGILLA_VERSION}, but Argilla 1.14.0 or"
|
||||
" higher is recommended.",
|
||||
UserWarning,
|
||||
)
|
||||
extra_args = {"with_records": False}
|
||||
self.dataset = rg.FeedbackDataset.from_argilla(
|
||||
name=self.dataset_name,
|
||||
workspace=self.workspace_name,
|
||||
with_records=False,
|
||||
**extra_args,
|
||||
)
|
||||
except Exception as e:
|
||||
raise FileNotFoundError(
|
||||
"`FeedbackDataset` retrieval from Argilla failed with exception:"
|
||||
f" '{e}'.\nPlease check that the dataset with"
|
||||
f" name={self.dataset_name} in the"
|
||||
f"`FeedbackDataset` retrieval from Argilla failed with exception `{e}`."
|
||||
f"\nPlease check that the dataset with name={self.dataset_name} in the"
|
||||
f" workspace={self.workspace_name} exists in advance. If you need help"
|
||||
" on how to create a `langchain`-compatible `FeedbackDataset` in"
|
||||
" Argilla, please visit"
|
||||
" https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html." # noqa: E501
|
||||
" If the problem persists please report it to"
|
||||
" https://github.com/argilla-io/argilla/issues with the label"
|
||||
" `langchain`."
|
||||
f" Argilla, please visit {self.BLOG_URL}. If the problem persists"
|
||||
f" please report it to {self.ISSUES_URL} as an `integration` issue."
|
||||
) from e
|
||||
|
||||
supported_fields = ["prompt", "response"]
|
||||
if supported_fields != [field.name for field in self.dataset.fields]:
|
||||
raise ValueError(
|
||||
f"`FeedbackDataset` with name={self.dataset_name} in the"
|
||||
f" workspace={self.workspace_name} "
|
||||
"had fields that are not supported yet for the `langchain` integration."
|
||||
" Supported fields are: "
|
||||
f"{supported_fields}, and the current `FeedbackDataset` fields are"
|
||||
f" {[field.name for field in self.dataset.fields]}. "
|
||||
"For more information on how to create a `langchain`-compatible"
|
||||
" `FeedbackDataset` in Argilla, please visit"
|
||||
" https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html." # noqa: E501
|
||||
f"`FeedbackDataset` with name={self.dataset_name} in the workspace="
|
||||
f"{self.workspace_name} had fields that are not supported yet for the"
|
||||
f"`langchain` integration. Supported fields are: {supported_fields},"
|
||||
f" and the current `FeedbackDataset` fields are {[field.name for field in self.dataset.fields]}." # noqa: E501
|
||||
" For more information on how to create a `langchain`-compatible"
|
||||
f" `FeedbackDataset` in Argilla, please visit {self.BLOG_URL}."
|
||||
)
|
||||
|
||||
self.prompts: Dict[str, List[str]] = {}
|
||||
|
||||
warnings.warn(
|
||||
(
|
||||
"The `ArgillaCallbackHandler` is currently in beta and is subject to "
|
||||
"change based on updates to `langchain`. Please report any issues to "
|
||||
"https://github.com/argilla-io/argilla/issues with the tag `langchain`."
|
||||
"The `ArgillaCallbackHandler` is currently in beta and is subject to"
|
||||
" change based on updates to `langchain`. Please report any issues to"
|
||||
f" {self.ISSUES_URL} as an `integration` issue."
|
||||
),
|
||||
)
|
||||
|
||||
@@ -205,12 +229,13 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
|
||||
]
|
||||
)
|
||||
|
||||
# Push the records to Argilla
|
||||
self.dataset.push_to_argilla()
|
||||
|
||||
# Pop current run from `self.runs`
|
||||
self.prompts.pop(str(kwargs["run_id"]))
|
||||
|
||||
if parse(self.ARGILLA_VERSION) < parse("1.14.0"):
|
||||
# Push the records to Argilla
|
||||
self.dataset.push_to_argilla()
|
||||
|
||||
def on_llm_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
@@ -278,15 +303,16 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
|
||||
]
|
||||
)
|
||||
|
||||
# Push the records to Argilla
|
||||
self.dataset.push_to_argilla()
|
||||
|
||||
# Pop current run from `self.runs`
|
||||
if str(kwargs["parent_run_id"]) in self.prompts:
|
||||
self.prompts.pop(str(kwargs["parent_run_id"]))
|
||||
if str(kwargs["run_id"]) in self.prompts:
|
||||
self.prompts.pop(str(kwargs["run_id"]))
|
||||
|
||||
if parse(self.ARGILLA_VERSION) < parse("1.14.0"):
|
||||
# Push the records to Argilla
|
||||
self.dataset.push_to_argilla()
|
||||
|
||||
def on_chain_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
|
||||
@@ -13,7 +13,7 @@ class FileCallbackHandler(BaseCallbackHandler):
|
||||
self, filename: str, mode: str = "a", color: Optional[str] = None
|
||||
) -> None:
|
||||
"""Initialize callback handler."""
|
||||
self.file = cast(TextIO, open(filename, mode))
|
||||
self.file = cast(TextIO, open(filename, mode, encoding="utf-8"))
|
||||
self.color = color
|
||||
|
||||
def __del__(self) -> None:
|
||||
|
||||
392
libs/langchain/langchain/callbacks/labelstudio_callback.py
Normal file
@@ -0,0 +1,392 @@
|
||||
import os
|
||||
import warnings
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
from typing import Any, Dict, List, Optional, Tuple, Union
|
||||
from uuid import UUID
|
||||
|
||||
from langchain.callbacks.base import BaseCallbackHandler
|
||||
from langchain.schema import (
|
||||
AgentAction,
|
||||
AgentFinish,
|
||||
BaseMessage,
|
||||
ChatMessage,
|
||||
Generation,
|
||||
LLMResult,
|
||||
)
|
||||
|
||||
|
||||
class LabelStudioMode(Enum):
|
||||
PROMPT = "prompt"
|
||||
CHAT = "chat"
|
||||
|
||||
|
||||
def get_default_label_configs(
|
||||
mode: Union[str, LabelStudioMode]
|
||||
) -> Tuple[str, LabelStudioMode]:
|
||||
_default_label_configs = {
|
||||
LabelStudioMode.PROMPT.value: """
|
||||
<View>
|
||||
<Style>
|
||||
.prompt-box {
|
||||
background-color: white;
|
||||
border-radius: 10px;
|
||||
box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);
|
||||
padding: 20px;
|
||||
}
|
||||
</Style>
|
||||
<View className="root">
|
||||
<View className="prompt-box">
|
||||
<Text name="prompt" value="$prompt"/>
|
||||
</View>
|
||||
<TextArea name="response" toName="prompt"
|
||||
maxSubmissions="1" editable="true"
|
||||
required="true"/>
|
||||
</View>
|
||||
<Header value="Rate the response:"/>
|
||||
<Rating name="rating" toName="prompt"/>
|
||||
</View>""",
|
||||
LabelStudioMode.CHAT.value: """
|
||||
<View>
|
||||
<View className="root">
|
||||
<Paragraphs name="dialogue"
|
||||
value="$prompt"
|
||||
layout="dialogue"
|
||||
textKey="content"
|
||||
nameKey="role"
|
||||
granularity="sentence"/>
|
||||
<Header value="Final response:"/>
|
||||
<TextArea name="response" toName="dialogue"
|
||||
maxSubmissions="1" editable="true"
|
||||
required="true"/>
|
||||
</View>
|
||||
<Header value="Rate the response:"/>
|
||||
<Rating name="rating" toName="dialogue"/>
|
||||
</View>""",
|
||||
}
|
||||
|
||||
if isinstance(mode, str):
|
||||
mode = LabelStudioMode(mode)
|
||||
|
||||
return _default_label_configs[mode.value], mode
|
||||
|
||||
|
||||
class LabelStudioCallbackHandler(BaseCallbackHandler):
|
||||
"""Label Studio callback handler.
|
||||
Provides the ability to send predictions to Label Studio
|
||||
for human evaluation, feedback and annotation.
|
||||
|
||||
Parameters:
|
||||
api_key: Label Studio API key
|
||||
url: Label Studio URL
|
||||
project_id: Label Studio project ID
|
||||
project_name: Label Studio project name
|
||||
project_config: Label Studio project config (XML)
|
||||
mode: Label Studio mode ("prompt" or "chat")
|
||||
|
||||
Examples:
|
||||
>>> from langchain.llms import OpenAI
|
||||
>>> from langchain.callbacks import LabelStudioCallbackHandler
|
||||
>>> handler = LabelStudioCallbackHandler(
|
||||
... api_key='<your_key_here>',
|
||||
... url='http://localhost:8080',
|
||||
... project_name='LangChain-%Y-%m-%d',
|
||||
... mode='prompt'
|
||||
... )
|
||||
>>> llm = OpenAI(callbacks=[handler])
|
||||
>>> llm.predict('Tell me a story about a dog.')
|
||||
"""
|
||||
|
||||
DEFAULT_PROJECT_NAME = "LangChain-%Y-%m-%d"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
api_key: Optional[str] = None,
|
||||
url: Optional[str] = None,
|
||||
project_id: Optional[int] = None,
|
||||
project_name: str = DEFAULT_PROJECT_NAME,
|
||||
project_config: Optional[str] = None,
|
||||
mode: Union[str, LabelStudioMode] = LabelStudioMode.PROMPT,
|
||||
):
|
||||
super().__init__()
|
||||
|
||||
# Import LabelStudio SDK
|
||||
try:
|
||||
import label_studio_sdk as ls
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
f"You're using {self.__class__.__name__} in your code,"
|
||||
f" but you don't have the LabelStudio SDK "
|
||||
f"Python package installed or upgraded to the latest version. "
|
||||
f"Please run `pip install -U label-studio-sdk`"
|
||||
f" before using this callback."
|
||||
)
|
||||
|
||||
# Check if Label Studio API key is provided
|
||||
if not api_key:
|
||||
if os.getenv("LABEL_STUDIO_API_KEY"):
|
||||
api_key = str(os.getenv("LABEL_STUDIO_API_KEY"))
|
||||
else:
|
||||
raise ValueError(
|
||||
f"You're using {self.__class__.__name__} in your code,"
|
||||
f" Label Studio API key is not provided. "
|
||||
f"Please provide Label Studio API key: "
|
||||
f"go to the Label Studio instance, navigate to "
|
||||
f"Account & Settings -> Access Token and copy the key. "
|
||||
f"Use the key as a parameter for the callback: "
|
||||
f"{self.__class__.__name__}"
|
||||
f"(label_studio_api_key='<your_key_here>', ...) or "
|
||||
f"set the environment variable LABEL_STUDIO_API_KEY=<your_key_here>"
|
||||
)
|
||||
self.api_key = api_key
|
||||
|
||||
if not url:
|
||||
if os.getenv("LABEL_STUDIO_URL"):
|
||||
url = os.getenv("LABEL_STUDIO_URL")
|
||||
else:
|
||||
warnings.warn(
|
||||
f"Label Studio URL is not provided, "
|
||||
f"using default URL: {ls.LABEL_STUDIO_DEFAULT_URL}"
|
||||
f"If you want to provide your own URL, use the parameter: "
|
||||
f"{self.__class__.__name__}"
|
||||
f"(label_studio_url='<your_url_here>', ...) "
|
||||
f"or set the environment variable LABEL_STUDIO_URL=<your_url_here>"
|
||||
)
|
||||
url = ls.LABEL_STUDIO_DEFAULT_URL
|
||||
self.url = url
|
||||
|
||||
# Maps run_id to prompts
|
||||
self.payload: Dict[str, Dict] = {}
|
||||
|
||||
self.ls_client = ls.Client(url=self.url, api_key=self.api_key)
|
||||
self.project_name = project_name
|
||||
if project_config:
|
||||
self.project_config = project_config
|
||||
self.mode = None
|
||||
else:
|
||||
self.project_config, self.mode = get_default_label_configs(mode)
|
||||
|
||||
self.project_id = project_id or os.getenv("LABEL_STUDIO_PROJECT_ID")
|
||||
if self.project_id is not None:
|
||||
self.ls_project = self.ls_client.get_project(int(self.project_id))
|
||||
else:
|
||||
project_title = datetime.today().strftime(self.project_name)
|
||||
existing_projects = self.ls_client.get_projects(title=project_title)
|
||||
if existing_projects:
|
||||
self.ls_project = existing_projects[0]
|
||||
self.project_id = self.ls_project.id
|
||||
else:
|
||||
self.ls_project = self.ls_client.create_project(
|
||||
title=project_title, label_config=self.project_config
|
||||
)
|
||||
self.project_id = self.ls_project.id
|
||||
self.parsed_label_config = self.ls_project.parsed_label_config
|
||||
|
||||
# Find the first TextArea tag
|
||||
# "from_name", "to_name", "value" will be used to create predictions
|
||||
self.from_name, self.to_name, self.value, self.input_type = (
|
||||
None,
|
||||
None,
|
||||
None,
|
||||
None,
|
||||
)
|
||||
for tag_name, tag_info in self.parsed_label_config.items():
|
||||
if tag_info["type"] == "TextArea":
|
||||
self.from_name = tag_name
|
||||
self.to_name = tag_info["to_name"][0]
|
||||
self.value = tag_info["inputs"][0]["value"]
|
||||
self.input_type = tag_info["inputs"][0]["type"]
|
||||
break
|
||||
if not self.from_name:
|
||||
error_message = (
|
||||
f'Label Studio project "{self.project_name}" '
|
||||
f"does not have a TextArea tag. "
|
||||
f"Please add a TextArea tag to the project."
|
||||
)
|
||||
if self.mode == LabelStudioMode.PROMPT:
|
||||
error_message += (
|
||||
"\nHINT: go to project Settings -> "
|
||||
"Labeling Interface -> Browse Templates"
|
||||
' and select "Generative AI -> '
|
||||
'Supervised Language Model Fine-tuning" template.'
|
||||
)
|
||||
else:
|
||||
error_message += (
|
||||
"\nHINT: go to project Settings -> "
|
||||
"Labeling Interface -> Browse Templates"
|
||||
" and check available templates under "
|
||||
'"Generative AI" section.'
|
||||
)
|
||||
raise ValueError(error_message)
|
||||
|
||||
def add_prompts_generations(
|
||||
self, run_id: str, generations: List[List[Generation]]
|
||||
) -> None:
|
||||
# Create tasks in Label Studio
|
||||
tasks = []
|
||||
prompts = self.payload[run_id]["prompts"]
|
||||
model_version = (
|
||||
self.payload[run_id]["kwargs"]
|
||||
.get("invocation_params", {})
|
||||
.get("model_name")
|
||||
)
|
||||
for prompt, generation in zip(prompts, generations):
|
||||
tasks.append(
|
||||
{
|
||||
"data": {
|
||||
self.value: prompt,
|
||||
"run_id": run_id,
|
||||
},
|
||||
"predictions": [
|
||||
{
|
||||
"result": [
|
||||
{
|
||||
"from_name": self.from_name,
|
||||
"to_name": self.to_name,
|
||||
"type": "textarea",
|
||||
"value": {"text": [g.text for g in generation]},
|
||||
}
|
||||
],
|
||||
"model_version": model_version,
|
||||
}
|
||||
],
|
||||
}
|
||||
)
|
||||
self.ls_project.import_tasks(tasks)
|
||||
|
||||
def on_llm_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
prompts: List[str],
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Save the prompts in memory when an LLM starts."""
|
||||
if self.input_type != "Text":
|
||||
raise ValueError(
|
||||
f'\nLabel Studio project "{self.project_name}" '
|
||||
f"has an input type <{self.input_type}>. "
|
||||
f'To make it work with the mode="chat", '
|
||||
f"the input type should be <Text>.\n"
|
||||
f"Read more here https://labelstud.io/tags/text"
|
||||
)
|
||||
run_id = str(kwargs["run_id"])
|
||||
self.payload[run_id] = {"prompts": prompts, "kwargs": kwargs}
|
||||
|
||||
def _get_message_role(self, message: BaseMessage) -> str:
|
||||
"""Get the role of the message."""
|
||||
if isinstance(message, ChatMessage):
|
||||
return message.role
|
||||
else:
|
||||
return message.__class__.__name__
|
||||
|
||||
def on_chat_model_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
messages: List[List[BaseMessage]],
|
||||
*,
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> Any:
|
||||
"""Save the prompts in memory when an LLM starts."""
|
||||
if self.input_type != "Paragraphs":
|
||||
raise ValueError(
|
||||
f'\nLabel Studio project "{self.project_name}" '
|
||||
f"has an input type <{self.input_type}>. "
|
||||
f'To make it work with the mode="chat", '
|
||||
f"the input type should be <Paragraphs>.\n"
|
||||
f"Read more here https://labelstud.io/tags/paragraphs"
|
||||
)
|
||||
|
||||
prompts = []
|
||||
for message_list in messages:
|
||||
dialog = []
|
||||
for message in message_list:
|
||||
dialog.append(
|
||||
{
|
||||
"role": self._get_message_role(message),
|
||||
"content": message.content,
|
||||
}
|
||||
)
|
||||
prompts.append(dialog)
|
||||
self.payload[str(run_id)] = {
|
||||
"prompts": prompts,
|
||||
"tags": tags,
|
||||
"metadata": metadata,
|
||||
"run_id": run_id,
|
||||
"parent_run_id": parent_run_id,
|
||||
"kwargs": kwargs,
|
||||
}
|
||||
|
||||
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
|
||||
"""Do nothing when a new token is generated."""
|
||||
pass
|
||||
|
||||
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
|
||||
"""Create a new Label Studio task for each prompt and generation."""
|
||||
run_id = str(kwargs["run_id"])
|
||||
|
||||
# Submit results to Label Studio
|
||||
self.add_prompts_generations(run_id, response.generations)
|
||||
|
||||
# Pop current run from `self.runs`
|
||||
self.payload.pop(run_id)
|
||||
|
||||
def on_llm_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing when LLM outputs an error."""
|
||||
pass
|
||||
|
||||
def on_chain_start(
|
||||
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
|
||||
) -> None:
|
||||
pass
|
||||
|
||||
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
|
||||
pass
|
||||
|
||||
def on_chain_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing when LLM chain outputs an error."""
|
||||
pass
|
||||
|
||||
def on_tool_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
input_str: str,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Do nothing when tool starts."""
|
||||
pass
|
||||
|
||||
def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
|
||||
"""Do nothing when agent takes a specific action."""
|
||||
pass
|
||||
|
||||
def on_tool_end(
|
||||
self,
|
||||
output: str,
|
||||
observation_prefix: Optional[str] = None,
|
||||
llm_prefix: Optional[str] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Do nothing when tool ends."""
|
||||
pass
|
||||
|
||||
def on_tool_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing when tool outputs an error."""
|
||||
pass
|
||||
|
||||
def on_text(self, text: str, **kwargs: Any) -> None:
|
||||
"""Do nothing"""
|
||||
pass
|
||||
|
||||
def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> None:
|
||||
"""Do nothing"""
|
||||
pass
|
||||
@@ -31,8 +31,19 @@ MODEL_COST_PER_1K_TOKENS = {
|
||||
"gpt-3.5-turbo-0613-completion": 0.002,
|
||||
"gpt-3.5-turbo-16k-completion": 0.004,
|
||||
"gpt-3.5-turbo-16k-0613-completion": 0.004,
|
||||
# Azure GPT-35 input
|
||||
"gpt-35-turbo": 0.0015, # Azure OpenAI version of ChatGPT
|
||||
"gpt-35-turbo-0301": 0.0015, # Azure OpenAI version of ChatGPT
|
||||
"gpt-35-turbo-0613": 0.0015,
|
||||
"gpt-35-turbo-16k": 0.003,
|
||||
"gpt-35-turbo-16k-0613": 0.003,
|
||||
# Azure GPT-35 output
|
||||
"gpt-35-turbo-completion": 0.002, # Azure OpenAI version of ChatGPT
|
||||
"gpt-35-turbo-0301-completion": 0.002, # Azure OpenAI version of ChatGPT
|
||||
"gpt-35-turbo-0613-completion": 0.002,
|
||||
"gpt-35-turbo-16k-completion": 0.004,
|
||||
"gpt-35-turbo-16k-0613-completion": 0.004,
|
||||
# Others
|
||||
"gpt-35-turbo": 0.002, # Azure OpenAI version of ChatGPT
|
||||
"text-ada-001": 0.0004,
|
||||
"ada": 0.0004,
|
||||
"text-babbage-001": 0.0005,
|
||||
@@ -69,7 +80,9 @@ def standardize_model_name(
|
||||
if "ft-" in model_name:
|
||||
return model_name.split(":")[0] + "-finetuned"
|
||||
elif is_completion and (
|
||||
model_name.startswith("gpt-4") or model_name.startswith("gpt-3.5")
|
||||
model_name.startswith("gpt-4")
|
||||
or model_name.startswith("gpt-3.5")
|
||||
or model_name.startswith("gpt-35")
|
||||
):
|
||||
return model_name + "-completion"
|
||||
else:
|
||||
|
||||
@@ -47,6 +47,9 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
parent_run = self.run_map[str(run.parent_run_id)]
|
||||
if parent_run:
|
||||
self._add_child_run(parent_run, run)
|
||||
parent_run.child_execution_order = max(
|
||||
parent_run.child_execution_order, run.child_execution_order
|
||||
)
|
||||
else:
|
||||
logger.debug(f"Parent run with UUID {run.parent_run_id} not found.")
|
||||
self.run_map[str(run.id)] = run
|
||||
@@ -131,7 +134,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
run_id_ = str(run_id)
|
||||
llm_run = self.run_map.get(run_id_)
|
||||
if llm_run is None or llm_run.run_type != "llm":
|
||||
raise TracerException("No LLM Run found to be traced")
|
||||
raise TracerException(f"No LLM Run found to be traced for {run_id}")
|
||||
llm_run.events.append(
|
||||
{
|
||||
"name": "new_token",
|
||||
@@ -183,7 +186,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
run_id_ = str(run_id)
|
||||
llm_run = self.run_map.get(run_id_)
|
||||
if llm_run is None or llm_run.run_type != "llm":
|
||||
raise TracerException("No LLM Run found to be traced")
|
||||
raise TracerException(f"No LLM Run found to be traced for {run_id}")
|
||||
llm_run.outputs = response.dict()
|
||||
for i, generations in enumerate(response.generations):
|
||||
for j, generation in enumerate(generations):
|
||||
@@ -211,7 +214,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
run_id_ = str(run_id)
|
||||
llm_run = self.run_map.get(run_id_)
|
||||
if llm_run is None or llm_run.run_type != "llm":
|
||||
raise TracerException("No LLM Run found to be traced")
|
||||
raise TracerException(f"No LLM Run found to be traced for {run_id}")
|
||||
llm_run.error = repr(error)
|
||||
llm_run.end_time = datetime.utcnow()
|
||||
llm_run.events.append({"name": "error", "time": llm_run.end_time})
|
||||
@@ -254,18 +257,25 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
self._on_chain_start(chain_run)
|
||||
|
||||
def on_chain_end(
|
||||
self, outputs: Dict[str, Any], *, run_id: UUID, **kwargs: Any
|
||||
self,
|
||||
outputs: Dict[str, Any],
|
||||
*,
|
||||
run_id: UUID,
|
||||
inputs: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""End a trace for a chain run."""
|
||||
if not run_id:
|
||||
raise TracerException("No run_id provided for on_chain_end callback.")
|
||||
chain_run = self.run_map.get(str(run_id))
|
||||
if chain_run is None:
|
||||
raise TracerException("No chain Run found to be traced")
|
||||
raise TracerException(f"No chain Run found to be traced for {run_id}")
|
||||
|
||||
chain_run.outputs = outputs
|
||||
chain_run.end_time = datetime.utcnow()
|
||||
chain_run.events.append({"name": "end", "time": chain_run.end_time})
|
||||
if inputs is not None:
|
||||
chain_run.inputs = inputs
|
||||
self._end_trace(chain_run)
|
||||
self._on_chain_end(chain_run)
|
||||
|
||||
@@ -273,6 +283,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
self,
|
||||
error: Union[Exception, KeyboardInterrupt],
|
||||
*,
|
||||
inputs: Optional[Dict[str, Any]] = None,
|
||||
run_id: UUID,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
@@ -281,11 +292,13 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
raise TracerException("No run_id provided for on_chain_error callback.")
|
||||
chain_run = self.run_map.get(str(run_id))
|
||||
if chain_run is None:
|
||||
raise TracerException("No chain Run found to be traced")
|
||||
raise TracerException(f"No chain Run found to be traced for {run_id}")
|
||||
|
||||
chain_run.error = repr(error)
|
||||
chain_run.end_time = datetime.utcnow()
|
||||
chain_run.events.append({"name": "error", "time": chain_run.end_time})
|
||||
if inputs is not None:
|
||||
chain_run.inputs = inputs
|
||||
self._end_trace(chain_run)
|
||||
self._on_chain_error(chain_run)
|
||||
|
||||
@@ -329,7 +342,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
raise TracerException("No run_id provided for on_tool_end callback.")
|
||||
tool_run = self.run_map.get(str(run_id))
|
||||
if tool_run is None or tool_run.run_type != "tool":
|
||||
raise TracerException("No tool Run found to be traced")
|
||||
raise TracerException(f"No tool Run found to be traced for {run_id}")
|
||||
|
||||
tool_run.outputs = {"output": output}
|
||||
tool_run.end_time = datetime.utcnow()
|
||||
@@ -349,7 +362,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
raise TracerException("No run_id provided for on_tool_error callback.")
|
||||
tool_run = self.run_map.get(str(run_id))
|
||||
if tool_run is None or tool_run.run_type != "tool":
|
||||
raise TracerException("No tool Run found to be traced")
|
||||
raise TracerException(f"No tool Run found to be traced for {run_id}")
|
||||
|
||||
tool_run.error = repr(error)
|
||||
tool_run.end_time = datetime.utcnow()
|
||||
@@ -404,7 +417,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
raise TracerException("No run_id provided for on_retriever_error callback.")
|
||||
retrieval_run = self.run_map.get(str(run_id))
|
||||
if retrieval_run is None or retrieval_run.run_type != "retriever":
|
||||
raise TracerException("No retriever Run found to be traced")
|
||||
raise TracerException(f"No retriever Run found to be traced for {run_id}")
|
||||
|
||||
retrieval_run.error = repr(error)
|
||||
retrieval_run.end_time = datetime.utcnow()
|
||||
@@ -420,7 +433,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
raise TracerException("No run_id provided for on_retriever_end callback.")
|
||||
retrieval_run = self.run_map.get(str(run_id))
|
||||
if retrieval_run is None or retrieval_run.run_type != "retriever":
|
||||
raise TracerException("No retriever Run found to be traced")
|
||||
raise TracerException(f"No retriever Run found to be traced for {run_id}")
|
||||
retrieval_run.outputs = {"documents": documents}
|
||||
retrieval_run.end_time = datetime.utcnow()
|
||||
retrieval_run.events.append({"name": "end", "time": retrieval_run.end_time})
|
||||
|
||||
@@ -37,7 +37,7 @@ def elapsed(run: Any) -> str:
|
||||
elapsed_time = run.end_time - run.start_time
|
||||
milliseconds = elapsed_time.total_seconds() * 1000
|
||||
if milliseconds < 1000:
|
||||
return f"{milliseconds}ms"
|
||||
return f"{milliseconds:.0f}ms"
|
||||
return f"{(milliseconds / 1000):.2f}s"
|
||||
|
||||
|
||||
@@ -78,28 +78,31 @@ class FunctionCallbackHandler(BaseTracer):
|
||||
# logging methods
|
||||
def _on_chain_start(self, run: Run) -> None:
|
||||
crumbs = self.get_breadcrumbs(run)
|
||||
run_type = run.run_type.capitalize()
|
||||
self.function_callback(
|
||||
f"{get_colored_text('[chain/start]', color='green')} "
|
||||
+ get_bolded_text(f"[{crumbs}] Entering Chain run with input:\n")
|
||||
+ get_bolded_text(f"[{crumbs}] Entering {run_type} run with input:\n")
|
||||
+ f"{try_json_stringify(run.inputs, '[inputs]')}"
|
||||
)
|
||||
|
||||
def _on_chain_end(self, run: Run) -> None:
|
||||
crumbs = self.get_breadcrumbs(run)
|
||||
run_type = run.run_type.capitalize()
|
||||
self.function_callback(
|
||||
f"{get_colored_text('[chain/end]', color='blue')} "
|
||||
+ get_bolded_text(
|
||||
f"[{crumbs}] [{elapsed(run)}] Exiting Chain run with output:\n"
|
||||
f"[{crumbs}] [{elapsed(run)}] Exiting {run_type} run with output:\n"
|
||||
)
|
||||
+ f"{try_json_stringify(run.outputs, '[outputs]')}"
|
||||
)
|
||||
|
||||
def _on_chain_error(self, run: Run) -> None:
|
||||
crumbs = self.get_breadcrumbs(run)
|
||||
run_type = run.run_type.capitalize()
|
||||
self.function_callback(
|
||||
f"{get_colored_text('[chain/error]', color='red')} "
|
||||
+ get_bolded_text(
|
||||
f"[{crumbs}] [{elapsed(run)}] Chain run errored with error:\n"
|
||||
f"[{crumbs}] [{elapsed(run)}] {run_type} run errored with error:\n"
|
||||
)
|
||||
+ f"{try_json_stringify(run.error, '[error]')}"
|
||||
)
|
||||
|
||||
@@ -1,9 +1,11 @@
|
||||
"""Base interface that all chains should implement."""
|
||||
import asyncio
|
||||
import inspect
|
||||
import json
|
||||
import logging
|
||||
import warnings
|
||||
from abc import ABC, abstractmethod
|
||||
from functools import partial
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional, Union
|
||||
|
||||
@@ -55,18 +57,40 @@ class Chain(Serializable, Runnable[Dict[str, Any], Dict[str, Any]], ABC):
|
||||
"""
|
||||
|
||||
def invoke(
|
||||
self, input: Dict[str, Any], config: Optional[RunnableConfig] = None
|
||||
self,
|
||||
input: Dict[str, Any],
|
||||
config: Optional[RunnableConfig] = None,
|
||||
**kwargs: Any,
|
||||
) -> Dict[str, Any]:
|
||||
return self(input, **(config or {}))
|
||||
config = config or {}
|
||||
return self(
|
||||
input,
|
||||
callbacks=config.get("callbacks"),
|
||||
tags=config.get("tags"),
|
||||
metadata=config.get("metadata"),
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
async def ainvoke(
|
||||
self, input: Dict[str, Any], config: Optional[RunnableConfig] = None
|
||||
self,
|
||||
input: Dict[str, Any],
|
||||
config: Optional[RunnableConfig] = None,
|
||||
**kwargs: Any,
|
||||
) -> Dict[str, Any]:
|
||||
if type(self)._acall == Chain._acall:
|
||||
# If the chain does not implement async, fall back to default implementation
|
||||
return await super().ainvoke(input, config)
|
||||
return await asyncio.get_running_loop().run_in_executor(
|
||||
None, partial(self.invoke, input, config, **kwargs)
|
||||
)
|
||||
|
||||
return await self.acall(input, **(config or {}))
|
||||
config = config or {}
|
||||
return await self.acall(
|
||||
input,
|
||||
callbacks=config.get("callbacks"),
|
||||
tags=config.get("tags"),
|
||||
metadata=config.get("metadata"),
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
memory: Optional[BaseMemory] = None
|
||||
"""Optional memory object. Defaults to None.
|
||||
@@ -553,7 +577,7 @@ class Chain(Serializable, Runnable[Dict[str, Any], Dict[str, Any]], ABC):
|
||||
A dictionary representation of the chain.
|
||||
|
||||
Example:
|
||||
..code-block:: python
|
||||
.. code-block:: python
|
||||
|
||||
chain.dict(exclude_unset=True)
|
||||
# -> {"_type": "foo", "verbose": False, ...}
|
||||
|
||||
@@ -100,6 +100,13 @@ class StuffDocumentsChain(BaseCombineDocumentsChain):
|
||||
)
|
||||
return values
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
extra_keys = [
|
||||
k for k in self.llm_chain.input_keys if k != self.document_variable_name
|
||||
]
|
||||
return super().input_keys + extra_keys
|
||||
|
||||
def _get_inputs(self, docs: List[Document], **kwargs: Any) -> dict:
|
||||
"""Construct inputs from kwargs and docs.
|
||||
|
||||
|
||||
@@ -127,6 +127,8 @@ class LLMChain(Chain):
|
||||
) -> Tuple[List[PromptValue], Optional[List[str]]]:
|
||||
"""Prepare prompts from inputs."""
|
||||
stop = None
|
||||
if len(input_list) == 0:
|
||||
return [], stop
|
||||
if "stop" in input_list[0]:
|
||||
stop = input_list[0]["stop"]
|
||||
prompts = []
|
||||
@@ -151,6 +153,8 @@ class LLMChain(Chain):
|
||||
) -> Tuple[List[PromptValue], Optional[List[str]]]:
|
||||
"""Prepare prompts from inputs."""
|
||||
stop = None
|
||||
if len(input_list) == 0:
|
||||
return [], stop
|
||||
if "stop" in input_list[0]:
|
||||
stop = input_list[0]["stop"]
|
||||
prompts = []
|
||||
|
||||
@@ -49,6 +49,8 @@ class ElementInViewPort(TypedDict):
|
||||
|
||||
|
||||
class Crawler:
|
||||
"""A crawler for web pages."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
try:
|
||||
from playwright.sync_api import sync_playwright
|
||||
|
||||
@@ -9,6 +9,7 @@ try:
|
||||
except ImportError:
|
||||
|
||||
def v_args(*args: Any, **kwargs: Any) -> Any: # type: ignore
|
||||
"""Dummy decorator for when lark is not installed."""
|
||||
return lambda _: None
|
||||
|
||||
Transformer = object # type: ignore
|
||||
|
||||
@@ -3,6 +3,8 @@ import functools
|
||||
import logging
|
||||
from typing import Any, Awaitable, Callable, Dict, List, Optional
|
||||
|
||||
from pydantic import Field
|
||||
|
||||
from langchain.callbacks.manager import (
|
||||
AsyncCallbackManagerForChainRun,
|
||||
CallbackManagerForChainRun,
|
||||
@@ -27,9 +29,11 @@ class TransformChain(Chain):
|
||||
"""The keys expected by the transform's input dictionary."""
|
||||
output_variables: List[str]
|
||||
"""The keys returned by the transform's output dictionary."""
|
||||
transform: Callable[[Dict[str, str]], Dict[str, str]]
|
||||
transform_cb: Callable[[Dict[str, str]], Dict[str, str]] = Field(alias="transform")
|
||||
"""The transform function."""
|
||||
atransform: Optional[Callable[[Dict[str, Any]], Awaitable[Dict[str, Any]]]] = None
|
||||
atransform_cb: Optional[
|
||||
Callable[[Dict[str, Any]], Awaitable[Dict[str, Any]]]
|
||||
] = Field(None, alias="atransform")
|
||||
"""The async coroutine transform function."""
|
||||
|
||||
@staticmethod
|
||||
@@ -62,18 +66,18 @@ class TransformChain(Chain):
|
||||
inputs: Dict[str, str],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, str]:
|
||||
return self.transform(inputs)
|
||||
return self.transform_cb(inputs)
|
||||
|
||||
async def _acall(
|
||||
self,
|
||||
inputs: Dict[str, Any],
|
||||
run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, Any]:
|
||||
if self.atransform is not None:
|
||||
return await self.atransform(inputs)
|
||||
if self.atransform_cb is not None:
|
||||
return await self.atransform_cb(inputs)
|
||||
else:
|
||||
self._log_once(
|
||||
"TransformChain's atransform is not provided, falling"
|
||||
" back to synchronous transform"
|
||||
)
|
||||
return self.transform(inputs)
|
||||
return self.transform_cb(inputs)
|
||||
|
||||
@@ -9,9 +9,9 @@ from typing import TYPE_CHECKING, Optional, Set
|
||||
import requests
|
||||
from pydantic import Field, root_validator
|
||||
|
||||
from langchain.adapters.openai import convert_message_to_dict
|
||||
from langchain.chat_models.openai import (
|
||||
ChatOpenAI,
|
||||
_convert_message_to_dict,
|
||||
_import_tiktoken,
|
||||
)
|
||||
from langchain.schema.messages import BaseMessage
|
||||
@@ -178,7 +178,7 @@ class ChatAnyscale(ChatOpenAI):
|
||||
tokens_per_message = 3
|
||||
tokens_per_name = 1
|
||||
num_tokens = 0
|
||||
messages_dict = [_convert_message_to_dict(m) for m in messages]
|
||||
messages_dict = [convert_message_to_dict(m) for m in messages]
|
||||
for message in messages_dict:
|
||||
num_tokens += tokens_per_message
|
||||
for key, value in message.items():
|
||||
|
||||
@@ -40,11 +40,20 @@ class AzureChatOpenAI(ChatOpenAI):
|
||||
|
||||
Be aware the API version may change.
|
||||
|
||||
You can also specify the version of the model using ``model_version`` constructor
|
||||
parameter, as Azure OpenAI doesn't return model version with the response.
|
||||
|
||||
Default is empty. When you specify the version, it will be appended to the
|
||||
model name in the response. Setting correct version will help you to calculate the
|
||||
cost properly. Model version is not validated, so make sure you set it correctly
|
||||
to get the correct cost.
|
||||
|
||||
Any parameters that are valid to be passed to the openai.create call can be passed
|
||||
in, even if not explicitly saved on this class.
|
||||
"""
|
||||
|
||||
deployment_name: str = ""
|
||||
model_version: str = ""
|
||||
openai_api_type: str = ""
|
||||
openai_api_base: str = ""
|
||||
openai_api_version: str = ""
|
||||
@@ -137,7 +146,19 @@ class AzureChatOpenAI(ChatOpenAI):
|
||||
for res in response["choices"]:
|
||||
if res.get("finish_reason", None) == "content_filter":
|
||||
raise ValueError(
|
||||
"Azure has not provided the response due to a content"
|
||||
" filter being triggered"
|
||||
"Azure has not provided the response due to a content filter "
|
||||
"being triggered"
|
||||
)
|
||||
return super()._create_chat_result(response)
|
||||
chat_result = super()._create_chat_result(response)
|
||||
|
||||
if "model" in response:
|
||||
model = response["model"]
|
||||
if self.model_version:
|
||||
model = f"{model}-{self.model_version}"
|
||||
|
||||
if chat_result.llm_output is not None and isinstance(
|
||||
chat_result.llm_output, dict
|
||||
):
|
||||
chat_result.llm_output["model_name"] = model
|
||||
|
||||
return chat_result
|
||||
|
||||
@@ -51,6 +51,8 @@ def _get_verbosity() -> bool:
|
||||
|
||||
|
||||
class BaseChatModel(BaseLanguageModel[BaseMessageChunk], ABC):
|
||||
"""Base class for chat models."""
|
||||
|
||||
cache: Optional[bool] = None
|
||||
"""Whether to cache the response."""
|
||||
verbose: bool = Field(default_factory=_get_verbosity)
|
||||
@@ -103,12 +105,18 @@ class BaseChatModel(BaseLanguageModel[BaseMessageChunk], ABC):
|
||||
stop: Optional[List[str]] = None,
|
||||
**kwargs: Any,
|
||||
) -> BaseMessageChunk:
|
||||
config = config or {}
|
||||
return cast(
|
||||
BaseMessageChunk,
|
||||
cast(
|
||||
ChatGeneration,
|
||||
self.generate_prompt(
|
||||
[self._convert_input(input)], stop=stop, **(config or {}), **kwargs
|
||||
[self._convert_input(input)],
|
||||
stop=stop,
|
||||
callbacks=config.get("callbacks"),
|
||||
tags=config.get("tags"),
|
||||
metadata=config.get("metadata"),
|
||||
**kwargs,
|
||||
).generations[0][0],
|
||||
).message,
|
||||
)
|
||||
@@ -127,8 +135,14 @@ class BaseChatModel(BaseLanguageModel[BaseMessageChunk], ABC):
|
||||
None, partial(self.invoke, input, config, stop=stop, **kwargs)
|
||||
)
|
||||
|
||||
config = config or {}
|
||||
llm_result = await self.agenerate_prompt(
|
||||
[self._convert_input(input)], stop=stop, **(config or {}), **kwargs
|
||||
[self._convert_input(input)],
|
||||
stop=stop,
|
||||
callbacks=config.get("callbacks"),
|
||||
tags=config.get("tags"),
|
||||
metadata=config.get("metadata"),
|
||||
**kwargs,
|
||||
)
|
||||
return cast(
|
||||
BaseMessageChunk, cast(ChatGeneration, llm_result.generations[0][0]).message
|
||||
|
||||