Compare commits
181 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5935767056 | ||
|
|
b5a57acf6c | ||
|
|
49f1d8477c | ||
|
|
f11e5442d6 | ||
|
|
72f9150a50 | ||
|
|
c172f972ea | ||
|
|
8189dea0d8 | ||
|
|
d95eeaedbe | ||
|
|
621da3c164 | ||
|
|
0fa69d8988 | ||
|
|
9b24f0b067 | ||
|
|
8d69dacdf3 | ||
|
|
cdfe2c96c5 | ||
|
|
19f504790e | ||
|
|
1b58460fe3 | ||
|
|
aca8cb5fba | ||
|
|
7edf4ca396 | ||
|
|
8aab39e3ce | ||
|
|
1d3735a84c | ||
|
|
45741bcc1b | ||
|
|
9b64932e55 | ||
|
|
eaa505fb09 | ||
|
|
e21152358a | ||
|
|
edb585228d | ||
|
|
0aabded97f | ||
|
|
00bf472265 | ||
|
|
bace17e0aa | ||
|
|
44bc89b7bf | ||
|
|
e4418d1b7e | ||
|
|
6cb763507c | ||
|
|
6d03f8b5d8 | ||
|
|
16af5f8690 | ||
|
|
8cb2594562 | ||
|
|
926c64da60 | ||
|
|
31cfc00845 | ||
|
|
f7ae183f40 | ||
|
|
0e5d09d0da | ||
|
|
9249d305af | ||
|
|
01ef786e7e | ||
|
|
3b754b5461 | ||
|
|
a429145420 | ||
|
|
7f0e847c13 | ||
|
|
991b448dfc | ||
|
|
3ab4e21579 | ||
|
|
2184e3a400 | ||
|
|
c0acbdca1b | ||
|
|
a2588d6c57 | ||
|
|
b80e3825a6 | ||
|
|
6abb2c2c08 | ||
|
|
57dd4daa9a | ||
|
|
5fc07fa524 | ||
|
|
02430e25b6 | ||
|
|
ee52482db8 | ||
|
|
bb6fbf4c71 | ||
|
|
45f0f9460a | ||
|
|
105c787e5a | ||
|
|
6221eb5974 | ||
|
|
cb5fb751e9 | ||
|
|
16bd328aab | ||
|
|
8eea46ed0e | ||
|
|
67ca187560 | ||
|
|
46f3428cb3 | ||
|
|
e3fb11bc10 | ||
|
|
1edead28b8 | ||
|
|
a5a4c53280 | ||
|
|
80b98812e1 | ||
|
|
fcbbddedae | ||
|
|
e94a5d753f | ||
|
|
b7bc8ec87f | ||
|
|
6c70f491ba | ||
|
|
f3f5853e9f | ||
|
|
2431eca700 | ||
|
|
641cb80c9d | ||
|
|
08a0741d82 | ||
|
|
8d351bfc20 | ||
|
|
3bdc273ab3 | ||
|
|
206f809366 | ||
|
|
8a320e55a0 | ||
|
|
e5db8a16c0 | ||
|
|
e162fd418a | ||
|
|
abb1264edf | ||
|
|
5e05ba2140 | ||
|
|
6e14f9548b | ||
|
|
2380492c8e | ||
|
|
d21333d710 | ||
|
|
dfb93dd2b5 | ||
|
|
2c7297d243 | ||
|
|
434a96415b | ||
|
|
c7a489ae0d | ||
|
|
618cf5241e | ||
|
|
f4a47ec717 | ||
|
|
3b51817706 | ||
|
|
bbbd2b076f | ||
|
|
d248481f13 | ||
|
|
efa02ed768 | ||
|
|
5454591b0a | ||
|
|
c72da53c10 | ||
|
|
8dd071ad08 | ||
|
|
96d064e305 | ||
|
|
c2f46b2cdb | ||
|
|
808248049d | ||
|
|
a6e6e9bb86 | ||
|
|
90579021f8 | ||
|
|
539672a7fd | ||
|
|
269f85b7b7 | ||
|
|
3adb1e12ca | ||
|
|
b8df15cd64 | ||
|
|
4d72288487 | ||
|
|
3c6eccd701 | ||
|
|
7de6a1b78e | ||
|
|
a2681f950d | ||
|
|
3f64b8a761 | ||
|
|
0a1be1d501 | ||
|
|
e3056340da | ||
|
|
99b5a7226c | ||
|
|
95cf7de112 | ||
|
|
8f0cd91d57 | ||
|
|
15f650ae8c | ||
|
|
7543a3d70e | ||
|
|
ab193338aa | ||
|
|
bb12184551 | ||
|
|
33a2f58fbf | ||
|
|
fad26e79a3 | ||
|
|
b2eb4ff0fc | ||
|
|
2d078c7767 | ||
|
|
a7824f16f2 | ||
|
|
642b57c7ff | ||
|
|
4a07fba9f0 | ||
|
|
c5c0735fc4 | ||
|
|
6327eecdaf | ||
|
|
beab637f04 | ||
|
|
4a63533216 | ||
|
|
bf4a112aa6 | ||
|
|
d1e305028f | ||
|
|
6b9f266837 | ||
|
|
6116cbf0de | ||
|
|
67718c1d6b | ||
|
|
61c2d918c6 | ||
|
|
52d6b91c18 | ||
|
|
e74a605379 | ||
|
|
022ef170f8 | ||
|
|
fa30a57034 | ||
|
|
1f9124ceaa | ||
|
|
b52a3785c9 | ||
|
|
ff44fe4e16 | ||
|
|
d56eff042a | ||
|
|
ce3666c28b | ||
|
|
cff52638b2 | ||
|
|
bbd22b9b76 | ||
|
|
33cdb06b5c | ||
|
|
cc908d49a3 | ||
|
|
7fc07ba5df | ||
|
|
fe78aff1f2 | ||
|
|
40079d4936 | ||
|
|
84c1ad7eaa | ||
|
|
9892e95d03 | ||
|
|
f616aee35a | ||
|
|
ab47557db3 | ||
|
|
40096c73cd | ||
|
|
fbc83dfdbb | ||
|
|
91be7eee66 | ||
|
|
fc2f450f2d | ||
|
|
aeaef8f3a3 | ||
|
|
472f00ada7 | ||
|
|
6e3fa59073 | ||
|
|
a616e19975 | ||
|
|
100d9ce4c7 | ||
|
|
c9da300e4d | ||
|
|
5a9765b1b5 | ||
|
|
454998c1fb | ||
|
|
0adc282d70 | ||
|
|
bd4865b6fe | ||
|
|
485d716c21 | ||
|
|
b57fa1a39c | ||
|
|
6b93670410 | ||
|
|
2bb1d256f3 | ||
|
|
4a7ebb7184 | ||
|
|
797c9e92c8 | ||
|
|
5f1aab5487 | ||
|
|
983678dedc | ||
|
|
f76d50d8dc |
22
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -1,28 +1,20 @@
|
||||
<!-- Thank you for contributing to LangChain!
|
||||
|
||||
Replace this comment with:
|
||||
Replace this entire comment with:
|
||||
- Description: a description of the change,
|
||||
- Issue: the issue # it fixes (if applicable),
|
||||
- Dependencies: any dependencies required for this change,
|
||||
- Tag maintainer: for a quicker response, tag the relevant maintainer (see below),
|
||||
- Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out!
|
||||
|
||||
Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
|
||||
Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
|
||||
|
||||
See contribution guidelines for more information on how to write/run tests, lint, etc:
|
||||
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
|
||||
|
||||
If you're adding a new integration, please include:
|
||||
1. a test for the integration, preferably unit tests that do not rely on network access,
|
||||
2. an example notebook showing its use.
|
||||
2. an example notebook showing its use. These live is docs/extras directory.
|
||||
|
||||
Maintainer responsibilities:
|
||||
- General / Misc / if you don't know who to tag: @baskaryan
|
||||
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
|
||||
- Models / Prompts: @hwchase17, @baskaryan
|
||||
- Memory: @hwchase17
|
||||
- Agents / Tools / Toolkits: @hinthornw
|
||||
- Tracing / Callbacks: @agola11
|
||||
- Async: @agola11
|
||||
|
||||
If no one reviews your PR within a few days, feel free to @-mention the same people again.
|
||||
|
||||
See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
|
||||
If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin.
|
||||
-->
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
name: libs/langchain-experimental CI
|
||||
name: libs/experimental CI
|
||||
|
||||
on:
|
||||
push:
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
name: libs/langchain-experimental Release
|
||||
name: libs/experimental Release
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
|
||||
42
.github/workflows/scheduled_test.yml
vendored
Normal file
@@ -0,0 +1,42 @@
|
||||
name: Scheduled tests
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
schedule:
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.4.2"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
defaults:
|
||||
run:
|
||||
working-directory: libs/langchain
|
||||
runs-on: ubuntu-latest
|
||||
environment: Scheduled testing
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: "1.4.2"
|
||||
working-directory: libs/langchain
|
||||
install-command: |
|
||||
echo "Running scheduled tests, installing dependencies with poetry..."
|
||||
poetry install --with=test_integration
|
||||
- name: Run tests
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
run: |
|
||||
make scheduled_tests
|
||||
shell: bash
|
||||
@@ -21,7 +21,7 @@ Looking for the JS/TS version? Check out [LangChain.js](https://github.com/hwcha
|
||||
**Production Support:** As you move your LangChains into production, we'd love to offer more hands-on support.
|
||||
Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) to share more about what you're building, and our team will get in touch.
|
||||
|
||||
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28
|
||||
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
|
||||
|
||||
In an effort to make `langchain` leaner and safer, we are moving select chains to `langchain_experimental`.
|
||||
This migration has already started, but we are remaining backwards compatible until 7/28.
|
||||
|
||||
@@ -145,30 +145,36 @@ def _load_package_modules(
|
||||
package_name = package_path.name
|
||||
|
||||
for file_path in package_path.rglob("*.py"):
|
||||
if not file_path.name.startswith("__"):
|
||||
relative_module_name = file_path.relative_to(package_path)
|
||||
# Get the full namespace of the module
|
||||
namespace = str(relative_module_name).replace(".py", "").replace("/", ".")
|
||||
# Keep only the top level namespace
|
||||
top_namespace = namespace.split(".")[0]
|
||||
if file_path.name.startswith("_"):
|
||||
continue
|
||||
|
||||
try:
|
||||
module_members = _load_module_members(
|
||||
f"{package_name}.{namespace}", namespace
|
||||
relative_module_name = file_path.relative_to(package_path)
|
||||
|
||||
if relative_module_name.name.startswith("_"):
|
||||
continue
|
||||
|
||||
# Get the full namespace of the module
|
||||
namespace = str(relative_module_name).replace(".py", "").replace("/", ".")
|
||||
# Keep only the top level namespace
|
||||
top_namespace = namespace.split(".")[0]
|
||||
|
||||
try:
|
||||
module_members = _load_module_members(
|
||||
f"{package_name}.{namespace}", namespace
|
||||
)
|
||||
# Merge module members if the namespace already exists
|
||||
if top_namespace in modules_by_namespace:
|
||||
existing_module_members = modules_by_namespace[top_namespace]
|
||||
_module_members = _merge_module_members(
|
||||
[existing_module_members, module_members]
|
||||
)
|
||||
# Merge module members if the namespace already exists
|
||||
if top_namespace in modules_by_namespace:
|
||||
existing_module_members = modules_by_namespace[top_namespace]
|
||||
_module_members = _merge_module_members(
|
||||
[existing_module_members, module_members]
|
||||
)
|
||||
else:
|
||||
_module_members = module_members
|
||||
else:
|
||||
_module_members = module_members
|
||||
|
||||
modules_by_namespace[top_namespace] = _module_members
|
||||
modules_by_namespace[top_namespace] = _module_members
|
||||
|
||||
except ImportError as e:
|
||||
print(f"Error: Unable to import module '{namespace}' with error: {e}")
|
||||
except ImportError as e:
|
||||
print(f"Error: Unable to import module '{namespace}' with error: {e}")
|
||||
|
||||
return modules_by_namespace
|
||||
|
||||
@@ -222,9 +228,9 @@ Classes
|
||||
"""
|
||||
|
||||
for class_ in classes:
|
||||
if not class_['is_public']:
|
||||
if not class_["is_public"]:
|
||||
continue
|
||||
|
||||
|
||||
if class_["kind"] == "TypedDict":
|
||||
template = "typeddict.rst"
|
||||
elif class_["kind"] == "enum":
|
||||
|
||||
54
docs/docs_skeleton/docs/community.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Community Navigator
|
||||
|
||||
Hi! Thanks for being here. We’re lucky to have a community of so many passionate developers building with LangChain–we have so much to teach and learn from each other. Community members contribute code, host meetups, write blog posts, amplify each other’s work, become each other's customers and collaborators, and so much more.
|
||||
|
||||
Whether you’re new to LangChain, looking to go deeper, or just want to get more exposure to the world of building with LLMs, this page can point you in the right direction.
|
||||
|
||||
- **🦜 Contribute to LangChain**
|
||||
|
||||
- **🌍 Meetups, Events, and Hackathons**
|
||||
|
||||
- **📣 Help Us Amplify Your Work**
|
||||
|
||||
- **💬 Stay in the loop**
|
||||
|
||||
|
||||
# 🦜 Contribute to LangChain
|
||||
|
||||
LangChain is the product of over 5,000+ contributions by 1,500+ contributors, and there is ******still****** so much to do together. Here are some ways to get involved:
|
||||
|
||||
- **[Open a pull request](https://github.com/langchain-ai/langchain/issues):** we’d appreciate all forms of contributions–new features, infrastructure improvements, better documentation, bug fixes, etc. If you have an improvement or an idea, we’d love to work on it with you.
|
||||
- **[Read our contributor guidelines:](https://github.com/langchain-ai/langchain/blob/bbd22b9b761389a5e40fc45b0570e1830aabb707/.github/CONTRIBUTING.md)** We ask contributors to follow a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow, run a few local checks for formatting, linting, and testing before submitting, and follow certain documentation and testing conventions.
|
||||
- **First time contributor?** [Try one of these PRs with the “good first issue” tag](https://github.com/langchain-ai/langchain/contribute).
|
||||
- **Become an expert:** our experts help the community by answering product questions in Discord. If that’s a role you’d like to play, we’d be so grateful! (And we have some special experts-only goodies/perks we can tell you more about). Send us an email to introduce yourself at hello@langchain.dev and we’ll take it from there!
|
||||
- **Integrate with LangChain:** if your product integrates with LangChain–or aspires to–we want to help make sure the experience is as smooth as possible for you and end users. Send us an email at hello@langchain.dev and tell us what you’re working on.
|
||||
- **Become an Integration Maintainer:** Partner with our team to ensure your integration stays up-to-date and talk directly with users (and answer their inquiries) in our Discord. Introduce yourself at hello@langchain.dev if you’d like to explore this role.
|
||||
|
||||
|
||||
# 🌍 Meetups, Events, and Hackathons
|
||||
|
||||
One of our favorite things about working in AI is how much enthusiasm there is for building together. We want to help make that as easy and impactful for you as possible!
|
||||
- **Find a meetup, hackathon, or webinar:** you can find the one for you on on our [global events calendar](https://mirror-feeling-d80.notion.site/0bc81da76a184297b86ca8fc782ee9a3?v=0d80342540df465396546976a50cfb3f).
|
||||
- **Submit an event to our calendar:** email us at events@langchain.dev with a link to your event page! We can also help you spread the word with our local communities.
|
||||
- **Host a meetup:** If you want to bring a group of builders together, we want to help! We can publicize your event on our event calendar/Twitter, share with our local communities in Discord, send swag, or potentially hook you up with a sponsor. Email us at events@langchain.dev to tell us about your event!
|
||||
- **Become a meetup sponsor:** we often hear from groups of builders that want to get together, but are blocked or limited on some dimension (space to host, budget for snacks, prizes to distribute, etc.). If you’d like to help, send us an email to events@langchain.dev we can share more about how it works!
|
||||
- **Speak at an event:** meetup hosts are always looking for great speakers, presenters, and panelists. If you’d like to do that at an event, send us an email to hello@langchain.dev with more information about yourself, what you want to talk about, and what city you’re based in and we’ll try to match you with an upcoming event!
|
||||
- **Tell us about your LLM community:** If you host or participate in a community that would welcome support from LangChain and/or our team, send us an email at hello@langchain.dev and let us know how we can help.
|
||||
|
||||
# 📣 Help Us Amplify Your Work
|
||||
|
||||
If you’re working on something you’re proud of, and think the LangChain community would benefit from knowing about it, we want to help you show it off.
|
||||
|
||||
- **Post about your work and mention us:** we love hanging out on Twitter to see what people in the space are talking about and working on. If you tag [@langchainai](https://twitter.com/LangChainAI), we’ll almost certainly see it and can show you some love.
|
||||
- **Publish something on our blog:** if you’re writing about your experience building with LangChain, we’d love to post (or crosspost) it on our blog! E-mail hello@langchain.dev with a draft of your post! Or even an idea for something you want to write about.
|
||||
- **Get your product onto our [integrations hub](https://integrations.langchain.com/):** Many developers take advantage of our seamless integrations with other products, and come to our integrations hub to find out who those are. If you want to get your product up there, tell us about it (and how it works with LangChain) at hello@langchain.dev.
|
||||
|
||||
# ☀️ Stay in the loop
|
||||
|
||||
Here’s where our team hangs out, talks shop, spotlights cool work, and shares what we’re up to. We’d love to see you there too.
|
||||
|
||||
- **[Twitter](https://twitter.com/LangChainAI):** we post about what we’re working on and what cool things we’re seeing in the space. If you tag @langchainai in your post, we’ll almost certainly see it, and can snow you some love!
|
||||
- **[Discord](https://discord.gg/6adMQxSpJS):** connect with with >30k developers who are building with LangChain
|
||||
- **[GitHub](https://github.com/langchain-ai/langchain):** open pull requests, contribute to a discussion, and/or contribute
|
||||
- **[Subscribe to our bi-weekly Release Notes](https://6w1pwbss0py.typeform.com/to/KjZB1auB):** a twice/month email roundup of the coolest things going on in our orbit
|
||||
- **Slack:** if you’re building an application in production at your company, we’d love to get into a Slack channel together. Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) and we’ll get in touch about setting one up.
|
||||
@@ -8,9 +8,9 @@ import DocCardList from "@theme/DocCardList";
|
||||
|
||||
Building applications with language models involves many moving parts. One of the most critical components is ensuring that the outcomes produced by your models are reliable and useful across a broad array of inputs, and that they work well with your application's other software components. Ensuring reliability usually boils down to some combination of application design, testing & evaluation, and runtime checks.
|
||||
|
||||
The guides in this section review the APIs and functionality LangChain provides to help yous better evaluate your applications. Evaluation and testing are both critical when thinking about deploying LLM applications, since production environments require repeatable and useful outcomes.
|
||||
The guides in this section review the APIs and functionality LangChain provides to help you better evaluate your applications. Evaluation and testing are both critical when thinking about deploying LLM applications, since production environments require repeatable and useful outcomes.
|
||||
|
||||
LangChain offers various types of evaluators to help you measure performance and integrity on diverse data, and we hope to encourage the the community to create and share other useful evaluators so everyone can improve. These docs will introduce the evaluator types, how to use them, and provide some examples of their use in real-world scenarios.
|
||||
LangChain offers various types of evaluators to help you measure performance and integrity on diverse data, and we hope to encourage the community to create and share other useful evaluators so everyone can improve. These docs will introduce the evaluator types, how to use them, and provide some examples of their use in real-world scenarios.
|
||||
|
||||
Each evaluator type in LangChain comes with ready-to-use implementations and an extensible API that allows for customization according to your unique requirements. Here are some of the types of evaluators we offer:
|
||||
|
||||
|
||||
@@ -5,8 +5,8 @@ import DocCardList from "@theme/DocCardList";
|
||||
LangSmith helps you trace and evaluate your language model applications and intelligent agents to help you
|
||||
move from prototype to production.
|
||||
|
||||
Check out the [interactive walkthrough](walkthrough) below to get started.
|
||||
Check out the [interactive walkthrough](/docs/guides/langsmith/walkthrough) below to get started.
|
||||
|
||||
For more information, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)
|
||||
|
||||
<DocCardList />
|
||||
<DocCardList />
|
||||
|
||||
@@ -12,7 +12,7 @@ Here are the agents available in LangChain.
|
||||
|
||||
### [Zero-shot ReAct](/docs/modules/agents/agent_types/react.html)
|
||||
|
||||
This agent uses the [ReAct](https://arxiv.org/pdf/2205.00445.pdf) framework to determine which tool to use
|
||||
This agent uses the [ReAct](https://arxiv.org/pdf/2210.03629) framework to determine which tool to use
|
||||
based solely on the tool's description. Any number of tools can be provided.
|
||||
This agent requires that a description is provided for each tool.
|
||||
|
||||
|
||||
@@ -18,5 +18,3 @@ Let chains choose which tools to use given high-level directives
|
||||
Persist application state between runs of a chain
|
||||
#### [Callbacks](/docs/modules/callbacks/)
|
||||
Log and stream intermediate steps of any chain
|
||||
#### [Evaluation](/docs/modules/evaluation/)
|
||||
Evaluate the performance of a chain.
|
||||
@@ -1,9 +0,0 @@
|
||||
---
|
||||
sidebar_position: 0
|
||||
---
|
||||
# API chains
|
||||
APIChain enables using LLMs to interact with APIs to retrieve relevant information. Construct the chain by providing a question relevant to the provided API documentation.
|
||||
|
||||
import Example from "@snippets/modules/chains/popular/api.mdx"
|
||||
|
||||
<Example/>
|
||||
9
docs/docs_skeleton/docs/use_cases/web_scraping/index.mdx
Normal file
@@ -0,0 +1,9 @@
|
||||
---
|
||||
sidebar_position: 3
|
||||
---
|
||||
|
||||
# Web Scraping
|
||||
|
||||
Web scraping has historically been a challenging endeavor due to the ever-changing nature of website structures, making it tedious for developers to maintain their scraping scripts. Traditional methods often rely on specific HTML tags and patterns which, when altered, can disrupt data extraction processes.
|
||||
|
||||
Enter the LLM-based method for parsing HTML: By leveraging the capabilities of LLMs, and especially OpenAI Functions in LangChain's extraction chain, developers can instruct the model to extract only the desired data in a specified format. This method not only streamlines the extraction process but also significantly reduces the time spent on manual debugging and script modifications. Its adaptability means that even if websites undergo significant design changes, the extraction remains consistent and robust. This level of resilience translates to reduced maintenance efforts, cost savings, and ensures a higher quality of extracted data. Compared to its predecessors, LLM-based approach wins out the web scraping domain by transforming a historically cumbersome task into a more automated and efficient process.
|
||||
@@ -128,6 +128,10 @@ const config = {
|
||||
hideable: true,
|
||||
},
|
||||
},
|
||||
colorMode: {
|
||||
disableSwitch: false,
|
||||
respectPrefersColorScheme: true,
|
||||
},
|
||||
prism: {
|
||||
theme: {
|
||||
...baseLightCodeBlockTheme,
|
||||
|
||||
71
docs/docs_skeleton/package-lock.json
generated
@@ -12,7 +12,7 @@
|
||||
"@docusaurus/preset-classic": "2.4.0",
|
||||
"@docusaurus/remark-plugin-npm2yarn": "^2.4.0",
|
||||
"@mdx-js/react": "^1.6.22",
|
||||
"@mendable/search": "^0.0.125",
|
||||
"@mendable/search": "^0.0.150",
|
||||
"clsx": "^1.2.1",
|
||||
"json-loader": "^0.5.7",
|
||||
"process": "^0.11.10",
|
||||
@@ -3212,10 +3212,11 @@
|
||||
}
|
||||
},
|
||||
"node_modules/@mendable/search": {
|
||||
"version": "0.0.125",
|
||||
"resolved": "https://registry.npmjs.org/@mendable/search/-/search-0.0.125.tgz",
|
||||
"integrity": "sha512-Mb1J3zDhOyBZV9cXqJocSOBNYGpe8+LQDqd9n9laPWxosSJcSTUewqtlIbMerrYsScBsxskoSiWgRsc7xF5z0Q==",
|
||||
"version": "0.0.150",
|
||||
"resolved": "https://registry.npmjs.org/@mendable/search/-/search-0.0.150.tgz",
|
||||
"integrity": "sha512-Eb5SeAWlMxzEim/8eJ/Ysn01Pyh39xlPBzRBw/5OyOBhti0HVLXk4wd1Fq2TKgJC2ppQIvhEKO98PUcj9dNDFw==",
|
||||
"dependencies": {
|
||||
"html-react-parser": "^4.2.0",
|
||||
"posthog-js": "^1.45.1"
|
||||
},
|
||||
"peerDependencies": {
|
||||
@@ -8332,6 +8333,33 @@
|
||||
"safe-buffer": "~5.1.0"
|
||||
}
|
||||
},
|
||||
"node_modules/html-dom-parser": {
|
||||
"version": "4.0.0",
|
||||
"resolved": "https://registry.npmjs.org/html-dom-parser/-/html-dom-parser-4.0.0.tgz",
|
||||
"integrity": "sha512-TUa3wIwi80f5NF8CVWzkopBVqVAtlawUzJoLwVLHns0XSJGynss4jiY0mTWpiDOsuyw+afP+ujjMgRh9CoZcXw==",
|
||||
"dependencies": {
|
||||
"domhandler": "5.0.3",
|
||||
"htmlparser2": "9.0.0"
|
||||
}
|
||||
},
|
||||
"node_modules/html-dom-parser/node_modules/htmlparser2": {
|
||||
"version": "9.0.0",
|
||||
"resolved": "https://registry.npmjs.org/htmlparser2/-/htmlparser2-9.0.0.tgz",
|
||||
"integrity": "sha512-uxbSI98wmFT/G4P2zXx4OVx04qWUmyFPrD2/CNepa2Zo3GPNaCaaxElDgwUrwYWkK1nr9fft0Ya8dws8coDLLQ==",
|
||||
"funding": [
|
||||
"https://github.com/fb55/htmlparser2?sponsor=1",
|
||||
{
|
||||
"type": "github",
|
||||
"url": "https://github.com/sponsors/fb55"
|
||||
}
|
||||
],
|
||||
"dependencies": {
|
||||
"domelementtype": "^2.3.0",
|
||||
"domhandler": "^5.0.3",
|
||||
"domutils": "^3.1.0",
|
||||
"entities": "^4.5.0"
|
||||
}
|
||||
},
|
||||
"node_modules/html-entities": {
|
||||
"version": "2.4.0",
|
||||
"resolved": "https://registry.npmjs.org/html-entities/-/html-entities-2.4.0.tgz",
|
||||
@@ -8375,6 +8403,20 @@
|
||||
"node": ">= 12"
|
||||
}
|
||||
},
|
||||
"node_modules/html-react-parser": {
|
||||
"version": "4.2.0",
|
||||
"resolved": "https://registry.npmjs.org/html-react-parser/-/html-react-parser-4.2.0.tgz",
|
||||
"integrity": "sha512-gzU55AS+FI6qD7XaKe5BLuLFM2Xw0/LodfMWZlxV9uOHe7LCD5Lukx/EgYuBI3c0kLu0XlgFXnSzO0qUUn3Vrg==",
|
||||
"dependencies": {
|
||||
"domhandler": "5.0.3",
|
||||
"html-dom-parser": "4.0.0",
|
||||
"react-property": "2.0.0",
|
||||
"style-to-js": "1.1.3"
|
||||
},
|
||||
"peerDependencies": {
|
||||
"react": "0.14 || 15 || 16 || 17 || 18"
|
||||
}
|
||||
},
|
||||
"node_modules/html-tags": {
|
||||
"version": "3.3.1",
|
||||
"resolved": "https://registry.npmjs.org/html-tags/-/html-tags-3.3.1.tgz",
|
||||
@@ -11762,6 +11804,11 @@
|
||||
"webpack": ">=4.41.1 || 5.x"
|
||||
}
|
||||
},
|
||||
"node_modules/react-property": {
|
||||
"version": "2.0.0",
|
||||
"resolved": "https://registry.npmjs.org/react-property/-/react-property-2.0.0.tgz",
|
||||
"integrity": "sha512-kzmNjIgU32mO4mmH5+iUyrqlpFQhF8K2k7eZ4fdLSOPFrD1XgEuSBv9LDEgxRXTMBqMd8ppT0x6TIzqE5pdGdw=="
|
||||
},
|
||||
"node_modules/react-router": {
|
||||
"version": "5.3.4",
|
||||
"resolved": "https://registry.npmjs.org/react-router/-/react-router-5.3.4.tgz",
|
||||
@@ -13127,6 +13174,22 @@
|
||||
"url": "https://github.com/sponsors/sindresorhus"
|
||||
}
|
||||
},
|
||||
"node_modules/style-to-js": {
|
||||
"version": "1.1.3",
|
||||
"resolved": "https://registry.npmjs.org/style-to-js/-/style-to-js-1.1.3.tgz",
|
||||
"integrity": "sha512-zKI5gN/zb7LS/Vm0eUwjmjrXWw8IMtyA8aPBJZdYiQTXj4+wQ3IucOLIOnF7zCHxvW8UhIGh/uZh/t9zEHXNTQ==",
|
||||
"dependencies": {
|
||||
"style-to-object": "0.4.1"
|
||||
}
|
||||
},
|
||||
"node_modules/style-to-js/node_modules/style-to-object": {
|
||||
"version": "0.4.1",
|
||||
"resolved": "https://registry.npmjs.org/style-to-object/-/style-to-object-0.4.1.tgz",
|
||||
"integrity": "sha512-HFpbb5gr2ypci7Qw+IOhnP2zOU7e77b+rzM+wTzXzfi1PrtBCX0E7Pk4wL4iTLnhzZ+JgEGAhX81ebTg/aYjQw==",
|
||||
"dependencies": {
|
||||
"inline-style-parser": "0.1.1"
|
||||
}
|
||||
},
|
||||
"node_modules/style-to-object": {
|
||||
"version": "0.3.0",
|
||||
"resolved": "https://registry.npmjs.org/style-to-object/-/style-to-object-0.3.0.tgz",
|
||||
|
||||
@@ -23,7 +23,7 @@
|
||||
"@docusaurus/preset-classic": "2.4.0",
|
||||
"@docusaurus/remark-plugin-npm2yarn": "^2.4.0",
|
||||
"@mdx-js/react": "^1.6.22",
|
||||
"@mendable/search": "^0.0.125",
|
||||
"@mendable/search": "^0.0.150",
|
||||
"clsx": "^1.2.1",
|
||||
"json-loader": "^0.5.7",
|
||||
"process": "^0.11.10",
|
||||
|
||||
@@ -75,6 +75,7 @@ module.exports = {
|
||||
slug: "additional_resources",
|
||||
},
|
||||
},
|
||||
'community'
|
||||
],
|
||||
integrations: [
|
||||
{
|
||||
|
||||
BIN
docs/docs_skeleton/static/img/SQLDatabaseToolkit.png
Normal file
|
After Width: | Height: | Size: 405 KiB |
BIN
docs/docs_skeleton/static/img/api_chain.png
Normal file
|
After Width: | Height: | Size: 471 KiB |
BIN
docs/docs_skeleton/static/img/api_chain_response.png
Normal file
|
After Width: | Height: | Size: 520 KiB |
BIN
docs/docs_skeleton/static/img/api_function_call.png
Normal file
|
After Width: | Height: | Size: 98 KiB |
BIN
docs/docs_skeleton/static/img/api_use_case.png
Normal file
|
After Width: | Height: | Size: 117 KiB |
BIN
docs/docs_skeleton/static/img/code_retrieval.png
Normal file
|
After Width: | Height: | Size: 307 KiB |
BIN
docs/docs_skeleton/static/img/code_understanding.png
Normal file
|
After Width: | Height: | Size: 193 KiB |
BIN
docs/docs_skeleton/static/img/create_sql_query_chain.png
Normal file
|
After Width: | Height: | Size: 190 KiB |
BIN
docs/docs_skeleton/static/img/sql_usecase.png
Normal file
|
After Width: | Height: | Size: 119 KiB |
BIN
docs/docs_skeleton/static/img/sqldbchain_trace.png
Normal file
|
After Width: | Height: | Size: 266 KiB |
BIN
docs/docs_skeleton/static/img/tagging.png
Normal file
|
After Width: | Height: | Size: 111 KiB |
BIN
docs/docs_skeleton/static/img/tagging_trace.png
Normal file
|
After Width: | Height: | Size: 130 KiB |
BIN
docs/docs_skeleton/static/img/web_research.png
Normal file
|
After Width: | Height: | Size: 152 KiB |
BIN
docs/docs_skeleton/static/img/web_scraping.png
Normal file
|
After Width: | Height: | Size: 172 KiB |
BIN
docs/docs_skeleton/static/img/wsj_page.png
Normal file
|
After Width: | Height: | Size: 716 KiB |
@@ -556,6 +556,14 @@
|
||||
"source": "/docs/integrations/llamacpp",
|
||||
"destination": "/docs/integrations/providers/llamacpp"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/integrations/log10.html",
|
||||
"destination": "/docs/integrations/providers/log10"
|
||||
},
|
||||
{
|
||||
"source": "/docs/integrations/log10",
|
||||
"destination": "/docs/integrations/providers/log10"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/integrations/mediawikidump.html",
|
||||
"destination": "/docs/integrations/providers/mediawikidump"
|
||||
|
||||
323
docs/extras/guides/adapters/openai.ipynb
Normal file
@@ -0,0 +1,323 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "700a516b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAI Adapter\n",
|
||||
"\n",
|
||||
"A lot of people get started with OpenAI but want to explore other models. LangChain's integrations with many model providers make this easy to do so. While LangChain has it's own message and model APIs, we've also made it as easy as possible to explore other models by exposing an adapter to adapt LangChain models to the OpenAI api.\n",
|
||||
"\n",
|
||||
"At the moment this only deals with output and does not return other information (token counts, stop reasons, etc)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "6017f26a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import openai\n",
|
||||
"from langchain.adapters import openai as lc_openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b522ceda",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## ChatCompletion.create"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"id": "1d22eb61",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"messages = [{\"role\": \"user\", \"content\": \"hi\"}]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d550d3ad",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Original OpenAI call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "e1d27dfa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"result = openai.ChatCompletion.create(\n",
|
||||
" messages=messages, \n",
|
||||
" model=\"gpt-3.5-turbo\", \n",
|
||||
" temperature=0\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "012d81ae",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'role': 'assistant', 'content': 'Hello! How can I assist you today?'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result[\"choices\"][0]['message'].to_dict_recursive()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "db5b5500",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"LangChain OpenAI wrapper call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "87c2d515",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"lc_result = lc_openai.ChatCompletion.create(\n",
|
||||
" messages=messages, \n",
|
||||
" model=\"gpt-3.5-turbo\", \n",
|
||||
" temperature=0\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "c67a5ac8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'role': 'assistant', 'content': 'Hello! How can I assist you today?'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"lc_result[\"choices\"][0]['message']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "034ba845",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Swapping out model providers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "7a2c011c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"lc_result = lc_openai.ChatCompletion.create(\n",
|
||||
" messages=messages, \n",
|
||||
" model=\"claude-2\", \n",
|
||||
" temperature=0, \n",
|
||||
" provider=\"ChatAnthropic\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "f7c94827",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'role': 'assistant', 'content': ' Hello!'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"lc_result[\"choices\"][0]['message']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cb3f181d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## ChatCompletion.stream"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f7b8cd18",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Original OpenAI call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"id": "fd8cb1ea",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'role': 'assistant', 'content': ''}\n",
|
||||
"{'content': 'Hello'}\n",
|
||||
"{'content': '!'}\n",
|
||||
"{'content': ' How'}\n",
|
||||
"{'content': ' can'}\n",
|
||||
"{'content': ' I'}\n",
|
||||
"{'content': ' assist'}\n",
|
||||
"{'content': ' you'}\n",
|
||||
"{'content': ' today'}\n",
|
||||
"{'content': '?'}\n",
|
||||
"{}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for c in openai.ChatCompletion.create(\n",
|
||||
" messages = messages,\n",
|
||||
" model=\"gpt-3.5-turbo\", \n",
|
||||
" temperature=0,\n",
|
||||
" stream=True\n",
|
||||
"):\n",
|
||||
" print(c[\"choices\"][0]['delta'].to_dict_recursive())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0b2a076b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"LangChain OpenAI wrapper call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"id": "9521218c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'role': 'assistant', 'content': ''}\n",
|
||||
"{'content': 'Hello'}\n",
|
||||
"{'content': '!'}\n",
|
||||
"{'content': ' How'}\n",
|
||||
"{'content': ' can'}\n",
|
||||
"{'content': ' I'}\n",
|
||||
"{'content': ' assist'}\n",
|
||||
"{'content': ' you'}\n",
|
||||
"{'content': ' today'}\n",
|
||||
"{'content': '?'}\n",
|
||||
"{}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for c in lc_openai.ChatCompletion.create(\n",
|
||||
" messages = messages,\n",
|
||||
" model=\"gpt-3.5-turbo\", \n",
|
||||
" temperature=0,\n",
|
||||
" stream=True\n",
|
||||
"):\n",
|
||||
" print(c[\"choices\"][0]['delta'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0fc39750",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Swapping out model providers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"id": "68f0214e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'role': 'assistant', 'content': ' Hello'}\n",
|
||||
"{'content': '!'}\n",
|
||||
"{}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for c in lc_openai.ChatCompletion.create(\n",
|
||||
" messages = messages,\n",
|
||||
" model=\"claude-2\", \n",
|
||||
" temperature=0,\n",
|
||||
" stream=True,\n",
|
||||
" provider=\"ChatAnthropic\",\n",
|
||||
"):\n",
|
||||
" print(c[\"choices\"][0]['delta'])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -22,7 +22,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 1,
|
||||
"id": "466b65b3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -171,9 +171,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "decf7710",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
@@ -202,7 +200,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 10,
|
||||
"id": "f799664d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -347,7 +345,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 12,
|
||||
"id": "5d3d8ffe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -368,7 +366,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 2,
|
||||
"id": "33be32af",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -380,7 +378,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 3,
|
||||
"id": "df3f3fa2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -424,9 +422,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "f3040b0c",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
@@ -477,9 +473,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "7ee8b2d4",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
@@ -515,7 +509,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 66,
|
||||
"execution_count": 4,
|
||||
"id": "3f30c348",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -526,7 +520,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"execution_count": 5,
|
||||
"id": "64ab1dbf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -544,7 +538,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"execution_count": 6,
|
||||
"id": "7d628c97",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -559,7 +553,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 68,
|
||||
"execution_count": 7,
|
||||
"id": "f60a5d0f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -572,7 +566,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 69,
|
||||
"execution_count": 8,
|
||||
"id": "7d007db6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -589,25 +583,29 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 70,
|
||||
"execution_count": 16,
|
||||
"id": "5c32cc89",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"conversational_qa_chain = RunnableMap({\n",
|
||||
" \"standalone_question\": {\n",
|
||||
" \"question\": lambda x: x[\"question\"],\n",
|
||||
" \"chat_history\": lambda x: _format_chat_history(x['chat_history'])\n",
|
||||
" } | CONDENSE_QUESTION_PROMPT | ChatOpenAI(temperature=0) | StrOutputParser(),\n",
|
||||
"}) | {\n",
|
||||
"_inputs = RunnableMap(\n",
|
||||
" {\n",
|
||||
" \"standalone_question\": {\n",
|
||||
" \"question\": lambda x: x[\"question\"],\n",
|
||||
" \"chat_history\": lambda x: _format_chat_history(x['chat_history'])\n",
|
||||
" } | CONDENSE_QUESTION_PROMPT | ChatOpenAI(temperature=0) | StrOutputParser(),\n",
|
||||
" }\n",
|
||||
")\n",
|
||||
"_context = {\n",
|
||||
" \"context\": itemgetter(\"standalone_question\") | retriever | _combine_documents,\n",
|
||||
" \"question\": lambda x: x[\"standalone_question\"]\n",
|
||||
"} | ANSWER_PROMPT | ChatOpenAI()"
|
||||
"}\n",
|
||||
"conversational_qa_chain = _inputs | _context | ANSWER_PROMPT | ChatOpenAI()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 71,
|
||||
"execution_count": 17,
|
||||
"id": "135c8205",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -624,7 +622,7 @@
|
||||
"AIMessage(content='Harrison was employed at Kensho.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 71,
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -638,7 +636,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 62,
|
||||
"execution_count": 15,
|
||||
"id": "424e7e7a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -655,7 +653,7 @@
|
||||
"AIMessage(content='Harrison worked at Kensho.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 62,
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -667,6 +665,149 @@
|
||||
"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c5543183",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### With Memory and returning source documents\n",
|
||||
"\n",
|
||||
"This shows how to use memory with the above. For memory, we need to manage that outside at the memory. For returning the retrieved documents, we just need to pass them through all the way."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "e31dd17c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory import ConversationBufferMemory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 44,
|
||||
"id": "d4bffe94",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"memory = ConversationBufferMemory(return_messages=True, output_key=\"answer\", input_key=\"question\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 45,
|
||||
"id": "733be985",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# First we add a step to load memory\n",
|
||||
"# This needs to be a RunnableMap because its the first input\n",
|
||||
"loaded_memory = RunnableMap(\n",
|
||||
" {\n",
|
||||
" \"question\": itemgetter(\"question\"),\n",
|
||||
" \"memory\": memory.load_memory_variables,\n",
|
||||
" }\n",
|
||||
")\n",
|
||||
"# Next we add a step to expand memory into the variables\n",
|
||||
"expanded_memory = {\n",
|
||||
" \"question\": itemgetter(\"question\"),\n",
|
||||
" \"chat_history\": lambda x: x[\"memory\"][\"history\"]\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# Now we calculate the standalone question\n",
|
||||
"standalone_question = {\n",
|
||||
" \"standalone_question\": {\n",
|
||||
" \"question\": lambda x: x[\"question\"],\n",
|
||||
" \"chat_history\": lambda x: _format_chat_history(x['chat_history'])\n",
|
||||
" } | CONDENSE_QUESTION_PROMPT | ChatOpenAI(temperature=0) | StrOutputParser(),\n",
|
||||
"}\n",
|
||||
"# Now we retrieve the documents\n",
|
||||
"retrieved_documents = {\n",
|
||||
" \"docs\": itemgetter(\"standalone_question\") | retriever,\n",
|
||||
" \"question\": lambda x: x[\"standalone_question\"]\n",
|
||||
"}\n",
|
||||
"# Now we construct the inputs for the final prompt\n",
|
||||
"final_inputs = {\n",
|
||||
" \"context\": lambda x: _combine_documents(x[\"docs\"]),\n",
|
||||
" \"question\": itemgetter(\"question\")\n",
|
||||
"}\n",
|
||||
"# And finally, we do the part that returns the answers\n",
|
||||
"answer = {\n",
|
||||
" \"answer\": final_inputs | ANSWER_PROMPT | ChatOpenAI(),\n",
|
||||
" \"docs\": itemgetter(\"docs\"),\n",
|
||||
"}\n",
|
||||
"# And now we put it all together!\n",
|
||||
"final_chain = loaded_memory | expanded_memory | standalone_question | retrieved_documents | answer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 46,
|
||||
"id": "806e390c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'answer': AIMessage(content='Harrison was employed at Kensho.', additional_kwargs={}, example=False),\n",
|
||||
" 'docs': [Document(page_content='harrison worked at kensho', metadata={})]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 46,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inputs = {\"question\": \"where did harrison work?\"}\n",
|
||||
"result = final_chain.invoke(inputs)\n",
|
||||
"result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 47,
|
||||
"id": "977399fd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note that the memory does not save automatically\n",
|
||||
"# This will be improved in the future\n",
|
||||
"# For now you need to save it yourself\n",
|
||||
"memory.save_context(inputs, {\"answer\": result[\"answer\"].content})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 48,
|
||||
"id": "f94f7de4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'history': [HumanMessage(content='where did harrison work?', additional_kwargs={}, example=False),\n",
|
||||
" AIMessage(content='Harrison was employed at Kensho.', additional_kwargs={}, example=False)]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 48,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"memory.load_memory_variables({})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0f2bf8d3",
|
||||
@@ -1391,10 +1532,299 @@
|
||||
"response"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4927a727-b4c8-453c-8c83-bd87b4fcac14",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Moderation\n",
|
||||
"\n",
|
||||
"This shows how to add in moderation (or other safeguards) around your LLM application."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "4f5f6449-940a-4f5c-97c0-39b71c3e2a68",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import OpenAIModerationChain\n",
|
||||
"from langchain.llms import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 35,
|
||||
"id": "fcb8312b-7e7a-424f-a3ec-76738c9a9d21",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"moderate = OpenAIModerationChain()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"id": "b24b9148-f6b0-4091-8ea8-d3fb281bd950",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model = OpenAI()\n",
|
||||
"prompt = ChatPromptTemplate.from_messages([\n",
|
||||
" (\"system\", \"repeat after me: {input}\")\n",
|
||||
"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"id": "1c8ed87c-9ca6-4559-bf60-d40e94a0af08",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = prompt | model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 34,
|
||||
"id": "5256b9bd-381a-42b0-bfa8-7e6d18f853cb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\nYou are stupid.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 34,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke({\"input\": \"you are stupid\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 36,
|
||||
"id": "fe6e3b33-dc9a-49d5-b194-ba750c58a628",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"moderated_chain = chain | moderate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 37,
|
||||
"id": "d8ba0cbd-c739-4d23-be9f-6ae092bd5ffb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'input': '\\n\\nYou are stupid.',\n",
|
||||
" 'output': \"Text was found that violates OpenAI's content policy.\"}"
|
||||
]
|
||||
},
|
||||
"execution_count": 37,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"moderated_chain.invoke({\"input\": \"you are stupid\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "179d3c03",
|
||||
"id": "a0a85ba4-f782-47b8-b16f-8b7a61d6dab7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## Conversational Retrieval With Memory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "92c87dd8-bb6f-4f32-a30d-8f5459ce6265",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fallbacks\n",
|
||||
"\n",
|
||||
"With LCEL you can easily introduce fallbacks for any Runnable component, like an LLM."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "1b1cb744-31fc-4261-ab25-65fe1fcad559",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='To get to the other side.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"bad_llm = ChatOpenAI(model_name=\"gpt-fake\")\n",
|
||||
"good_llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\")\n",
|
||||
"llm = bad_llm.with_fallbacks([good_llm])\n",
|
||||
"\n",
|
||||
"llm.invoke(\"Why did the the chicken cross the road?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b8cf3982-03f6-49b3-8ff5-7cd12444f19c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Looking at the trace, we can see that the first model failed but the second succeeded, so we still got an output: https://smith.langchain.com/public/dfaf0bf6-d86d-43e9-b084-dd16a56df15c/r\n",
|
||||
"\n",
|
||||
"We can add an arbitrary sequence of fallbacks, which will be executed in order:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "31819be0-7f40-4e67-b5ab-61340027b948",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='To get to the other side.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm = bad_llm.with_fallbacks([bad_llm, bad_llm, good_llm])\n",
|
||||
"\n",
|
||||
"llm.invoke(\"Why did the the chicken cross the road?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "acad6e88-8046-450e-b005-db7e50f33b80",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Trace: https://smith.langchain.com/public/c09efd01-3184-4369-a225-c9da8efcaf47/r\n",
|
||||
"\n",
|
||||
"We can continue to use our Runnable with fallbacks the same way we use any Runnable, mean we can include it in sequences:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "bab114a1-bb93-4b7e-a639-e7e00f21aebc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='To show off its incredible jumping skills! Kangaroos are truly amazing creatures.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", \"You're a nice assistant who always includes a compliment in your response\"),\n",
|
||||
" (\"human\", \"Why did the {animal} cross the road\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"chain = prompt | llm\n",
|
||||
"chain.invoke({\"animal\": \"kangaroo\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "58340afa-8187-4ffe-9bd2-7912fb733a15",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Trace: https://smith.langchain.com/public/ba03895f-f8bd-4c70-81b7-8b930353eabd/r\n",
|
||||
"\n",
|
||||
"Note, since every sequence of Runnables is itself a Runnable, we can create fallbacks for whole Sequences. We can also continue using the full interface, including asynchronous calls, batched calls, and streams:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "45aa3170-b2e6-430d-887b-bd879048060a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[\"\\n\\nAnswer: The rabbit crossed the road to get to the other side. That's quite clever of him!\",\n",
|
||||
" '\\n\\nAnswer: The turtle crossed the road to get to the other side. You must be pretty clever to come up with that riddle!']"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"chat_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", \"You're a nice assistant who always includes a compliment in your response\"),\n",
|
||||
" (\"human\", \"Why did the {animal} cross the road\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"chat_model = ChatOpenAI(model_name=\"gpt-fake\")\n",
|
||||
"\n",
|
||||
"prompt_template = \"\"\"Instructions: You should always include a compliment in your response.\n",
|
||||
"\n",
|
||||
"Question: Why did the {animal} cross the road?\"\"\"\n",
|
||||
"prompt = PromptTemplate.from_template(prompt_template)\n",
|
||||
"llm = OpenAI()\n",
|
||||
"\n",
|
||||
"bad_chain = chat_prompt | chat_model\n",
|
||||
"good_chain = prompt | llm\n",
|
||||
"chain = bad_chain.with_fallbacks([good_chain])\n",
|
||||
"await chain.abatch([{\"animal\": \"rabbit\"}, {\"animal\": \"turtle\"}])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "af6731c6-0c73-4b1d-a433-6e8f6ecce2bb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Traces: \n",
|
||||
"1. https://smith.langchain.com/public/ccd73236-9ae5-48a6-94b5-41210be18a46/r\n",
|
||||
"2. https://smith.langchain.com/public/f43f608e-075c-45c7-bf73-b64e4d3f3082/r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3d2fe1fe-506b-4ee5-8056-8b9df801765f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
|
||||
@@ -41,7 +41,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"execution_count": 1,
|
||||
"id": "466b65b3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -108,7 +108,7 @@
|
||||
],
|
||||
"source": [
|
||||
"for s in chain.stream({\"topic\": \"bears\"}):\n",
|
||||
" print(s.content, end=\"\")"
|
||||
" print(s.content, end=\"\", flush=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -196,7 +196,7 @@
|
||||
],
|
||||
"source": [
|
||||
"async for s in chain.astream({\"topic\": \"bears\"}):\n",
|
||||
" print(s.content, end=\"\")"
|
||||
" print(s.content, end=\"\", flush=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -256,6 +256,131 @@
|
||||
"source": [
|
||||
"await chain.abatch([{\"topic\": \"bears\"}])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0a1c409d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Parallelism\n",
|
||||
"\n",
|
||||
"Let's take a look at how LangChain Expression Language support parralel requests as much as possible. For example, when using a RunnableMapping (often written as a dictionary) it executes each element in parralel."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "e3014c7a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema.runnable import RunnableMap\n",
|
||||
"chain1 = ChatPromptTemplate.from_template(\"tell me a joke about {topic}\") | model\n",
|
||||
"chain2 = ChatPromptTemplate.from_template(\"write a short (2 line) poem about {topic}\") | model\n",
|
||||
"combined = RunnableMap({\n",
|
||||
" \"joke\": chain1,\n",
|
||||
" \"poem\": chain2,\n",
|
||||
"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "08044c0a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 31.7 ms, sys: 8.59 ms, total: 40.3 ms\n",
|
||||
"Wall time: 1.05 s\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\"Why don't bears like fast food?\\n\\nBecause they can't catch it!\", additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"chain1.invoke({\"topic\": \"bears\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "22c56804",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 42.9 ms, sys: 10.2 ms, total: 53 ms\n",
|
||||
"Wall time: 1.93 s\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\"In forest's embrace, bears roam free,\\nSilent strength, nature's majesty.\", additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"chain2.invoke({\"topic\": \"bears\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "4fff4cbb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 96.3 ms, sys: 20.4 ms, total: 117 ms\n",
|
||||
"Wall time: 1.1 s\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'joke': AIMessage(content=\"Why don't bears wear socks?\\n\\nBecause they have bear feet!\", additional_kwargs={}, example=False),\n",
|
||||
" 'poem': AIMessage(content=\"In forest's embrace,\\nMajestic bears leave their trace.\", additional_kwargs={}, example=False)}"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"combined.invoke({\"topic\": \"bears\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fab75d1d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -274,7 +399,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -147,7 +147,7 @@
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"dataset.push_to_argilla(\"langchain-dataset\")"
|
||||
"dataset.push_to_argilla(\"langchain-dataset\");"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
382
docs/extras/integrations/callbacks/labelstudio.ipynb
Normal file
@@ -0,0 +1,382 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Label Studio\n",
|
||||
"\n",
|
||||
"<div>\n",
|
||||
"<img src=\"https://labelstudio-pub.s3.amazonaws.com/lc/open-source-data-labeling-platform.png\" width=\"400\"/>\n",
|
||||
"</div>\n",
|
||||
"\n",
|
||||
"Label Studio is an open-source data labeling platform that provides LangChain with flexibility when it comes to labeling data for fine-tuning large language models (LLMs). It also enables the preparation of custom training data and the collection and evaluation of responses through human feedback.\n",
|
||||
"\n",
|
||||
"In this guide, you will learn how to connect a LangChain pipeline to Label Studio to:\n",
|
||||
"\n",
|
||||
"- Aggregate all input prompts, conversations, and responses in a single LabelStudio project. This consolidates all the data in one place for easier labeling and analysis.\n",
|
||||
"- Refine prompts and responses to create a dataset for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) scenarios. The labeled data can be used to further train the LLM to improve its performance.\n",
|
||||
"- Evaluate model responses through human feedback. LabelStudio provides an interface for humans to review and provide feedback on model responses, allowing evaluation and iteration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Installation and setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"First install latest versions of Label Studio and Label Studio API client:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install -U label-studio label-studio-sdk openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Next, run `label-studio` on the command line to start the local LabelStudio instance at `http://localhost:8080`. See the [Label Studio installation guide](https://labelstud.io/guide/install) for more options."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"You'll need a token to make API calls.\n",
|
||||
"\n",
|
||||
"Open your LabelStudio instance in your browser, go to `Account & Settings > Access Token` and copy the key.\n",
|
||||
"\n",
|
||||
"Set environment variables with your LabelStudio URL, API key and OpenAI API key:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ['LABEL_STUDIO_URL'] = '<YOUR-LABEL-STUDIO-URL>' # e.g. http://localhost:8080\n",
|
||||
"os.environ['LABEL_STUDIO_API_KEY'] = '<YOUR-LABEL-STUDIO-API-KEY>'\n",
|
||||
"os.environ['OPENAI_API_KEY'] = '<YOUR-OPENAI-API-KEY>'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Collecting LLMs prompts and responses"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The data used for labeling is stored in projects within Label Studio. Every project is identified by an XML configuration that details the specifications for input and output data. \n",
|
||||
"\n",
|
||||
"Create a project that takes human input in text format and outputs an editable LLM response in a text area:\n",
|
||||
"\n",
|
||||
"```xml\n",
|
||||
"<View>\n",
|
||||
"<Style>\n",
|
||||
" .prompt-box {\n",
|
||||
" background-color: white;\n",
|
||||
" border-radius: 10px;\n",
|
||||
" box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);\n",
|
||||
" padding: 20px;\n",
|
||||
" }\n",
|
||||
"</Style>\n",
|
||||
"<View className=\"root\">\n",
|
||||
" <View className=\"prompt-box\">\n",
|
||||
" <Text name=\"prompt\" value=\"$prompt\"/>\n",
|
||||
" </View>\n",
|
||||
" <TextArea name=\"response\" toName=\"prompt\"\n",
|
||||
" maxSubmissions=\"1\" editable=\"true\"\n",
|
||||
" required=\"true\"/>\n",
|
||||
"</View>\n",
|
||||
"<Header value=\"Rate the response:\"/>\n",
|
||||
"<Rating name=\"rating\" toName=\"prompt\"/>\n",
|
||||
"</View>\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"1. To create a project in Label Studio, click on the \"Create\" button. \n",
|
||||
"2. Enter a name for your project in the \"Project Name\" field, such as `My Project`.\n",
|
||||
"3. Navigate to `Labeling Setup > Custom Template` and paste the XML configuration provided above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"You can collect input LLM prompts and output responses in a LabelStudio project, connecting it via `LabelStudioCallbackHandler`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.callbacks import LabelStudioCallbackHandler\n",
|
||||
"\n",
|
||||
"llm = OpenAI(\n",
|
||||
" temperature=0,\n",
|
||||
" callbacks=[\n",
|
||||
" LabelStudioCallbackHandler(\n",
|
||||
" project_name=\"My Project\"\n",
|
||||
" )]\n",
|
||||
")\n",
|
||||
"print(llm(\"Tell me a joke\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"In the Label Studio, open `My Project`. You will see the prompts, responses, and metadata like the model name. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Collecting Chat model Dialogues"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also track and display full chat dialogues in LabelStudio, with the ability to rate and modify the last response:\n",
|
||||
"\n",
|
||||
"1. Open Label Studio and click on the \"Create\" button.\n",
|
||||
"2. Enter a name for your project in the \"Project Name\" field, such as `New Project with Chat`.\n",
|
||||
"3. Navigate to Labeling Setup > Custom Template and paste the following XML configuration:\n",
|
||||
"\n",
|
||||
"```xml\n",
|
||||
"<View>\n",
|
||||
"<View className=\"root\">\n",
|
||||
" <Paragraphs name=\"dialogue\"\n",
|
||||
" value=\"$prompt\"\n",
|
||||
" layout=\"dialogue\"\n",
|
||||
" textKey=\"content\"\n",
|
||||
" nameKey=\"role\"\n",
|
||||
" granularity=\"sentence\"/>\n",
|
||||
" <Header value=\"Final response:\"/>\n",
|
||||
" <TextArea name=\"response\" toName=\"dialogue\"\n",
|
||||
" maxSubmissions=\"1\" editable=\"true\"\n",
|
||||
" required=\"true\"/>\n",
|
||||
"</View>\n",
|
||||
"<Header value=\"Rate the response:\"/>\n",
|
||||
"<Rating name=\"rating\" toName=\"dialogue\"/>\n",
|
||||
"</View>\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.schema import HumanMessage, SystemMessage\n",
|
||||
"from langchain.callbacks import LabelStudioCallbackHandler\n",
|
||||
"\n",
|
||||
"chat_llm = ChatOpenAI(callbacks=[\n",
|
||||
" LabelStudioCallbackHandler(\n",
|
||||
" mode=\"chat\",\n",
|
||||
" project_name=\"New Project with Chat\",\n",
|
||||
" )\n",
|
||||
"])\n",
|
||||
"llm_results = chat_llm([\n",
|
||||
" SystemMessage(content=\"Always use a lot of emojis\"),\n",
|
||||
" HumanMessage(content=\"Tell me a joke\")\n",
|
||||
"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In Label Studio, open \"New Project with Chat\". Click on a created task to view dialog history and edit/annotate responses."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Custom Labeling Configuration"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"You can modify the default labeling configuration in LabelStudio to add more target labels like response sentiment, relevance, and many [other types annotator's feedback](https://labelstud.io/tags/).\n",
|
||||
"\n",
|
||||
"New labeling configuration can be added from UI: go to `Settings > Labeling Interface` and set up a custom configuration with additional tags like `Choices` for sentiment or `Rating` for relevance. Keep in mind that [`TextArea` tag](https://labelstud.io/tags/textarea) should be presented in any configuration to display the LLM responses.\n",
|
||||
"\n",
|
||||
"Alternatively, you can specify the labeling configuration on the initial call before project creation:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ls = LabelStudioCallbackHandler(project_config='''\n",
|
||||
"<View>\n",
|
||||
"<Text name=\"prompt\" value=\"$prompt\"/>\n",
|
||||
"<TextArea name=\"response\" toName=\"prompt\"/>\n",
|
||||
"<TextArea name=\"user_feedback\" toName=\"prompt\"/>\n",
|
||||
"<Rating name=\"rating\" toName=\"prompt\"/>\n",
|
||||
"<Choices name=\"sentiment\" toName=\"prompt\">\n",
|
||||
" <Choice value=\"Positive\"/>\n",
|
||||
" <Choice value=\"Negative\"/>\n",
|
||||
"</Choices>\n",
|
||||
"</View>\n",
|
||||
"''')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that if the project doesn't exist, it will be created with the specified labeling configuration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Other parameters"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%% md\n"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"The `LabelStudioCallbackHandler` accepts several optional parameters:\n",
|
||||
"\n",
|
||||
"- **api_key** - Label Studio API key. Overrides environmental variable `LABEL_STUDIO_API_KEY`.\n",
|
||||
"- **url** - Label Studio URL. Overrides `LABEL_STUDIO_URL`, default `http://localhost:8080`.\n",
|
||||
"- **project_id** - Existing Label Studio project ID. Overrides `LABEL_STUDIO_PROJECT_ID`. Stores data in this project.\n",
|
||||
"- **project_name** - Project name if project ID not specified. Creates a new project. Default is `\"LangChain-%Y-%m-%d\"` formatted with the current date.\n",
|
||||
"- **project_config** - [custom labeling configuration](#custom-labeling-configuration)\n",
|
||||
"- **mode**: use this shortcut to create target configuration from scratch:\n",
|
||||
" - `\"prompt\"` - Single prompt, single response. Default.\n",
|
||||
" - `\"chat\"` - Multi-turn chat mode.\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "labelops",
|
||||
"language": "python",
|
||||
"name": "labelops"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 1
|
||||
}
|
||||
225
docs/extras/integrations/chat/anyscale.ipynb
Normal file
@@ -0,0 +1,225 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "642fd21c-600a-47a1-be96-6e1438b421a9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Anyscale\n",
|
||||
"\n",
|
||||
"This notebook demonstrates the use of `langchain.chat_models.ChatAnyscale` for [Anyscale Endpoints](https://endpoints.anyscale.com/).\n",
|
||||
"\n",
|
||||
"* Set `ANYSCALE_API_KEY` environment variable\n",
|
||||
"* or use the `anyscale_api_key` keyword argument"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# !pip install openai"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"id": "d00d850917865298"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "72340871-ae2f-415f-b399-0777d32dc379",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"ANYSCALE_API_KEY\"] = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5d7fc704-3ea0-4c35-96e7-89fcae6c73fa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Let's try out each model offered on Anyscale Endpoints"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "0dc9428d-4217-47d2-97de-f784b1764186",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"dict_keys(['meta-llama/Llama-2-70b-chat-hf', 'meta-llama/Llama-2-7b-chat-hf', 'meta-llama/Llama-2-13b-chat-hf'])\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatAnyscale\n",
|
||||
"\n",
|
||||
"chats = {\n",
|
||||
" model: ChatAnyscale(model_name=model, temperature=1.0)\n",
|
||||
" for model in ChatAnyscale.get_available_models()\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"print(chats.keys())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7c4f124a-eaf7-4d78-a2c0-b0aa23fb25c4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# We can use async methods and other stuff supported by ChatOpenAI\n",
|
||||
"\n",
|
||||
"This way, the three requests will only take as long as the longest individual request."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "1f94f5d2-569e-4a2c-965e-de53c2845fbb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"\n",
|
||||
"from langchain.schema import SystemMessage, HumanMessage\n",
|
||||
"\n",
|
||||
"messages = [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"You are a helpful AI that shares everything you know.\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Tell me technical facts about yourself. Are you a transformer model? How many billions of parameters do you have?\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"async def get_msgs():\n",
|
||||
" tasks = [\n",
|
||||
" chat.apredict_messages(messages)\n",
|
||||
" for chat in chats.values()\n",
|
||||
" ]\n",
|
||||
" responses = await asyncio.gather(*tasks)\n",
|
||||
" return dict(zip(chats.keys(), responses))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "b2ced871-869a-4ca6-a2ec-6bfececdf7da",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import nest_asyncio\n",
|
||||
"\n",
|
||||
"nest_asyncio.apply()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "bc605fa5-9501-470d-a6c9-cd868d2145ef",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\tmeta-llama/Llama-2-70b-chat-hf\n",
|
||||
"\n",
|
||||
"Greetings! I'm just an AI, I don't have a personal identity like humans do, but I'm here to help you with any questions you have.\n",
|
||||
"\n",
|
||||
"I'm a large language model, which means I'm trained on a large corpus of text data to generate language outputs that are coherent and natural-sounding. My architecture is based on a transformer model, which is a type of neural network that's particularly well-suited for natural language processing tasks.\n",
|
||||
"\n",
|
||||
"As for my parameters, I have a few billion parameters, but I don't have access to the exact number as it's not relevant to my functioning. My training data includes a vast amount of text from various sources, including books, articles, and websites, which I use to learn patterns and relationships in language.\n",
|
||||
"\n",
|
||||
"I'm designed to be a helpful tool for a variety of tasks, such as answering questions, providing information, and generating text. I'm constantly learning and improving my abilities through machine learning algorithms and feedback from users like you.\n",
|
||||
"\n",
|
||||
"I hope this helps! Is there anything else you'd like to know about me or my capabilities?\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"\tmeta-llama/Llama-2-7b-chat-hf\n",
|
||||
"\n",
|
||||
"Ah, a fellow tech enthusiast! *adjusts glasses* I'm glad to share some technical details about myself. 🤓\n",
|
||||
"Indeed, I'm a transformer model, specifically a BERT-like language model trained on a large corpus of text data. My architecture is based on the transformer framework, which is a type of neural network designed for natural language processing tasks. 🏠\n",
|
||||
"As for the number of parameters, I have approximately 340 million. *winks* That's a pretty hefty number, if I do say so myself! These parameters allow me to learn and represent complex patterns in language, such as syntax, semantics, and more. 🤔\n",
|
||||
"But don't ask me to do math in my head – I'm a language model, not a calculating machine! 😅 My strengths lie in understanding and generating human-like text, so feel free to chat with me anytime you'd like. 💬\n",
|
||||
"Now, do you have any more technical questions for me? Or would you like to engage in a nice chat? 😊\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"\tmeta-llama/Llama-2-13b-chat-hf\n",
|
||||
"\n",
|
||||
"Hello! As a friendly and helpful AI, I'd be happy to share some technical facts about myself.\n",
|
||||
"\n",
|
||||
"I am a transformer-based language model, specifically a variant of the BERT (Bidirectional Encoder Representations from Transformers) architecture. BERT was developed by Google in 2018 and has since become one of the most popular and widely-used AI language models.\n",
|
||||
"\n",
|
||||
"Here are some technical details about my capabilities:\n",
|
||||
"\n",
|
||||
"1. Parameters: I have approximately 340 million parameters, which are the numbers that I use to learn and represent language. This is a relatively large number of parameters compared to some other languages models, but it allows me to learn and understand complex language patterns and relationships.\n",
|
||||
"2. Training: I was trained on a large corpus of text data, including books, articles, and other sources of written content. This training allows me to learn about the structure and conventions of language, as well as the relationships between words and phrases.\n",
|
||||
"3. Architectures: My architecture is based on the transformer model, which is a type of neural network that is particularly well-suited for natural language processing tasks. The transformer model uses self-attention mechanisms to allow the model to \"attend\" to different parts of the input text, allowing it to capture long-range dependencies and contextual relationships.\n",
|
||||
"4. Precision: I am capable of generating text with high precision and accuracy, meaning that I can produce text that is close to human-level quality in terms of grammar, syntax, and coherence.\n",
|
||||
"5. Generative capabilities: In addition to being able to generate text based on prompts and questions, I am also capable of generating text based on a given topic or theme. This allows me to create longer, more coherent pieces of text that are organized around a specific idea or concept.\n",
|
||||
"\n",
|
||||
"Overall, I am a powerful and versatile language model that is capable of a wide range of natural language processing tasks. I am constantly learning and improving, and I am here to help answer any questions you may have!\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"CPU times: user 371 ms, sys: 15.5 ms, total: 387 ms\n",
|
||||
"Wall time: 12 s\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"\n",
|
||||
"response_dict = asyncio.run(get_msgs())\n",
|
||||
"\n",
|
||||
"for model_name, response in response_dict.items():\n",
|
||||
" print(f'\\t{model_name}')\n",
|
||||
" print()\n",
|
||||
" print(response.content)\n",
|
||||
" print('\\n---\\n')"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -74,6 +74,124 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f27fa24d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model Version\n",
|
||||
"Azure OpenAI responses contain `model` property, which is name of the model used to generate the response. However unlike native OpenAI responses, it does not contain the version of the model, which is set on the deplyoment in Azure. This makes it tricky to know which version of the model was used to generate the response, which as result can lead to e.g. wrong total cost calculation with `OpenAICallbackHandler`.\n",
|
||||
"\n",
|
||||
"To solve this problem, you can pass `model_version` parameter to `AzureChatOpenAI` class, which will be added to the model name in the llm output. This way you can easily distinguish between different versions of the model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0531798a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks import get_openai_callback"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "3fd97dfc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"BASE_URL = \"https://{endpoint}.openai.azure.com\"\n",
|
||||
"API_KEY = \"...\"\n",
|
||||
"DEPLOYMENT_NAME = \"gpt-35-turbo\" # in Azure, this deployment has version 0613 - input and output tokens are counted separately"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "aceddb72",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Total Cost (USD): $0.000054\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model = AzureChatOpenAI(\n",
|
||||
" openai_api_base=BASE_URL,\n",
|
||||
" openai_api_version=\"2023-05-15\",\n",
|
||||
" deployment_name=DEPLOYMENT_NAME,\n",
|
||||
" openai_api_key=API_KEY,\n",
|
||||
" openai_api_type=\"azure\",\n",
|
||||
")\n",
|
||||
"with get_openai_callback() as cb:\n",
|
||||
" model(\n",
|
||||
" [\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Translate this sentence from English to French. I love programming.\"\n",
|
||||
" )\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" print(f\"Total Cost (USD): ${format(cb.total_cost, '.6f')}\") # without specifying the model version, flat-rate 0.002 USD per 1k input and output tokens is used\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2e61eefd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can provide the model version to `AzureChatOpenAI` constructor. It will get appended to the model name returned by Azure OpenAI and cost will be counted correctly."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "8d5e54e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Total Cost (USD): $0.000044\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model0613 = AzureChatOpenAI(\n",
|
||||
" openai_api_base=BASE_URL,\n",
|
||||
" openai_api_version=\"2023-05-15\",\n",
|
||||
" deployment_name=DEPLOYMENT_NAME,\n",
|
||||
" openai_api_key=API_KEY,\n",
|
||||
" openai_api_type=\"azure\",\n",
|
||||
" model_version=\"0613\"\n",
|
||||
")\n",
|
||||
"with get_openai_callback() as cb:\n",
|
||||
" model0613(\n",
|
||||
" [\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Translate this sentence from English to French. I love programming.\"\n",
|
||||
" )\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" print(f\"Total Cost (USD): ${format(cb.total_cost, '.6f')}\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "99682534",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -92,7 +210,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.8.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
185
docs/extras/integrations/chat/litellm.ipynb
Normal file
@@ -0,0 +1,185 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "bf733a38-db84-4363-89e2-de6735c37230",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 🚅 LiteLLM\n",
|
||||
"\n",
|
||||
"[LiteLLM](https://github.com/BerriAI/litellm) is a library that simplifies calling Anthropic, Azure, Huggingface, Replicate, etc. \n",
|
||||
"\n",
|
||||
"This notebook covers how to get started with using Langchain + the LiteLLM I/O library. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "d4a7c55d-b235-4ca4-a579-c90cc9570da9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatLiteLLM\n",
|
||||
"from langchain.prompts.chat import (\n",
|
||||
" ChatPromptTemplate,\n",
|
||||
" SystemMessagePromptTemplate,\n",
|
||||
" AIMessagePromptTemplate,\n",
|
||||
" HumanMessagePromptTemplate,\n",
|
||||
")\n",
|
||||
"from langchain.schema import AIMessage, HumanMessage, SystemMessage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "70cf04e8-423a-4ff6-8b09-f11fb711c817",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat = ChatLiteLLM(model=\"gpt-3.5-turbo\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8199ef8f-eb8b-4253-9ea0-6c24a013ca4c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\" J'aime la programmation.\", additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"messages = [\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Translate this sentence from English to French. I love programming.\"\n",
|
||||
" )\n",
|
||||
"]\n",
|
||||
"chat(messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c361ab1e-8c0c-4206-9e3c-9d1424a12b9c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## `ChatLiteLLM` also supports async and streaming functionality:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "93a21c5c-6ef9-4688-be60-b2e1f94842fb",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "c5fac0e9-05a4-4fc1-a3b3-e5bbb24b971b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LLMResult(generations=[[ChatGeneration(text=\" J'aime programmer.\", generation_info=None, message=AIMessage(content=\" J'aime programmer.\", additional_kwargs={}, example=False))]], llm_output={}, run=[RunInfo(run_id=UUID('8cc8fb68-1c35-439c-96a0-695036a93652'))])"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"await chat.agenerate([messages])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "025be980-e50d-4a68-93dc-c9c7b500ce34",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" J'aime la programmation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\" J'aime la programmation.\", additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat = ChatLiteLLM(\n",
|
||||
" streaming=True,\n",
|
||||
" verbose=True,\n",
|
||||
" callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),\n",
|
||||
")\n",
|
||||
"chat(messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c253883f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
226
docs/extras/integrations/document_loaders/airbyte_cdk.ipynb
Normal file
@@ -0,0 +1,226 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3a5ebf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Airbyte CDK"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
|
||||
"\n",
|
||||
"A lot of source connectors are implemented using the [Airbyte CDK](https://docs.airbyte.com/connector-development/cdk-python/). This loader allows to run any of these connectors and return the data as documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b06fbde",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3e9dc79",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, you need to install the `airbyte-cdk` python package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4d35e4e0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install airbyte-cdk"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "085aa658",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Then, either install an existing connector from the [Airbyte Github repository](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors) or create your own connector using the [Airbyte CDK](https://docs.airbyte.io/connector-development/connector-development).\n",
|
||||
"\n",
|
||||
"For example, to install the Github connector, run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f6d04ef4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install \"source_github@git+https://github.com/airbytehq/airbyte.git@master#subdirectory=airbyte-integrations/connectors/source-github\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "36069b74",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Some sources are also published as regular packages on PyPI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ae855210",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02208f52",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can create an `AirbyteCDKLoader` based on the imported source. It takes a `config` object that's passed to the connector. You also have to pick the stream you want to retrieve records from by name (`stream_name`). Check the connectors documentation page and spec definition for more information on the config object and available streams. For the Github connectors these are:\n",
|
||||
"* [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-github/source_github/spec.json](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-github/source_github/spec.json).\n",
|
||||
"* [https://docs.airbyte.com/integrations/sources/github/](https://docs.airbyte.com/integrations/sources/github/)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "89a99e58",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.document_loaders.airbyte import AirbyteCDKLoader\n",
|
||||
"from source_github.source import SourceGithub # plug in your own source here\n",
|
||||
"\n",
|
||||
"config = {\n",
|
||||
" # your github configuration\n",
|
||||
" \"credentials\": {\n",
|
||||
" \"api_url\": \"api.github.com\",\n",
|
||||
" \"personal_access_token\": \"<token>\"\n",
|
||||
" },\n",
|
||||
" \"repository\": \"<repo>\",\n",
|
||||
" \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\"\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"issues_loader = AirbyteCDKLoader(source_class=SourceGithub, config=config, stream_name=\"issues\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2cea23fc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can load documents the usual way"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dae75cdb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = issues_loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4a93dc2a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1782db09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_iterator = issues_loader.lazy_load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a124086",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5671395d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"\n",
|
||||
"def handle_record(record, id):\n",
|
||||
" return Document(page_content=record.data[\"title\"] + \"\\n\" + (record.data[\"body\"] or \"\"), metadata=record.data)\n",
|
||||
"\n",
|
||||
"issues_loader = AirbyteCDKLoader(source_class=SourceGithub, config=config, stream_name=\"issues\", record_handler=handle_record)\n",
|
||||
"\n",
|
||||
"docs = issues_loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "223eb8bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Incremental loads\n",
|
||||
"\n",
|
||||
"Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
|
||||
"\n",
|
||||
"To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7061e735",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"last_state = issues_loader.last_state # store safely\n",
|
||||
"\n",
|
||||
"incremental_issue_loader = AirbyteCDKLoader(source_class=SourceGithub, config=config, stream_name=\"issues\", state=last_state)\n",
|
||||
"\n",
|
||||
"new_docs = incremental_issue_loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
206
docs/extras/integrations/document_loaders/airbyte_gong.ipynb
Normal file
@@ -0,0 +1,206 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3a5ebf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Airbyte Gong"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
|
||||
"\n",
|
||||
"This loader exposes the Gong connector as a document loader, allowing you to load various Gong objects as documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6847a40c",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b06fbde",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3e9dc79",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, you need to install the `airbyte-source-gong` python package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4d35e4e0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install airbyte-source-gong"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ae855210",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02208f52",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/gong/) for details about how to configure the reader.\n",
|
||||
"The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-gong/source_gong/spec.yaml](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-gong/source_gong/spec.yaml).\n",
|
||||
"\n",
|
||||
"The general shape looks like this:\n",
|
||||
"```python\n",
|
||||
"{\n",
|
||||
" \"access_key\": \"<access key name>\",\n",
|
||||
" \"access_key_secret\": \"<access key secret>\",\n",
|
||||
" \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "89a99e58",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.document_loaders.airbyte import AirbyteGongLoader\n",
|
||||
"\n",
|
||||
"config = {\n",
|
||||
" # your gong configuration\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"loader = AirbyteGongLoader(config=config, stream_name=\"calls\") # check the documentation linked above for a list of all streams"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2cea23fc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can load documents the usual way"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dae75cdb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4a93dc2a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1782db09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_iterator = loader.lazy_load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a124086",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To process documents, create a class inheriting from the base loader and implement the `_handle_records` method yourself:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5671395d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"\n",
|
||||
"def handle_record(record, id):\n",
|
||||
" return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
|
||||
"\n",
|
||||
"loader = AirbyteGongLoader(config=config, record_handler=handle_record, stream_name=\"calls\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "223eb8bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Incremental loads\n",
|
||||
"\n",
|
||||
"Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
|
||||
"\n",
|
||||
"To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7061e735",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"last_state = loader.last_state # store safely\n",
|
||||
"\n",
|
||||
"incremental_loader = AirbyteGongLoader(config=config, stream_name=\"calls\", state=last_state)\n",
|
||||
"\n",
|
||||
"new_docs = incremental_loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
208
docs/extras/integrations/document_loaders/airbyte_hubspot.ipynb
Normal file
@@ -0,0 +1,208 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3a5ebf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Airbyte Hubspot"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
|
||||
"\n",
|
||||
"This loader exposes the Hubspot connector as a document loader, allowing you to load various Hubspot objects as documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6847a40c",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b06fbde",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3e9dc79",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, you need to install the `airbyte-source-hubspot` python package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4d35e4e0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install airbyte-source-hubspot"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ae855210",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02208f52",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/hubspot/) for details about how to configure the reader.\n",
|
||||
"The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-hubspot/source_hubspot/spec.yaml](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-hubspot/source_hubspot/spec.yaml).\n",
|
||||
"\n",
|
||||
"The general shape looks like this:\n",
|
||||
"```python\n",
|
||||
"{\n",
|
||||
" \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
|
||||
" \"credentials\": {\n",
|
||||
" \"credentials_title\": \"Private App Credentials\",\n",
|
||||
" \"access_token\": \"<access token of your private app>\"\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "89a99e58",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.document_loaders.airbyte import AirbyteHubspotLoader\n",
|
||||
"\n",
|
||||
"config = {\n",
|
||||
" # your hubspot configuration\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"loader = AirbyteHubspotLoader(config=config, stream_name=\"products\") # check the documentation linked above for a list of all streams"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2cea23fc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can load documents the usual way"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dae75cdb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4a93dc2a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1782db09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_iterator = loader.lazy_load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a124086",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To process documents, create a class inheriting from the base loader and implement the `_handle_records` method yourself:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5671395d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"\n",
|
||||
"def handle_record(record, id):\n",
|
||||
" return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
|
||||
"\n",
|
||||
"loader = AirbyteHubspotLoader(config=config, record_handler=handle_record, stream_name=\"products\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "223eb8bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Incremental loads\n",
|
||||
"\n",
|
||||
"Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
|
||||
"\n",
|
||||
"To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7061e735",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"last_state = loader.last_state # store safely\n",
|
||||
"\n",
|
||||
"incremental_loader = AirbyteHubspotLoader(config=config, stream_name=\"products\", state=last_state)\n",
|
||||
"\n",
|
||||
"new_docs = incremental_loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,213 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3a5ebf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Airbyte Salesforce"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
|
||||
"\n",
|
||||
"This loader exposes the Salesforce connector as a document loader, allowing you to load various Salesforce objects as documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6847a40c",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b06fbde",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3e9dc79",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, you need to install the `airbyte-source-salesforce` python package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4d35e4e0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install airbyte-source-salesforce"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ae855210",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02208f52",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/salesforce/) for details about how to configure the reader.\n",
|
||||
"The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce/source_salesforce/spec.yaml](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce/source_salesforce/spec.yaml).\n",
|
||||
"\n",
|
||||
"The general shape looks like this:\n",
|
||||
"```python\n",
|
||||
"{\n",
|
||||
" \"client_id\": \"<oauth client id>\",\n",
|
||||
" \"client_secret\": \"<oauth client secret>\",\n",
|
||||
" \"refresh_token\": \"<oauth refresh token>\",\n",
|
||||
" \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
|
||||
" \"is_sandbox\": False, # set to True if you're using a sandbox environment\n",
|
||||
" \"streams_criteria\": [ # Array of filters for salesforce objects that should be loadable\n",
|
||||
" {\"criteria\": \"exacts\", \"value\": \"Account\"}, # Exact name of salesforce object\n",
|
||||
" {\"criteria\": \"starts with\", \"value\": \"Asset\"}, # Prefix of the name\n",
|
||||
" # Other allowed criteria: ends with, contains, starts not with, ends not with, not contains, not exacts\n",
|
||||
" ],\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "89a99e58",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.document_loaders.airbyte import AirbyteSalesforceLoader\n",
|
||||
"\n",
|
||||
"config = {\n",
|
||||
" # your salesforce configuration\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"loader = AirbyteSalesforceLoader(config=config, stream_name=\"asset\") # check the documentation linked above for a list of all streams"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2cea23fc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can load documents the usual way"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dae75cdb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4a93dc2a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1782db09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_iterator = loader.lazy_load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a124086",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5671395d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"\n",
|
||||
"def handle_record(record, id):\n",
|
||||
" return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
|
||||
"\n",
|
||||
"loader = AirbyteSalesforceLoader(config=config, record_handler=handle_record, stream_name=\"asset\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "223eb8bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Incremental loads\n",
|
||||
"\n",
|
||||
"Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
|
||||
"\n",
|
||||
"To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7061e735",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"last_state = loader.last_state # store safely\n",
|
||||
"\n",
|
||||
"incremental_loader = AirbyteSalesforceLoader(config=config, stream_name=\"asset\", state=last_state)\n",
|
||||
"\n",
|
||||
"new_docs = incremental_loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
209
docs/extras/integrations/document_loaders/airbyte_shopify.ipynb
Normal file
@@ -0,0 +1,209 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3a5ebf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Airbyte Shopify"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
|
||||
"\n",
|
||||
"This loader exposes the Shopify connector as a document loader, allowing you to load various Shopify objects as documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6847a40c",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b06fbde",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3e9dc79",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, you need to install the `airbyte-source-shopify` python package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4d35e4e0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install airbyte-source-shopify"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ae855210",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02208f52",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/shopify/) for details about how to configure the reader.\n",
|
||||
"The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-shopify/source_shopify/spec.json](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-shopify/source_shopify/spec.json).\n",
|
||||
"\n",
|
||||
"The general shape looks like this:\n",
|
||||
"```python\n",
|
||||
"{\n",
|
||||
" \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
|
||||
" \"shop\": \"<name of the shop you want to retrieve documents from>\",\n",
|
||||
" \"credentials\": {\n",
|
||||
" \"auth_method\": \"api_password\",\n",
|
||||
" \"api_password\": \"<your api password>\"\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "89a99e58",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.document_loaders.airbyte import AirbyteShopifyLoader\n",
|
||||
"\n",
|
||||
"config = {\n",
|
||||
" # your shopify configuration\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"loader = AirbyteShopifyLoader(config=config, stream_name=\"orders\") # check the documentation linked above for a list of all streams"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2cea23fc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can load documents the usual way"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dae75cdb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4a93dc2a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1782db09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_iterator = loader.lazy_load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a124086",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5671395d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"\n",
|
||||
"def handle_record(record, id):\n",
|
||||
" return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
|
||||
"\n",
|
||||
"loader = AirbyteShopifyLoader(config=config, record_handler=handle_record, stream_name=\"orders\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "223eb8bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Incremental loads\n",
|
||||
"\n",
|
||||
"Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
|
||||
"\n",
|
||||
"To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7061e735",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"last_state = loader.last_state # store safely\n",
|
||||
"\n",
|
||||
"incremental_loader = AirbyteShopifyLoader(config=config, stream_name=\"orders\", state=last_state)\n",
|
||||
"\n",
|
||||
"new_docs = incremental_loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
206
docs/extras/integrations/document_loaders/airbyte_stripe.ipynb
Normal file
@@ -0,0 +1,206 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3a5ebf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Airbyte Stripe"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
|
||||
"\n",
|
||||
"This loader exposes the Stripe connector as a document loader, allowing you to load various Stripe objects as documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6847a40c",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b06fbde",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3e9dc79",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, you need to install the `airbyte-source-stripe` python package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4d35e4e0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install airbyte-source-stripe"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ae855210",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02208f52",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/stripe/) for details about how to configure the reader.\n",
|
||||
"The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml).\n",
|
||||
"\n",
|
||||
"The general shape looks like this:\n",
|
||||
"```python\n",
|
||||
"{\n",
|
||||
" \"client_secret\": \"<secret key>\",\n",
|
||||
" \"account_id\": \"<account id>\",\n",
|
||||
" \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "89a99e58",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.document_loaders.airbyte import AirbyteStripeLoader\n",
|
||||
"\n",
|
||||
"config = {\n",
|
||||
" # your stripe configuration\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"loader = AirbyteStripeLoader(config=config, stream_name=\"invoices\") # check the documentation linked above for a list of all streams"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2cea23fc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can load documents the usual way"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dae75cdb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4a93dc2a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1782db09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_iterator = loader.lazy_load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a124086",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5671395d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"\n",
|
||||
"def handle_record(record, id):\n",
|
||||
" return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
|
||||
"\n",
|
||||
"loader = AirbyteStripeLoader(config=config, record_handler=handle_record, stream_name=\"invoices\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "223eb8bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Incremental loads\n",
|
||||
"\n",
|
||||
"Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
|
||||
"\n",
|
||||
"To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7061e735",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"last_state = loader.last_state # store safely\n",
|
||||
"\n",
|
||||
"incremental_loader = AirbyteStripeLoader(config=config, record_handler=handle_record, stream_name=\"invoices\", state=last_state)\n",
|
||||
"\n",
|
||||
"new_docs = incremental_loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
209
docs/extras/integrations/document_loaders/airbyte_typeform.ipynb
Normal file
@@ -0,0 +1,209 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3a5ebf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Airbyte Typeform"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
|
||||
"\n",
|
||||
"This loader exposes the Typeform connector as a document loader, allowing you to load various Typeform objects as documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6847a40c",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b06fbde",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3e9dc79",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, you need to install the `airbyte-source-typeform` python package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4d35e4e0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install airbyte-source-typeform"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ae855210",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02208f52",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/typeform/) for details about how to configure the reader.\n",
|
||||
"The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-typeform/source_typeform/spec.json](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-typeform/source_typeform/spec.json).\n",
|
||||
"\n",
|
||||
"The general shape looks like this:\n",
|
||||
"```python\n",
|
||||
"{\n",
|
||||
" \"credentials\": {\n",
|
||||
" \"auth_type\": \"Private Token\",\n",
|
||||
" \"access_token\": \"<your auth token>\"\n",
|
||||
" },\n",
|
||||
" \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
|
||||
" \"form_ids\": [\"<id of form to load records for>\"] # if omitted, records from all forms will be loaded\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "89a99e58",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.document_loaders.airbyte import AirbyteTypeformLoader\n",
|
||||
"\n",
|
||||
"config = {\n",
|
||||
" # your typeform configuration\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"loader = AirbyteTypeformLoader(config=config, stream_name=\"forms\") # check the documentation linked above for a list of all streams"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2cea23fc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can load documents the usual way"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dae75cdb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4a93dc2a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1782db09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_iterator = loader.lazy_load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a124086",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5671395d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"\n",
|
||||
"def handle_record(record, id):\n",
|
||||
" return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
|
||||
"\n",
|
||||
"loader = AirbyteTypeformLoader(config=config, record_handler=handle_record, stream_name=\"forms\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "223eb8bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Incremental loads\n",
|
||||
"\n",
|
||||
"Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
|
||||
"\n",
|
||||
"To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7061e735",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"last_state = loader.last_state # store safely\n",
|
||||
"\n",
|
||||
"incremental_loader = AirbyteTypeformLoader(config=config, record_handler=handle_record, stream_name=\"forms\", state=last_state)\n",
|
||||
"\n",
|
||||
"new_docs = incremental_loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,210 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3a5ebf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Airbyte Zendesk Support"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
|
||||
"\n",
|
||||
"This loader exposes the Zendesk Support connector as a document loader, allowing you to load various objects as documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6847a40c",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b06fbde",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3e9dc79",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, you need to install the `airbyte-source-zendesk-support` python package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4d35e4e0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install airbyte-source-zendesk-support"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ae855210",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "02208f52",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Check out the [Airbyte documentation page](https://docs.airbyte.com/integrations/sources/zendesk-support/) for details about how to configure the reader.\n",
|
||||
"The JSON schema the config object should adhere to can be found on Github: [https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-zendesk-support/source_zendesk_support/spec.json](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-zendesk-support/source_zendesk_support/spec.json).\n",
|
||||
"\n",
|
||||
"The general shape looks like this:\n",
|
||||
"```python\n",
|
||||
"{\n",
|
||||
" \"subdomain\": \"<your zendesk subdomain>\",\n",
|
||||
" \"start_date\": \"<date from which to start retrieving records from in ISO format, e.g. 2020-10-20T00:00:00Z>\",\n",
|
||||
" \"credentials\": {\n",
|
||||
" \"credentials\": \"api_token\",\n",
|
||||
" \"email\": \"<your email>\",\n",
|
||||
" \"api_token\": \"<your api token>\"\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"By default all fields are stored as metadata in the documents and the text is set to an empty string. Construct the text of the document by transforming the documents returned by the reader."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "89a99e58",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.document_loaders.airbyte import AirbyteZendeskSupportLoader\n",
|
||||
"\n",
|
||||
"config = {\n",
|
||||
" # your zendesk-support configuration\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"loader = AirbyteZendeskSupportLoader(config=config, stream_name=\"tickets\") # check the documentation linked above for a list of all streams"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2cea23fc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can load documents the usual way"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dae75cdb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4a93dc2a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As `load` returns a list, it will block until all documents are loaded. To have better control over this process, you can also you the `lazy_load` method which returns an iterator instead:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1782db09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_iterator = loader.lazy_load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a124086",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Keep in mind that by default the page content is empty and the metadata object contains all the information from the record. To create documents in a different, pass in a record_handler function when creating the loader:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5671395d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"\n",
|
||||
"def handle_record(record, id):\n",
|
||||
" return Document(page_content=record.data[\"title\"], metadata=record.data)\n",
|
||||
"\n",
|
||||
"loader = AirbyteZendeskSupportLoader(config=config, record_handler=handle_record, stream_name=\"tickets\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "223eb8bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Incremental loads\n",
|
||||
"\n",
|
||||
"Some streams allow incremental loading, this means the source keeps track of synced records and won't load them again. This is useful for sources that have a high volume of data and are updated frequently.\n",
|
||||
"\n",
|
||||
"To take advantage of this, store the `last_state` property of the loader and pass it in when creating the loader again. This will ensure that only new records are loaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7061e735",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"last_state = loader.last_state # store safely\n",
|
||||
"\n",
|
||||
"incremental_loader = AirbyteZendeskSupportLoader(config=config, stream_name=\"tickets\", state=last_state)\n",
|
||||
"\n",
|
||||
"new_docs = incremental_loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
325
docs/extras/integrations/document_loaders/arcgis.ipynb
Normal file
@@ -0,0 +1,325 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "62359e08-cf80-4210-a30c-f450000e65b9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ArcGISLoader\n",
|
||||
"\n",
|
||||
"This notebook demonstrates the use of the `langchain.document_loaders.ArcGISLoader` class.\n",
|
||||
"\n",
|
||||
"You will need to install the ArcGIS API for Python `arcgis` and, optionally, `bs4.BeautifulSoup`.\n",
|
||||
"\n",
|
||||
"You can use an `arcgis.gis.GIS` object for authenticated data loading, or leave it blank to access public data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "b782cab5-0584-4e2a-9073-009fb8dc93a3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import ArcGISLoader\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"url = \"https://maps1.vcgov.org/arcgis/rest/services/Beaches/MapServer/7\"\n",
|
||||
"\n",
|
||||
"loader = ArcGISLoader(url)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "aa3053cf-4127-43ea-bf56-e378b348091f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 4.04 ms, sys: 1.63 ms, total: 5.67 ms\n",
|
||||
"Wall time: 644 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a2444519-9117-4feb-8bb9-8931ce286fa5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"dict_keys(['url', 'layer_description', 'item_description', 'layer_properties'])"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].metadata.keys()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "6b6e9107-6a80-4ef7-8149-3013faa2de76",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"KeysView({\n",
|
||||
" \"currentVersion\": 10.81,\n",
|
||||
" \"id\": 7,\n",
|
||||
" \"name\": \"Beach Ramps\",\n",
|
||||
" \"type\": \"Feature Layer\",\n",
|
||||
" \"description\": \"\",\n",
|
||||
" \"geometryType\": \"esriGeometryPoint\",\n",
|
||||
" \"sourceSpatialReference\": {\n",
|
||||
" \"wkid\": 2881,\n",
|
||||
" \"latestWkid\": 2881\n",
|
||||
" },\n",
|
||||
" \"copyrightText\": \"\",\n",
|
||||
" \"parentLayer\": null,\n",
|
||||
" \"subLayers\": [],\n",
|
||||
" \"minScale\": 750000,\n",
|
||||
" \"maxScale\": 0,\n",
|
||||
" \"drawingInfo\": {\n",
|
||||
" \"renderer\": {\n",
|
||||
" \"type\": \"simple\",\n",
|
||||
" \"symbol\": {\n",
|
||||
" \"type\": \"esriPMS\",\n",
|
||||
" \"url\": \"9bb2e5ca499bb68aa3ee0d4e1ecc3849\",\n",
|
||||
" \"imageData\": \"iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAAXNSR0IB2cksfwAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAJJJREFUOI3NkDEKg0AQRZ9kkSnSGBshR7DJqdJYeg7BMpcS0uQWQsqoCLExkcUJzGqT38zw2fcY1rEzbp7vjXz0EXC7gBxs1ABcG/8CYkCcDqwyLqsV+RlV0I/w7PzuJBArr1VB20H58Ls6h+xoFITkTwWpQJX7XSIBAnFwVj7MLAjJV/AC6G3QoAmK+74Lom04THTBEp/HCSc6AAAAAElFTkSuQmCC\",\n",
|
||||
" \"contentType\": \"image/png\",\n",
|
||||
" \"width\": 12,\n",
|
||||
" \"height\": 12,\n",
|
||||
" \"angle\": 0,\n",
|
||||
" \"xoffset\": 0,\n",
|
||||
" \"yoffset\": 0\n",
|
||||
" },\n",
|
||||
" \"label\": \"\",\n",
|
||||
" \"description\": \"\"\n",
|
||||
" },\n",
|
||||
" \"transparency\": 0,\n",
|
||||
" \"labelingInfo\": null\n",
|
||||
" },\n",
|
||||
" \"defaultVisibility\": true,\n",
|
||||
" \"extent\": {\n",
|
||||
" \"xmin\": -81.09480168806815,\n",
|
||||
" \"ymin\": 28.858349245353473,\n",
|
||||
" \"xmax\": -80.77512908572814,\n",
|
||||
" \"ymax\": 29.41078388840041,\n",
|
||||
" \"spatialReference\": {\n",
|
||||
" \"wkid\": 4326,\n",
|
||||
" \"latestWkid\": 4326\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" \"hasAttachments\": false,\n",
|
||||
" \"htmlPopupType\": \"esriServerHTMLPopupTypeNone\",\n",
|
||||
" \"displayField\": \"AccessName\",\n",
|
||||
" \"typeIdField\": null,\n",
|
||||
" \"subtypeFieldName\": null,\n",
|
||||
" \"subtypeField\": null,\n",
|
||||
" \"defaultSubtypeCode\": null,\n",
|
||||
" \"fields\": [\n",
|
||||
" {\n",
|
||||
" \"name\": \"OBJECTID\",\n",
|
||||
" \"type\": \"esriFieldTypeOID\",\n",
|
||||
" \"alias\": \"OBJECTID\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"Shape\",\n",
|
||||
" \"type\": \"esriFieldTypeGeometry\",\n",
|
||||
" \"alias\": \"Shape\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessName\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessName\",\n",
|
||||
" \"length\": 40,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessID\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessID\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessType\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessType\",\n",
|
||||
" \"length\": 25,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"GeneralLoc\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"GeneralLoc\",\n",
|
||||
" \"length\": 100,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"MilePost\",\n",
|
||||
" \"type\": \"esriFieldTypeDouble\",\n",
|
||||
" \"alias\": \"MilePost\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"City\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"City\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessStatus\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessStatus\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"Entry_Date_Time\",\n",
|
||||
" \"type\": \"esriFieldTypeDate\",\n",
|
||||
" \"alias\": \"Entry_Date_Time\",\n",
|
||||
" \"length\": 8,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"DrivingZone\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"DrivingZone\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"geometryField\": {\n",
|
||||
" \"name\": \"Shape\",\n",
|
||||
" \"type\": \"esriFieldTypeGeometry\",\n",
|
||||
" \"alias\": \"Shape\"\n",
|
||||
" },\n",
|
||||
" \"indexes\": null,\n",
|
||||
" \"subtypes\": [],\n",
|
||||
" \"relationships\": [],\n",
|
||||
" \"canModifyLayer\": true,\n",
|
||||
" \"canScaleSymbols\": false,\n",
|
||||
" \"hasLabels\": false,\n",
|
||||
" \"capabilities\": \"Map,Query,Data\",\n",
|
||||
" \"maxRecordCount\": 1000,\n",
|
||||
" \"supportsStatistics\": true,\n",
|
||||
" \"supportsAdvancedQueries\": true,\n",
|
||||
" \"supportedQueryFormats\": \"JSON, geoJSON\",\n",
|
||||
" \"isDataVersioned\": false,\n",
|
||||
" \"ownershipBasedAccessControlForFeatures\": {\n",
|
||||
" \"allowOthersToQuery\": true\n",
|
||||
" },\n",
|
||||
" \"useStandardizedQueries\": true,\n",
|
||||
" \"advancedQueryCapabilities\": {\n",
|
||||
" \"useStandardizedQueries\": true,\n",
|
||||
" \"supportsStatistics\": true,\n",
|
||||
" \"supportsHavingClause\": true,\n",
|
||||
" \"supportsCountDistinct\": true,\n",
|
||||
" \"supportsOrderBy\": true,\n",
|
||||
" \"supportsDistinct\": true,\n",
|
||||
" \"supportsPagination\": true,\n",
|
||||
" \"supportsTrueCurve\": true,\n",
|
||||
" \"supportsReturningQueryExtent\": true,\n",
|
||||
" \"supportsQueryWithDistance\": true,\n",
|
||||
" \"supportsSqlExpression\": true\n",
|
||||
" },\n",
|
||||
" \"supportsDatumTransformation\": true,\n",
|
||||
" \"dateFieldsTimeReference\": null,\n",
|
||||
" \"supportsCoordinatesQuantization\": true\n",
|
||||
"})"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].metadata['layer_properties'].keys()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "1d132b7d-5a13-4d66-98e8-785ffdf87af0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{\"OBJECTID\": 2, \"AccessName\": \"27TH AV\", \"AccessID\": \"NS-141\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3600 BLK S ATLANTIC AV\", \"MilePost\": 4.83, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 7, \"AccessName\": \"BEACHWAY AV\", \"AccessID\": \"NS-106\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1400 N ATLANTIC AV\", \"MilePost\": 1.57, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 10, \"AccessName\": \"SEABREEZE BLVD\", \"AccessID\": \"DB-051\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"500 BLK N ATLANTIC AV\", \"MilePost\": 14.24, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691394892000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 13, \"AccessName\": \"GRANADA BLVD\", \"AccessID\": \"OB-030\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"20 BLK OCEAN SHORE BLVD\", \"MilePost\": 10.02, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 16, \"AccessName\": \"INTERNATIONAL SPEEDWAY BLVD\", \"AccessID\": \"DB-059\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"300 BLK S ATLANTIC AV\", \"MilePost\": 15.27, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691395174000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 26, \"AccessName\": \"UNIVERSITY BLVD\", \"AccessID\": \"DB-048\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"900 BLK N ATLANTIC AV\", \"MilePost\": 13.74, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691394892000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 36, \"AccessName\": \"BEACH ST\", \"AccessID\": \"PI-097\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"4890 BLK S ATLANTIC AV\", \"MilePost\": 25.85, \"City\": \"PONCE INLET\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 40, \"AccessName\": \"BOTEFUHR AV\", \"AccessID\": \"DBS-067\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1900 BLK S ATLANTIC AV\", \"MilePost\": 16.68, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691395124000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 41, \"AccessName\": \"SILVER BEACH AV\", \"AccessID\": \"DB-064\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1000 BLK S ATLANTIC AV\", \"MilePost\": 15.98, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691395174000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 50, \"AccessName\": \"3RD AV\", \"AccessID\": \"NS-118\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1200 BLK HILL ST\", \"MilePost\": 3.25, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 58, \"AccessName\": \"DUNLAWTON BLVD\", \"AccessID\": \"DBS-078\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3400 BLK S ATLANTIC AV\", \"MilePost\": 20.61, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 63, \"AccessName\": \"MILSAP RD\", \"AccessID\": \"OB-037\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"700 BLK S ATLANTIC AV\", \"MilePost\": 11.52, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 68, \"AccessName\": \"EMILIA AV\", \"AccessID\": \"DBS-082\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3790 BLK S ATLANTIC AV\", \"MilePost\": 21.38, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 92, \"AccessName\": \"FLAGLER AV\", \"AccessID\": \"NS-110\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"500 BLK FLAGLER AV\", \"MilePost\": 2.57, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 94, \"AccessName\": \"CRAWFORD RD\", \"AccessID\": \"NS-108\", \"AccessType\": \"OPEN VEHICLE RAMP - PASS\", \"GeneralLoc\": \"800 BLK N ATLANTIC AV\", \"MilePost\": 2.19, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 122, \"AccessName\": \"HARTFORD AV\", \"AccessID\": \"DB-043\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1890 BLK N ATLANTIC AV\", \"MilePost\": 12.76, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"CLOSED - SEASONAL\", \"Entry_Date_Time\": 1691394832000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 125, \"AccessName\": \"WILLIAMS AV\", \"AccessID\": \"DB-042\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"2200 BLK N ATLANTIC AV\", \"MilePost\": 12.5, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 134, \"AccessName\": \"CARDINAL DR\", \"AccessID\": \"OB-036\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"600 BLK S ATLANTIC AV\", \"MilePost\": 11.27, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 229, \"AccessName\": \"EL PORTAL ST\", \"AccessID\": \"DBS-076\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3200 BLK S ATLANTIC AV\", \"MilePost\": 20.04, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 230, \"AccessName\": \"HARVARD DR\", \"AccessID\": \"OB-038\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"900 BLK S ATLANTIC AV\", \"MilePost\": 11.72, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 232, \"AccessName\": \"VAN AV\", \"AccessID\": \"DBS-075\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3100 BLK S ATLANTIC AV\", \"MilePost\": 19.6, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 233, \"AccessName\": \"ROCKEFELLER DR\", \"AccessID\": \"OB-034\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"400 BLK S ATLANTIC AV\", \"MilePost\": 10.9, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"CLOSED - SEASONAL\", \"Entry_Date_Time\": 1691394832000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 235, \"AccessName\": \"MINERVA RD\", \"AccessID\": \"DBS-069\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"2300 BLK S ATLANTIC AV\", \"MilePost\": 17.52, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691395124000, \"DrivingZone\": \"YES\"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for doc in docs:\n",
|
||||
" print(doc.page_content)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
101
docs/extras/integrations/document_loaders/async_chromium.ipynb
Normal file
@@ -0,0 +1,101 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ad553e51",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Async Chromium\n",
|
||||
"\n",
|
||||
"Chromium is one of the browsers supported by Playwright, a library used to control browser automation. \n",
|
||||
"\n",
|
||||
"By running `p.chromium.launch(headless=True)`, we are launching a headless instance of Chromium. \n",
|
||||
"\n",
|
||||
"Headless mode means that the browser is running without a graphical user interface.\n",
|
||||
"\n",
|
||||
"`AsyncChromiumLoader` load the page, and then we use `Html2TextTransformer` to trasnform to text."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1c3a4c19",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install -q playwright beautifulsoup4\n",
|
||||
"! playwright install"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "dd2cdea7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'<!DOCTYPE html><html lang=\"en\"><head><script src=\"https://s0.2mdn.net/instream/video/client.js\" asyn'"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AsyncChromiumLoader\n",
|
||||
"urls = [\"https://www.wsj.com\"]\n",
|
||||
"loader = AsyncChromiumLoader(urls)\n",
|
||||
"docs = loader.load()\n",
|
||||
"docs[0].page_content[0:100]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "013caa7e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Skip to Main ContentSkip to SearchSkip to... Select * Top News * What's News *\\nFeatured Stories * Retirement * Life & Arts * Hip-Hop * Sports * Video *\\nEconomy * Real Estate * Sports * CMO * CIO * CFO * Risk & Compliance *\\nLogistics Report * Sustainable Business * Heard on the Street * Barron’s *\\nMarketWatch * Mansion Global * Penta * Opinion * Journal Reports * Sponsored\\nOffers Explore Our Brands * WSJ * * * * * Barron's * * * * * MarketWatch * * *\\n* * IBD # The Wall Street Journal SubscribeSig\""
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.document_transformers import Html2TextTransformer\n",
|
||||
"html2text = Html2TextTransformer()\n",
|
||||
"docs_transformed = html2text.transform_documents(docs)\n",
|
||||
"docs_transformed[0].page_content[0:500]"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -73,13 +73,27 @@
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "41c8a46f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to use an alternative loader, you can provide a custom function, for example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eba3002d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
"source": [
|
||||
"from langchain.document_loaders import PyPDFLoader\n",
|
||||
"def load_pdf(file_path):\n",
|
||||
" return PyPDFLoader(file_path)\n",
|
||||
"\n",
|
||||
"loader = GCSFileLoader(project_name=\"aist\", bucket=\"testing-hwc\", blob=\"fake.pdf\", loader_func=load_pdf)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -9,66 +9,16 @@
|
||||
"\n",
|
||||
"GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents.\n",
|
||||
"\n",
|
||||
"It is particularly good for sturctured PDFs, like academic papers.\n",
|
||||
"It is designed and expected to be used to parse academic papers, where it works particularly well. Note: if the articles supplied to Grobid are large documents (e.g. dissertations) exceeding a certain number of elements, they might not be processed. \n",
|
||||
"\n",
|
||||
"This loader uses GROBIB to parse PDFs into `Documents` that retain metadata associated with the section of text.\n",
|
||||
"This loader uses Grobid to parse PDFs into `Documents` that retain metadata associated with the section of text.\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"The best approach is to install Grobid via docker, see https://grobid.readthedocs.io/en/latest/Grobid-docker/. \n",
|
||||
"\n",
|
||||
"For users on `Mac` - \n",
|
||||
"(Note: additional instructions can be found [here](https://python.langchain.com/docs/extras/integrations/providers/grobid.mdx).)\n",
|
||||
"\n",
|
||||
"(Note: additional instructions can be found [here](https://python.langchain.com/docs/ecosystem/integrations/grobid.mdx).)\n",
|
||||
"\n",
|
||||
"Install Java (Apple Silicon):\n",
|
||||
"```\n",
|
||||
"$ arch -arm64 brew install openjdk@11\n",
|
||||
"$ brew --prefix openjdk@11\n",
|
||||
"/opt/homebrew/opt/openjdk@ 11\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"In `~/.zshrc`:\n",
|
||||
"```\n",
|
||||
"export JAVA_HOME=/opt/homebrew/opt/openjdk@11\n",
|
||||
"export PATH=$JAVA_HOME/bin:$PATH\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Then, in Terminal:\n",
|
||||
"```\n",
|
||||
"$ source ~/.zshrc\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Confirm install:\n",
|
||||
"```\n",
|
||||
"$ which java\n",
|
||||
"/opt/homebrew/opt/openjdk@11/bin/java\n",
|
||||
"$ java -version \n",
|
||||
"openjdk version \"11.0.19\" 2023-04-18\n",
|
||||
"OpenJDK Runtime Environment Homebrew (build 11.0.19+0)\n",
|
||||
"OpenJDK 64-Bit Server VM Homebrew (build 11.0.19+0, mixed mode)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Then, get [Grobid](https://grobid.readthedocs.io/en/latest/Install-Grobid/#getting-grobid):\n",
|
||||
"```\n",
|
||||
"$ curl -LO https://github.com/kermitt2/grobid/archive/0.7.3.zip\n",
|
||||
"$ unzip 0.7.3.zip\n",
|
||||
"```\n",
|
||||
" \n",
|
||||
"Build\n",
|
||||
"```\n",
|
||||
"$ ./gradlew clean install\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Then, run the server:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "2d8992fc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')"
|
||||
"Once grobid is up-and-running you can interact as described below. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
139
docs/extras/integrations/document_loaders/pubmed.ipynb
Normal file
@@ -0,0 +1,139 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3df0dcf8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PubMed\n",
|
||||
"\n",
|
||||
">[PubMed®](https://pubmed.ncbi.nlm.nih.gov/) by `The National Center for Biotechnology Information, National Library of Medicine` comprises more than 35 million citations for biomedical literature from `MEDLINE`, life science journals, and online books. Citations may include links to full text content from `PubMed Central` and publisher web sites."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "aecaff63",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import PubMedLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "f2f7e8d3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = PubMedLoader(\"chatgpt\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "ed115aa1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "b68d3264-b893-45e4-8ab0-077b25a586dc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"3"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "9f4626d2-068d-4aed-9ffe-ad754ad4b4cd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'uid': '37548997',\n",
|
||||
" 'Title': 'Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom.',\n",
|
||||
" 'Published': '2023-08-07',\n",
|
||||
" 'Copyright Information': '©Robin J Borchert, Charlotte R Hickman, Jack Pepys, Timothy J Sadler. Originally published in JMIR Medical Education (https://mededu.jmir.org), 07.08.2023.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[1].metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "8000f687-b500-4cce-841b-70d6151304da",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"BACKGROUND: ChatGPT is a large language model that has performed well on professional examinations in the fields of medicine, law, and business. However, it is unclear how ChatGPT would perform on an examination assessing professionalism and situational judgement for doctors.\\nOBJECTIVE: We evaluated the performance of ChatGPT on the Situational Judgement Test (SJT): a national examination taken by all final-year medical students in the United Kingdom. This examination is designed to assess attributes such as communication, teamwork, patient safety, prioritization skills, professionalism, and ethics.\\nMETHODS: All questions from the UK Foundation Programme Office's (UKFPO's) 2023 SJT practice examination were inputted into ChatGPT. For each question, ChatGPT's answers and rationales were recorded and assessed on the basis of the official UK Foundation Programme Office scoring template. Questions were categorized into domains of Good Medical Practice on the basis of the domains referenced in the rationales provided in the scoring sheet. Questions without clear domain links were screened by reviewers and assigned one or multiple domains. ChatGPT's overall performance, as well as its performance across the domains of Good Medical Practice, was evaluated.\\nRESULTS: Overall, ChatGPT performed well, scoring 76% on the SJT but scoring full marks on only a few questions (9%), which may reflect possible flaws in ChatGPT's situational judgement or inconsistencies in the reasoning across questions (or both) in the examination itself. ChatGPT demonstrated consistent performance across the 4 outlined domains in Good Medical Practice for doctors.\\nCONCLUSIONS: Further research is needed to understand the potential applications of large language models, such as ChatGPT, in medical education for standardizing questions and providing consistent rationales for examinations assessing professionalism and ethics.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[1].page_content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1070e571-697d-4c33-9a4f-0b2dd6909629",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -9,7 +9,7 @@
|
||||
"\n",
|
||||
"We may want to process load all URLs under a root directory.\n",
|
||||
"\n",
|
||||
"For example, let's look at the [LangChain JS documentation](https://js.langchain.com/docs/).\n",
|
||||
"For example, let's look at the [Python 3.9 Document](https://docs.python.org/3.9/).\n",
|
||||
"\n",
|
||||
"This has many interesting child pages that we may want to read in bulk.\n",
|
||||
"\n",
|
||||
@@ -19,13 +19,28 @@
|
||||
" \n",
|
||||
"We do this using the `RecursiveUrlLoader`.\n",
|
||||
"\n",
|
||||
"This also gives us the flexibility to exclude some children (e.g., the `api` directory with > 800 child pages)."
|
||||
"This also gives us the flexibility to exclude some children, customize the extractor, and more."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1be8094f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Parameters\n",
|
||||
"- url: str, the target url to crawl.\n",
|
||||
"- exclude_dirs: Optional[str], webpage directories to exclude.\n",
|
||||
"- use_async: Optional[bool], wether to use async requests, using async requests is usually faster in large tasks. However, async will disable the lazy loading feature(the function still works, but it is not lazy). By default, it is set to False.\n",
|
||||
"- extractor: Optional[Callable[[str], str]], a function to extract the text of the document from the webpage, by default it returns the page as it is. It is recommended to use tools like goose3 and beautifulsoup to extract the text. By default, it just returns the page as it is.\n",
|
||||
"- max_depth: Optional[int] = None, the maximum depth to crawl. By default, it is set to 2. If you need to crawl the whole website, set it to a number that is large enough would simply do the job.\n",
|
||||
"- timeout: Optional[int] = None, the timeout for each request, in the unit of seconds. By default, it is set to 10.\n",
|
||||
"- prevent_outside: Optional[bool] = None, whether to prevent crawling outside the root url. By default, it is set to True."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "2e3532b2",
|
||||
"execution_count": null,
|
||||
"id": "23c18539",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -42,13 +57,15 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "d69e5620",
|
||||
"execution_count": null,
|
||||
"id": "55394afe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = \"https://js.langchain.com/docs/modules/memory/examples/\"\n",
|
||||
"loader = RecursiveUrlLoader(url=url)\n",
|
||||
"from bs4 import BeautifulSoup as Soup\n",
|
||||
"\n",
|
||||
"url = \"https://docs.python.org/3.9/\"\n",
|
||||
"loader = RecursiveUrlLoader(url=url, max_depth=2, extractor=lambda x: Soup(x, \"html.parser\").text)\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
@@ -61,7 +78,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"12"
|
||||
"'\\n\\n\\n\\n\\nPython Frequently Asked Questions — Python 3.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
@@ -70,19 +87,21 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(docs)"
|
||||
"docs[0].page_content[:50]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "89355b7c",
|
||||
"id": "13bd7e16",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\n\\n\\n\\nBuffer Window Memory | 🦜️🔗 Langchain\\n\\n\\n\\n\\n\\nSki'"
|
||||
"{'source': 'https://docs.python.org/3.9/library/index.html',\n",
|
||||
" 'title': 'The Python Standard Library — Python 3.9.17 documentation',\n",
|
||||
" 'language': None}"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
@@ -91,137 +110,48 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].page_content[:50]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "13bd7e16",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'source': 'https://js.langchain.com/docs/modules/memory/examples/buffer_window_memory',\n",
|
||||
" 'title': 'Buffer Window Memory | 🦜️🔗 Langchain',\n",
|
||||
" 'description': 'BufferWindowMemory keeps track of the back-and-forths in conversation, and then uses a window of size k to surface the last k back-and-forths to use as memory.',\n",
|
||||
" 'language': 'en'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].metadata"
|
||||
"docs[-1].metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "40fc13ef",
|
||||
"id": "5866e5a6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's try a more extensive example, the `docs` root dir.\n",
|
||||
"\n",
|
||||
"We will skip everything under `api`.\n",
|
||||
"\n",
|
||||
"For this, we can `lazy_load` each page as we crawl the tree, using `WebBaseLoader` to load each as we go."
|
||||
"However, since it's hard to perform a perfect filter, you may still see some irrelevant results in the results. You can perform a filter on the returned documents by yourself, if it's needed. Most of the time, the returned results are good enough."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4ec8ecef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Testing on LangChain docs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5c938b9f",
|
||||
"execution_count": 2,
|
||||
"id": "349b5598",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = \"https://js.langchain.com/docs/\"\n",
|
||||
"exclude_dirs = [\"https://js.langchain.com/docs/api/\"]\n",
|
||||
"loader = RecursiveUrlLoader(url=url, exclude_dirs=exclude_dirs)\n",
|
||||
"# Lazy load each\n",
|
||||
"docs = [print(doc) or doc for doc in loader.lazy_load()]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "30ff61d3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load all pages\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "457e30f3",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"188"
|
||||
"8"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"url = \"https://js.langchain.com/docs/modules/memory/integrations/\"\n",
|
||||
"loader = RecursiveUrlLoader(url=url)\n",
|
||||
"docs = loader.load()\n",
|
||||
"len(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "bca80b4a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\n\\n\\n\\nAgent Simulations | 🦜️🔗 Langchain\\n\\n\\n\\n\\n\\nSkip t'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].page_content[:50]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "df97cf22",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'source': 'https://js.langchain.com/docs/use_cases/agent_simulations/',\n",
|
||||
" 'title': 'Agent Simulations | 🦜️🔗 Langchain',\n",
|
||||
" 'description': 'Agent simulations involve taking multiple agents and having them interact with each other.',\n",
|
||||
" 'language': 'en'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].metadata"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -0,0 +1,320 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bda1f3f5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# TensorFlow Datasets\n",
|
||||
"\n",
|
||||
">[TensorFlow Datasets](https://www.tensorflow.org/datasets) is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as [tf.data.Datasets](https://www.tensorflow.org/api_docs/python/tf/data/Dataset), enabling easy-to-use and high-performance input pipelines. To get started see the [guide](https://www.tensorflow.org/datasets/overview) and the [list of datasets](https://www.tensorflow.org/datasets/catalog/overview#all_datasets).\n",
|
||||
"\n",
|
||||
"This notebook shows how to load `TensorFlow Datasets` into a Document format that we can use downstream."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1b7a1eef-7bf7-4e7d-8bfc-c4e27c9488cb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2abd5578-aa3d-46b9-99af-8b262f0b3df8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You need to install `tensorflow` and `tensorflow-datasets` python packages."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2e589036-351e-4c63-b734-c9a05fadb880",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install tensorflow"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b674aaea-ed3a-4541-8414-260a8f67f623",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install tensorflow-datasets"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "95f05e1c-195e-4e2b-ae8e-8d6637f15be6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e66e211e-9419-4dbb-b3cd-afc3cf984305",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As an example, we use the [`mlqa/en` dataset](https://www.tensorflow.org/datasets/catalog/mlqa#mlqaen).\n",
|
||||
"\n",
|
||||
">`MLQA` (`Multilingual Question Answering Dataset`) is a benchmark dataset for evaluating multilingual question answering performance. The dataset consists of 7 languages: Arabic, German, Spanish, English, Hindi, Vietnamese, Chinese.\n",
|
||||
">\n",
|
||||
">- Homepage: https://github.com/facebookresearch/MLQA\n",
|
||||
">- Source code: `tfds.datasets.mlqa.Builder`\n",
|
||||
">- Download size: 72.21 MiB\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8968d645-c81c-4e3b-82bc-a3cbb5ddd93a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Feature structure of `mlqa/en` dataset:\n",
|
||||
"\n",
|
||||
"FeaturesDict({\n",
|
||||
" 'answers': Sequence({\n",
|
||||
" 'answer_start': int32,\n",
|
||||
" 'text': Text(shape=(), dtype=string),\n",
|
||||
" }),\n",
|
||||
" 'context': Text(shape=(), dtype=string),\n",
|
||||
" 'id': string,\n",
|
||||
" 'question': Text(shape=(), dtype=string),\n",
|
||||
" 'title': Text(shape=(), dtype=string),\n",
|
||||
"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "30fcaba5-cc9b-4a0e-a8f4-c047018451c2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import tensorflow as tf\n",
|
||||
"import tensorflow_datasets as tfds"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 78,
|
||||
"id": "e307dd67-029e-4ee3-a65f-e085c09b0b8b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<_TakeDataset element_spec={'answers': {'answer_start': TensorSpec(shape=(None,), dtype=tf.int32, name=None), 'text': TensorSpec(shape=(None,), dtype=tf.string, name=None)}, 'context': TensorSpec(shape=(), dtype=tf.string, name=None), 'id': TensorSpec(shape=(), dtype=tf.string, name=None), 'question': TensorSpec(shape=(), dtype=tf.string, name=None), 'title': TensorSpec(shape=(), dtype=tf.string, name=None)}>"
|
||||
]
|
||||
},
|
||||
"execution_count": 78,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# try directly access this dataset:\n",
|
||||
"ds = tfds.load('mlqa/en', split='test')\n",
|
||||
"ds = ds.take(1) # Only take a single example\n",
|
||||
"ds"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5c9c4b08-d94f-4b53-add0-93769811644e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we have to create a custom function to convert dataset sample into a Document.\n",
|
||||
"\n",
|
||||
"This is a requirement. There is no standard format for the TF datasets that's why we need to make a custom transformation function.\n",
|
||||
"\n",
|
||||
"Let's use `context` field as the `Document.page_content` and place other fields in the `Document.metadata`.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 72,
|
||||
"id": "78844113-f8d8-48a8-8105-685280b6cfa5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='After completing the journey around South America, on 23 February 2006, Queen Mary 2 met her namesake, the original RMS Queen Mary, which is permanently docked at Long Beach, California. Escorted by a flotilla of smaller ships, the two Queens exchanged a \"whistle salute\" which was heard throughout the city of Long Beach. Queen Mary 2 met the other serving Cunard liners Queen Victoria and Queen Elizabeth 2 on 13 January 2008 near the Statue of Liberty in New York City harbour, with a celebratory fireworks display; Queen Elizabeth 2 and Queen Victoria made a tandem crossing of the Atlantic for the meeting. This marked the first time three Cunard Queens have been present in the same location. Cunard stated this would be the last time these three ships would ever meet, due to Queen Elizabeth 2\\'s impending retirement from service in late 2008. However this would prove not to be the case, as the three Queens met in Southampton on 22 April 2008. Queen Mary 2 rendezvoused with Queen Elizabeth 2 in Dubai on Saturday 21 March 2009, after the latter ship\\'s retirement, while both ships were berthed at Port Rashid. With the withdrawal of Queen Elizabeth 2 from Cunard\\'s fleet and its docking in Dubai, Queen Mary 2 became the only ocean liner left in active passenger service.' metadata={'id': '5116f7cccdbf614d60bcd23498274ffd7b1e4ec7', 'title': 'RMS Queen Mary 2', 'question': 'What year did Queen Mary 2 complete her journey around South America?', 'answer': '2006'}\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"2023-08-03 14:27:08.482983: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def decode_to_str(item: tf.Tensor) -> str:\n",
|
||||
" return item.numpy().decode('utf-8')\n",
|
||||
"\n",
|
||||
"def mlqaen_example_to_document(example: dict) -> Document:\n",
|
||||
" return Document(\n",
|
||||
" page_content=decode_to_str(example[\"context\"]),\n",
|
||||
" metadata={\n",
|
||||
" \"id\": decode_to_str(example[\"id\"]),\n",
|
||||
" \"title\": decode_to_str(example[\"title\"]),\n",
|
||||
" \"question\": decode_to_str(example[\"question\"]),\n",
|
||||
" \"answer\": decode_to_str(example[\"answers\"][\"text\"][0]),\n",
|
||||
" },\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
" \n",
|
||||
"for example in ds: \n",
|
||||
" doc = mlqaen_example_to_document(example)\n",
|
||||
" print(doc)\n",
|
||||
" break"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 73,
|
||||
"id": "2d43c834-5145-4793-9558-8e301ccaf3b4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.document_loaders import TensorflowDatasetLoader\n",
|
||||
"\n",
|
||||
"loader = TensorflowDatasetLoader(\n",
|
||||
" dataset_name=\"mlqa/en\",\n",
|
||||
" split_name=\"test\",\n",
|
||||
" load_max_docs=3,\n",
|
||||
" sample_to_document_function=mlqaen_example_to_document,\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e29b954c-1407-4797-ae21-6ba8937156be",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`TensorflowDatasetLoader` has these parameters:\n",
|
||||
"- `dataset_name`: the name of the dataset to load\n",
|
||||
"- `split_name`: the name of the split to load. Defaults to \"train\".\n",
|
||||
"- `load_max_docs`: a limit to the number of loaded documents. Defaults to 100.\n",
|
||||
"- `sample_to_document_function`: a function that converts a dataset sample to a Document\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 74,
|
||||
"id": "700e4ef2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"2023-08-03 14:27:22.998964: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"3"
|
||||
]
|
||||
},
|
||||
"execution_count": 74,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs = loader.load()\n",
|
||||
"len(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 76,
|
||||
"id": "9138940a-e9fe-4145-83e8-77589b5272c9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'After completing the journey around South America, on 23 February 2006, Queen Mary 2 met her namesake, the original RMS Queen Mary, which is permanently docked at Long Beach, California. Escorted by a flotilla of smaller ships, the two Queens exchanged a \"whistle salute\" which was heard throughout the city of Long Beach. Queen Mary 2 met the other serving Cunard liners Queen Victoria and Queen Elizabeth 2 on 13 January 2008 near the Statue of Liberty in New York City harbour, with a celebratory fireworks display; Queen Elizabeth 2 and Queen Victoria made a tandem crossing of the Atlantic for the meeting. This marked the first time three Cunard Queens have been present in the same location. Cunard stated this would be the last time these three ships would ever meet, due to Queen Elizabeth 2\\'s impending retirement from service in late 2008. However this would prove not to be the case, as the three Queens met in Southampton on 22 April 2008. Queen Mary 2 rendezvoused with Queen Elizabeth 2 in Dubai on Saturday 21 March 2009, after the latter ship\\'s retirement, while both ships were berthed at Port Rashid. With the withdrawal of Queen Elizabeth 2 from Cunard\\'s fleet and its docking in Dubai, Queen Mary 2 became the only ocean liner left in active passenger service.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 76,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].page_content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 77,
|
||||
"id": "2f7f7832-fe4d-4a58-892d-bb987cdbed0b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'id': '5116f7cccdbf614d60bcd23498274ffd7b1e4ec7',\n",
|
||||
" 'title': 'RMS Queen Mary 2',\n",
|
||||
" 'question': 'What year did Queen Mary 2 complete her journey around South America?',\n",
|
||||
" 'answer': '2006'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 77,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "125d073c-4f4f-4ae6-a0c7-9e9db3cc8d69",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,95 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2ed9a4c2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Beautiful Soup\n",
|
||||
"\n",
|
||||
"Beautiful Soup offers fine-grained control over HTML content, enabling specific tag extraction, removal, and content cleaning. \n",
|
||||
"\n",
|
||||
"It's suited for cases where you want to extract specific information and clean up the HTML content according to your needs.\n",
|
||||
"\n",
|
||||
"For example, we can scrape text content within `<p>, <li>, <div>, and <a>` tags from the HTML content:\n",
|
||||
"\n",
|
||||
"* `<p>`: The paragraph tag. It defines a paragraph in HTML and is used to group together related sentences and/or phrases.\n",
|
||||
" \n",
|
||||
"* `<li>`: The list item tag. It is used within ordered (`<ol>`) and unordered (`<ul>`) lists to define individual items within the list.\n",
|
||||
" \n",
|
||||
"* `<div>`: The division tag. It is a block-level element used to group other inline or block-level elements.\n",
|
||||
" \n",
|
||||
"* `<a>`: The anchor tag. It is used to define hyperlinks."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "dd710e5b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AsyncChromiumLoader\n",
|
||||
"from langchain.document_transformers import BeautifulSoupTransformer\n",
|
||||
"\n",
|
||||
"# Load HTML\n",
|
||||
"loader = AsyncChromiumLoader([\"https://www.wsj.com\"])\n",
|
||||
"html = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "052b64dd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Transform\n",
|
||||
"bs_transformer = BeautifulSoupTransformer()\n",
|
||||
"docs_transformed = bs_transformer.transform_documents(html,tags_to_extract=[\"p\", \"li\", \"div\", \"a\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "b53a5307",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Conservative legal activists are challenging Amazon, Comcast and others using many of the same tools that helped kill affirmative-action programs in colleges.1,2099 min read U.S. stock indexes fell and government-bond prices climbed, after Moody’s lowered credit ratings for 10 smaller U.S. banks and said it was reviewing ratings for six larger ones. The Dow industrials dropped more than 150 points.3 min read Penn Entertainment’s Barstool Sportsbook app will be rebranded as ESPN Bet this fall as '"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs_transformed[0].page_content[0:500]"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -14,7 +14,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"execution_count": 1,
|
||||
"id": "60b6dbb2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -42,25 +42,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 2,
|
||||
"id": "9ca87a2e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize a Fireworks LLM\n",
|
||||
"os.environ['FIREWORKS_API_KEY'] = \"\" #change this to your own API KEY\n",
|
||||
"llm = Fireworks(model_id=\"accounts/fireworks/models/fireworks-llama-v2-13b-chat\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "43a11ba8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create LLM chain\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
|
||||
"os.environ['FIREWORKS_API_KEY'] = \"<YOUR_API_KEY>\" # Change this to your own API key\n",
|
||||
"llm = Fireworks(model_id=\"accounts/fireworks/models/llama-v2-13b-chat\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -72,16 +61,33 @@
|
||||
"\n",
|
||||
"You can use the LLMs to call the model for specified prompt(s). \n",
|
||||
"\n",
|
||||
"Current Specified Models: \n",
|
||||
"* accounts/fireworks/models/fireworks-falcon-7b, accounts/fireworks/models/fireworks-falcon-40b-w8a16\n",
|
||||
"* accounts/fireworks/models/fireworks-starcoder-1b-w8a16-1gpu, accounts/fireworks/models/fireworks-starcoder-3b-w8a16-1gpu, accounts/fireworks/models/fireworks-starcoder-7b-w8a16-1gpu, accounts/fireworks/models/fireworks-starcoder-16b-w8a16 \n",
|
||||
"* accounts/fireworks/models/fireworks-llama-v2-13b, accounts/fireworks/models/fireworks-llama-v2-13b-chat, accounts/fireworks/models/fireworks-llama-v2-13b-w8a16, accounts/fireworks/models/fireworks-llama-v2-13b-chat-w8a16\n",
|
||||
"* accounts/fireworks/models/fireworks-llama-v2-7b, accounts/fireworks/models/fireworks-llama-v2-7b-chat, accounts/fireworks/models/fireworks-llama-v2-7b-w8a16, accounts/fireworks/models/fireworks-llama-v2-7b-chat-w8a16, accounts/fireworks/models/fireworks-llama-v2-70b-chat-4gpu"
|
||||
"Currently supported models: \n",
|
||||
"\n",
|
||||
"* Falcon\n",
|
||||
" * `accounts/fireworks/models/falcon-7b`\n",
|
||||
" * `accounts/fireworks/models/falcon-40b-w8a16`\n",
|
||||
"* Llama 2\n",
|
||||
" * `accounts/fireworks/models/llama-v2-7b`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-7b-w8a16`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-7b-chat`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-7b-chat-w8a16`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-13b`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-13b-w8a16`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-13b-chat`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-13b-chat-w8a16`\n",
|
||||
" * `accounts/fireworks/models/llama-v2-70b-chat-4gpu`\n",
|
||||
"* StarCoder\n",
|
||||
" * `accounts/fireworks/models/starcoder-1b-w8a16-1gpu`\n",
|
||||
" * `accounts/fireworks/models/starcoder-3b-w8a16-1gpu`\n",
|
||||
" * `accounts/fireworks/models/starcoder-7b-w8a16-1gpu`\n",
|
||||
" * `accounts/fireworks/models/starcoder-16b-w8a16`\n",
|
||||
"\n",
|
||||
"See the full, most up-to-date list on [app.fireworks.ai](https://app.fireworks.ai)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 3,
|
||||
"id": "bf0a425c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -89,29 +95,41 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Is it Tom Brady, Aaron Rodgers, or someone else? It's a tough question to answer, and there are strong arguments for each of these quarterbacks. Here are some of the reasons why each of these quarterbacks could be considered the best:\n",
|
||||
"\n",
|
||||
"Tom Brady:\n",
|
||||
"\n",
|
||||
"It's a question that has been debated for years, with different analysts and fans making their cases for various signal-callers. Here are some of the top contenders for the title of best quarterback in the NFL:\n",
|
||||
"* He has the most Super Bowl wins (6) of any quarterback in NFL history.\n",
|
||||
"* He has been named Super Bowl MVP four times, more than any other player.\n",
|
||||
"* He has led the New England Patriots to 18 playoff victories, the most in NFL history.\n",
|
||||
"* He has thrown for over 70,000 yards in his career, the most of any quarterback in NFL history.\n",
|
||||
"* He has thrown for 50 or more touchdowns in a season four times, the most of any quarterback in NFL history.\n",
|
||||
"\n",
|
||||
"1. Tom Brady: The New England Patriots legend has won six Super Bowls and has been named Super Bowl MVP four times. He's known for his precision passing, pocket presence, and ability to read defenses.\n",
|
||||
"2. Aaron Rodgers: The Green Bay Packers quarterback has won two Super Bowls and has been named NFL MVP twice. He's known for his quick release, accuracy, and ability to extend plays with his feet.\n",
|
||||
"3. Drew Brees: The New Orleans Saints quarterback has won a Super Bowl and has been named NFL MVP once. He's known for his accuracy, pocket presence, and ability to read defenses.\n",
|
||||
"4. Patrick Mahomes: The Kansas City Chiefs quarterback has won a Super Bowl and has been named NFL MVP twice. He's known for his arm strength, athleticism, and ability to make plays outside of the pocket.\n",
|
||||
"5. Russell Wilson: The Seattle Seahawks quarterback has won a Super Bowl and has been named NFL MVP once. He's known for his mobility, accuracy, and ability to extend plays with his feet.\n",
|
||||
"Aaron Rodgers:\n",
|
||||
"\n",
|
||||
"Of course, there are other talented quarterbacks in the league, such as Lamar Jackson, Deshaun Watson, and Carson Wentz, who could also be considered among the best. Ultimately, the answer to the question of who's the best quarterback in the NFL is subjective and can vary depending on individual perspectives and criteria.\n"
|
||||
"* He has led the Green Bay Packers to a Super Bowl victory in 2010.\n",
|
||||
"* He has been named Super Bowl MVP once.\n",
|
||||
"* He has thrown for over 40,000 yards in his career, the most of any quarterback in NFL history.\n",
|
||||
"* He has thrown for 40 or more touchdowns in a season three times, the most of any quarterback in NFL history.\n",
|
||||
"* He has a career passer rating of 103.1, the highest of any quarterback in NFL history.\n",
|
||||
"\n",
|
||||
"So, who's the best quarterback in the NFL? It's a tough call, but here's my opinion:\n",
|
||||
"\n",
|
||||
"I think Aaron Rodgers is the best quarterback in the NFL right now. He has led the Packers to a Super Bowl victory and has had some incredible seasons, including the 2011 season when he threw for 45 touchdowns and just 6 interceptions. He has a strong arm, great accuracy, and is incredibly mobile for a quarterback of his size. He also has a great sense of timing and knows when to take risks and when to play it safe.\n",
|
||||
"\n",
|
||||
"Tom Brady is a close second, though. He has an incredible track record of success, including six Super Bowl victories, and has been one of the most consistent quarterbacks in the league for the past two decades. He has a strong arm and is incredibly accurate\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"#single prompt\n",
|
||||
"# Single prompt\n",
|
||||
"output = llm(\"Who's the best quarterback in the NFL?\")\n",
|
||||
"print(output)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 4,
|
||||
"id": "afc7de6f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -119,19 +137,22 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"generations=[[Generation(text=\"\\nWho is the best cricket player in the world in 2016?\\nThe best cricket player in the world in 2016 is Virat Kohli. The Indian captain has had a fabulous year, scoring heavily in all formats of the game, leading India to several victories, and breaking several records. In Test cricket, Kohli has scored 1215 runs at an average of 75.33 with 6 centuries and 4 fifties, which is the highest number of runs scored by any player in a calendar year. In ODI cricket, he has scored 1143 runs at an average of 83.42 with 7 centuries and 6 fifties, which is also the highest number of runs scored by any player in a calendar year. Additionally, Kohli has led India to the number one ranking in Test cricket, and has been named the ICC Test Player of the Year and the ICC ODI Player of the Year.\\nVirat Kohli has been in incredible form in 2016, and his performances have made him the standout player of the year. Other players who have had a great year include Steve Smith, Joe Root, and Kane Williamson, but Kohli's consistency and dominance in all formats of the game make him the best cricket player in the world in 2016.\", generation_info=None)], [Generation(text=\"\\n\\nA: LeBron James.\\n\\nB: Kevin Durant.\\n\\nC: Steph Curry.\\n\\nD: James Harden.\\n\\nE: Other (please specify).\\n\\nWhat's your answer?\", generation_info=None)]] llm_output={'token_usage': {}, 'model_id': 'fireworks-llama-v2-13b-chat'} run=[RunInfo(run_id=UUID('d14b6bee-7692-46ad-8798-acb6f72fc7fb')), RunInfo(run_id=UUID('b9f5b3b5-9e62-4eaf-b269-ecf0cbbcfb82'))]\n"
|
||||
"[[Generation(text='\\nThe best cricket player in 2016 is a matter of opinion, but some of the top contenders for the title include:\\n\\n1. Virat Kohli (India): Kohli had a phenomenal year in 2016, scoring over 1,000 runs in Test cricket, including four centuries, and averaging over 70. He also scored heavily in ODI cricket, with an average of over 80.\\n2. Steve Smith (Australia): Smith had a remarkable year in 2016, leading Australia to a Test series victory in India and scoring over 1,000 runs in the format, including five centuries. He also averaged over 60 in ODI cricket.\\n3. KL Rahul (India): Rahul had a breakout year in 2016, scoring over 1,000 runs in Test cricket, including four centuries, and averaging over 60. He also scored heavily in ODI cricket, with an average of over 70.\\n4. Joe Root (England): Root had a solid year in 2016, scoring over 1,000 runs in Test cricket, including four centuries, and averaging over 50. He also scored heavily in ODI cricket, with an average of over 80.\\n5. Quinton de Kock (South Africa): De Kock had a remarkable year in 2016, scoring over 1,000 runs in ODI cricket, including six centuries, and averaging over 80. He also scored heavily in Test cricket, with an average of over 50.\\n\\nThese are just a few of the top contenders for the title of best cricket player in 2016, but there were many other talented players who also had impressive years. Ultimately, the answer to this question is subjective and depends on individual opinions and criteria for evaluation.', generation_info=None)], [Generation(text=\"\\nThis is a tough one, as there are so many great players in the league right now. But if I had to choose one, I'd say LeBron James is the best basketball player in the league. He's a once-in-a-generation talent who can dominate the game in so many ways. He's got incredible speed, strength, and court vision, and he's always finding new ways to improve his game. Plus, he's been doing it at an elite level for over a decade now, which is just amazing.\\n\\nBut don't just take my word for it - there are plenty of other great players in the league who could make a strong case for being the best. Guys like Kevin Durant, Steph Curry, James Harden, and Giannis Antetokounmpo are all having incredible seasons, and they've all got their own unique skills and strengths that make them special. So ultimately, it's up to you to decide who you think is the best basketball player in the league.\", generation_info=None)]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"#calling multiple prompts\n",
|
||||
"output = llm.generate([\"Who's the best cricket player in 2016?\", \"Who's the best basketball player in the league?\"])\n",
|
||||
"print(output)"
|
||||
"# Calling multiple prompts\n",
|
||||
"output = llm.generate([\n",
|
||||
" \"Who's the best cricket player in 2016?\",\n",
|
||||
" \"Who's the best basketball player in the league?\",\n",
|
||||
"])\n",
|
||||
"print(output.generations)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 5,
|
||||
"id": "b801c20d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -140,13 +161,13 @@
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Kansas City in December can be quite chilly, with average high\n"
|
||||
"Kansas City in December is quite cold, with temperatures typically r\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"#setting parameters: model_id, temperature, max_tokens, top_p\n",
|
||||
"llm = Fireworks(model_id=\"accounts/fireworks/models/fireworks-llama-v2-13b-chat\", temperature=0.7, max_tokens=15, top_p=1.0)\n",
|
||||
"# Setting additional parameters: temperature, max_tokens, top_p\n",
|
||||
"llm = Fireworks(model_id=\"accounts/fireworks/models/llama-v2-13b-chat\", temperature=0.7, max_tokens=15, top_p=1.0)\n",
|
||||
"print(llm(\"What's the weather like in Kansas City in December?\"))"
|
||||
]
|
||||
},
|
||||
@@ -162,7 +183,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"execution_count": 6,
|
||||
"id": "fd2c6bc1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -171,34 +192,25 @@
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"(Note: I'm just an AI and not a branding expert, so take this as a starting point for your own research and brainstorming.)\n",
|
||||
"A good name for a company that makes football helmets could be:\n",
|
||||
"Naming a company can be a fun and creative process! Here are a few name ideas for a company that makes football helmets:\n",
|
||||
"\n",
|
||||
"1. Helix Pro: This name plays off the idea of a helix, or spiral, shape that is commonly associated with football helmets. \"Pro\" implies a professional-level product.\n",
|
||||
"2. Gridiron Gear: \"Gridiron\" is a term used to describe a football field, and \"gear\" highlights the company's focus on producing high-quality football helmets.\n",
|
||||
"3. Linebacker Lab: \"Linebacker\" is a position on the football field, and \"Lab\" suggests a focus on research and development.\n",
|
||||
"4. Helmet Hut: This name is simple and easy to remember, and it immediately conveys the company's focus on football helmets.\n",
|
||||
"5. Tackle Tech: \"Tackle\" is a term used in football to describe a hit or collision, and \"Tech\" implies a focus on advanced technology and innovation.\n",
|
||||
"6. Victory Vest: \"Victory\" implies a focus on winning and success, and \"Vest\" could suggest a protective or armored design.\n",
|
||||
"7. Pigskin Pro: \"Pigskin\" is a term used to describe a football, and \"Pro\" implies a professional-level product.\n",
|
||||
"8. Football Fusion: This name could suggest a combination of different materials or technologies to create a high-quality football helmet.\n",
|
||||
"9. Endzone Edge: \"Endzone\" is the area of the football field where a team scores a touchdown, and \"Edge\" implies a competitive advantage.\n",
|
||||
"10. MVP Masks: \"MVP\" stands for \"Most Valuable Player,\" and \"Masks\" highlights the protective nature of the company's football helmets.\n",
|
||||
"1. Helix Headgear: This name plays off the idea of the helix shape of a football helmet and could be a memorable and catchy name for a company.\n",
|
||||
"2. Gridiron Gear: \"Gridiron\" is a term used to describe a football field, and \"gear\" refers to the products the company sells. This name is straightforward and easy to understand.\n",
|
||||
"3. Cushion Crusaders: This name emphasizes the protective qualities of football helmets and could appeal to customers looking for safety-conscious products.\n",
|
||||
"4. Helmet Heroes: This name has a fun, heroic tone and could appeal to customers looking for high-quality products.\n",
|
||||
"5. Tackle Tech: \"Tackle\" is a term used in football to describe a player's attempt to stop an opponent, and \"tech\" refers to the technology used in the helmets. This name could appeal to customers interested in innovative products.\n",
|
||||
"6. Padded Protection: This name emphasizes the protective qualities of football helmets and could appeal to customers looking for products that prioritize safety.\n",
|
||||
"7. Gridiron Gear Co.: This name is simple and straightforward, and it clearly conveys the company's focus on football-related products.\n",
|
||||
"8. Helmet Haven: This name has a soothing, protective tone and could appeal to customers looking for a reliable brand.\n",
|
||||
"\n",
|
||||
"Remember, the name you choose for your company should be memorable, easy to pronounce and spell, and convey a sense of quality and professionalism. It's also important to check that the name isn't already in use by another company, and to consider any potential trademark issues.\n"
|
||||
"Remember to choose a name that reflects your company's values and mission, and that resonates with your target market. Good luck with your company!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"human_message_prompt = HumanMessagePromptTemplate(\n",
|
||||
" prompt=PromptTemplate(\n",
|
||||
" template=\"What is a good name for a company that makes {product}?\",\n",
|
||||
" input_variables=[\"product\"],\n",
|
||||
" )\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"human_message_prompt = HumanMessagePromptTemplate.from_template(\"What is a good name for a company that makes {product}?\")\n",
|
||||
"chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])\n",
|
||||
"chat = Fireworks()\n",
|
||||
"chat = FireworksChat()\n",
|
||||
"chain = LLMChain(llm=chat, prompt=chat_prompt_template)\n",
|
||||
"output = chain.run(\"football helmets\")\n",
|
||||
"\n",
|
||||
@@ -222,7 +234,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.11.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -31,12 +31,11 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms.bedrock import Bedrock\n",
|
||||
"from langchain.llms import Bedrock\n",
|
||||
"\n",
|
||||
"llm = Bedrock(\n",
|
||||
" credentials_profile_name=\"bedrock-admin\",\n",
|
||||
" model_id=\"amazon.titan-tg1-large\",\n",
|
||||
" endpoint_url=\"custom_endpoint_url\",\n",
|
||||
" model_id=\"amazon.titan-tg1-large\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
||||
78
docs/extras/integrations/llms/deepsparse.ipynb
Normal file
@@ -0,0 +1,78 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "15d7ce70-8879-42a0-86d9-a3d604a3ec83",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# DeepSparse\n",
|
||||
"\n",
|
||||
"This page covers how to use the [DeepSparse](https://github.com/neuralmagic/deepsparse) inference runtime within LangChain.\n",
|
||||
"It is broken into two parts: installation and setup, and then examples of DeepSparse usage.\n",
|
||||
"\n",
|
||||
"## Installation and Setup\n",
|
||||
"\n",
|
||||
"- Install the Python package with `pip install deepsparse`\n",
|
||||
"- Choose a [SparseZoo model](https://sparsezoo.neuralmagic.com/?useCase=text_generation) or export a support model to ONNX [using Optimum](https://github.com/neuralmagic/notebooks/blob/main/notebooks/opt-text-generation-deepsparse-quickstart/OPT_Text_Generation_DeepSparse_Quickstart.ipynb)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"There exists a DeepSparse LLM wrapper, that provides a unified interface for all models:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "79d24d37-737a-428c-b6c5-84c1633070d7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import DeepSparse\n",
|
||||
"\n",
|
||||
"llm = DeepSparse(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none')\n",
|
||||
"\n",
|
||||
"print(llm('def fib():'))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ea7ea674-d6b0-49d9-9c2b-014032973be6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Additional parameters can be passed using the `config` parameter:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ff61b845-41e6-4457-8625-6e21a11bfe7c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"config = {'max_generated_tokens': 256}\n",
|
||||
"\n",
|
||||
"llm = DeepSparse(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none', config=config)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -241,7 +241,9 @@
|
||||
"# Make sure the model path is correct for your system!\n",
|
||||
"llm = LlamaCpp(\n",
|
||||
" model_path=\"/Users/rlm/Desktop/Code/llama/llama-2-7b-ggml/llama-2-7b-chat.ggmlv3.q4_0.bin\",\n",
|
||||
" input={\"temperature\": 0.75, \"max_length\": 2000, \"top_p\": 1},\n",
|
||||
" temperature=0.75,\n",
|
||||
" max_length=2000,\n",
|
||||
" top_p=1,\n",
|
||||
" callback_manager=callback_manager,\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
|
||||
340
docs/extras/integrations/llms/ollama.ipynb
Normal file
@@ -0,0 +1,340 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ollama\n",
|
||||
"\n",
|
||||
"[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as Llama 2, locally.\n",
|
||||
"\n",
|
||||
"Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. \n",
|
||||
"\n",
|
||||
"It optimizes setup and configuration details, including GPU usage.\n",
|
||||
"\n",
|
||||
"For a complete list of supported models and model variants, see the [Ollama model library](https://github.com/jmorganca/ollama#model-library).\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n",
|
||||
"\n",
|
||||
"* [Download](https://ollama.ai/download)\n",
|
||||
"* Fetch a model, e.g., `Llama-7b`: `ollama pull llama2`\n",
|
||||
"* Run `ollama run llama2`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Usage\n",
|
||||
"\n",
|
||||
"You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain.llms.ollama.Ollama.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 38,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import Ollama\n",
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler \n",
|
||||
"llm = Ollama(base_url=\"http://localhost:11434\", \n",
|
||||
" model=\"llama2\", \n",
|
||||
" callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"With `StreamingStdOutCallbackHandler`, you will see tokens streamed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 40,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Great! The history of Artificial Intelligence (AI) is a fascinating and complex topic that spans several decades. Here's a brief overview:\n",
|
||||
"\n",
|
||||
"1. Early Years (1950s-1960s): The term \"Artificial Intelligence\" was coined in 1956 by computer scientist John McCarthy. However, the concept of AI dates back to ancient Greece, where mythical creatures like Talos and Hephaestus were created to perform tasks without any human intervention. In the 1950s and 1960s, researchers began exploring ways to replicate human intelligence using computers, leading to the development of simple AI programs like ELIZA (1966) and PARRY (1972).\n",
|
||||
"2. Rule-Based Systems (1970s-1980s): As computing power increased, researchers developed rule-based systems, such as Mycin (1976), which could diagnose medical conditions based on a set of rules. This period also saw the rise of expert systems, like EDICT (1985), which mimicked human experts in specific domains.\n",
|
||||
"3. Machine Learning (1990s-2000s): With the advent of big data and machine learning algorithms, AI evolved to include neural networks, decision trees, and other techniques for training models on large datasets. This led to the development of applications like speech recognition (e.g., Siri, Alexa), image recognition (e.g., Google Image Search), and natural language processing (e.g., chatbots).\n",
|
||||
"4. Deep Learning (2010s-present): The rise of deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), has enabled AI to perform complex tasks like image and speech recognition, natural language processing, and even autonomous driving. Companies like Google, Facebook, and Baidu have invested heavily in deep learning research, leading to breakthroughs in areas like facial recognition, object detection, and machine translation.\n",
|
||||
"5. Current Trends (present-future): AI is currently being applied to various industries, including healthcare, finance, education, and entertainment. With the growth of cloud computing, edge AI, and autonomous systems, we can expect to see more sophisticated AI applications in the near future. However, there are also concerns about the ethical implications of AI, such as data privacy, algorithmic bias, and job displacement.\n",
|
||||
"\n",
|
||||
"Remember, AI has a long history, and its development is an ongoing process. As technology advances, we can expect to see even more innovative applications of AI in various fields."
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\nGreat! The history of Artificial Intelligence (AI) is a fascinating and complex topic that spans several decades. Here\\'s a brief overview:\\n\\n1. Early Years (1950s-1960s): The term \"Artificial Intelligence\" was coined in 1956 by computer scientist John McCarthy. However, the concept of AI dates back to ancient Greece, where mythical creatures like Talos and Hephaestus were created to perform tasks without any human intervention. In the 1950s and 1960s, researchers began exploring ways to replicate human intelligence using computers, leading to the development of simple AI programs like ELIZA (1966) and PARRY (1972).\\n2. Rule-Based Systems (1970s-1980s): As computing power increased, researchers developed rule-based systems, such as Mycin (1976), which could diagnose medical conditions based on a set of rules. This period also saw the rise of expert systems, like EDICT (1985), which mimicked human experts in specific domains.\\n3. Machine Learning (1990s-2000s): With the advent of big data and machine learning algorithms, AI evolved to include neural networks, decision trees, and other techniques for training models on large datasets. This led to the development of applications like speech recognition (e.g., Siri, Alexa), image recognition (e.g., Google Image Search), and natural language processing (e.g., chatbots).\\n4. Deep Learning (2010s-present): The rise of deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), has enabled AI to perform complex tasks like image and speech recognition, natural language processing, and even autonomous driving. Companies like Google, Facebook, and Baidu have invested heavily in deep learning research, leading to breakthroughs in areas like facial recognition, object detection, and machine translation.\\n5. Current Trends (present-future): AI is currently being applied to various industries, including healthcare, finance, education, and entertainment. With the growth of cloud computing, edge AI, and autonomous systems, we can expect to see more sophisticated AI applications in the near future. However, there are also concerns about the ethical implications of AI, such as data privacy, algorithmic bias, and job displacement.\\n\\nRemember, AI has a long history, and its development is an ongoing process. As technology advances, we can expect to see even more innovative applications of AI in various fields.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 40,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm(\"Tell me about the history of AI\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## RAG\n",
|
||||
"\n",
|
||||
"We can use Olama with RAG, [just as shown here](https://python.langchain.com/docs/use_cases/question_answering/how_to/local_retrieval_qa).\n",
|
||||
"\n",
|
||||
"Let's use the 13b model:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"ollama pull llama2:13b\n",
|
||||
"ollama run llama2:13b \n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Let's also use local embeddings from `GPT4AllEmbeddings` and `Chroma`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install gpt4all chromadb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 60,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import WebBaseLoader\n",
|
||||
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
|
||||
"data = loader.load()\n",
|
||||
"\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
|
||||
"all_splits = text_splitter.split_documents(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 61,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Found model file at /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.embeddings import GPT4AllEmbeddings\n",
|
||||
"\n",
|
||||
"vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 62,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"4"
|
||||
]
|
||||
},
|
||||
"execution_count": 62,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"What are the approaches to Task Decomposition?\"\n",
|
||||
"docs = vectorstore.similarity_search(question)\n",
|
||||
"len(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate\n",
|
||||
"\n",
|
||||
"# Prompt\n",
|
||||
"template = \"\"\"Use the following pieces of context to answer the question at the end. \n",
|
||||
"If you don't know the answer, just say that you don't know, don't try to make up an answer. \n",
|
||||
"Use three sentences maximum and keep the answer as concise as possible. \n",
|
||||
"{context}\n",
|
||||
"Question: {question}\n",
|
||||
"Helpful Answer:\"\"\"\n",
|
||||
"QA_CHAIN_PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"context\", \"question\"],\n",
|
||||
" template=template,\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 69,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# LLM\n",
|
||||
"from langchain.llms import Ollama\n",
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
||||
"llm = Ollama(base_url=\"http://localhost:11434\",\n",
|
||||
" model=\"llama2\",\n",
|
||||
" verbose=True,\n",
|
||||
" callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 66,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# QA chain\n",
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"qa_chain = RetrievalQA.from_chain_type(\n",
|
||||
" llm,\n",
|
||||
" retriever=vectorstore.as_retriever(),\n",
|
||||
" chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 70,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Task decomposition can be approached in different ways for AI agents, including:\n",
|
||||
"\n",
|
||||
"1. Using simple prompts like \"Steps for XYZ.\" or \"What are the subgoals for achieving XYZ?\" to guide the LLM.\n",
|
||||
"2. Providing task-specific instructions, such as \"Write a story outline\" for writing a novel.\n",
|
||||
"3. Utilizing human inputs to help the AI agent understand the task and break it down into smaller steps."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"What are the various approaches to Task Decomposition for AI Agents?\"\n",
|
||||
"result = qa_chain({\"query\": question})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also get logging for tokens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 56,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Task decomposition can be approached in three ways: (1) using simple prompting like \"Steps for XYZ.\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions, or (3) with human inputs.{'model': 'llama2', 'created_at': '2023-08-08T04:01:09.005367Z', 'done': True, 'context': [1, 29871, 1, 13, 9314, 14816, 29903, 6778, 13, 13, 3492, 526, 263, 8444, 29892, 3390, 1319, 322, 15993, 20255, 29889, 29849, 1234, 408, 1371, 3730, 408, 1950, 29892, 1550, 1641, 9109, 29889, 3575, 6089, 881, 451, 3160, 738, 10311, 1319, 29892, 443, 621, 936, 29892, 11021, 391, 29892, 7916, 391, 29892, 304, 27375, 29892, 18215, 29892, 470, 27302, 2793, 29889, 3529, 9801, 393, 596, 20890, 526, 5374, 635, 443, 5365, 1463, 322, 6374, 297, 5469, 29889, 13, 13, 3644, 263, 1139, 947, 451, 1207, 738, 4060, 29892, 470, 338, 451, 2114, 1474, 16165, 261, 296, 29892, 5649, 2020, 2012, 310, 22862, 1554, 451, 1959, 29889, 960, 366, 1016, 29915, 29873, 1073, 278, 1234, 304, 263, 1139, 29892, 3113, 1016, 29915, 29873, 6232, 2089, 2472, 29889, 13, 13, 29966, 829, 14816, 29903, 6778, 13, 13, 29961, 25580, 29962, 4803, 278, 1494, 12785, 310, 3030, 304, 1234, 278, 1139, 472, 278, 1095, 29889, 29871, 13, 3644, 366, 1016, 29915, 29873, 1073, 278, 1234, 29892, 925, 1827, 393, 366, 1016, 29915, 29873, 1073, 29892, 1016, 29915, 29873, 1018, 304, 1207, 701, 385, 1234, 29889, 29871, 13, 11403, 2211, 25260, 7472, 322, 3013, 278, 1234, 408, 3022, 895, 408, 1950, 29889, 29871, 13, 5398, 26227, 508, 367, 2309, 313, 29896, 29897, 491, 365, 26369, 411, 2560, 9508, 292, 763, 376, 7789, 567, 363, 1060, 29979, 29999, 7790, 29876, 29896, 19602, 376, 5618, 526, 278, 1014, 1484, 1338, 363, 3657, 15387, 1060, 29979, 29999, 29973, 613, 313, 29906, 29897, 491, 773, 3414, 29899, 14940, 11994, 29936, 321, 29889, 29887, 29889, 376, 6113, 263, 5828, 27887, 1213, 363, 5007, 263, 9554, 29892, 470, 313, 29941, 29897, 411, 5199, 10970, 29889, 13, 13, 5398, 26227, 508, 367, 2309, 313, 29896, 29897, 491, 365, 26369, 411, 2560, 9508, 292, 763, 376, 7789, 567, 363, 1060, 29979, 29999, 7790, 29876, 29896, 19602, 376, 5618, 526, 278, 1014, 1484, 1338, 363, 3657, 15387, 1060, 29979, 29999, 29973, 613, 313, 29906, 29897, 491, 773, 3414, 29899, 14940, 11994, 29936, 321, 29889, 29887, 29889, 376, 6113, 263, 5828, 27887, 1213, 363, 5007, 263, 9554, 29892, 470, 313, 29941, 29897, 411, 5199, 10970, 29889, 13, 13, 5398, 26227, 508, 367, 2309, 313, 29896, 29897, 491, 365, 26369, 411, 2560, 9508, 292, 763, 376, 7789, 567, 363, 1060, 29979, 29999, 7790, 29876, 29896, 19602, 376, 5618, 526, 278, 1014, 1484, 1338, 363, 3657, 15387, 1060, 29979, 29999, 29973, 613, 313, 29906, 29897, 491, 773, 3414, 29899, 14940, 11994, 29936, 321, 29889, 29887, 29889, 376, 6113, 263, 5828, 27887, 1213, 363, 5007, 263, 9554, 29892, 470, 313, 29941, 29897, 411, 5199, 10970, 29889, 13, 13, 1451, 16047, 267, 297, 1472, 29899, 8489, 18987, 322, 3414, 26227, 29901, 1858, 9450, 975, 263, 3309, 29891, 4955, 322, 17583, 3902, 8253, 278, 1650, 2913, 3933, 18066, 292, 29889, 365, 26369, 29879, 21117, 304, 10365, 13900, 746, 20050, 411, 15668, 4436, 29892, 3907, 963, 3109, 16424, 9401, 304, 25618, 1058, 5110, 515, 14260, 322, 1059, 29889, 13, 16492, 29901, 1724, 526, 278, 13501, 304, 9330, 897, 510, 3283, 29973, 13, 29648, 1319, 673, 29901, 518, 29914, 25580, 29962, 13, 5398, 26227, 508, 367, 26733, 297, 2211, 5837, 29901, 313, 29896, 29897, 773, 2560, 9508, 292, 763, 376, 7789, 567, 363, 1060, 29979, 29999, 7790, 29876, 29896, 19602, 376, 5618, 526, 278, 1014, 1484, 1338, 363, 3657, 15387, 1060, 29979, 29999, 29973, 613, 313, 29906, 29897, 491, 773, 3414, 29899, 14940, 11994, 29892, 470, 313, 29941, 29897, 411, 5199, 10970, 29889, 2], 'total_duration': 1364428708, 'load_duration': 1246375, 'sample_count': 62, 'sample_duration': 44859000, 'prompt_eval_count': 1, 'eval_count': 62, 'eval_duration': 1313002000}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.schema import LLMResult\n",
|
||||
"from langchain.callbacks.base import BaseCallbackHandler\n",
|
||||
"\n",
|
||||
"class GenerationStatisticsCallback(BaseCallbackHandler):\n",
|
||||
" def on_llm_end(self, response: LLMResult, **kwargs) -> None:\n",
|
||||
" print(response.generations[0][0].generation_info)\n",
|
||||
" \n",
|
||||
"callback_manager = CallbackManager([StreamingStdOutCallbackHandler(), GenerationStatisticsCallback()])\n",
|
||||
"\n",
|
||||
"llm = Ollama(base_url=\"http://localhost:11434\",\n",
|
||||
" model=\"llama2\",\n",
|
||||
" verbose=True,\n",
|
||||
" callback_manager=callback_manager)\n",
|
||||
"\n",
|
||||
"qa_chain = RetrievalQA.from_chain_type(\n",
|
||||
" llm,\n",
|
||||
" retriever=vectorstore.as_retriever(),\n",
|
||||
" chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"question = \"What are the approaches to Task Decomposition?\"\n",
|
||||
"result = qa_chain({\"query\": question})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`eval_count` / (`eval_duration`/10e9) gets `tok / s`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 57,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"47.22003469910937"
|
||||
]
|
||||
},
|
||||
"execution_count": 57,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"62 / (1313002000/1000/1000/1000)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
106
docs/extras/integrations/llms/symblai_nebula.ipynb
Normal file
@@ -0,0 +1,106 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9597802c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Nebula\n",
|
||||
"\n",
|
||||
"[Nebula](https://symbl.ai/nebula/) is a fully-managed Conversation platform, on which you can build, deploy, and manage scalable AI applications.\n",
|
||||
"\n",
|
||||
"This example goes over how to use LangChain to interact with the [Nebula platform](https://docs.symbl.ai/docs/nebula-llm-overview). \n",
|
||||
"\n",
|
||||
"It will send the requests to Nebula Service endpoint, which concatenates `SYMBLAI_NEBULA_SERVICE_URL` and `SYMBLAI_NEBULA_SERVICE_PATH`, with a token defined in `SYMBLAI_NEBULA_SERVICE_TOKEN`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f15ebe0d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Integrate with a LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5472a7cd-af26-48ca-ae9b-5f6ae73c74d2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"NEBULA_SERVICE_URL\"] = NEBULA_SERVICE_URL\n",
|
||||
"os.environ[\"NEBULA_SERVICE_PATH\"] = NEBULA_SERVICE_PATH\n",
|
||||
"os.environ[\"NEBULA_SERVICE_API_KEY\"] = NEBULA_SERVICE_API_KEY"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6fb585dd",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenLLM\n",
|
||||
"\n",
|
||||
"llm = OpenLLM(\n",
|
||||
" conversation=\"<Drop your text conversation that you want to ask Nebula to analyze here>\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "035dea0f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"\n",
|
||||
"template = \"Identify the {count} main objectives or goals mentioned in this context concisely in less points. Emphasize on key intents.\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"count\"])\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"\n",
|
||||
"generated = llm_chain.run(count=\"five\")\n",
|
||||
"print(generated)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.8"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
169
docs/extras/integrations/llms/titan_takeoff.ipynb
Normal file
@@ -0,0 +1,169 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Titan Takeoff\n",
|
||||
"\n",
|
||||
"TitanML helps businesses build and deploy better, smaller, cheaper, and faster NLP models through our training, compression, and inference optimization platform. \n",
|
||||
"\n",
|
||||
"Our inference server, [Titan Takeoff](https://docs.titanml.co/docs/titan-takeoff/getting-started) enables deployment of LLMs locally on your hardware in a single command. Most generative model architectures are supported, such as Falcon, Llama 2, GPT2, T5 and many more."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation\n",
|
||||
"\n",
|
||||
"To get started with Iris Takeoff, all you need is to have docker and python installed on your local system. If you wish to use the server with gpu suport, then you will need to install docker with cuda support.\n",
|
||||
"\n",
|
||||
"For Mac and Windows users, make sure you have the docker daemon running! You can check this by running docker ps in your terminal. To start the daemon, open the docker desktop app.\n",
|
||||
"\n",
|
||||
"Run the following command to install the Iris CLI that will enable you to run the takeoff server:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "shellscript"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install titan-iris"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Choose a Model\n",
|
||||
"Iris Takeoff supports many of the most powerful generative text models, such as Falcon, MPT, and Llama. See the [supported models](https://docs.titanml.co/docs/titan-takeoff/supported-models) for more information. For information about using your own models, see the [custom models](https://docs.titanml.co/docs/titan-takeoff/Advanced/custom-models).\n",
|
||||
"\n",
|
||||
"Going forward in this demo we will be using the falcon 7B instruct model. This is a good open source model that is trained to follow instructions, and is small enough to easily inference even on CPUs.\n",
|
||||
"\n",
|
||||
"## Taking off\n",
|
||||
"Models are referred to by their model id on HuggingFace. Takeoff uses port 8000 by default, but can be configured to use another port. There is also support to use a Nvidia GPU by specifing cuda for the device flag.\n",
|
||||
"\n",
|
||||
"To start the takeoff server, run:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "shellscript"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"iris takeoff --model tiiuae/falcon-7b-instruct --device cpu\n",
|
||||
"iris takeoff --model tiiuae/falcon-7b-instruct --device cuda # Nvidia GPU required\n",
|
||||
"iris takeoff --model tiiuae/falcon-7b-instruct --device cpu --port 5000 # run on port 5000 (default: 8000)\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You will then be directed to a login page, where you will need to create an account to proceed.\n",
|
||||
"After logging in, run the command onscreen to check whether the server is ready. When it is ready, you can start using the Takeoff integration\n",
|
||||
"\n",
|
||||
"## Inferencing your model\n",
|
||||
"To access your LLM, use the TitanTakeoff LLM wrapper:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import TitanTakeoff\n",
|
||||
"\n",
|
||||
"llm = TitanTakeoff(\n",
|
||||
" port=8000,\n",
|
||||
" generate_max_length=128,\n",
|
||||
" temperature=1.0\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"prompt = \"What is the largest planet in the solar system?\"\n",
|
||||
"\n",
|
||||
"llm(prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"No parameters are needed by default, but a port can be specified and [generation parameters](https://docs.titanml.co/docs/titan-takeoff/Advanced/generation-parameters) can be supplied.\n",
|
||||
"\n",
|
||||
"### Streaming\n",
|
||||
"Streaming is also supported via the streaming flag:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"\n",
|
||||
"llm = TitanTakeoff(port=8000, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), streaming=True)\n",
|
||||
"\n",
|
||||
"prompt = \"What is the capital of France?\"\n",
|
||||
"\n",
|
||||
"llm(prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Integration with LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"\n",
|
||||
"llm = TitanTakeoff()\n",
|
||||
"\n",
|
||||
"template = \"What is the capital of {country}\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"country\"])\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
|
||||
"\n",
|
||||
"generated = llm_chain.run(country=\"Belgium\")\n",
|
||||
"print(generated)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
241
docs/extras/integrations/llms/vllm.ipynb
Normal file
@@ -0,0 +1,241 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "499c3142-2033-437d-a60a-731988ac6074",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# vLLM\n",
|
||||
"\n",
|
||||
"[vLLM](https://vllm.readthedocs.io/en/latest/index.html) is a fast and easy-to-use library for LLM inference and serving, offering:\n",
|
||||
"* State-of-the-art serving throughput \n",
|
||||
"* Efficient management of attention key and value memory with PagedAttention\n",
|
||||
"* Continuous batching of incoming requests\n",
|
||||
"* Optimized CUDA kernels\n",
|
||||
"\n",
|
||||
"This notebooks goes over how to use a LLM with langchain and vLLM.\n",
|
||||
"\n",
|
||||
"To use, you should have the `vllm` python package installed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "8a3f2666-5c75-4797-967a-7915a247bf33",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install vllm -q"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "84e350f7-21f6-455b-b1f0-8b0116a2fd49",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"INFO 08-06 11:37:33 llm_engine.py:70] Initializing an LLM engine with config: model='mosaicml/mpt-7b', tokenizer='mosaicml/mpt-7b', tokenizer_mode=auto, trust_remote_code=True, dtype=torch.bfloat16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)\n",
|
||||
"INFO 08-06 11:37:41 llm_engine.py:196] # GPU blocks: 861, # CPU blocks: 512\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 2.00it/s]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"What is the capital of France ? The capital of France is Paris.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import VLLM\n",
|
||||
"\n",
|
||||
"llm = VLLM(model=\"mosaicml/mpt-7b\",\n",
|
||||
" trust_remote_code=True, # mandatory for hf models\n",
|
||||
" max_new_tokens=128,\n",
|
||||
" top_k=10,\n",
|
||||
" top_p=0.95,\n",
|
||||
" temperature=0.8,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(llm(\"What is the capital of France ?\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "94a3b41d-8329-4f8f-94f9-453d7f132214",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Integrate the model in an LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "5605b7a1-fa63-49c1-934d-8b4ef8d71dd5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed prompts: 100%|██████████| 1/1 [00:01<00:00, 1.34s/it]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"1. The first Pokemon game was released in 1996.\n",
|
||||
"2. The president was Bill Clinton.\n",
|
||||
"3. Clinton was president from 1993 to 2001.\n",
|
||||
"4. The answer is Clinton.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"\n",
|
||||
"template = \"\"\"Question: {question}\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step.\"\"\"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"\n",
|
||||
"question = \"Who was the US president in the year the first Pokemon game was released?\"\n",
|
||||
"\n",
|
||||
"print(llm_chain.run(question))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "56826aba-d08b-4838-8bfa-ca96e463b25d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Distributed Inference\n",
|
||||
"\n",
|
||||
"vLLM supports distributed tensor-parallel inference and serving. \n",
|
||||
"\n",
|
||||
"To run multi-GPU inference with the LLM class, set the `tensor_parallel_size` argument to the number of GPUs you want to use. For example, to run inference on 4 GPUs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f8c25c35-47b5-459d-9985-3cf546e9ac16",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import VLLM\n",
|
||||
"\n",
|
||||
"llm = VLLM(model=\"mosaicml/mpt-30b\",\n",
|
||||
" tensor_parallel_size=4,\n",
|
||||
" trust_remote_code=True, # mandatory for hf models\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"llm(\"What is the future of AI?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "64e89be0-6ad7-43a8-9dac-1324dcd4e851",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## OpenAI-Compatible Server\n",
|
||||
"\n",
|
||||
"vLLM can be deployed as a server that mimics the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API.\n",
|
||||
"\n",
|
||||
"This server can be queried in the same format as OpenAI API.\n",
|
||||
"\n",
|
||||
"### OpenAI-Compatible Completion"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "c3cbc428-0bb8-422a-913e-1c6fef8b89d4",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" a city that is filled with history, ancient buildings, and art around every corner\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import VLLMOpenAI\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"llm = VLLMOpenAI(\n",
|
||||
" openai_api_key=\"EMPTY\",\n",
|
||||
" openai_api_base=\"http://localhost:8000/v1\",\n",
|
||||
" model_name=\"tiiuae/falcon-7b\",\n",
|
||||
" model_kwargs={\"stop\": [\".\"]}\n",
|
||||
")\n",
|
||||
"print(llm(\"Rome is\"))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "conda_pytorch_p310",
|
||||
"language": "python",
|
||||
"name": "conda_pytorch_p310"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,67 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Rockset Chat Message History\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use [Rockset](https://rockset.com/docs) to store chat message history. \n",
|
||||
"\n",
|
||||
"To begin, with get your API key from the [Rockset console](https://console.rockset.com/apikeys). Find your API region for the Rockset [API reference](https://rockset.com/docs/rest-api#introduction)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "plaintext"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory.chat_message_histories import RocksetChatMessageHistory\n",
|
||||
"from rockset import RocksetClient, Regions\n",
|
||||
"\n",
|
||||
"history = RocksetChatMessageHistory(\n",
|
||||
" session_id=\"MySession\",\n",
|
||||
" client=RocksetClient(\n",
|
||||
" api_key=\"YOUR API KEY\", \n",
|
||||
" host=Regions.usw2a1 # us-west-2 Oregon\n",
|
||||
" ),\n",
|
||||
" collection=\"langchain_demo\",\n",
|
||||
" sync=True\n",
|
||||
")\n",
|
||||
"history.add_user_message(\"hi!\")\n",
|
||||
"history.add_ai_message(\"whats up?\")\n",
|
||||
"print(history.messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The output should be something like:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"[\n",
|
||||
" HumanMessage(content='hi!', additional_kwargs={'id': '2e62f1c2-e9f7-465e-b551-49bae07fe9f0'}, example=False), \n",
|
||||
" AIMessage(content='whats up?', additional_kwargs={'id': 'b9be8eda-4c18-4cf8-81c3-e91e876927d0'}, example=False)\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"```"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
21
docs/extras/integrations/providers/bageldb.mdx
Normal file
@@ -0,0 +1,21 @@
|
||||
# BagelDB
|
||||
|
||||
> [BagelDB](https://www.bageldb.ai/) (`Open Vector Database for AI`), is like GitHub for AI data.
|
||||
It is a collaborative platform where users can create,
|
||||
share, and manage vector datasets. It can support private projects for independent developers,
|
||||
internal collaborations for enterprises, and public contributions for data DAOs.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install betabageldb
|
||||
```
|
||||
|
||||
|
||||
## VectorStore
|
||||
|
||||
See a [usage example](/docs/integrations/vectorstores/bageldb).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Bagel
|
||||
```
|
||||
@@ -13,7 +13,7 @@ pip install boto3
|
||||
See a [usage example](/docs/integrations/llms/bedrock).
|
||||
|
||||
```python
|
||||
from langchain import Bedrock
|
||||
from langchain.llms.bedrock import Bedrock
|
||||
```
|
||||
|
||||
## Text Embedding Models
|
||||
|
||||
35
docs/extras/integrations/providers/deepsparse.mdx
Normal file
@@ -0,0 +1,35 @@
|
||||
# DeepSparse
|
||||
|
||||
This page covers how to use the [DeepSparse](https://github.com/neuralmagic/deepsparse) inference runtime within LangChain.
|
||||
It is broken into two parts: installation and setup, and then examples of DeepSparse usage.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Install the Python package with `pip install deepsparse`
|
||||
- Choose a [SparseZoo model](https://sparsezoo.neuralmagic.com/?useCase=text_generation) or export a support model to ONNX [using Optimum](https://github.com/neuralmagic/notebooks/blob/main/notebooks/opt-text-generation-deepsparse-quickstart/OPT_Text_Generation_DeepSparse_Quickstart.ipynb)
|
||||
|
||||
## Wrappers
|
||||
|
||||
### LLM
|
||||
|
||||
There exists a DeepSparse LLM wrapper, which you can access with:
|
||||
|
||||
```python
|
||||
from langchain.llms import DeepSparse
|
||||
```
|
||||
|
||||
It provides a unified interface for all models:
|
||||
|
||||
```python
|
||||
llm = DeepSparse(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none')
|
||||
|
||||
print(llm('def fib():'))
|
||||
```
|
||||
|
||||
Additional parameters can be passed using the `config` parameter:
|
||||
|
||||
```python
|
||||
config = {'max_generated_tokens': 256}
|
||||
|
||||
llm = DeepSparse(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none', config=config)
|
||||
```
|
||||
19
docs/extras/integrations/providers/dingo.mdx
Normal file
@@ -0,0 +1,19 @@
|
||||
# Dingo
|
||||
|
||||
This page covers how to use the Dingo ecosystem within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Dingo wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python SDK with `pip install dingodb`
|
||||
|
||||
## VectorStore
|
||||
|
||||
There exists a wrapper around Dingo indexes, allowing you to use it as a vectorstore,
|
||||
whether for semantic search or example selection.
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain.vectorstores import Dingo
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the Dingo wrapper, see [this notebook](/docs/integrations/vectorstores/dingo.html)
|
||||
@@ -4,14 +4,15 @@ This page covers how to use the Fireworks models within Langchain.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- To use the Fireworks model, you need to have a Fireworks API key. To generate one, sign up at platform.fireworks.ai
|
||||
- To use the Fireworks model, you need to have a Fireworks API key. To generate one, sign up at [app.fireworks.ai](https://app.fireworks.ai).
|
||||
- Authenticate by setting the FIREWORKS_API_KEY environment variable.
|
||||
|
||||
## LLM
|
||||
|
||||
Fireworks integrates with Langchain through the LLM module, which allows for standardized usage of any models deployed on the Fireworks models.
|
||||
|
||||
In this example, we'll work the llama-v2-13b.
|
||||
In this example, we'll work the llama-v2-13b-chat model.
|
||||
|
||||
```python
|
||||
from langchain.llms.fireworks import Fireworks
|
||||
|
||||
|
||||
@@ -1,22 +1,23 @@
|
||||
# Grobid
|
||||
|
||||
GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents.
|
||||
|
||||
It is designed and expected to be used to parse academic papers, where it works particularly well.
|
||||
|
||||
*Note*: if the articles supplied to Grobid are large documents (e.g. dissertations) exceeding a certain number
|
||||
of elements, they might not be processed.
|
||||
|
||||
This page covers how to use the Grobid to parse articles for LangChain.
|
||||
It is separated into two parts: installation and running the server
|
||||
|
||||
## Installation and Setup
|
||||
#Ensure You have Java installed
|
||||
!apt-get install -y openjdk-11-jdk -q
|
||||
!update-alternatives --set java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
|
||||
## Installation
|
||||
The grobid installation is described in details in https://grobid.readthedocs.io/en/latest/Install-Grobid/.
|
||||
However, it is probably easier and less troublesome to run grobid through a docker container,
|
||||
as documented [here](https://grobid.readthedocs.io/en/latest/Grobid-docker/).
|
||||
|
||||
#Clone and install the Grobid Repo
|
||||
import os
|
||||
!git clone https://github.com/kermitt2/grobid.git
|
||||
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
|
||||
os.chdir('grobid')
|
||||
!./gradlew clean install
|
||||
## Use Grobid with LangChain
|
||||
|
||||
#Run the server,
|
||||
get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')
|
||||
Once grobid is installed and up and running (you can check by accessing it http://localhost:8070),
|
||||
you're ready to go.
|
||||
|
||||
You can now use the GrobidParser to produce documents
|
||||
```python
|
||||
@@ -41,4 +42,5 @@ loader = GenericLoader.from_filesystem(
|
||||
)
|
||||
docs = loader.load()
|
||||
```
|
||||
Chunk metadata will include bboxes although these are a bit funky to parse, see https://grobid.readthedocs.io/en/latest/Coordinates-in-PDF/
|
||||
Chunk metadata will include Bounding Boxes. Although these are a bit funky to parse,
|
||||
they are explained in https://grobid.readthedocs.io/en/latest/Coordinates-in-PDF/
|
||||
104
docs/extras/integrations/providers/log10.mdx
Normal file
@@ -0,0 +1,104 @@
|
||||
# Log10
|
||||
|
||||
This page covers how to use the [Log10](https://log10.io) within LangChain.
|
||||
|
||||
## What is Log10?
|
||||
|
||||
Log10 is an [open source](https://github.com/log10-io/log10) proxiless LLM data management and application development platform that lets you log, debug and tag your Langchain calls.
|
||||
|
||||
## Quick start
|
||||
|
||||
1. Create your free account at [log10.io](https://log10.io)
|
||||
2. Add your `LOG10_TOKEN` and `LOG10_ORG_ID` from the Settings and Organization tabs respectively as environment variables.
|
||||
3. Also add `LOG10_URL=https://log10.io` and your usual LLM API key: for e.g. `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` to your environment
|
||||
|
||||
## How to enable Log10 data management for Langchain
|
||||
|
||||
Integration with log10 is a simple one-line `log10_callback` integration as shown below:
|
||||
|
||||
```python
|
||||
from langchain.chat_models import ChatOpenAI
|
||||
from langchain.schema import HumanMessage
|
||||
|
||||
from log10.langchain import Log10Callback
|
||||
from log10.llm import Log10Config
|
||||
|
||||
log10_callback = Log10Callback(log10_config=Log10Config())
|
||||
|
||||
messages = [
|
||||
HumanMessage(content="You are a ping pong machine"),
|
||||
HumanMessage(content="Ping?"),
|
||||
]
|
||||
|
||||
llm = ChatOpenAI(model_name="gpt-3.5-turbo", callbacks=[log10_callback])
|
||||
```
|
||||
|
||||
[Log10 + Langchain + Logs docs](https://github.com/log10-io/log10/blob/main/logging.md#langchain-logger)
|
||||
|
||||
[More details + screenshots](https://log10.io/docs/logs) including instructions for self-hosting logs
|
||||
|
||||
## How to use tags with Log10
|
||||
|
||||
```python
|
||||
from langchain import OpenAI
|
||||
from langchain.chat_models import ChatAnthropic
|
||||
from langchain.chat_models import ChatOpenAI
|
||||
from langchain.schema import HumanMessage
|
||||
|
||||
from log10.langchain import Log10Callback
|
||||
from log10.llm import Log10Config
|
||||
|
||||
log10_callback = Log10Callback(log10_config=Log10Config())
|
||||
|
||||
messages = [
|
||||
HumanMessage(content="You are a ping pong machine"),
|
||||
HumanMessage(content="Ping?"),
|
||||
]
|
||||
|
||||
llm = ChatOpenAI(model_name="gpt-3.5-turbo", callbacks=[log10_callback], temperature=0.5, tags=["test"])
|
||||
completion = llm.predict_messages(messages, tags=["foobar"])
|
||||
print(completion)
|
||||
|
||||
llm = ChatAnthropic(model="claude-2", callbacks=[log10_callback], temperature=0.7, tags=["baz"])
|
||||
llm.predict_messages(messages)
|
||||
print(completion)
|
||||
|
||||
llm = OpenAI(model_name="text-davinci-003", callbacks=[log10_callback], temperature=0.5)
|
||||
completion = llm.predict("You are a ping pong machine.\nPing?\n")
|
||||
print(completion)
|
||||
```
|
||||
|
||||
You can also intermix direct OpenAI calls and Langchain LLM calls:
|
||||
|
||||
```python
|
||||
import os
|
||||
from log10.load import log10, log10_session
|
||||
import openai
|
||||
from langchain import OpenAI
|
||||
|
||||
log10(openai)
|
||||
|
||||
with log10_session(tags=["foo", "bar"]):
|
||||
# Log a direct OpenAI call
|
||||
response = openai.Completion.create(
|
||||
model="text-ada-001",
|
||||
prompt="Where is the Eiffel Tower?",
|
||||
temperature=0,
|
||||
max_tokens=1024,
|
||||
top_p=1,
|
||||
frequency_penalty=0,
|
||||
presence_penalty=0,
|
||||
)
|
||||
print(response)
|
||||
|
||||
# Log a call via Langchain
|
||||
llm = OpenAI(model_name="text-ada-001", temperature=0.5)
|
||||
response = llm.predict("You are a ping pong machine.\nPing?\n")
|
||||
print(response)
|
||||
```
|
||||
|
||||
## How to debug Langchain calls
|
||||
|
||||
[Example of debugging](https://log10.io/docs/prompt_chain_debugging)
|
||||
|
||||
[More Langchain examples](https://github.com/log10-io/log10/tree/main/examples#langchain)
|
||||
30
docs/extras/integrations/providers/pubmed.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# PubMed
|
||||
|
||||
# PubMed
|
||||
|
||||
>[PubMed®](https://pubmed.ncbi.nlm.nih.gov/) by `The National Center for Biotechnology Information, National Library of Medicine`
|
||||
> comprises more than 35 million citations for biomedical literature from `MEDLINE`, life science journals, and online books.
|
||||
> Citations may include links to full text content from `PubMed Central` and publisher web sites.
|
||||
|
||||
## Setup
|
||||
You need to install a python package.
|
||||
|
||||
```bash
|
||||
pip install xmltodict
|
||||
```
|
||||
|
||||
### Retriever
|
||||
|
||||
See a [usage example](/docs/integrations/retrievers/pubmed).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import PubMedRetriever
|
||||
```
|
||||
|
||||
### Document Loader
|
||||
|
||||
See a [usage example](/docs/integrations/document_loaders/pubmed).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import PubMedLoader
|
||||
```
|
||||
@@ -23,4 +23,11 @@ from langchain.vectorstores import Rockset
|
||||
See a [usage example](/docs/integrations/document_loaders/rockset).
|
||||
```python
|
||||
from langchain.document_loaders import RocksetLoader
|
||||
```
|
||||
|
||||
## Chat Message History
|
||||
|
||||
See a [usage example](/docs/integrations/memory/rockset_chat_message_history).
|
||||
```python
|
||||
from langchain.memory.chat_message_histories import RocksetChatMessageHistory
|
||||
```
|
||||
20
docs/extras/integrations/providers/symblai_nebula.mdx
Normal file
@@ -0,0 +1,20 @@
|
||||
# Nebula
|
||||
|
||||
This page covers how to use [Nebula](https://symbl.ai/nebula), [Symbl.ai](https://symbl.ai/)'s LLM, ecosystem within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Nebula wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Get an Nebula API Key and set as environment variables (`SYMBLAI_NEBULA_SERVICE_URL`, `SYMBLAI_NEBULA_SERVICE_PATH`, `SYMBLAI_NEBULA_SERVICE_TOKEN`)
|
||||
- Sign up for a FREE Symbl.ai/Nebula Account: [https://nebula.symbl.ai/playground/](https://nebula.symbl.ai/playground/)
|
||||
- Please see the [Nebula documentation](https://docs.symbl.ai/docs/nebula-llm-overview) for more details.
|
||||
- No time? Visit the [Nebula Quickstart Guide](https://docs.symbl.ai/docs/nebula-quickstart).
|
||||
|
||||
## Wrappers
|
||||
|
||||
### LLM
|
||||
|
||||
There exists an Nebula LLM wrapper, which you can access with
|
||||
```python
|
||||
from langchain.llms import Nebula
|
||||
```
|
||||
31
docs/extras/integrations/providers/tensorflow_datasets.mdx
Normal file
@@ -0,0 +1,31 @@
|
||||
# TensorFlow Datasets
|
||||
|
||||
>[TensorFlow Datasets](https://www.tensorflow.org/datasets) is a collection of datasets ready to use,
|
||||
> with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed
|
||||
> as [tf.data.Datasets](https://www.tensorflow.org/api_docs/python/tf/data/Dataset),
|
||||
> enabling easy-to-use and high-performance input pipelines. To get started see
|
||||
> the [guide](https://www.tensorflow.org/datasets/overview) and
|
||||
> the [list of datasets](https://www.tensorflow.org/datasets/catalog/overview#all_datasets).
|
||||
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
You need to install `tensorflow` and `tensorflow-datasets` python packages.
|
||||
|
||||
```bash
|
||||
pip install tensorflow
|
||||
```
|
||||
|
||||
```bash
|
||||
pip install tensorflow-dataset
|
||||
```
|
||||
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](/docs/integrations/document_loaders/tensorflow_datasets).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import TensorflowDatasetLoader
|
||||
```
|
||||
@@ -63,7 +63,7 @@ results = vectara.similarity_score("what is LangChain?")
|
||||
- `k`: number of results to return (defaults to 5)
|
||||
- `lambda_val`: the [lexical matching](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) factor for hybrid search (defaults to 0.025)
|
||||
- `filter`: a [filter](https://docs.vectara.com/docs/common-use-cases/filtering-by-metadata/filter-overview) to apply to the results (default None)
|
||||
- `n_sentence_context`: number of sentences to include before/after the actual matching segment when returning results. This defaults to 0 so as to return the exact text segment that matches, but can be used with other values e.g. 2 or 3 to return adjacent text segments.
|
||||
- `n_sentence_context`: number of sentences to include before/after the actual matching segment when returning results. This defaults to 2.
|
||||
|
||||
The results are returned as a list of relevant documents, and a relevance score of each document.
|
||||
|
||||
|
||||
@@ -7,14 +7,15 @@
|
||||
"source": [
|
||||
"# PubMed\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use `PubMed` as a retriever\n",
|
||||
"\n",
|
||||
"`PubMed®` comprises more than 35 million citations for biomedical literature from `MEDLINE`, life science journals, and online books. Citations may include links to full text content from `PubMed Central` and publisher web sites."
|
||||
">[PubMed®](https://pubmed.ncbi.nlm.nih.gov/) by `The National Center for Biotechnology Information, National Library of Medicine` comprises more than 35 million citations for biomedical literature from `MEDLINE`, life science journals, and online books. Citations may include links to full text content from `PubMed Central` and publisher web sites.\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use `PubMed` as a retriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 12,
|
||||
"id": "aecaff63",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -24,7 +25,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 34,
|
||||
"id": "f2f7e8d3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -34,19 +35,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 35,
|
||||
"id": "ed115aa1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='', metadata={'uid': '37268021', 'title': 'Dermatology in the wake of an AI revolution: who gets a say?', 'pub_date': '<Year>2023</Year><Month>May</Month><Day>31</Day>'}),\n",
|
||||
" Document(page_content='', metadata={'uid': '37267643', 'title': 'What is ChatGPT and what do we do with it? Implications of the age of AI for nursing and midwifery practice and education: An editorial.', 'pub_date': '<Year>2023</Year><Month>May</Month><Day>30</Day>'}),\n",
|
||||
" Document(page_content='The nursing field has undergone notable changes over time and is projected to undergo further modifications in the future, owing to the advent of sophisticated technologies and growing healthcare needs. The advent of ChatGPT, an AI-powered language model, is expected to exert a significant influence on the nursing profession, specifically in the domains of patient care and instruction. The present article delves into the ramifications of ChatGPT within the nursing domain and accentuates its capacity and constraints to transform the discipline.', metadata={'uid': '37266721', 'title': 'The Impact of ChatGPT on the Nursing Profession: Revolutionizing Patient Care and Education.', 'pub_date': '<Year>2023</Year><Month>Jun</Month><Day>02</Day>'})]"
|
||||
"[Document(page_content='', metadata={'uid': '37549050', 'Title': 'ChatGPT: \"To Be or Not to Be\" in Bikini Bottom.', 'Published': '--', 'Copyright Information': ''}),\n",
|
||||
" Document(page_content=\"BACKGROUND: ChatGPT is a large language model that has performed well on professional examinations in the fields of medicine, law, and business. However, it is unclear how ChatGPT would perform on an examination assessing professionalism and situational judgement for doctors.\\nOBJECTIVE: We evaluated the performance of ChatGPT on the Situational Judgement Test (SJT): a national examination taken by all final-year medical students in the United Kingdom. This examination is designed to assess attributes such as communication, teamwork, patient safety, prioritization skills, professionalism, and ethics.\\nMETHODS: All questions from the UK Foundation Programme Office's (UKFPO's) 2023 SJT practice examination were inputted into ChatGPT. For each question, ChatGPT's answers and rationales were recorded and assessed on the basis of the official UK Foundation Programme Office scoring template. Questions were categorized into domains of Good Medical Practice on the basis of the domains referenced in the rationales provided in the scoring sheet. Questions without clear domain links were screened by reviewers and assigned one or multiple domains. ChatGPT's overall performance, as well as its performance across the domains of Good Medical Practice, was evaluated.\\nRESULTS: Overall, ChatGPT performed well, scoring 76% on the SJT but scoring full marks on only a few questions (9%), which may reflect possible flaws in ChatGPT's situational judgement or inconsistencies in the reasoning across questions (or both) in the examination itself. ChatGPT demonstrated consistent performance across the 4 outlined domains in Good Medical Practice for doctors.\\nCONCLUSIONS: Further research is needed to understand the potential applications of large language models, such as ChatGPT, in medical education for standardizing questions and providing consistent rationales for examinations assessing professionalism and ethics.\", metadata={'uid': '37548997', 'Title': 'Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom.', 'Published': '2023-08-07', 'Copyright Information': '©Robin J Borchert, Charlotte R Hickman, Jack Pepys, Timothy J Sadler. Originally published in JMIR Medical Education (https://mededu.jmir.org), 07.08.2023.'}),\n",
|
||||
" Document(page_content='', metadata={'uid': '37548971', 'Title': \"Large Language Models Answer Medical Questions Accurately, but Can't Match Clinicians' Knowledge.\", 'Published': '2023-08-07', 'Copyright Information': ''})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"execution_count": 35,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -54,6 +55,14 @@
|
||||
"source": [
|
||||
"retriever.get_relevant_documents(\"chatgpt\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a9ff7a25-bb4b-4cd5-896d-72f70f4af49b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -72,7 +81,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -44,21 +44,20 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:27.863217Z",
|
||||
"start_time": "2023-05-25T15:03:25.690273Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-11T20:31:12.231459Z",
|
||||
"start_time": "2023-08-11T20:31:11.211176Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory.chat_message_histories import ZepChatMessageHistory\n",
|
||||
"from langchain.schema import HumanMessage, AIMessage\n",
|
||||
"from uuid import uuid4\n",
|
||||
"import getpass\n",
|
||||
"import time\n",
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"from langchain.memory import ZepMemory, CombinedMemory, VectorStoreRetrieverMemory\n",
|
||||
"from langchain.schema import HumanMessage, AIMessage\n",
|
||||
"\n",
|
||||
"# Set this to your Zep server URL\n",
|
||||
"ZEP_API_URL = \"http://localhost:8000\""
|
||||
@@ -76,56 +75,50 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-11T20:31:12.237545Z",
|
||||
"start_time": "2023-08-11T20:31:12.232416Z"
|
||||
}
|
||||
],
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Provide your Zep API key. Note that this is optional. See https://docs.getzep.com/deployment/auth\n",
|
||||
"AUTHENTICATE = False\n",
|
||||
"\n",
|
||||
"zep_api_key = getpass.getpass()"
|
||||
"zep_api_key = None\n",
|
||||
"if AUTHENTICATE:\n",
|
||||
" zep_api_key = getpass.getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:29.118416Z",
|
||||
"start_time": "2023-05-25T15:03:29.022464Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-11T20:31:12.342790Z",
|
||||
"start_time": "2023-08-11T20:31:12.235291Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"session_id = str(uuid4()) # This is a unique identifier for the user/session\n",
|
||||
"\n",
|
||||
"# Set up Zep Chat History. We'll use this to add chat histories to the memory store\n",
|
||||
"zep_chat_history = ZepChatMessageHistory(\n",
|
||||
"# Initialize the Zep Memory Class\n",
|
||||
"zep_memory = ZepMemory(\n",
|
||||
" session_id=session_id, url=ZEP_API_URL, api_key=zep_api_key\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:30.271181Z",
|
||||
"start_time": "2023-05-25T15:03:30.180442Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-11T20:31:14.455269Z",
|
||||
"start_time": "2023-08-11T20:31:12.345635Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
@@ -191,11 +184,13 @@
|
||||
"]\n",
|
||||
"\n",
|
||||
"for msg in test_history:\n",
|
||||
" zep_chat_history.add_message(\n",
|
||||
" zep_memory.chat_memory.add_message(\n",
|
||||
" HumanMessage(content=msg[\"content\"])\n",
|
||||
" if msg[\"role\"] == \"human\"\n",
|
||||
" else AIMessage(content=msg[\"content\"])\n",
|
||||
" )"
|
||||
" )\n",
|
||||
" \n",
|
||||
"time.sleep(2) # Wait for the messages to be embedded"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -211,29 +206,20 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:32.979155Z",
|
||||
"start_time": "2023-05-25T15:03:32.590310Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-11T20:31:14.758738Z",
|
||||
"start_time": "2023-08-11T20:31:14.458850Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Parable of the Sower is a science fiction novel by Octavia Butler, published in 1993. It follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.', metadata={'score': 0.8897116216176073, 'uuid': 'db60ff57-f259-4ec4-8a81-178ed4c6e54f', 'created_at': '2023-06-26T23:40:22.816214Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'GPE', 'Matches': [{'End': 20, 'Start': 15, 'Text': 'Sower'}], 'Name': 'Sower'}, {'Label': 'PERSON', 'Matches': [{'End': 65, 'Start': 51, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'DATE', 'Matches': [{'End': 84, 'Start': 80, 'Text': '1993'}], 'Name': '1993'}, {'Label': 'PERSON', 'Matches': [{'End': 124, 'Start': 110, 'Text': 'Lauren Olamina'}], 'Name': 'Lauren Olamina'}]}}, 'token_count': 56}),\n",
|
||||
" Document(page_content=\"Write a short synopsis of Butler's book, Parable of the Sower. What is it about?\", metadata={'score': 0.8856661080361157, 'uuid': 'f1a5981a-8f6d-4168-a548-6e9c32f35fa1', 'created_at': '2023-06-26T23:40:22.809621Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 32, 'Start': 26, 'Text': 'Butler'}], 'Name': 'Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 61, 'Start': 41, 'Text': 'Parable of the Sower'}], 'Name': 'Parable of the Sower'}]}}, 'token_count': 23}),\n",
|
||||
" Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7757595298492976, 'uuid': '361d0043-1009-4e13-a7f0-8aea8b1ee869', 'created_at': '2023-06-26T23:40:22.709886Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 8, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}], 'intent': 'The subject wants to know about the identity or background of an individual named Octavia Butler.'}}, 'token_count': 8}),\n",
|
||||
" Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7601242516059306, 'uuid': '56c45e8a-0f65-45f0-bc46-d9e65164b563', 'created_at': '2023-06-26T23:40:22.778836Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 16, 'Start': 0, 'Text': \"Octavia Butler's\"}], 'Name': \"Octavia Butler's\"}, {'Label': 'ORG', 'Matches': [{'End': 58, 'Start': 41, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 76, 'Start': 60, 'Text': 'Samuel R. Delany'}], 'Name': 'Samuel R. Delany'}, {'Label': 'PERSON', 'Matches': [{'End': 93, 'Start': 82, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}], 'intent': \"The subject is providing information about Octavia Butler's contemporaries.\"}}, 'token_count': 27}),\n",
|
||||
" Document(page_content='You might want to read Ursula K. Le Guin or Joanna Russ.', metadata={'score': 0.7594731095320668, 'uuid': '6951f2fd-dfa4-4e05-9380-f322ef8f72f8', 'created_at': '2023-06-26T23:40:22.80464Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 40, 'Start': 23, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 55, 'Start': 44, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}]}}, 'token_count': 18})]"
|
||||
]
|
||||
"text/plain": "[Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7758688965570713, 'uuid': 'b3322d28-f589-48c7-9daf-5eb092d65976', 'created_at': '2023-08-11T20:31:12.3856Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 8, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}]}}, 'token_count': 8}),\n Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7602672137411663, 'uuid': '756b7136-0b4c-4664-ad33-c4431670356c', 'created_at': '2023-08-11T20:31:12.420717Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 16, 'Start': 0, 'Text': \"Octavia Butler's\"}], 'Name': \"Octavia Butler's\"}, {'Label': 'ORG', 'Matches': [{'End': 58, 'Start': 41, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 76, 'Start': 60, 'Text': 'Samuel R. Delany'}], 'Name': 'Samuel R. Delany'}, {'Label': 'PERSON', 'Matches': [{'End': 93, 'Start': 82, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}]}}, 'token_count': 27}),\n Document(page_content='You might want to read Ursula K. Le Guin or Joanna Russ.', metadata={'score': 0.7596040989115522, 'uuid': '166d9556-2d48-4237-8a84-5d8a1024d5f4', 'created_at': '2023-08-11T20:31:12.434522Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 40, 'Start': 23, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 55, 'Start': 44, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}]}}, 'token_count': 18}),\n Document(page_content='Who were her contemporaries?', metadata={'score': 0.7575531381951208, 'uuid': 'c6a16691-4012-439f-b223-84fd4e79c4cf', 'created_at': '2023-08-11T20:31:12.410336Z', 'role': 'human', 'token_count': 8}),\n Document(page_content='Octavia Estelle Butler (June 22, 1947 – February 24, 2006) was an American science fiction author.', metadata={'score': 0.7546476914454683, 'uuid': '7c093a2a-0099-415a-95c5-615a8026a894', 'created_at': '2023-08-11T20:31:12.399979Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 0, 'Text': 'Octavia Estelle Butler'}], 'Name': 'Octavia Estelle Butler'}, {'Label': 'DATE', 'Matches': [{'End': 37, 'Start': 24, 'Text': 'June 22, 1947'}], 'Name': 'June 22, 1947'}, {'Label': 'DATE', 'Matches': [{'End': 57, 'Start': 40, 'Text': 'February 24, 2006'}], 'Name': 'February 24, 2006'}, {'Label': 'NORP', 'Matches': [{'End': 74, 'Start': 66, 'Text': 'American'}], 'Name': 'American'}]}}, 'token_count': 31})]"
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -260,29 +246,20 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:34.713354Z",
|
||||
"start_time": "2023-05-25T15:03:34.577974Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-11T20:31:14.922838Z",
|
||||
"start_time": "2023-08-11T20:31:14.751737Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Parable of the Sower is a science fiction novel by Octavia Butler, published in 1993. It follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.', metadata={'score': 0.889661105796371, 'uuid': 'db60ff57-f259-4ec4-8a81-178ed4c6e54f', 'created_at': '2023-06-26T23:40:22.816214Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'GPE', 'Matches': [{'End': 20, 'Start': 15, 'Text': 'Sower'}], 'Name': 'Sower'}, {'Label': 'PERSON', 'Matches': [{'End': 65, 'Start': 51, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'DATE', 'Matches': [{'End': 84, 'Start': 80, 'Text': '1993'}], 'Name': '1993'}, {'Label': 'PERSON', 'Matches': [{'End': 124, 'Start': 110, 'Text': 'Lauren Olamina'}], 'Name': 'Lauren Olamina'}]}}, 'token_count': 56}),\n",
|
||||
" Document(page_content=\"Write a short synopsis of Butler's book, Parable of the Sower. What is it about?\", metadata={'score': 0.885754241595424, 'uuid': 'f1a5981a-8f6d-4168-a548-6e9c32f35fa1', 'created_at': '2023-06-26T23:40:22.809621Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 32, 'Start': 26, 'Text': 'Butler'}], 'Name': 'Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 61, 'Start': 41, 'Text': 'Parable of the Sower'}], 'Name': 'Parable of the Sower'}]}}, 'token_count': 23}),\n",
|
||||
" Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7758688965570713, 'uuid': '361d0043-1009-4e13-a7f0-8aea8b1ee869', 'created_at': '2023-06-26T23:40:22.709886Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 8, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}], 'intent': 'The subject wants to know about the identity or background of an individual named Octavia Butler.'}}, 'token_count': 8}),\n",
|
||||
" Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7602672137411663, 'uuid': '56c45e8a-0f65-45f0-bc46-d9e65164b563', 'created_at': '2023-06-26T23:40:22.778836Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 16, 'Start': 0, 'Text': \"Octavia Butler's\"}], 'Name': \"Octavia Butler's\"}, {'Label': 'ORG', 'Matches': [{'End': 58, 'Start': 41, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 76, 'Start': 60, 'Text': 'Samuel R. Delany'}], 'Name': 'Samuel R. Delany'}, {'Label': 'PERSON', 'Matches': [{'End': 93, 'Start': 82, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}], 'intent': \"The subject is providing information about Octavia Butler's contemporaries.\"}}, 'token_count': 27}),\n",
|
||||
" Document(page_content='You might want to read Ursula K. Le Guin or Joanna Russ.', metadata={'score': 0.7596040989115522, 'uuid': '6951f2fd-dfa4-4e05-9380-f322ef8f72f8', 'created_at': '2023-06-26T23:40:22.80464Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 40, 'Start': 23, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 55, 'Start': 44, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}]}}, 'token_count': 18})]"
|
||||
]
|
||||
"text/plain": "[Document(page_content=\"Write a short synopsis of Butler's book, Parable of the Sower. What is it about?\", metadata={'score': 0.8857504413268114, 'uuid': '82f07ab5-9d4b-4db6-aaae-6028e6fd836b', 'created_at': '2023-08-11T20:31:12.437365Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 32, 'Start': 26, 'Text': 'Butler'}], 'Name': 'Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 61, 'Start': 41, 'Text': 'Parable of the Sower'}], 'Name': 'Parable of the Sower'}]}}, 'token_count': 23}),\n Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7758688965570713, 'uuid': 'b3322d28-f589-48c7-9daf-5eb092d65976', 'created_at': '2023-08-11T20:31:12.3856Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 8, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}]}}, 'token_count': 8}),\n Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7602672137411663, 'uuid': '756b7136-0b4c-4664-ad33-c4431670356c', 'created_at': '2023-08-11T20:31:12.420717Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 16, 'Start': 0, 'Text': \"Octavia Butler's\"}], 'Name': \"Octavia Butler's\"}, {'Label': 'ORG', 'Matches': [{'End': 58, 'Start': 41, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 76, 'Start': 60, 'Text': 'Samuel R. Delany'}], 'Name': 'Samuel R. Delany'}, {'Label': 'PERSON', 'Matches': [{'End': 93, 'Start': 82, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}]}}, 'token_count': 27}),\n Document(page_content='You might want to read Ursula K. Le Guin or Joanna Russ.', metadata={'score': 0.7596040989115522, 'uuid': '166d9556-2d48-4237-8a84-5d8a1024d5f4', 'created_at': '2023-08-11T20:31:12.434522Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 40, 'Start': 23, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 55, 'Start': 44, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}]}}, 'token_count': 18}),\n Document(page_content='Who were her contemporaries?', metadata={'score': 0.7575531381951208, 'uuid': 'c6a16691-4012-439f-b223-84fd4e79c4cf', 'created_at': '2023-08-11T20:31:12.410336Z', 'role': 'human', 'token_count': 8})]"
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -293,19 +270,16 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-18T20:09:21.298710Z",
|
||||
"start_time": "2023-05-18T20:09:21.297169Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"execution_count": 6,
|
||||
"outputs": [],
|
||||
"source": []
|
||||
"source": [],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-08-11T20:31:14.923032Z",
|
||||
"start_time": "2023-08-11T20:31:14.918181Z"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -20,7 +20,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 5,
|
||||
"id": "282239c8-e03a-4abc-86c1-ca6120231a20",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -28,7 +28,7 @@
|
||||
"from langchain.embeddings import BedrockEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = BedrockEmbeddings(\n",
|
||||
" credentials_profile_name=\"bedrock-admin\", endpoint_url=\"custom_endpoint_url\"\n",
|
||||
" credentials_profile_name=\"bedrock-admin\", region_name=\"us-east-1\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -49,7 +49,29 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings.embed_documents([\"This is a content of the document\"])"
|
||||
"embeddings.embed_documents([\"This is a content of the document\", \"This is another document\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9f6b364d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# async embed query\n",
|
||||
"await embeddings.aembed_query(\"This is a content of the document\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c9240a5a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# async embed documents\n",
|
||||
"await embeddings.aembed_documents([\"This is a content of the document\", \"This is another document\"])"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -69,7 +91,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
"version": "3.9.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -0,0 +1,84 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "719619d3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# BGE Hugging Face Embeddings\n",
|
||||
"\n",
|
||||
"This notebook shows how to use BGE Embeddings through Hugging Face"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "f7a54279",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# !pip install sentence_transformers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "9e1d5b6b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings import HuggingFaceBgeEmbeddings\n",
|
||||
"\n",
|
||||
"model_name = \"BAAI/bge-small-en\"\n",
|
||||
"model_kwargs = {'device': 'cpu'}\n",
|
||||
"encode_kwargs = {'normalize_embeddings': False}\n",
|
||||
"hf = HuggingFaceBgeEmbeddings(\n",
|
||||
" model_name=model_name,\n",
|
||||
" model_kwargs=model_kwargs,\n",
|
||||
" encode_kwargs=encode_kwargs\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "e59d1a89",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embedding = hf.embed_query(\"hi this is harrison\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e596315f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -12,7 +12,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 6,
|
||||
"id": "0be1af71",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -22,7 +22,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 29,
|
||||
"id": "2c66e5da",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -32,7 +32,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 30,
|
||||
"id": "01370375",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -42,7 +42,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 31,
|
||||
"id": "bfb6142c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -52,7 +52,32 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 32,
|
||||
"id": "91bc875d-829b-4c3d-8e6f-fc2dda30a3bd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[-0.003186025367556387,\n",
|
||||
" 0.011071979803637493,\n",
|
||||
" -0.004020420763285827,\n",
|
||||
" -0.011658221276953042,\n",
|
||||
" -0.0010534035786864363]"
|
||||
]
|
||||
},
|
||||
"execution_count": 32,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query_result[:5]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"id": "0356c3b7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -60,6 +85,31 @@
|
||||
"doc_result = embeddings.embed_documents([text])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 34,
|
||||
"id": "a4b0d49e-0c73-44b6-aed5-5b426564e085",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[-0.003186025367556387,\n",
|
||||
" 0.011071979803637493,\n",
|
||||
" -0.004020420763285827,\n",
|
||||
" -0.011658221276953042,\n",
|
||||
" -0.0010534035786864363]"
|
||||
]
|
||||
},
|
||||
"execution_count": 34,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"doc_result[0][:5]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bb61bbeb",
|
||||
@@ -70,7 +120,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"id": "c0b072cc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -80,17 +130,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 23,
|
||||
"id": "a56b70f5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
"embeddings = OpenAIEmbeddings(model=\"text-search-ada-doc-001\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 24,
|
||||
"id": "14aefb64",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -100,7 +150,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 25,
|
||||
"id": "3c39ed33",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -110,7 +160,32 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 26,
|
||||
"id": "2ee7ce9f-d506-4810-8897-e44334412714",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[0.004452846988523035,\n",
|
||||
" 0.034550655976098514,\n",
|
||||
" -0.015029939040690051,\n",
|
||||
" 0.03827273883655212,\n",
|
||||
" 0.005785414075152477]"
|
||||
]
|
||||
},
|
||||
"execution_count": 26,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query_result[:5]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"id": "e3221db6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -118,6 +193,31 @@
|
||||
"doc_result = embeddings.embed_documents([text])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"id": "a0865409-3a6d-468f-939f-abde17c7cac3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[0.004452846988523035,\n",
|
||||
" 0.034550655976098514,\n",
|
||||
" -0.015029939040690051,\n",
|
||||
" 0.03827273883655212,\n",
|
||||
" 0.005785414075152477]"
|
||||
]
|
||||
},
|
||||
"execution_count": 28,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"doc_result[0][:5]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -132,7 +232,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.11.1 64-bit",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -146,7 +246,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Multion Toolkit\n",
|
||||
"# MultiOn Toolkit\n",
|
||||
"\n",
|
||||
"This notebook walks you through connecting LangChain to the MultiOn Client in your browser\n",
|
||||
"\n",
|
||||
@@ -18,7 +18,32 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install --upgrade multion > /dev/null"
|
||||
"!pip install --upgrade multion langchain -q"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents.agent_toolkits import MultionToolkit\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"toolkit = MultionToolkit()\n",
|
||||
"\n",
|
||||
"toolkit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = toolkit.get_tools()\n",
|
||||
"tools"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -38,8 +63,9 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Authorize connection to your Browser extention\n",
|
||||
"import multion \n",
|
||||
"multion.login()\n"
|
||||
"import multion\n",
|
||||
"multion.login()\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -57,38 +83,18 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents.agent_toolkits import create_multion_agent\n",
|
||||
"from langchain.tools.multion.tool import MultionClientTool\n",
|
||||
"from langchain.agents.agent_types import AgentType\n",
|
||||
"from langchain.chat_models import ChatOpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"agent_executor = create_multion_agent(\n",
|
||||
" llm=ChatOpenAI(temperature=0),\n",
|
||||
" tool=MultionClientTool(),\n",
|
||||
" agent_type=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
" verbose=True\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent.run(\"show me the weather today\")"
|
||||
"from langchain import OpenAI\n",
|
||||
"from langchain.agents import initialize_agent, AgentType\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"from langchain.agents.agent_toolkits import MultionToolkit\n",
|
||||
"toolkit = MultionToolkit()\n",
|
||||
"tools=toolkit.get_tools()\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools=toolkit.get_tools(),\n",
|
||||
" llm=llm,\n",
|
||||
" agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n",
|
||||
" verbose = True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -100,7 +106,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent.run(\n",
|
||||
" \"Tweet about Elon Musk\"\n",
|
||||
" \"Tweet 'Hi from MultiOn'\"\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
|
||||
134
docs/extras/integrations/tools/alpha_vantage.ipynb
Normal file
@@ -0,0 +1,134 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "245a954a",
|
||||
"metadata": {
|
||||
"id": "245a954a"
|
||||
},
|
||||
"source": [
|
||||
"# Alpha Vantage\n",
|
||||
"\n",
|
||||
">[Alpha Vantage](https://www.alphavantage.co) Alpha Vantage provides realtime and historical financial market data through a set of powerful and developer-friendly data APIs and spreadsheets. \n",
|
||||
"\n",
|
||||
"Use the ``AlphaVantageAPIWrapper`` to get currency exchange rates."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "34bb5968",
|
||||
"metadata": {
|
||||
"id": "34bb5968"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"ALPHAVANTAGE_API_KEY\"] = getpass.getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "ac4910f8",
|
||||
"metadata": {
|
||||
"id": "ac4910f8"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.utilities.alpha_vantage import AlphaVantageAPIWrapper"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "84b8f773",
|
||||
"metadata": {
|
||||
"id": "84b8f773"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"alpha_vantage = AlphaVantageAPIWrapper()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "068991a6",
|
||||
"metadata": {
|
||||
"id": "068991a6",
|
||||
"outputId": "c5cdc6ec-03cf-4084-cc6f-6ae792d91d39"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'1. From_Currency Code': 'USD',\n",
|
||||
" '2. From_Currency Name': 'United States Dollar',\n",
|
||||
" '3. To_Currency Code': 'JPY',\n",
|
||||
" '4. To_Currency Name': 'Japanese Yen',\n",
|
||||
" '5. Exchange Rate': '144.93000000',\n",
|
||||
" '6. Last Refreshed': '2023-08-11 21:31:01',\n",
|
||||
" '7. Time Zone': 'UTC',\n",
|
||||
" '8. Bid Price': '144.92600000',\n",
|
||||
" '9. Ask Price': '144.93400000'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"alpha_vantage.run(\"USD\", \"JPY\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "84fc2b66-c08f-4cd3-ae13-494c54789c09",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "53f3bc57609c7a84333bb558594977aa5b4026b1d6070b93987956689e367341"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -66,7 +66,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import OpenAI\n",
|
||||
"from langchain.agents import load_tools, AgentType\n",
|
||||
"from langchain.agents import load_tools, initialize_agent, AgentType\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"\n",
|
||||
|
||||
194
docs/extras/integrations/tools/dalle_image_generator.ipynb
Normal file
@@ -0,0 +1,194 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Dall-E Image Generator\n",
|
||||
"\n",
|
||||
"This notebook shows how you can generate images from a prompt synthesized using an OpenAI LLM. The images are generated using Dall-E, which uses the same OpenAI API key as the LLM."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Needed if you would like to display images in the notebook\n",
|
||||
"!pip install opencv-python scikit-image"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"id": "q-k8wmp0zquh"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"import os\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"<your-key-here>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run as a chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.utilities.dalle_image_generator import DallEAPIWrapper\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0.9)\n",
|
||||
"prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"image_desc\"],\n",
|
||||
" template=\"Generate a detailed prompt to generate an image based on the following description: {image_desc}\",\n",
|
||||
")\n",
|
||||
"chain = LLMChain(llm=llm, prompt=prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"image_url = DallEAPIWrapper().run(chain.run(\"halloween night at a haunted museum\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'https://oaidalleapiprodscus.blob.core.windows.net/private/org-i0zjYONU3PemzJ222esBaAzZ/user-f6uEIOFxoiUZivy567cDSWni/img-i7Z2ZxvJ4IbbdAiO6OXJgS3v.png?st=2023-08-11T14%3A03%3A14Z&se=2023-08-11T16%3A03%3A14Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-08-10T20%3A58%3A32Z&ske=2023-08-11T20%3A58%3A32Z&sks=b&skv=2021-08-06&sig=/sECe7C0EAq37ssgBm7g7JkVIM/Q1W3xOstd0Go6slA%3D'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"image_url"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You can click on the link above to display the image \n",
|
||||
"# Or you can try the options below to display the image inline in this notebook\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" import google.colab\n",
|
||||
" IN_COLAB = True\n",
|
||||
"except:\n",
|
||||
" IN_COLAB = False\n",
|
||||
"\n",
|
||||
"if IN_COLAB:\n",
|
||||
" from google.colab.patches import cv2_imshow # for image display\n",
|
||||
" from skimage import io\n",
|
||||
"\n",
|
||||
" image = io.imread(image_url) \n",
|
||||
" cv2_imshow(image)\n",
|
||||
"else:\n",
|
||||
" import cv2\n",
|
||||
" from skimage import io\n",
|
||||
"\n",
|
||||
" image = io.imread(image_url) \n",
|
||||
" cv2.imshow('image', image)\n",
|
||||
" cv2.waitKey(0) #wait for a keyboard input\n",
|
||||
" cv2.destroyAllWindows()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run as a tool with an agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m What is the best way to turn this description into an image?\n",
|
||||
"Action: Dall-E Image Generator\n",
|
||||
"Action Input: A spooky Halloween night at a haunted museum\u001b[0mhttps://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A36Z&ske=2023-01-31T22%3A19%3A36Z&sks=b&skv=2021-08-06&sig=XsomxxBfu2CP78SzR9lrWUlbask4wBNnaMsHamy4VvU%3D\n",
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mhttps://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A36Z&ske=2023-01-31T22%3A19%3A36Z&sks=b&skv=2021-08-06&sig=XsomxxBfu2CP78SzR9lrWUlbask4wBNnaMsHamy4VvU%3D\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m With the image generated, I can now make my final answer.\n",
|
||||
"Final Answer: An image of a Halloween night at a haunted museum can be seen here: https://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.agents import load_tools\n",
|
||||
"from langchain.agents import initialize_agent\n",
|
||||
"\n",
|
||||
"tools = load_tools(['dalle-image-generator'])\n",
|
||||
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)\n",
|
||||
"output = agent.run(\"Create an image of a halloween night at a haunted museum\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "poetry-venv",
|
||||
"language": "python",
|
||||
"name": "poetry-venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "3570c8892273ffbeee7ead61dc7c022b73551d9f55fb2584ac0e8e8920b18a89"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -9,7 +9,7 @@
|
||||
"\n",
|
||||
"This notebook goes over how to use the google search component.\n",
|
||||
"\n",
|
||||
"First, you need to set up the proper API keys and environment variables. To set it up, create the GOOGLE_API_KEY in the Google Cloud credential console (https://console.cloud.google.com/apis/credentials) and a GOOGLE_CSE_ID using the Programmable Search Enginge (https://programmablesearchengine.google.com/controlpanel/create). Next, it is good to follow the instructions found [here](https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search).\n",
|
||||
"First, you need to set up the proper API keys and environment variables. To set it up, create the GOOGLE_API_KEY in the Google Cloud credential console (https://console.cloud.google.com/apis/credentials) and a GOOGLE_CSE_ID using the Programmable Search Engine (https://programmablesearchengine.google.com/controlpanel/create). Next, it is good to follow the instructions found [here](https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search).\n",
|
||||
"\n",
|
||||
"Then we will need to set some environment variables."
|
||||
]
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# SQL
|
||||
# SQL Database Chain
|
||||
|
||||
This example demonstrates the use of the `SQLDatabaseChain` for answering questions over a SQL database.
|
||||
|
||||
import Example from "@snippets/modules/chains/popular/sqlite.mdx"
|
||||
|
||||
<Example/>
|
||||
<Example/>
|
||||
300
docs/extras/integrations/vectorstores/bageldb.ipynb
Normal file
@@ -0,0 +1,300 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# BagelDB\n",
|
||||
"\n",
|
||||
"> [BagelDB](https://www.bageldb.ai/) (`Open Vector Database for AI`), is like GitHub for AI data.\n",
|
||||
"It is a collaborative platform where users can create,\n",
|
||||
"share, and manage vector datasets. It can support private projects for independent developers,\n",
|
||||
"internal collaborations for enterprises, and public contributions for data DAOs.\n",
|
||||
"\n",
|
||||
"### Installation and Setup\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"pip install betabageldb\n",
|
||||
"```\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create VectorStore from texts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import Bagel\n",
|
||||
"\n",
|
||||
"texts = [\"hello bagel\", \"hello langchain\", \"I love salad\", \"my car\", \"a dog\"]\n",
|
||||
"# create cluster and add texts\n",
|
||||
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='hello bagel', metadata={}),\n",
|
||||
" Document(page_content='my car', metadata={}),\n",
|
||||
" Document(page_content='I love salad', metadata={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# similarity search\n",
|
||||
"cluster.similarity_search(\"bagel\", k=3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[(Document(page_content='hello bagel', metadata={}), 0.27392977476119995),\n",
|
||||
" (Document(page_content='my car', metadata={}), 1.4783176183700562),\n",
|
||||
" (Document(page_content='I love salad', metadata={}), 1.5342965126037598)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# the score is a distance metric, so lower is better\n",
|
||||
"cluster.similarity_search_with_score(\"bagel\", k=3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# delete the cluster\n",
|
||||
"cluster.delete_cluster()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create VectorStore from docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)[:10]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 36,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create cluster with docs\n",
|
||||
"cluster = Bagel.from_documents(cluster_name=\"testing_with_docs\", documents=docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 37,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# similarity search\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = cluster.similarity_search(query)\n",
|
||||
"print(docs[0].page_content[:102])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Get all text/doc from Cluster"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 53,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"texts = [\"hello bagel\", \"this is langchain\"]\n",
|
||||
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts)\n",
|
||||
"cluster_data = cluster.get()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 54,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"dict_keys(['ids', 'embeddings', 'metadatas', 'documents'])"
|
||||
]
|
||||
},
|
||||
"execution_count": 54,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# all keys\n",
|
||||
"cluster_data.keys()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 56,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'ids': ['578c6d24-3763-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" '578c6d25-3763-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" 'fb2fc7d8-3762-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" 'fb2fc7d9-3762-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" '6b40881a-3762-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" '6b40881b-3762-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" '581e691e-3762-11ee-a8ab-b7b7b34f99ba',\n",
|
||||
" '581e691f-3762-11ee-a8ab-b7b7b34f99ba'],\n",
|
||||
" 'embeddings': None,\n",
|
||||
" 'metadatas': [{}, {}, {}, {}, {}, {}, {}, {}],\n",
|
||||
" 'documents': ['hello bagel',\n",
|
||||
" 'this is langchain',\n",
|
||||
" 'hello bagel',\n",
|
||||
" 'this is langchain',\n",
|
||||
" 'hello bagel',\n",
|
||||
" 'this is langchain',\n",
|
||||
" 'hello bagel',\n",
|
||||
" 'this is langchain']}"
|
||||
]
|
||||
},
|
||||
"execution_count": 56,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# all values and keys\n",
|
||||
"cluster_data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 57,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"cluster.delete_cluster()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create cluster with metadata & filter using metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 63,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[(Document(page_content='hello bagel', metadata={'source': 'notion'}), 0.0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 63,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"texts = [\"hello bagel\", \"this is langchain\"]\n",
|
||||
"metadatas = [{\"source\": \"notion\"}, {\"source\": \"google\"}]\n",
|
||||
"\n",
|
||||
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts, metadatas=metadatas)\n",
|
||||
"cluster.similarity_search_with_score(\"hello bagel\", where={\"source\": \"notion\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 64,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# delete the cluster\n",
|
||||
"cluster.delete_cluster()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
244
docs/extras/integrations/vectorstores/dingo.ipynb
Normal file
@@ -0,0 +1,244 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Dingo\n",
|
||||
"\n",
|
||||
">[Dingo](https://dingodb.readthedocs.io/en/latest/) is a distributed multi-mode vector database, which combines the characteristics of data lakes and vector databases, and can store data of any type and size (Key-Value, PDF, audio, video, etc.). It has real-time low-latency processing capabilities to achieve rapid insight and response, and can efficiently conduct instant analysis and process multi-modal data.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the DingoDB vector database.\n",
|
||||
"\n",
|
||||
"To run, you should have a [DingoDB instance up and running](https://github.com/dingodb/dingo-deploy/blob/main/README.md)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a62cff8a-bcf7-4e33-bbbc-76999c2e3e20",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install dingodb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7a0f9e02-8eb0-4aef-b11f-8861360472ee",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "8b6ed9cd-81b9-46e5-9c20-5aafca2844d0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key:········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Dingo\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "dcf88bdf",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from dingodb import DingoDB\n",
|
||||
"\n",
|
||||
"index_name = \"langchain-demo\"\n",
|
||||
"\n",
|
||||
"dingo_client = DingoDB(user=\"\", password=\"\", host=[\"127.0.0.1:13000\"])\n",
|
||||
"# First, check if our index already exists. If it doesn't, we create it\n",
|
||||
"if index_name not in dingo_client.get_index():\n",
|
||||
" # we create a new index\n",
|
||||
" dingo_client.create_index(\n",
|
||||
" index_name=index_name,\n",
|
||||
" dimension=1536,\n",
|
||||
" metric_type='cosine',\n",
|
||||
" auto_id=False\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`\n",
|
||||
"docsearch = Dingo.from_documents(docs, embeddings, client=dingo_client, index_name=index_name)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c3aae49e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Dingo\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "a8c513ab",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "fc516993",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(docs[0][1])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1eca81e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Adding More Text to an Existing Index\n",
|
||||
"\n",
|
||||
"More text can embedded and upserted to an existing Dingo index using the `add_texts` function"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e40d558b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vectorstore = Dingo(client, embeddings.embed_query, \"text\")\n",
|
||||
"\n",
|
||||
"vectorstore.add_texts(\"More text!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bcb858a8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Maximal Marginal Relevance Searches\n",
|
||||
"\n",
|
||||
"In addition to using similarity search in the retriever object, you can also use `mmr` as retriever."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "649083ab",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
|
||||
"matched_docs = retriever.get_relevant_documents(query)\n",
|
||||
"for i, d in enumerate(matched_docs):\n",
|
||||
" print(f\"\\n## Document {i}\\n\")\n",
|
||||
" print(d.page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7d3831ad",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Or use `max_marginal_relevance_search` directly:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "732f58b1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
|
||||
"for i, doc in enumerate(found_docs):\n",
|
||||
" print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -80,7 +80,7 @@
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||||
"loader = TextLoader(\"../../../extras/modules/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
@@ -90,7 +90,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 3,
|
||||
"id": "5eabdb75",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -517,6 +517,67 @@
|
||||
"for doc in results:\n",
|
||||
" print(f\"Content: {doc.page_content}, Metadata: {doc.metadata}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1becca53",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Delete\n",
|
||||
"\n",
|
||||
"You can also delete ids. Note that the ids to delete should be the ids in the docstore."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "1408b870",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"db.delete([db.index_to_docstore_id[0]])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "d13daf33",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"False"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Is now missing\n",
|
||||
"0 in db.index_to_docstore_id"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "30ace43e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -21,7 +21,10 @@
|
||||
"\n",
|
||||
"1. Leverage the `Rockset` console to create a [collection](https://rockset.com/docs/collections/) with the Write API as your source. In this walkthrough, we create a collection named `langchain_demo`. \n",
|
||||
" \n",
|
||||
" Configure the following [ingest transformation](https://rockset.com/docs/ingest-transformation/) to mark your embeddings field and take advantage of performance and storage optimizations:"
|
||||
" Configure the following [ingest transformation](https://rockset.com/docs/ingest-transformation/) to mark your embeddings field and take advantage of performance and storage optimizations:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" (We used OpenAI `text-embedding-ada-002` for this examples, where #length_of_vector_embedding = 1536)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -75,23 +78,10 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": null,
|
||||
"id": "29505c1e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "InitializationException",
|
||||
"evalue": "The rockset client was initialized incorrectly: An api key must be provided as a parameter to the RocksetClient or the Configuration object.",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mInitializationException\u001b[0m Traceback (most recent call last)",
|
||||
"Cell \u001b[0;32mIn[5], line 6\u001b[0m\n\u001b[1;32m 4\u001b[0m ROCKSET_API_KEY \u001b[39m=\u001b[39m os\u001b[39m.\u001b[39menviron\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mROCKSET_API_KEY\u001b[39m\u001b[39m\"\u001b[39m) \u001b[39m# Verify ROCKSET_API_KEY environment variable\u001b[39;00m\n\u001b[1;32m 5\u001b[0m ROCKSET_API_SERVER \u001b[39m=\u001b[39m rockset\u001b[39m.\u001b[39mRegions\u001b[39m.\u001b[39musw2a1 \u001b[39m# Verify Rockset region\u001b[39;00m\n\u001b[0;32m----> 6\u001b[0m rockset_client \u001b[39m=\u001b[39m rockset\u001b[39m.\u001b[39;49mRocksetClient(ROCKSET_API_SERVER, ROCKSET_API_KEY)\n\u001b[1;32m 8\u001b[0m COLLECTION_NAME\u001b[39m=\u001b[39m\u001b[39m'\u001b[39m\u001b[39mlangchain_demo\u001b[39m\u001b[39m'\u001b[39m\n\u001b[1;32m 9\u001b[0m TEXT_KEY\u001b[39m=\u001b[39m\u001b[39m'\u001b[39m\u001b[39mdescription\u001b[39m\u001b[39m'\u001b[39m\n",
|
||||
"File \u001b[0;32m~/Library/Python/3.9/lib/python/site-packages/rockset/rockset_client.py:242\u001b[0m, in \u001b[0;36mRocksetClient.__init__\u001b[0;34m(self, host, api_key, max_workers, config)\u001b[0m\n\u001b[1;32m 239\u001b[0m config\u001b[39m.\u001b[39mhost \u001b[39m=\u001b[39m host\n\u001b[1;32m 241\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m config\u001b[39m.\u001b[39mapi_key:\n\u001b[0;32m--> 242\u001b[0m \u001b[39mraise\u001b[39;00m InitializationException(\n\u001b[1;32m 243\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mAn api key must be provided as a parameter to the RocksetClient or the Configuration object.\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 244\u001b[0m )\n\u001b[1;32m 246\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mapi_client \u001b[39m=\u001b[39m ApiClient(config, max_workers\u001b[39m=\u001b[39mmax_workers)\n\u001b[1;32m 248\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mAliases \u001b[39m=\u001b[39m AliasesApiWrapper(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mapi_client)\n",
|
||||
"\u001b[0;31mInitializationException\u001b[0m: The rockset client was initialized incorrectly: An api key must be provided as a parameter to the RocksetClient or the Configuration object."
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import rockset\n",
|
||||
@@ -118,18 +108,7 @@
|
||||
"execution_count": null,
|
||||
"id": "9740d8c4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "",
|
||||
"evalue": "",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[1;31mRunning cells with '/opt/local/bin/python3.11' requires the ipykernel package.\n",
|
||||
"\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
|
||||
"\u001b[1;31mCommand: '/opt/local/bin/python3.11 -m pip install ipykernel -U --user --force-reinstall'"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
@@ -155,20 +134,9 @@
|
||||
"execution_count": null,
|
||||
"id": "85b6a6c5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "",
|
||||
"evalue": "",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[1;31mRunning cells with '/opt/local/bin/python3.11' requires the ipykernel package.\n",
|
||||
"\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
|
||||
"\u001b[1;31mCommand: '/opt/local/bin/python3.11 -m pip install ipykernel -U --user --force-reinstall'"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = OpenAIEmbeddings() # Verify OPENAI_KEY environment variable\n",
|
||||
"embeddings = OpenAIEmbeddings() # Verify OPENAI_API_KEY environment variable\n",
|
||||
"\n",
|
||||
"docsearch = Rockset(\n",
|
||||
" client=rockset_client,\n",
|
||||
@@ -194,22 +162,10 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"id": "0bbf3df0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"ename": "NameError",
|
||||
"evalue": "name 'docsearch' is not defined",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
|
||||
"Cell \u001b[0;32mIn[1], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m query \u001b[39m=\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mWhat did the president say about Ketanji Brown Jackson?\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m----> 2\u001b[0m output \u001b[39m=\u001b[39m docsearch\u001b[39m.\u001b[39msimilarity_search_with_relevance_scores(query, \u001b[39m4\u001b[39m, Rockset\u001b[39m.\u001b[39mDistanceFunction\u001b[39m.\u001b[39mCOSINE_SIM)\n\u001b[1;32m 4\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39moutput length:\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39mlen\u001b[39m(output))\n\u001b[1;32m 5\u001b[0m \u001b[39mfor\u001b[39;00m d, dist \u001b[39min\u001b[39;00m output:\n",
|
||||
"\u001b[0;31mNameError\u001b[0m: name 'docsearch' is not defined"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"output = docsearch.similarity_search_with_relevance_scores(\n",
|
||||
@@ -313,7 +269,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.6"
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -35,7 +35,7 @@
|
||||
"\n",
|
||||
" -- Create a table to store your documents\n",
|
||||
" create table documents (\n",
|
||||
" id bigserial primary key,\n",
|
||||
" id uuid primary key,\n",
|
||||
" content text, -- corresponds to Document.pageContent\n",
|
||||
" metadata jsonb, -- corresponds to Document.metadata\n",
|
||||
" embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed\n",
|
||||
|
||||
195
docs/extras/integrations/vectorstores/usearch.ipynb
Normal file
@@ -0,0 +1,195 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bb384510-d9b4-4fa1-84c2-f181eb28487d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# USearch\n",
|
||||
">[USearch](https://unum-cloud.github.io/usearch/) is a Smaller & Faster Single-File Vector Search Engine\n",
|
||||
"\n",
|
||||
"USearch's base functionality is identical to FAISS, and the interface should look familiar if you have ever investigated Approximate Nearest Neigbors search. FAISS is a widely recognized standard for high-performance vector search engines. USearch and FAISS both employ the same HNSW algorithm, but they differ significantly in their design principles. USearch is compact and broadly compatible without sacrificing performance, with a primary focus on user-defined metrics and fewer dependencies."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "497fcd89-e832-46a7-a74a-c71199666206",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install usearch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "38237514-b3fa-44a4-9cff-30cd6bf50073",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "47f9b495-88f1-4286-8d5d-1416103931a7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import USearch\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../../extras/modules/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "5eabdb75",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = USearch.from_documents(docs, embeddings)\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "4b172de8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f13473b5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Similarity Search with score\n",
|
||||
"The `similarity_search_with_score` method allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "186ee1d8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_and_scores = db.similarity_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "284e04b5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../extras/modules/state_of_the_union.txt'}),\n",
|
||||
" 0.1845687)"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs_and_scores[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "483f6013-fb32-4756-a9e2-3d529fb81f68",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -8,7 +8,7 @@
|
||||
"source": [
|
||||
"# Vectara\n",
|
||||
"\n",
|
||||
">[Vectara](https://vectara.com/) is a API platform for building GenAI applications. It provides an easy-to-use API for document indexing and query that is managed by Vectara and is optimized for performance and accuracy. \n",
|
||||
">[Vectara](https://vectara.com/) is a API platform for building GenAI applications. It provides an easy-to-use API for document indexing and querying that is managed by Vectara and is optimized for performance and accuracy. \n",
|
||||
"See the [Vectara API documentation ](https://docs.vectara.com/docs/) for more information on how to use the API.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Vectara`'s integration with langchain.\n",
|
||||
|
||||
@@ -81,17 +81,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 21,
|
||||
"id": "53b7ce2d-3c09-4d1c-b66b-5769ce6746ae",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"WEAVIATE_API_KEY\"] = getpass.getpass(\"WEAVIATE_API_KEY:\")"
|
||||
"os.environ[\"WEAVIATE_API_KEY\"] = getpass.getpass(\"WEAVIATE_API_KEY:\")\n",
|
||||
"WEAVIATE_API_KEY = os.environ[\"WEAVIATE_API_KEY\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 7,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -106,7 +107,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 12,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -123,7 +124,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 14,
|
||||
"id": "21e9e528",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -133,7 +134,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 15,
|
||||
"id": "b4170176",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -144,7 +145,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 16,
|
||||
"id": "ecf3b890",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -166,6 +167,53 @@
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7826d0ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Authentication"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "13989a7c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Weaviate instances have authentication enabled by default. You can use either a username/password combination or API key. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "f6604f1d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<langchain.vectorstores.weaviate.Weaviate object at 0x107f46550>\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import weaviate\n",
|
||||
"\n",
|
||||
"client = weaviate.Client(url=WEAVIATE_URL, auth_client_secret=weaviate.AuthApiKey(WEAVIATE_API_KEY))\n",
|
||||
"\n",
|
||||
"# client = weaviate.Client(\n",
|
||||
"# url=WEAVIATE_URL,\n",
|
||||
"# auth_client_secret=weaviate.AuthClientPassword(\n",
|
||||
"# username = \"WCS_USERNAME\", # Replace w/ your WCS username\n",
|
||||
"# password = \"WCS_PASSWORD\", # Replace w/ your WCS password\n",
|
||||
"# ),\n",
|
||||
"# )\n",
|
||||
"\n",
|
||||
"vectorstore = Weaviate.from_documents(documents, embeddings, client=client, by_text=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
@@ -187,7 +235,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 17,
|
||||
"id": "102105a1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -213,7 +261,7 @@
|
||||
"id": "8fc3487b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Persistance"
|
||||
"# Persistence"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -249,7 +297,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": null,
|
||||
"id": "8b7df7ae",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -287,7 +335,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": null,
|
||||
"id": "5e824f3b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -298,7 +346,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": null,
|
||||
"id": "61209cc3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -311,7 +359,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": null,
|
||||
"id": "4abc3d37",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -327,7 +375,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": null,
|
||||
"id": "c7062393",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -339,7 +387,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": null,
|
||||
"id": "7e41b773",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
|
||||
240
docs/extras/integrations/vectorstores/xata.ipynb
Normal file
@@ -0,0 +1,240 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Xata\n",
|
||||
"\n",
|
||||
"> [Xata](https://xata.io) is a serverless data platform, based on PostgreSQL. It provides a Python SDK for interacting with your database, and a UI for managing your data.\n",
|
||||
"> Xata has a native vector type, which can be added to any table, and supports similarity search. LangChain inserts vectors directly to Xata, and queries it for the nearest neighbors of a given vector, so that you can use all the LangChain Embeddings integrations with Xata."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook guides you how to use Xata as a VectorStore."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"### Create a database to use as a vector store\n",
|
||||
"\n",
|
||||
"In the [Xata UI](https://app.xata.io) create a new database. You can name it whatever you want, in this notepad we'll use `langchain`.\n",
|
||||
"Create a table, again you can name it anything, but we will use `vectors`. Add the following columns via the UI:\n",
|
||||
"\n",
|
||||
"* `content` of type \"Text\". This is used to store the `Document.pageContent` values.\n",
|
||||
"* `embedding` of type \"Vector\". Use the dimension used by the model you plan to use. In this notebook we use OpenAI embeddings, which have 1536 dimensions.\n",
|
||||
"* `search` of type \"Text\". This is used as a metadata column by this example.\n",
|
||||
"* any other columns you want to use as metadata. They are populated from the `Document.metadata` object. For example, if in the `Document.metadata` object you have a `title` property, you can create a `title` column in the table and it will be populated.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's first install our dependencies:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install xata==1.0.0a7 openai tiktoken langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's load the OpenAI key to the environemnt. If you don't have one you can create an OpenAI account and create a key on this [page](https://platform.openai.com/account/api-keys)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Similarly, we need to get the environment variables for Xata. You can create a new API key by visiting your [account settings](https://app.xata.io/settings). To find the database URL, go to the Settings page of the database that you have created. The database URL should look something like this: `https://demo-uni3q8.eu-west-1.xata.sh/db/langchain`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"api_key = getpass.getpass(\"Xata API key: \")\n",
|
||||
"db_url = input(\"Xata database URL (copy it from your DB settings):\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"from langchain.vectorstores.xata import XataVectorStore\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create the Xata vector store\n",
|
||||
"Let's import our test dataset:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now create the actual vector store, backed by the Xata table."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store = XataVectorStore.from_documents(docs, embeddings, api_key=api_key, db_url=db_url, table_name=\"vectors\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"After running the above command, if you go to the Xata UI, you should see the documents loaded together with their embeddings."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity Search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = vector_store.similarity_search(query)\n",
|
||||
"print(found_docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity Search with score (vector distance)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = vector_store.similarity_search_with_score(query)\n",
|
||||
"for doc, score in result:\n",
|
||||
" print(f\"document={doc}, score={score}\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||