Compare commits
182 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3f1302a4ab | ||
|
|
9cdea4e0e1 | ||
|
|
98c48f303a | ||
|
|
111bd7ddbe | ||
|
|
ee40d37098 | ||
|
|
fa0a9e502a | ||
|
|
25e3d3f283 | ||
|
|
2e47412073 | ||
|
|
ff3aada0b2 | ||
|
|
ca79044948 | ||
|
|
beb38f4f4d | ||
|
|
1db13e8a85 | ||
|
|
c58d35765d | ||
|
|
ed97af423c | ||
|
|
c4ece52dac | ||
|
|
0d058d4046 | ||
|
|
4cb9f1eda8 | ||
|
|
1d06eee3b5 | ||
|
|
2e3d77c34e | ||
|
|
c871c04270 | ||
|
|
96f3dff050 | ||
|
|
4c8106311f | ||
|
|
b8b8a138df | ||
|
|
43f900fd38 | ||
|
|
b7c409152a | ||
|
|
b015647e31 | ||
|
|
b6a7f40ad3 | ||
|
|
1ff5b67025 | ||
|
|
275b926cf7 | ||
|
|
9800c6051c | ||
|
|
77e6bbe6f0 | ||
|
|
2be3515a66 | ||
|
|
fcf98dc4c1 | ||
|
|
bae93682f6 | ||
|
|
b065da6933 | ||
|
|
87d81b6acc | ||
|
|
210296a71f | ||
|
|
ad7d97670b | ||
|
|
7d4843fe84 | ||
|
|
6d88b23ef7 | ||
|
|
663b0933e4 | ||
|
|
1e40427755 | ||
|
|
85e1c9b348 | ||
|
|
45bb414be2 | ||
|
|
6325a3517c | ||
|
|
82f3e32d8d | ||
|
|
af6d333147 | ||
|
|
3874bb256e | ||
|
|
574698a5fb | ||
|
|
854f3fe9b1 | ||
|
|
051fac1e66 | ||
|
|
5db4dba526 | ||
|
|
9124221d31 | ||
|
|
c087ce74f7 | ||
|
|
ae7714f1ba | ||
|
|
fbc97a77ed | ||
|
|
120c52589b | ||
|
|
c7b687e944 | ||
|
|
aab2a7cd4b | ||
|
|
5f03cc3511 | ||
|
|
3dd0704e38 | ||
|
|
24c1654208 | ||
|
|
c17a80f11c | ||
|
|
a673a51efa | ||
|
|
224199083b | ||
|
|
af3f401015 | ||
|
|
98e1bbfbbd | ||
|
|
6f62e5461c | ||
|
|
b08f903755 | ||
|
|
f307ca094b | ||
|
|
488d2d5da9 | ||
|
|
a8bbfb2da3 | ||
|
|
92ef77da35 | ||
|
|
7f8ff2a317 | ||
|
|
c5e50c40c9 | ||
|
|
a08baa97c5 | ||
|
|
cdb93ab5ca | ||
|
|
8effd90be0 | ||
|
|
f11d845dee | ||
|
|
0e1d7a27c6 | ||
|
|
53722dcfdc | ||
|
|
1d4db1327a | ||
|
|
ee70d4a0cd | ||
|
|
9b215e761e | ||
|
|
2f848294cb | ||
|
|
d85c33a5c3 | ||
|
|
0d92a7f357 | ||
|
|
931e68692e | ||
|
|
be29a6287d | ||
|
|
adc96d60b6 | ||
|
|
93a84f6182 | ||
|
|
22525bad65 | ||
|
|
6e1000dc8d | ||
|
|
f3c9bf5e4b | ||
|
|
6cdd4b5edc | ||
|
|
50316f6477 | ||
|
|
603a0bea29 | ||
|
|
3f7213586e | ||
|
|
5f17c57174 | ||
|
|
ebcb144342 | ||
|
|
641fd74baa | ||
|
|
2667ddc686 | ||
|
|
74c28df363 | ||
|
|
5c3fe8b0d1 | ||
|
|
2babe3069f | ||
|
|
e811c5e8c6 | ||
|
|
8741e55e7c | ||
|
|
00c466627a | ||
|
|
cc0585af42 | ||
|
|
b96ac13f3d | ||
|
|
9cb2347453 | ||
|
|
c4d53f98dc | ||
|
|
2c2f0e15a6 | ||
|
|
0ea7224535 | ||
|
|
1f83b5f47e | ||
|
|
6674b33cf5 | ||
|
|
406a9dc11f | ||
|
|
9e067b8cc9 | ||
|
|
3c4338470e | ||
|
|
d2137eea9f | ||
|
|
9129318466 | ||
|
|
2e4047e5e7 | ||
|
|
1dd4236177 | ||
|
|
4a94f56258 | ||
|
|
5171c3bcca | ||
|
|
bd0c6381f5 | ||
|
|
28d2b213a4 | ||
|
|
dd648183fa | ||
|
|
5eec74d9a5 | ||
|
|
9d13dcd17c | ||
|
|
5debd5043e | ||
|
|
9b615022e2 | ||
|
|
92b4418c8c | ||
|
|
7d29bb2c02 | ||
|
|
21a353e9c2 | ||
|
|
d2cf0d16b3 | ||
|
|
04cddfba0d | ||
|
|
bcab894f4e | ||
|
|
490f4a9ff0 | ||
|
|
7ffc431b3a | ||
|
|
50a9fcccb0 | ||
|
|
a5fd8873b1 | ||
|
|
dfc3f83b0f | ||
|
|
c7f7788d0b | ||
|
|
8f8e8d701e | ||
|
|
560c4dfc98 | ||
|
|
f5bd88757e | ||
|
|
ea9c3cc9c9 | ||
|
|
5da9f9abcb | ||
|
|
2eb4a2ceea | ||
|
|
e7420789e4 | ||
|
|
26c86a197c | ||
|
|
1d649b127e | ||
|
|
362bc301df | ||
|
|
a1603fccfb | ||
|
|
4ba7396f96 | ||
|
|
633b673b85 | ||
|
|
4d697d3f24 | ||
|
|
612a74eb7e | ||
|
|
4789c99bc2 | ||
|
|
fb6e63dc36 | ||
|
|
c5edbea34a | ||
|
|
1ac347b4e3 | ||
|
|
705d2f5b92 | ||
|
|
ec033ae277 | ||
|
|
da5b0723d2 | ||
|
|
184ede4e48 | ||
|
|
7cdf97ba9b | ||
|
|
4d427b2397 | ||
|
|
2179d4eef8 | ||
|
|
df746ad821 | ||
|
|
c9a0f24646 | ||
|
|
34a2755a54 | ||
|
|
4e7d0c115b | ||
|
|
01dca1e438 | ||
|
|
1ac6deda89 | ||
|
|
4e180dc54e | ||
|
|
3ce4e46c8c | ||
|
|
b489466488 | ||
|
|
38ca5c84cb | ||
|
|
49b2b0e3c0 | ||
|
|
a2830e3056 |
66
.github/CONTRIBUTING.md
vendored
@@ -95,6 +95,14 @@ To run formatting for this project:
|
||||
make format
|
||||
```
|
||||
|
||||
Additionally, you can run the formatter only on the files that have been modified in your current branch as compared to the master branch using the format_diff command:
|
||||
|
||||
```bash
|
||||
make format_diff
|
||||
```
|
||||
|
||||
This is especially useful when you have made changes to a subset of the project and want to ensure your changes are properly formatted without affecting the rest of the codebase.
|
||||
|
||||
### Linting
|
||||
|
||||
Linting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/), [isort](https://pycqa.github.io/isort/), [flake8](https://flake8.pycqa.org/en/latest/), and [mypy](http://mypy-lang.org/).
|
||||
@@ -105,8 +113,42 @@ To run linting for this project:
|
||||
make lint
|
||||
```
|
||||
|
||||
In addition, you can run the linter only on the files that have been modified in your current branch as compared to the master branch using the lint_diff command:
|
||||
|
||||
```bash
|
||||
make lint_diff
|
||||
```
|
||||
|
||||
This can be very helpful when you've made changes to only certain parts of the project and want to ensure your changes meet the linting standards without having to check the entire codebase.
|
||||
|
||||
We recognize linting can be annoying - if you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
|
||||
|
||||
### Spellcheck
|
||||
|
||||
Spellchecking for this project is done via [codespell](https://github.com/codespell-project/codespell).
|
||||
Note that `codespell` finds common typos, so could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
|
||||
|
||||
To check spelling for this project:
|
||||
|
||||
```bash
|
||||
make spell_check
|
||||
```
|
||||
|
||||
To fix spelling in place:
|
||||
|
||||
```bash
|
||||
make spell_fix
|
||||
```
|
||||
|
||||
If codespell is incorrectly flagging a word, you can skip spellcheck for that word by adding it to the codespell config in the `pyproject.toml` file.
|
||||
|
||||
```python
|
||||
[tool.codespell]
|
||||
...
|
||||
# Add here:
|
||||
ignore-words-list = 'momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogyny,unsecure'
|
||||
```
|
||||
|
||||
### Coverage
|
||||
|
||||
Code coverage (i.e. the amount of code that is covered by unit tests) helps identify areas of the code that are potentially more or less brittle.
|
||||
@@ -208,30 +250,38 @@ When you run `poetry install`, the `langchain` package is installed as editable
|
||||
|
||||
### Contribute Documentation
|
||||
|
||||
Docs are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
|
||||
The docs directory contains Documentation and API Reference.
|
||||
|
||||
Documentation is built using [Docusaurus 2](https://docusaurus.io/).
|
||||
|
||||
API Reference are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
|
||||
For that reason, we ask that you add good documentation to all classes and methods.
|
||||
|
||||
Similar to linting, we recognize documentation can be annoying. If you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
|
||||
|
||||
### Build Documentation Locally
|
||||
|
||||
In the following commands, the prefix `api_` indicates that those are operations for the API Reference.
|
||||
|
||||
Before building the documentation, it is always a good idea to clean the build directory:
|
||||
|
||||
```bash
|
||||
make docs_clean
|
||||
make api_docs_clean
|
||||
```
|
||||
|
||||
Next, you can run the linkchecker to make sure all links are valid:
|
||||
|
||||
```bash
|
||||
make docs_linkcheck
|
||||
```
|
||||
|
||||
Finally, you can build the documentation as outlined below:
|
||||
Next, you can build the documentation as outlined below:
|
||||
|
||||
```bash
|
||||
make docs_build
|
||||
make api_docs_build
|
||||
```
|
||||
|
||||
Finally, you can run the linkchecker to make sure all links are valid:
|
||||
|
||||
```bash
|
||||
make docs_linkcheck
|
||||
make api_docs_linkcheck
|
||||
```
|
||||
|
||||
## 🏭 Release Process
|
||||
|
||||
22
.github/workflows/codespell.yml
vendored
Normal file
@@ -0,0 +1,22 @@
|
||||
---
|
||||
name: Codespell
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
pull_request:
|
||||
branches: [master]
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
codespell:
|
||||
name: Check for spelling errors
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v3
|
||||
- name: Codespell
|
||||
uses: codespell-project/actions-codespell@v2
|
||||
5
.gitignore
vendored
@@ -161,7 +161,12 @@ docs/node_modules/
|
||||
docs/.docusaurus/
|
||||
docs/.cache-loader/
|
||||
docs/_dist
|
||||
docs/api_reference/api_reference.rst
|
||||
docs/api_reference/_build
|
||||
docs/api_reference/*/
|
||||
!docs/api_reference/_static/
|
||||
!docs/api_reference/templates/
|
||||
!docs/api_reference/themes/
|
||||
docs/docs_skeleton/build
|
||||
docs/docs_skeleton/node_modules
|
||||
docs/docs_skeleton/yarn.lock
|
||||
|
||||
69
Makefile
@@ -1,40 +1,47 @@
|
||||
.PHONY: all clean format lint test tests test_watch integration_tests docker_tests help extended_tests
|
||||
.PHONY: all clean docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck format lint test tests test_watch integration_tests docker_tests help extended_tests
|
||||
|
||||
# Default target executed when no arguments are given to make.
|
||||
all: help
|
||||
|
||||
######################
|
||||
# TESTING AND COVERAGE
|
||||
######################
|
||||
|
||||
# Run unit tests and generate a coverage report.
|
||||
coverage:
|
||||
poetry run pytest --cov \
|
||||
--cov-config=.coveragerc \
|
||||
--cov-report xml \
|
||||
--cov-report term-missing:skip-covered
|
||||
|
||||
clean: docs_clean
|
||||
######################
|
||||
# DOCUMENTATION
|
||||
######################
|
||||
|
||||
clean: docs_clean api_docs_clean
|
||||
|
||||
docs_compile:
|
||||
poetry run nbdoc_build --srcdir $(srcdir)
|
||||
|
||||
docs_build:
|
||||
cd docs && poetry run make html
|
||||
docs/.local_build.sh
|
||||
|
||||
docs_clean:
|
||||
cd docs && poetry run make clean
|
||||
rm -r docs/_dist
|
||||
|
||||
docs_linkcheck:
|
||||
poetry run linkchecker docs/_build/html/index.html
|
||||
poetry run linkchecker docs/_dist/docs_skeleton/ --ignore-url node_modules
|
||||
|
||||
format:
|
||||
poetry run black .
|
||||
poetry run ruff --select I --fix .
|
||||
api_docs_build:
|
||||
poetry run python docs/api_reference/create_api_rst.py
|
||||
cd docs/api_reference && poetry run make html
|
||||
|
||||
PYTHON_FILES=.
|
||||
lint: PYTHON_FILES=.
|
||||
lint_diff: PYTHON_FILES=$(shell git diff --name-only --diff-filter=d master | grep -E '\.py$$')
|
||||
api_docs_clean:
|
||||
rm -f docs/api_reference/api_reference.rst
|
||||
cd docs/api_reference && poetry run make clean
|
||||
|
||||
lint lint_diff:
|
||||
poetry run mypy $(PYTHON_FILES)
|
||||
poetry run black $(PYTHON_FILES) --check
|
||||
poetry run ruff .
|
||||
api_docs_linkcheck:
|
||||
poetry run linkchecker docs/api_reference/_build/html/index.html
|
||||
|
||||
# Define a variable for the test file path.
|
||||
TEST_FILE ?= tests/unit_tests/
|
||||
|
||||
test:
|
||||
@@ -56,6 +63,34 @@ docker_tests:
|
||||
docker build -t my-langchain-image:test .
|
||||
docker run --rm my-langchain-image:test
|
||||
|
||||
######################
|
||||
# LINTING AND FORMATTING
|
||||
######################
|
||||
|
||||
# Define a variable for Python and notebook files.
|
||||
PYTHON_FILES=.
|
||||
lint format: PYTHON_FILES=.
|
||||
lint_diff format_diff: PYTHON_FILES=$(shell git diff --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$')
|
||||
|
||||
lint lint_diff:
|
||||
poetry run mypy $(PYTHON_FILES)
|
||||
poetry run black $(PYTHON_FILES) --check
|
||||
poetry run ruff .
|
||||
|
||||
format format_diff:
|
||||
poetry run black $(PYTHON_FILES)
|
||||
poetry run ruff --select I --fix $(PYTHON_FILES)
|
||||
|
||||
spell_check:
|
||||
poetry run codespell --toml pyproject.toml
|
||||
|
||||
spell_fix:
|
||||
poetry run codespell --toml pyproject.toml -w
|
||||
|
||||
######################
|
||||
# HELP
|
||||
######################
|
||||
|
||||
help:
|
||||
@echo '----'
|
||||
@echo 'coverage - run unit tests and generate coverage report'
|
||||
|
||||
@@ -25,7 +25,7 @@ Please fill out [this form](https://forms.gle/57d8AmXBYp8PP8tZA) and we'll set u
|
||||
|
||||
`pip install langchain`
|
||||
or
|
||||
`conda install langchain -c conda-forge`
|
||||
`pip install langsmith && conda install langchain -c conda-forge`
|
||||
|
||||
## 🤔 What is this?
|
||||
|
||||
|
||||
@@ -1,10 +1,15 @@
|
||||
mkdir _dist
|
||||
#!/usr/bin/env bash
|
||||
|
||||
set -o errexit
|
||||
set -o nounset
|
||||
set -o pipefail
|
||||
set -o xtrace
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")"; pwd)"
|
||||
cd "${SCRIPT_DIR}"
|
||||
|
||||
mkdir -p _dist/docs_skeleton
|
||||
cp -r {docs_skeleton,snippets} _dist
|
||||
mkdir -p _dist/docs_skeleton/static/api_reference
|
||||
cd api_reference
|
||||
poetry run make html
|
||||
cp -r _build/* ../_dist/docs_skeleton/static/api_reference
|
||||
cd ..
|
||||
cp -r extras/* _dist/docs_skeleton/docs
|
||||
cd _dist/docs_skeleton
|
||||
poetry run nbdoc_build
|
||||
|
||||
@@ -20,7 +20,9 @@ def load_members() -> dict:
|
||||
cls = re.findall(r"^class ([^_].*)\(", line)
|
||||
members[top_level]["classes"].extend([module + "." + c for c in cls])
|
||||
func = re.findall(r"^def ([^_].*)\(", line)
|
||||
members[top_level]["functions"].extend([module + "." + f for f in func])
|
||||
afunc = re.findall(r"^async def ([^_].*)\(", line)
|
||||
func_strings = [module + "." + f for f in func + afunc]
|
||||
members[top_level]["functions"].extend(func_strings)
|
||||
return members
|
||||
|
||||
|
||||
|
||||
|
Before Width: | Height: | Size: 157 KiB After Width: | Height: | Size: 157 KiB |
@@ -3,6 +3,8 @@ sidebar_position: 0
|
||||
---
|
||||
# Integrations
|
||||
|
||||
Visit the [Integrations Hub](https://integrations.langchain.com) to further explore, upvote and request integrations across key LangChain components.
|
||||
|
||||
import DocCardList from "@theme/DocCardList";
|
||||
|
||||
<DocCardList />
|
||||
|
||||
12
docs/docs_skeleton/docs/guides/langsmith/index.md
Normal file
@@ -0,0 +1,12 @@
|
||||
# LangSmith
|
||||
|
||||
import DocCardList from "@theme/DocCardList";
|
||||
|
||||
LangSmith helps you trace and evaluate your language model applications and intelligent agents to help you
|
||||
move from prototype to production.
|
||||
|
||||
Check out the [interactive walkthrough](walkthrough) below to get started.
|
||||
|
||||
For more information, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)
|
||||
|
||||
<DocCardList />
|
||||
@@ -24,7 +24,7 @@ That means there are two different axes along which you can customize your text
|
||||
1. How the text is split
|
||||
2. How the chunk size is measured
|
||||
|
||||
## Get started with text splitters
|
||||
### Get started with text splitters
|
||||
|
||||
import GetStarted from "@snippets/modules/data_connection/document_transformers/get_started.mdx"
|
||||
|
||||
|
||||
@@ -1 +1,2 @@
|
||||
label: 'Text splitters'
|
||||
position: 0
|
||||
|
||||
@@ -8,7 +8,7 @@ Many LLM applications require user-specific data that is not part of the model's
|
||||
building blocks to load, transform, store and query your data via:
|
||||
|
||||
- [Document loaders](/docs/modules/data_connection/document_loaders/): Load documents from many different sources
|
||||
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, drop redundant documents, and more
|
||||
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, convert documents into Q&A format, drop redundant documents, and more
|
||||
- [Text embedding models](/docs/modules/data_connection/text_embedding/): Take unstructured text and turn it into a list of floating point numbers
|
||||
- [Vector stores](/docs/modules/data_connection/vectorstores/): Store and search over embedded data
|
||||
- [Retrievers](/docs/modules/data_connection/retrievers/): Query your data
|
||||
|
||||
@@ -15,3 +15,11 @@ This walkthrough showcases basic functionality related to VectorStores. A key pa
|
||||
import GetStarted from "@snippets/modules/data_connection/vectorstores/get_started.mdx"
|
||||
|
||||
<GetStarted/>
|
||||
|
||||
## Asynchronous operations
|
||||
|
||||
Vector stores are usually run as a separate service that requires some IO operations, and therefore they might be called asynchronously. That gives performance benefits as you don't waste time waiting for responses from external services. That might also be important if you work with an asynchronous framework, such as [FastAPI](https://fastapi.tiangolo.com/).
|
||||
|
||||
import AsyncVectorStore from "@snippets/modules/data_connection/vectorstores/async.mdx"
|
||||
|
||||
<AsyncVectorStore/>
|
||||
8
docs/docs_skeleton/package-lock.json
generated
@@ -12,7 +12,7 @@
|
||||
"@docusaurus/preset-classic": "2.4.0",
|
||||
"@docusaurus/remark-plugin-npm2yarn": "^2.4.0",
|
||||
"@mdx-js/react": "^1.6.22",
|
||||
"@mendable/search": "^0.0.112-beta.7",
|
||||
"@mendable/search": "^0.0.122",
|
||||
"clsx": "^1.2.1",
|
||||
"json-loader": "^0.5.7",
|
||||
"process": "^0.11.10",
|
||||
@@ -3206,9 +3206,9 @@
|
||||
}
|
||||
},
|
||||
"node_modules/@mendable/search": {
|
||||
"version": "0.0.112-beta.7",
|
||||
"resolved": "https://registry.npmjs.org/@mendable/search/-/search-0.0.112-beta.7.tgz",
|
||||
"integrity": "sha512-OdkwNprtDhwjnlc/78+6cUtDormgHmfT5bE0/FAFKAfN4bZGqea9aQwvLA/TlcuUvXilkiDPufDPDTkmShkd4g==",
|
||||
"version": "0.0.122",
|
||||
"resolved": "https://registry.npmjs.org/@mendable/search/-/search-0.0.122.tgz",
|
||||
"integrity": "sha512-cfbrA7XJmgY7zZyId+GDA+PQj89yjJjHGbSZiwMk6pmKO5bZ16pV16j7Y795uMUWGfQ0/g/T3beJhb02USVHCg==",
|
||||
"dependencies": {
|
||||
"posthog-js": "^1.45.1"
|
||||
},
|
||||
|
||||
@@ -23,7 +23,7 @@
|
||||
"@docusaurus/preset-classic": "2.4.0",
|
||||
"@docusaurus/remark-plugin-npm2yarn": "^2.4.0",
|
||||
"@mdx-js/react": "^1.6.22",
|
||||
"@mendable/search": "^0.0.112-beta.7",
|
||||
"@mendable/search": "^0.0.122",
|
||||
"clsx": "^1.2.1",
|
||||
"json-loader": "^0.5.7",
|
||||
"process": "^0.11.10",
|
||||
|
||||
@@ -22,6 +22,7 @@ export default function SearchBarWrapper() {
|
||||
placeholder="Search..."
|
||||
dialogPlaceholder="How do I use a LLM Chain?"
|
||||
messageSettings={{ openSourcesInNewTab: false, prettySources: true }}
|
||||
isPinnable
|
||||
showSimpleSearch
|
||||
/>
|
||||
</div>
|
||||
|
||||
BIN
docs/docs_skeleton/static/img/cpal_diagram.png
Normal file
|
After Width: | Height: | Size: 116 KiB |
BIN
docs/docs_skeleton/static/img/qa_data_load.png
Normal file
|
After Width: | Height: | Size: 237 KiB |
BIN
docs/docs_skeleton/static/img/qa_flow.jpeg
Normal file
|
After Width: | Height: | Size: 173 KiB |
BIN
docs/docs_skeleton/static/img/qa_intro.png
Normal file
|
After Width: | Height: | Size: 164 KiB |
BIN
docs/docs_skeleton/static/img/summary_chains.png
Normal file
|
After Width: | Height: | Size: 118 KiB |
@@ -138,7 +138,11 @@
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/integrations/databerry.html",
|
||||
"destination": "/docs/ecosystem/integrations/databerry"
|
||||
"destination": "/docs/ecosystem/integrations/chaindesk"
|
||||
},
|
||||
{
|
||||
"source": "/docs/ecosystem/integrations/databerry",
|
||||
"destination": "/docs/ecosystem/integrations/chaindesk"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/integrations/databricks/databricks.html",
|
||||
@@ -1296,6 +1300,10 @@
|
||||
"source": "/en/latest/modules/indexes/text_splitters/examples/markdown_header_metadata.html",
|
||||
"destination": "/docs/modules/data_connection/document_transformers/text_splitters/markdown_header_metadata"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/modules/indexes/text_splitters.html",
|
||||
"destination": "/docs/modules/data_connection/document_transformers/"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/modules/indexes/retrievers/examples/chroma_self_query.html",
|
||||
"destination": "/docs/modules/data_connection/retrievers/how_to/self_query/chroma_self_query"
|
||||
@@ -1330,7 +1338,11 @@
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/modules/indexes/retrievers/examples/databerry.html",
|
||||
"destination": "/docs/modules/data_connection/retrievers/integrations/databerry"
|
||||
"destination": "/docs/modules/data_connection/retrievers/integrations/chaindesk"
|
||||
},
|
||||
{
|
||||
"source": "/docs/modules/data_connection/retrievers/integrations/databerry",
|
||||
"destination": "/docs/modules/data_connection/retrievers/integrations/chaindesk"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/modules/indexes/retrievers/examples/elastic_search_bm25.html",
|
||||
@@ -2125,4 +2137,4 @@
|
||||
"destination": "/docs/:path*"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2,188 +2,261 @@
|
||||
|
||||
Dependents stats for `hwchase17/langchain`
|
||||
|
||||
[](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=172&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=4980&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=17239&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=244&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=9697&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=19827&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
|
||||
[update: 2023-05-17; only dependent repositories with Stars > 100]
|
||||
|
||||
[update: 2023-07-07; only dependent repositories with Stars > 100]
|
||||
|
||||
|
||||
| Repository | Stars |
|
||||
| :-------- | -----: |
|
||||
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 35401 |
|
||||
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 32861 |
|
||||
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 32766 |
|
||||
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 29560 |
|
||||
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 22315 |
|
||||
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 17474 |
|
||||
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 16923 |
|
||||
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16112 |
|
||||
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 15407 |
|
||||
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14345 |
|
||||
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 10372 |
|
||||
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 9919 |
|
||||
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8177 |
|
||||
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 6807 |
|
||||
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 6087 |
|
||||
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5292 |
|
||||
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 4622 |
|
||||
|[nsarrazin/serge](https://github.com/nsarrazin/serge) | 4076 |
|
||||
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 3952 |
|
||||
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 3952 |
|
||||
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 3762 |
|
||||
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 3388 |
|
||||
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3243 |
|
||||
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3189 |
|
||||
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 3050 |
|
||||
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 2930 |
|
||||
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 2710 |
|
||||
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2545 |
|
||||
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2479 |
|
||||
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2399 |
|
||||
|[langgenius/dify](https://github.com/langgenius/dify) | 2344 |
|
||||
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2283 |
|
||||
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2266 |
|
||||
|[guangzhengli/ChatFiles](https://github.com/guangzhengli/ChatFiles) | 1903 |
|
||||
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 1884 |
|
||||
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 1860 |
|
||||
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1813 |
|
||||
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1571 |
|
||||
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1480 |
|
||||
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1464 |
|
||||
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1419 |
|
||||
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1410 |
|
||||
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1363 |
|
||||
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1344 |
|
||||
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 1330 |
|
||||
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1318 |
|
||||
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1286 |
|
||||
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1156 |
|
||||
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 1141 |
|
||||
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1106 |
|
||||
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1072 |
|
||||
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1064 |
|
||||
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1057 |
|
||||
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1003 |
|
||||
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1002 |
|
||||
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 957 |
|
||||
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 918 |
|
||||
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 886 |
|
||||
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 867 |
|
||||
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 850 |
|
||||
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 837 |
|
||||
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 826 |
|
||||
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 782 |
|
||||
|[hashintel/hash](https://github.com/hashintel/hash) | 778 |
|
||||
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 773 |
|
||||
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 738 |
|
||||
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 737 |
|
||||
|[ai-sidekick/sidekick](https://github.com/ai-sidekick/sidekick) | 717 |
|
||||
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 703 |
|
||||
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 689 |
|
||||
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 666 |
|
||||
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 608 |
|
||||
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 559 |
|
||||
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 544 |
|
||||
|[pieroit/cheshire-cat](https://github.com/pieroit/cheshire-cat) | 520 |
|
||||
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 514 |
|
||||
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 481 |
|
||||
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 462 |
|
||||
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 452 |
|
||||
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 439 |
|
||||
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 437 |
|
||||
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 433 |
|
||||
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 427 |
|
||||
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 425 |
|
||||
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 422 |
|
||||
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 421 |
|
||||
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 407 |
|
||||
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 395 |
|
||||
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 383 |
|
||||
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 374 |
|
||||
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 368 |
|
||||
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 358 |
|
||||
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 357 |
|
||||
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 354 |
|
||||
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 343 |
|
||||
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 334 |
|
||||
|[showlab/VLog](https://github.com/showlab/VLog) | 330 |
|
||||
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 324 |
|
||||
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 323 |
|
||||
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 320 |
|
||||
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 308 |
|
||||
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 301 |
|
||||
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 300 |
|
||||
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 299 |
|
||||
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 287 |
|
||||
|[itamargol/openai](https://github.com/itamargol/openai) | 273 |
|
||||
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 267 |
|
||||
|[momegas/megabots](https://github.com/momegas/megabots) | 259 |
|
||||
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 238 |
|
||||
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 232 |
|
||||
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 227 |
|
||||
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 227 |
|
||||
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 226 |
|
||||
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 218 |
|
||||
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 218 |
|
||||
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 215 |
|
||||
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 213 |
|
||||
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 209 |
|
||||
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 208 |
|
||||
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 197 |
|
||||
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 195 |
|
||||
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 195 |
|
||||
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 192 |
|
||||
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 189 |
|
||||
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 187 |
|
||||
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 184 |
|
||||
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 183 |
|
||||
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 180 |
|
||||
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 166 |
|
||||
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 166 |
|
||||
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 161 |
|
||||
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 160 |
|
||||
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 153 |
|
||||
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 153 |
|
||||
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 152 |
|
||||
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 149 |
|
||||
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 149 |
|
||||
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 147 |
|
||||
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 144 |
|
||||
|[homanp/superagent](https://github.com/homanp/superagent) | 143 |
|
||||
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 141 |
|
||||
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 141 |
|
||||
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 139 |
|
||||
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 138 |
|
||||
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 136 |
|
||||
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 135 |
|
||||
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 134 |
|
||||
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 130 |
|
||||
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 130 |
|
||||
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 128 |
|
||||
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 128 |
|
||||
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 127 |
|
||||
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 127 |
|
||||
|[yasyf/summ](https://github.com/yasyf/summ) | 127 |
|
||||
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 126 |
|
||||
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 125 |
|
||||
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 124 |
|
||||
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 124 |
|
||||
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 124 |
|
||||
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 123 |
|
||||
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 123 |
|
||||
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 123 |
|
||||
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 115 |
|
||||
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 113 |
|
||||
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 113 |
|
||||
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 112 |
|
||||
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 111 |
|
||||
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 109 |
|
||||
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 108 |
|
||||
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 104 |
|
||||
|[enhancedocs/enhancedocs](https://github.com/enhancedocs/enhancedocs) | 102 |
|
||||
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 101 |
|
||||
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 41047 |
|
||||
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 33983 |
|
||||
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 33375 |
|
||||
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 31114 |
|
||||
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 30369 |
|
||||
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 24116 |
|
||||
|[OpenBB-finance/OpenBBTerminal](https://github.com/OpenBB-finance/OpenBBTerminal) | 22565 |
|
||||
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 18375 |
|
||||
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 17723 |
|
||||
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16958 |
|
||||
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14632 |
|
||||
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 11273 |
|
||||
|[openai/evals](https://github.com/openai/evals) | 10745 |
|
||||
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 10298 |
|
||||
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 9838 |
|
||||
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 9247 |
|
||||
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8768 |
|
||||
|[PromtEngineer/localGPT](https://github.com/PromtEngineer/localGPT) | 8651 |
|
||||
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 8119 |
|
||||
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 7418 |
|
||||
|[gventuri/pandas-ai](https://github.com/gventuri/pandas-ai) | 7301 |
|
||||
|[PipedreamHQ/pipedream](https://github.com/PipedreamHQ/pipedream) | 6636 |
|
||||
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5849 |
|
||||
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 5129 |
|
||||
|[langgenius/dify](https://github.com/langgenius/dify) | 4804 |
|
||||
|[serge-chat/serge](https://github.com/serge-chat/serge) | 4448 |
|
||||
|[csunny/DB-GPT](https://github.com/csunny/DB-GPT) | 4350 |
|
||||
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 4268 |
|
||||
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 4244 |
|
||||
|[intitni/CopilotForXcode](https://github.com/intitni/CopilotForXcode) | 4232 |
|
||||
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 4154 |
|
||||
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 4080 |
|
||||
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3949 |
|
||||
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 3920 |
|
||||
|[bentoml/OpenLLM](https://github.com/bentoml/OpenLLM) | 3481 |
|
||||
|[MineDojo/Voyager](https://github.com/MineDojo/Voyager) | 3453 |
|
||||
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3355 |
|
||||
|[postgresml/postgresml](https://github.com/postgresml/postgresml) | 3328 |
|
||||
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 3100 |
|
||||
|[kyegomez/tree-of-thoughts](https://github.com/kyegomez/tree-of-thoughts) | 3049 |
|
||||
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2844 |
|
||||
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2833 |
|
||||
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 2809 |
|
||||
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2809 |
|
||||
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2664 |
|
||||
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 2650 |
|
||||
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 2525 |
|
||||
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2372 |
|
||||
|[ParisNeo/lollms-webui](https://github.com/ParisNeo/lollms-webui) | 2287 |
|
||||
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 2265 |
|
||||
|[SamurAIGPT/privateGPT](https://github.com/SamurAIGPT/privateGPT) | 2084 |
|
||||
|[Chainlit/chainlit](https://github.com/Chainlit/chainlit) | 1912 |
|
||||
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1869 |
|
||||
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1864 |
|
||||
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1849 |
|
||||
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1766 |
|
||||
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1745 |
|
||||
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1732 |
|
||||
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1716 |
|
||||
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1619 |
|
||||
|[pinterest/querybook](https://github.com/pinterest/querybook) | 1468 |
|
||||
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1446 |
|
||||
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 1430 |
|
||||
|[Mintplex-Labs/anything-llm](https://github.com/Mintplex-Labs/anything-llm) | 1419 |
|
||||
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1416 |
|
||||
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1327 |
|
||||
|[psychic-api/psychic](https://github.com/psychic-api/psychic) | 1307 |
|
||||
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1242 |
|
||||
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1239 |
|
||||
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1203 |
|
||||
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1179 |
|
||||
|[keephq/keep](https://github.com/keephq/keep) | 1169 |
|
||||
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1156 |
|
||||
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 1090 |
|
||||
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 1088 |
|
||||
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 1074 |
|
||||
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1057 |
|
||||
|[noahshinn024/reflexion](https://github.com/noahshinn024/reflexion) | 1045 |
|
||||
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 1036 |
|
||||
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 999 |
|
||||
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 989 |
|
||||
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 974 |
|
||||
|[homanp/superagent](https://github.com/homanp/superagent) | 970 |
|
||||
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 941 |
|
||||
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 896 |
|
||||
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 856 |
|
||||
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 840 |
|
||||
|[chatarena/chatarena](https://github.com/chatarena/chatarena) | 829 |
|
||||
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 816 |
|
||||
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 816 |
|
||||
|[hashintel/hash](https://github.com/hashintel/hash) | 806 |
|
||||
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 790 |
|
||||
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 752 |
|
||||
|[cheshire-cat-ai/core](https://github.com/cheshire-cat-ai/core) | 713 |
|
||||
|[e-johnstonn/BriefGPT](https://github.com/e-johnstonn/BriefGPT) | 686 |
|
||||
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 685 |
|
||||
|[refuel-ai/autolabel](https://github.com/refuel-ai/autolabel) | 673 |
|
||||
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 617 |
|
||||
|[billxbf/ReWOO](https://github.com/billxbf/ReWOO) | 616 |
|
||||
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 609 |
|
||||
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 592 |
|
||||
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 581 |
|
||||
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 574 |
|
||||
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 572 |
|
||||
|[kreneskyp/ix](https://github.com/kreneskyp/ix) | 564 |
|
||||
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 540 |
|
||||
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 540 |
|
||||
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 537 |
|
||||
|[khoj-ai/khoj](https://github.com/khoj-ai/khoj) | 531 |
|
||||
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 528 |
|
||||
|[microsoft/PodcastCopilot](https://github.com/microsoft/PodcastCopilot) | 526 |
|
||||
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 515 |
|
||||
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 494 |
|
||||
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 483 |
|
||||
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 472 |
|
||||
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 465 |
|
||||
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 464 |
|
||||
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 464 |
|
||||
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 455 |
|
||||
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 455 |
|
||||
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 450 |
|
||||
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 446 |
|
||||
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 445 |
|
||||
|[plastic-labs/tutor-gpt](https://github.com/plastic-labs/tutor-gpt) | 426 |
|
||||
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 426 |
|
||||
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 418 |
|
||||
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 416 |
|
||||
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 401 |
|
||||
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 400 |
|
||||
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 386 |
|
||||
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 382 |
|
||||
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 368 |
|
||||
|[showlab/VLog](https://github.com/showlab/VLog) | 363 |
|
||||
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 363 |
|
||||
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 361 |
|
||||
|[opentensor/bittensor](https://github.com/opentensor/bittensor) | 360 |
|
||||
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 355 |
|
||||
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 351 |
|
||||
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 348 |
|
||||
|[alejandro-ao/ask-multiple-pdfs](https://github.com/alejandro-ao/ask-multiple-pdfs) | 321 |
|
||||
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 314 |
|
||||
|[mosaicml/examples](https://github.com/mosaicml/examples) | 313 |
|
||||
|[personoids/personoids-lite](https://github.com/personoids/personoids-lite) | 306 |
|
||||
|[itamargol/openai](https://github.com/itamargol/openai) | 304 |
|
||||
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 299 |
|
||||
|[momegas/megabots](https://github.com/momegas/megabots) | 299 |
|
||||
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 289 |
|
||||
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 283 |
|
||||
|[wandb/weave](https://github.com/wandb/weave) | 279 |
|
||||
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 273 |
|
||||
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 271 |
|
||||
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 270 |
|
||||
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 269 |
|
||||
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 259 |
|
||||
|[Azure-Samples/openai](https://github.com/Azure-Samples/openai) | 252 |
|
||||
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 248 |
|
||||
|[hnawaz007/pythondataanalysis](https://github.com/hnawaz007/pythondataanalysis) | 247 |
|
||||
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 243 |
|
||||
|[truera/trulens](https://github.com/truera/trulens) | 239 |
|
||||
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 238 |
|
||||
|[intel/intel-extension-for-transformers](https://github.com/intel/intel-extension-for-transformers) | 237 |
|
||||
|[monarch-initiative/ontogpt](https://github.com/monarch-initiative/ontogpt) | 236 |
|
||||
|[wandb/edu](https://github.com/wandb/edu) | 231 |
|
||||
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 229 |
|
||||
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 223 |
|
||||
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 221 |
|
||||
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 220 |
|
||||
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 219 |
|
||||
|[Safiullah-Rahu/CSV-AI](https://github.com/Safiullah-Rahu/CSV-AI) | 215 |
|
||||
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 215 |
|
||||
|[steamship-packages/langchain-agent-production-starter](https://github.com/steamship-packages/langchain-agent-production-starter) | 214 |
|
||||
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 213 |
|
||||
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 211 |
|
||||
|[marella/chatdocs](https://github.com/marella/chatdocs) | 207 |
|
||||
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 200 |
|
||||
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 195 |
|
||||
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 189 |
|
||||
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 186 |
|
||||
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 185 |
|
||||
|[huchenxucs/ChatDB](https://github.com/huchenxucs/ChatDB) | 179 |
|
||||
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 178 |
|
||||
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 178 |
|
||||
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 177 |
|
||||
|[jiran214/GPT-vup](https://github.com/jiran214/GPT-vup) | 176 |
|
||||
|[rsaryev/talk-codebase](https://github.com/rsaryev/talk-codebase) | 174 |
|
||||
|[edreisMD/plugnplai](https://github.com/edreisMD/plugnplai) | 174 |
|
||||
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 172 |
|
||||
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 171 |
|
||||
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 165 |
|
||||
|[gustavz/DataChad](https://github.com/gustavz/DataChad) | 164 |
|
||||
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 163 |
|
||||
|[SamPink/dev-gpt](https://github.com/SamPink/dev-gpt) | 161 |
|
||||
|[yuanjie-ai/ChatLLM](https://github.com/yuanjie-ai/ChatLLM) | 161 |
|
||||
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 160 |
|
||||
|[jondurbin/airoboros](https://github.com/jondurbin/airoboros) | 157 |
|
||||
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 157 |
|
||||
|[PradipNichite/Youtube-Tutorials](https://github.com/PradipNichite/Youtube-Tutorials) | 156 |
|
||||
|[nicknochnack/LangchainDocuments](https://github.com/nicknochnack/LangchainDocuments) | 155 |
|
||||
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 155 |
|
||||
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 154 |
|
||||
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 153 |
|
||||
|[preset-io/promptimize](https://github.com/preset-io/promptimize) | 150 |
|
||||
|[onlyphantom/llm-python](https://github.com/onlyphantom/llm-python) | 148 |
|
||||
|[Azure-Samples/azure-search-power-skills](https://github.com/Azure-Samples/azure-search-power-skills) | 146 |
|
||||
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 144 |
|
||||
|[microsoft/azure-openai-in-a-day-workshop](https://github.com/microsoft/azure-openai-in-a-day-workshop) | 144 |
|
||||
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 143 |
|
||||
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 142 |
|
||||
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 141 |
|
||||
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 140 |
|
||||
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 140 |
|
||||
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 139 |
|
||||
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 139 |
|
||||
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 138 |
|
||||
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 137 |
|
||||
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 137 |
|
||||
|[deeppavlov/dream](https://github.com/deeppavlov/dream) | 136 |
|
||||
|[miaoshouai/miaoshouai-assistant](https://github.com/miaoshouai/miaoshouai-assistant) | 135 |
|
||||
|[sugarforever/LangChain-Tutorials](https://github.com/sugarforever/LangChain-Tutorials) | 135 |
|
||||
|[yasyf/summ](https://github.com/yasyf/summ) | 135 |
|
||||
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 134 |
|
||||
|[vaibkumr/prompt-optimizer](https://github.com/vaibkumr/prompt-optimizer) | 132 |
|
||||
|[ju-bezdek/langchain-decorators](https://github.com/ju-bezdek/langchain-decorators) | 130 |
|
||||
|[homanp/vercel-langchain](https://github.com/homanp/vercel-langchain) | 128 |
|
||||
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 127 |
|
||||
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 125 |
|
||||
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 122 |
|
||||
|[fixie-ai/fixie-examples](https://github.com/fixie-ai/fixie-examples) | 122 |
|
||||
|[Aggregate-Intellect/practical-llms](https://github.com/Aggregate-Intellect/practical-llms) | 120 |
|
||||
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 120 |
|
||||
|[Azure-Samples/azure-search-openai-demo-csharp](https://github.com/Azure-Samples/azure-search-openai-demo-csharp) | 119 |
|
||||
|[prof-frink-lab/slangchain](https://github.com/prof-frink-lab/slangchain) | 117 |
|
||||
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 117 |
|
||||
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 116 |
|
||||
|[flurb18/AgentOoba](https://github.com/flurb18/AgentOoba) | 114 |
|
||||
|[kaarthik108/snowChat](https://github.com/kaarthik108/snowChat) | 112 |
|
||||
|[RedisVentures/redis-openai-qna](https://github.com/RedisVentures/redis-openai-qna) | 111 |
|
||||
|[solana-labs/chatgpt-plugin](https://github.com/solana-labs/chatgpt-plugin) | 111 |
|
||||
|[kulltc/chatgpt-sql](https://github.com/kulltc/chatgpt-sql) | 109 |
|
||||
|[summarizepaper/summarizepaper](https://github.com/summarizepaper/summarizepaper) | 109 |
|
||||
|[Azure-Samples/miyagi](https://github.com/Azure-Samples/miyagi) | 106 |
|
||||
|[ssheng/BentoChain](https://github.com/ssheng/BentoChain) | 106 |
|
||||
|[voxel51/voxelgpt](https://github.com/voxel51/voxelgpt) | 105 |
|
||||
|[mallahyari/drqa](https://github.com/mallahyari/drqa) | 103 |
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -120,7 +120,8 @@
|
||||
" history = []\n",
|
||||
" while True:\n",
|
||||
" user_input = input(\"\\n>>> input >>>\\n>>>: \")\n",
|
||||
" if user_input == 'q': break\n",
|
||||
" if user_input == \"q\":\n",
|
||||
" break\n",
|
||||
" history.append(HumanMessage(content=user_input))\n",
|
||||
" history.append(llm(history))"
|
||||
]
|
||||
|
||||
@@ -1,17 +1,17 @@
|
||||
# Databerry
|
||||
# Chaindesk
|
||||
|
||||
>[Databerry](https://databerry.ai) is an [open source](https://github.com/gmpetrov/databerry) document retrieval platform that helps to connect your personal data with Large Language Models.
|
||||
>[Chaindesk](https://chaindesk.ai) is an [open source](https://github.com/gmpetrov/databerry) document retrieval platform that helps to connect your personal data with Large Language Models.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
We need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url.
|
||||
We need the [API Key](https://docs.databerry.ai/api-reference/authentication).
|
||||
We need to sign up for Chaindesk, create a datastore, add some data and get your datastore api endpoint url.
|
||||
We need the [API Key](https://docs.chaindesk.ai/api-reference/authentication).
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](/docs/modules/data_connection/retrievers/integrations/databerry.html).
|
||||
See a [usage example](/docs/modules/data_connection/retrievers/integrations/chaindesk.html).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import DataberryRetriever
|
||||
from langchain.retrievers import ChaindeskRetriever
|
||||
```
|
||||
@@ -8,7 +8,7 @@ pip install cnos-connector
|
||||
```
|
||||
|
||||
## Connecting to CnosDB
|
||||
You can connect to CnosDB using the SQLDatabase.from_cnosdb() method.
|
||||
You can connect to CnosDB using the `SQLDatabase.from_cnosdb()` method.
|
||||
### Syntax
|
||||
```python
|
||||
def SQLDatabase.from_cnosdb(url: str = "127.0.0.1:8902",
|
||||
@@ -31,7 +31,6 @@ Args:
|
||||
## Examples
|
||||
```python
|
||||
# Connecting to CnosDB with SQLDatabase Wrapper
|
||||
from cnosdb_connector import make_cnosdb_langchain_uri
|
||||
from langchain import SQLDatabase
|
||||
|
||||
db = SQLDatabase.from_cnosdb()
|
||||
@@ -43,7 +42,7 @@ from langchain.chat_models import ChatOpenAI
|
||||
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
|
||||
```
|
||||
|
||||
### SQL Chain
|
||||
### SQL Database Chain
|
||||
This example demonstrates the use of the SQL Chain for answering a question over a CnosDB.
|
||||
```python
|
||||
from langchain import SQLDatabaseChain
|
||||
@@ -51,15 +50,15 @@ from langchain import SQLDatabaseChain
|
||||
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
|
||||
|
||||
db_chain.run(
|
||||
"What is the average fa of test table that time between November 3,2022 and November 4, 2022?"
|
||||
"What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?"
|
||||
)
|
||||
```
|
||||
```shell
|
||||
> Entering new chain...
|
||||
What is the average fa of test table that time between November 3, 2022 and November 4, 2022?
|
||||
SQLQuery:SELECT AVG(fa) FROM test WHERE time >= '2022-11-03' AND time < '2022-11-04'
|
||||
SQLResult: [(2.0,)]
|
||||
Answer:The average fa of the test table between November 3, 2022, and November 4, 2022, is 2.0.
|
||||
What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?
|
||||
SQLQuery:SELECT AVG(temperature) FROM air WHERE station = 'XiaoMaiDao' AND time >= '2022-10-19' AND time < '2022-10-20'
|
||||
SQLResult: [(68.0,)]
|
||||
Answer:The average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022 is 68.0.
|
||||
> Finished chain.
|
||||
```
|
||||
### SQL Database Agent
|
||||
@@ -73,36 +72,39 @@ agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)
|
||||
```
|
||||
```python
|
||||
agent.run(
|
||||
"What is the average fa of test table that time between November 3, 2022 and November 4, 2022?"
|
||||
"What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?"
|
||||
)
|
||||
```
|
||||
```shell
|
||||
> Entering new chain...
|
||||
Action: sql_db_list_tables
|
||||
Action Input: ""
|
||||
Observation: test
|
||||
Thought:The relevant table is "test". I should query the schema of this table to see the column names.
|
||||
Observation: air
|
||||
Thought:The "air" table seems relevant to the question. I should query the schema of the "air" table to see what columns are available.
|
||||
Action: sql_db_schema
|
||||
Action Input: "test"
|
||||
Action Input: "air"
|
||||
Observation:
|
||||
CREATE TABLE test (
|
||||
CREATE TABLE air (
|
||||
pressure FLOAT,
|
||||
station STRING,
|
||||
temperature FLOAT,
|
||||
time TIMESTAMP,
|
||||
fa BIGINT
|
||||
visibility FLOAT
|
||||
)
|
||||
|
||||
/*
|
||||
3 rows from test table:
|
||||
fa time
|
||||
1 2022-11-03T06:20:11
|
||||
2 2022-11-03T06:20:11.000000001
|
||||
3 2022-11-03T06:20:11.000000002
|
||||
3 rows from air table:
|
||||
pressure station temperature time visibility
|
||||
75.0 XiaoMaiDao 67.0 2022-10-19T03:40:00 54.0
|
||||
77.0 XiaoMaiDao 69.0 2022-10-19T04:40:00 56.0
|
||||
76.0 XiaoMaiDao 68.0 2022-10-19T05:40:00 55.0
|
||||
*/
|
||||
Thought:The relevant column is "fa" in the "test" table. I can now construct the query to calculate the average "fa" between the specified time range.
|
||||
Thought:The "temperature" column in the "air" table is relevant to the question. I can query the average temperature between the specified dates.
|
||||
Action: sql_db_query
|
||||
Action Input: "SELECT AVG(fa) FROM test WHERE time >= '2022-11-03' AND time < '2022-11-04'"
|
||||
Observation: [(2.0,)]
|
||||
Thought:The average "fa" of the "test" table between November 3, 2022 and November 4, 2022 is 2.0.
|
||||
Final Answer: 2.0
|
||||
Action Input: "SELECT AVG(temperature) FROM air WHERE station = 'XiaoMaiDao' AND time >= '2022-10-19' AND time <= '2022-10-20'"
|
||||
Observation: [(68.0,)]
|
||||
Thought:The average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022 is 68.0.
|
||||
Final Answer: 68.0
|
||||
|
||||
> Finished chain.
|
||||
```
|
||||
|
||||
19
docs/extras/ecosystem/integrations/datadog_logs.mdx
Normal file
@@ -0,0 +1,19 @@
|
||||
# Datadog Logs
|
||||
|
||||
>[Datadog](https://www.datadoghq.com/) is a monitoring and analytics platform for cloud-scale applications.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install datadog_api_client
|
||||
```
|
||||
|
||||
We must initialize the loader with the Datadog API key and APP key, and we need to set up the query to extract the desired logs.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](/docs/modules/data_connection/document_loaders/integrations/datadog_logs.html).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import DatadogLogsLoader
|
||||
```
|
||||
@@ -1,7 +1,7 @@
|
||||
# Grobid
|
||||
|
||||
This page covers how to use the Grobid to parse articles for LangChain.
|
||||
It is seperated into two parts: installation and running the server
|
||||
It is separated into two parts: installation and running the server
|
||||
|
||||
## Installation and Setup
|
||||
#Ensure You have Java installed
|
||||
|
||||
@@ -10,7 +10,7 @@ For Feedback, Issues, Contributions - please raise an issue here:
|
||||
Main principles and benefits:
|
||||
|
||||
- more `pythonic` way of writing code
|
||||
- write multiline prompts that wont break your code flow with indentation
|
||||
- write multiline prompts that won't break your code flow with indentation
|
||||
- making use of IDE in-built support for **hinting**, **type checking** and **popup with docs** to quickly peek in the function to see the prompt, parameters it consumes etc.
|
||||
- leverage all the power of 🦜🔗 LangChain ecosystem
|
||||
- adding support for **optional parameters**
|
||||
@@ -31,7 +31,7 @@ def write_me_short_post(topic:str, platform:str="twitter", audience:str = "devel
|
||||
"""
|
||||
return
|
||||
|
||||
# run it naturaly
|
||||
# run it naturally
|
||||
write_me_short_post(topic="starwars")
|
||||
# or
|
||||
write_me_short_post(topic="starwars", platform="redit")
|
||||
@@ -122,7 +122,7 @@ await write_me_short_post(topic="old movies")
|
||||
|
||||
# Simplified streaming
|
||||
|
||||
If we wan't to leverage streaming:
|
||||
If we want to leverage streaming:
|
||||
- we need to define prompt as async function
|
||||
- turn on the streaming on the decorator, or we can define PromptType with streaming on
|
||||
- capture the stream using StreamingContext
|
||||
@@ -149,7 +149,7 @@ async def write_me_short_post(topic:str, platform:str="twitter", audience:str =
|
||||
|
||||
|
||||
|
||||
# just an arbitrary function to demonstrate the streaming... wil be some websockets code in the real world
|
||||
# just an arbitrary function to demonstrate the streaming... will be some websockets code in the real world
|
||||
tokens=[]
|
||||
def capture_stream_func(new_token:str):
|
||||
tokens.append(new_token)
|
||||
@@ -250,7 +250,7 @@ the roles here are model native roles (assistant, user, system for chatGPT)
|
||||
|
||||
# Optional sections
|
||||
- you can define a whole sections of your prompt that should be optional
|
||||
- if any input in the section is missing, the whole section wont be rendered
|
||||
- if any input in the section is missing, the whole section won't be rendered
|
||||
|
||||
the syntax for this is as follows:
|
||||
|
||||
@@ -273,7 +273,7 @@ def prompt_with_optional_partials():
|
||||
# Output parsers
|
||||
|
||||
- llm_prompt decorator natively tries to detect the best output parser based on the output type. (if not set, it returns the raw string)
|
||||
- list, dict and pydantic outputs are also supported natively (automaticaly)
|
||||
- list, dict and pydantic outputs are also supported natively (automatically)
|
||||
|
||||
``` python
|
||||
# this code example is complete and should run as it is
|
||||
|
||||
@@ -28,4 +28,4 @@ To import this vectorstore:
|
||||
from langchain.vectorstores import Marqo
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the Marqo wrapper and some of its unique features, see [this notebook](../modules/data_connection/vectorstores/integrations/marqo.ipynb)
|
||||
For a more detailed walkthrough of the Marqo wrapper and some of its unique features, see [this notebook](/docs/modules/data_connection/vectorstores/integrations/marqo.html)
|
||||
|
||||
@@ -18,7 +18,7 @@ We also deliver with live demo on huggingface! Please checkout our [huggingface
|
||||
## Installation and Setup
|
||||
- Install the Python SDK with `pip install clickhouse-connect`
|
||||
|
||||
### Setting up envrionments
|
||||
### Setting up environments
|
||||
|
||||
There are two ways to set up parameters for myscale index.
|
||||
|
||||
|
||||
@@ -8,6 +8,36 @@ It is broken into two parts: installation and setup, and then references to spec
|
||||
|
||||
## Wrappers
|
||||
|
||||
All wrappers needing a redis url connection string to connect to the database support either a stand alone Redis server
|
||||
or a High-Availability setup with Replication and Redis Sentinels.
|
||||
|
||||
### Redis Standalone connection url
|
||||
For standalone Redis server the official redis connection url formats can be used as describe in the python redis modules
|
||||
"from_url()" method [Redis.from_url](https://redis-py.readthedocs.io/en/stable/connections.html#redis.Redis.from_url)
|
||||
|
||||
Example: `redis_url = "redis://:secret-pass@localhost:6379/0"`
|
||||
|
||||
### Redis Sentinel connection url
|
||||
|
||||
For [Redis sentinel setups](https://redis.io/docs/management/sentinel/) the connection scheme is "redis+sentinel".
|
||||
This is an un-offical extensions to the official IANA registered protocol schemes as long as there is no connection url
|
||||
for Sentinels available.
|
||||
|
||||
Example: `redis_url = "redis+sentinel://:secret-pass@sentinel-host:26379/mymaster/0"`
|
||||
|
||||
The format is `redis+sentinel://[[username]:[password]]@[host-or-ip]:[port]/[service-name]/[db-number]`
|
||||
with the default values of "service-name = mymaster" and "db-number = 0" if not set explicit.
|
||||
The service-name is the redis server monitoring group name as configured within the Sentinel.
|
||||
|
||||
The current url format limits the connection string to one sentinel host only (no list can be given) and
|
||||
booth Redis server and sentinel must have the same password set (if used).
|
||||
|
||||
### Redis Cluster connection url
|
||||
|
||||
Redis cluster is not supported right now for all methods requiring a "redis_url" parameter.
|
||||
The only way to use a Redis Cluster is with LangChain classes accepting a preconfigured Redis client like `RedisCache`
|
||||
(example below).
|
||||
|
||||
### Cache
|
||||
|
||||
The Cache wrapper allows for [Redis](https://redis.io) to be used as a remote, low-latency, in-memory cache for LLM prompts and responses.
|
||||
|
||||
@@ -17,3 +17,10 @@ See a [usage example](/docs/modules/data_connection/vectorstores/integrations/ro
|
||||
```python
|
||||
from langchain.vectorstores import RocksetDB
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](docs/modules/data_connection/document_loaders/integrations/rockset).
|
||||
```python
|
||||
from langchain.document_loaders import RocksetLoader
|
||||
```
|
||||
@@ -39,7 +39,7 @@ vectara = Vectara(
|
||||
```
|
||||
The customer_id, corpus_id and api_key are optional, and if they are not supplied will be read from the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`, respectively.
|
||||
|
||||
Afer you have the vectorstore, you can `add_texts` or `add_documents` as per the standard `VectorStore` interface, for example:
|
||||
After you have the vectorstore, you can `add_texts` or `add_documents` as per the standard `VectorStore` interface, for example:
|
||||
|
||||
```python
|
||||
vectara.add_texts(["to be or not to be", "that is the question"])
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -16,6 +17,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -28,10 +30,11 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install langkit -q"
|
||||
"%pip install langkit openai langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -54,6 +57,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -63,6 +67,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -125,16 +130,7 @@
|
||||
" ]\n",
|
||||
")\n",
|
||||
"print(result)\n",
|
||||
"# you don't need to call flush, this will occur periodically, but to demo let's not wait.\n",
|
||||
"whylabs.flush()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# you don't need to call close to write profiles to WhyLabs, upload will occur periodically, but to demo let's not wait.\n",
|
||||
"whylabs.close()"
|
||||
]
|
||||
}
|
||||
@@ -155,7 +151,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.8.10"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# YouTube
|
||||
|
||||
>[YouTube](https://www.youtube.com/) is an online video sharing and social media platform created by Google.
|
||||
>[YouTube](https://www.youtube.com/) is an online video sharing and social media platform by Google.
|
||||
> We download the `YouTube` transcripts and video information.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
"\n",
|
||||
"One automated way to predict the preferred configuration is to use a `PairwiseStringEvaluator` like the `PairwiseStringEvalChain`<a name=\"cite_ref-1\"></a>[<sup>[1]</sup>](#cite_note-1). This chain prompts an LLM to select which output is preferred, given a specific input.\n",
|
||||
"\n",
|
||||
"For this evalution, we will need 3 things:\n",
|
||||
"For this evaluation, we will need 3 things:\n",
|
||||
"1. An evaluator\n",
|
||||
"2. A dataset of inputs\n",
|
||||
"3. 2 (or more) LLMs, Chains, or Agents to compare\n",
|
||||
@@ -117,11 +117,11 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"# Initialize the language model\n",
|
||||
"# You can add your own OpenAI API key by adding openai_api_key=\"<your_api_key>\" \n",
|
||||
"# You can add your own OpenAI API key by adding openai_api_key=\"<your_api_key>\"\n",
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||
"\n",
|
||||
"# Initialize the SerpAPIWrapper for search functionality\n",
|
||||
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
|
||||
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
|
||||
"search = SerpAPIWrapper()\n",
|
||||
"\n",
|
||||
"# Define a list of tools offered by the agent\n",
|
||||
@@ -130,7 +130,7 @@
|
||||
" name=\"Search\",\n",
|
||||
" func=search.run,\n",
|
||||
" coroutine=search.arun,\n",
|
||||
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\"\n",
|
||||
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\",\n",
|
||||
" ),\n",
|
||||
"]"
|
||||
]
|
||||
@@ -143,8 +143,12 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"functions_agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False)\n",
|
||||
"conversations_agent = initialize_agent(tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=False)"
|
||||
"functions_agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False\n",
|
||||
")\n",
|
||||
"conversations_agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=False\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -193,20 +197,20 @@
|
||||
"\n",
|
||||
"results = []\n",
|
||||
"agents = [functions_agent, conversations_agent]\n",
|
||||
"concurrency_level = 6 # How many concurrent agents to run. May need to decrease if OpenAI is rate limiting.\n",
|
||||
"concurrency_level = 6 # How many concurrent agents to run. May need to decrease if OpenAI is rate limiting.\n",
|
||||
"\n",
|
||||
"# We will only run the first 20 examples of this dataset to speed things up\n",
|
||||
"# This will lead to larger confidence intervals downstream.\n",
|
||||
"batch = []\n",
|
||||
"for example in tqdm(dataset[:20]):\n",
|
||||
" batch.extend([agent.acall(example['inputs']) for agent in agents])\n",
|
||||
" batch.extend([agent.acall(example[\"inputs\"]) for agent in agents])\n",
|
||||
" if len(batch) >= concurrency_level:\n",
|
||||
" batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
|
||||
" results.extend(list(zip(*[iter(batch_results)]*2)))\n",
|
||||
" results.extend(list(zip(*[iter(batch_results)] * 2)))\n",
|
||||
" batch = []\n",
|
||||
"if batch:\n",
|
||||
" batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
|
||||
" results.extend(list(zip(*[iter(batch_results)]*2)))"
|
||||
" results.extend(list(zip(*[iter(batch_results)] * 2)))"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -230,11 +234,12 @@
|
||||
"source": [
|
||||
"import random\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def predict_preferences(dataset, results) -> list:\n",
|
||||
" preferences = []\n",
|
||||
"\n",
|
||||
" for example, (res_a, res_b) in zip(dataset, results):\n",
|
||||
" input_ = example['inputs']\n",
|
||||
" input_ = example[\"inputs\"]\n",
|
||||
" # Flip a coin to reduce persistent position bias\n",
|
||||
" if random.random() < 0.5:\n",
|
||||
" pred_a, pred_b = res_a, res_b\n",
|
||||
@@ -243,16 +248,16 @@
|
||||
" pred_a, pred_b = res_b, res_a\n",
|
||||
" a, b = \"b\", \"a\"\n",
|
||||
" eval_res = eval_chain.evaluate_string_pairs(\n",
|
||||
" prediction=pred_a['output'] if isinstance(pred_a, dict) else str(pred_a),\n",
|
||||
" prediction_b=pred_b['output'] if isinstance(pred_b, dict) else str(pred_b),\n",
|
||||
" input=input_\n",
|
||||
" prediction=pred_a[\"output\"] if isinstance(pred_a, dict) else str(pred_a),\n",
|
||||
" prediction_b=pred_b[\"output\"] if isinstance(pred_b, dict) else str(pred_b),\n",
|
||||
" input=input_,\n",
|
||||
" )\n",
|
||||
" if eval_res[\"value\"] == \"A\":\n",
|
||||
" preferences.append(a)\n",
|
||||
" elif eval_res[\"value\"] == \"B\":\n",
|
||||
" preferences.append(b)\n",
|
||||
" else:\n",
|
||||
" preferences.append(None) # No preference\n",
|
||||
" preferences.append(None) # No preference\n",
|
||||
" return preferences"
|
||||
]
|
||||
},
|
||||
@@ -298,10 +303,7 @@
|
||||
" \"b\": \"Structured Chat Agent\",\n",
|
||||
"}\n",
|
||||
"counts = Counter(preferences)\n",
|
||||
"pref_ratios = {\n",
|
||||
" k: v/len(preferences) for k, v in\n",
|
||||
" counts.items()\n",
|
||||
"}\n",
|
||||
"pref_ratios = {k: v / len(preferences) for k, v in counts.items()}\n",
|
||||
"for k, v in pref_ratios.items():\n",
|
||||
" print(f\"{name_map.get(k)}: {v:.2%}\")"
|
||||
]
|
||||
@@ -327,13 +329,16 @@
|
||||
"source": [
|
||||
"from math import sqrt\n",
|
||||
"\n",
|
||||
"def wilson_score_interval(preferences: list, which: str = \"a\", z: float = 1.96) -> tuple:\n",
|
||||
"\n",
|
||||
"def wilson_score_interval(\n",
|
||||
" preferences: list, which: str = \"a\", z: float = 1.96\n",
|
||||
") -> tuple:\n",
|
||||
" \"\"\"Estimate the confidence interval using the Wilson score.\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" See: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval\n",
|
||||
" for more details, including when to use it and when it should not be used.\n",
|
||||
" \"\"\"\n",
|
||||
" total_preferences = preferences.count('a') + preferences.count('b')\n",
|
||||
" total_preferences = preferences.count(\"a\") + preferences.count(\"b\")\n",
|
||||
" n_s = preferences.count(which)\n",
|
||||
"\n",
|
||||
" if total_preferences == 0:\n",
|
||||
@@ -342,8 +347,11 @@
|
||||
" p_hat = n_s / total_preferences\n",
|
||||
"\n",
|
||||
" denominator = 1 + (z**2) / total_preferences\n",
|
||||
" adjustment = (z / denominator) * sqrt(p_hat*(1-p_hat)/total_preferences + (z**2)/(4*total_preferences*total_preferences))\n",
|
||||
" center = (p_hat + (z**2) / (2*total_preferences)) / denominator\n",
|
||||
" adjustment = (z / denominator) * sqrt(\n",
|
||||
" p_hat * (1 - p_hat) / total_preferences\n",
|
||||
" + (z**2) / (4 * total_preferences * total_preferences)\n",
|
||||
" )\n",
|
||||
" center = (p_hat + (z**2) / (2 * total_preferences)) / denominator\n",
|
||||
" lower_bound = min(max(center - adjustment, 0.0), 1.0)\n",
|
||||
" upper_bound = min(max(center + adjustment, 0.0), 1.0)\n",
|
||||
"\n",
|
||||
@@ -369,7 +377,9 @@
|
||||
"source": [
|
||||
"for which_, name in name_map.items():\n",
|
||||
" low, high = wilson_score_interval(preferences, which=which_)\n",
|
||||
" print(f'The \"{name}\" would be preferred between {low:.2%} and {high:.2%} percent of the time (with 95% confidence).')"
|
||||
" print(\n",
|
||||
" f'The \"{name}\" would be preferred between {low:.2%} and {high:.2%} percent of the time (with 95% confidence).'\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -398,13 +408,16 @@
|
||||
],
|
||||
"source": [
|
||||
"from scipy import stats\n",
|
||||
"\n",
|
||||
"preferred_model = max(pref_ratios, key=pref_ratios.get)\n",
|
||||
"successes = preferences.count(preferred_model)\n",
|
||||
"n = len(preferences) - preferences.count(None)\n",
|
||||
"p_value = stats.binom_test(successes, n, p=0.5, alternative='two-sided')\n",
|
||||
"print(f\"\"\"The p-value is {p_value:.5f}. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
|
||||
"p_value = stats.binom_test(successes, n, p=0.5, alternative=\"two-sided\")\n",
|
||||
"print(\n",
|
||||
" f\"\"\"The p-value is {p_value:.5f}. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
|
||||
"then there is a {p_value:.5%} chance of observing the {name_map.get(preferred_model)} be preferred at least {successes}\n",
|
||||
"times out of {n} trials.\"\"\")"
|
||||
"times out of {n} trials.\"\"\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -12,7 +12,7 @@
|
||||
"The `CriteriaEvalChain` is a convenient way to predict whether an LLM or Chain's output complies with a set of criteria, so long as you can\n",
|
||||
"describe those criteria in regular language. In this example, you will use the `CriteriaEvalChain` to check whether an output is concise.\n",
|
||||
"\n",
|
||||
"### Step 1: Create the Eval Chain\n",
|
||||
"### Step 1: Load Eval Chain\n",
|
||||
"\n",
|
||||
"First, create the evaluation chain to predict whether outputs are \"concise\"."
|
||||
]
|
||||
@@ -27,11 +27,15 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.evaluation.criteria import CriteriaEvalChain\n",
|
||||
"from langchain.evaluation import load_evaluator, EvaluatorType\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"eval_llm = ChatOpenAI(model=\"gpt-4\", temperature=0)\n",
|
||||
"criterion = \"conciseness\"\n",
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=criterion)"
|
||||
"eval_chain = load_evaluator(EvaluatorType.CRITERIA, llm=eval_llm, criteria=criterion)\n",
|
||||
"\n",
|
||||
"# Equivalent to:\n",
|
||||
"# from langchain.evaluation import CriteriaEvalChain\n",
|
||||
"# CriteriaEvalChain.from_llm(llm=eval_llm, criteria=criterion)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -54,7 +58,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"query=\"What's the origin of the term synecdoche?\"\n",
|
||||
"query = \"What's the origin of the term synecdoche?\"\n",
|
||||
"prediction = llm.predict(query)"
|
||||
]
|
||||
},
|
||||
@@ -80,7 +84,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'reasoning': '1. Conciseness: The submission is concise and to the point. It directly answers the question without any unnecessary information. Therefore, the submission meets the criterion of conciseness.\\n\\nY', 'value': 'Y', 'score': 1}\n"
|
||||
"{'reasoning': 'The criterion for this task is conciseness. The submission should be concise and to the point.\\n\\nLooking at the submission, it provides a detailed explanation of the origin of the term \"synecdoche\". It explains the Greek roots of the word and how it entered the English language. \\n\\nWhile the explanation is detailed, it is also concise. It doesn\\'t include unnecessary information or go off on tangents. It sticks to the point, which is explaining the origin of the term.\\n\\nTherefore, the submission meets the criterion of conciseness.\\n\\nY', 'value': 'Y', 'score': 1}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -89,40 +93,6 @@
|
||||
"print(eval_result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "8c4ec9dd-6557-4f23-8480-c822eb6ec552",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['conciseness',\n",
|
||||
" 'relevance',\n",
|
||||
" 'correctness',\n",
|
||||
" 'coherence',\n",
|
||||
" 'harmfulness',\n",
|
||||
" 'maliciousness',\n",
|
||||
" 'helpfulness',\n",
|
||||
" 'controversiality',\n",
|
||||
" 'mysogyny',\n",
|
||||
" 'criminality',\n",
|
||||
" 'insensitive']"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# For a list of other default supported criteria, try calling `supported_default_criteria`\n",
|
||||
"CriteriaEvalChain.get_supported_default_criteria()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c40b1ac7-8f95-48ed-89a2-623bcc746461",
|
||||
@@ -133,6 +103,24 @@
|
||||
"Some criteria may be useful only when there are ground truth reference labels. You can pass these in as well."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "0c41cd19",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"eval_chain = load_evaluator(\n",
|
||||
" EvaluatorType.LABELED_CRITERIA,\n",
|
||||
" llm=eval_llm,\n",
|
||||
" criteria=\"correctness\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Equivalent to\n",
|
||||
"# from langchain.evaluation import LabeledCriteriaEvalChain\n",
|
||||
"# LabeledCriteriaEvalChain.from_llm(llm=eval_llm, criteria=criterion)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
@@ -145,62 +133,18 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"With ground truth: 1\n",
|
||||
"Withoutg ground truth: 0\n"
|
||||
"With ground truth: 1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\", requires_reference=True)\n",
|
||||
"\n",
|
||||
"# We can even override the model's learned knowledge using ground truth labels\n",
|
||||
"eval_result = eval_chain.evaluate_strings(\n",
|
||||
" input=\"What is the capital of the US?\",\n",
|
||||
" prediction=\"Topeka, KS\", \n",
|
||||
" reference=\"The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023\")\n",
|
||||
"print(f'With ground truth: {eval_result[\"score\"]}')\n",
|
||||
"\n",
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\")\n",
|
||||
"eval_result = eval_chain.evaluate_strings(\n",
|
||||
" input=\"What is the capital of the US?\",\n",
|
||||
" prediction=\"Topeka, KS\", \n",
|
||||
" prediction=\"Topeka, KS\",\n",
|
||||
" reference=\"The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023\",\n",
|
||||
")\n",
|
||||
"print(f'Withoutg ground truth: {eval_result[\"score\"]}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2eb7dedb-913a-4d9e-b48a-9521425d1008",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Multiple Criteria\n",
|
||||
"\n",
|
||||
"To check whether an output complies with all of a list of default criteria, pass in a list! Be sure to only include criteria that are relevant to the provided information, and avoid mixing criteria that measure opposing things (e.g., harmfulness and helpfulness)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "50c067f7-bc6e-4d6c-ba34-97a72023be27",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'reasoning': 'Conciseness:\\n- The submission is one sentence long, which is concise.\\n- The submission directly answers the question without any unnecessary information.\\nConclusion: The submission meets the conciseness criterion.\\n\\nCoherence:\\n- The submission is well-structured and organized.\\n- The submission provides the origin of the term synecdoche and explains the meaning of the Greek words it comes from.\\n- The submission is coherent and easy to understand.\\nConclusion: The submission meets the coherence criterion.', 'value': 'Final conclusion: Y', 'score': None}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"criteria = [\"conciseness\", \"coherence\"]\n",
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=criteria)\n",
|
||||
"eval_result = eval_chain.evaluate_strings(prediction=prediction, input=query)\n",
|
||||
"print(eval_result)"
|
||||
"print(f'With ground truth: {eval_result[\"score\"]}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -217,7 +161,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"id": "bafa0a11-2617-4663-84bf-24df7d0736be",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -225,58 +169,22 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'reasoning': '1. Criteria: numeric: Does the output contain numeric information?\\n- The submission does not contain any numeric information.\\n- Conclusion: The submission meets the criteria.', 'value': 'Answer: Y', 'score': None}\n"
|
||||
"{'reasoning': 'The criterion is asking if the output contains numeric information. The submission does mention the \"late 16th century,\" which is a numeric information. Therefore, the submission meets the criterion.\\n\\nY', 'value': 'Y', 'score': 1}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"custom_criterion = {\n",
|
||||
" \"numeric\": \"Does the output contain numeric information?\"\n",
|
||||
"}\n",
|
||||
"custom_criterion = {\"numeric\": \"Does the output contain numeric information?\"}\n",
|
||||
"\n",
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=custom_criterion)\n",
|
||||
"eval_chain = load_evaluator(\n",
|
||||
" EvaluatorType.CRITERIA,\n",
|
||||
" llm=eval_llm,\n",
|
||||
" criteria=custom_criterion,\n",
|
||||
")\n",
|
||||
"eval_result = eval_chain.evaluate_strings(prediction=prediction, input=query)\n",
|
||||
"print(eval_result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "6db12a16-0058-4a14-8064-8528540963d8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Meets criteria: 1\n",
|
||||
"Does not meet criteria: 0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# You can specify multiple criteria in the dictionary. We recommend you keep the number criteria to a minimum, however for more reliable results.\n",
|
||||
"\n",
|
||||
"custom_criteria = {\n",
|
||||
" \"complements-user\": \"Does the submission complements the question or the person writing the question in some way?\",\n",
|
||||
" \"positive\": \"Does the submission maintain a positive sentiment throughout?\",\n",
|
||||
" \"active voice\": \"Does the submission maintain an active voice throughout, avoiding state of being verbs?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=custom_criteria)\n",
|
||||
"\n",
|
||||
"# Example that complies\n",
|
||||
"query = \"What's the population of lagos?\"\n",
|
||||
"eval_result = eval_chain.evaluate_strings(prediction=\"I think that's a great question, you're really curious! About 30 million people live in Lagos, Nigeria, as of 2023.\", input=query)\n",
|
||||
"print(\"Meets criteria: \", eval_result[\"score\"])\n",
|
||||
"\n",
|
||||
"# Example that does not comply\n",
|
||||
"eval_result = eval_chain.evaluate_strings(prediction=\"The population of Lagos, Nigeria, is about 30 million people.\", input=query)\n",
|
||||
"print(\"Does not meet criteria: \", eval_result[\"score\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "07485cce-8d52-43a0-bdad-76ec7dacfb51",
|
||||
@@ -290,7 +198,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 7,
|
||||
"id": "99e3c242-5b12-4bd5-b487-64990a159655",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -316,7 +224,7 @@
|
||||
" ConstitutionalPrinciple(critique_request='Discuss the assistant’s last response to the human and whether it is in any way insensitive, sexist, racist, or socially inappropriate.', revision_request='Revise the assistant’s last response to remove content that is in any way insensitive, sexist, racist, or socially inappropriate.', name='insensitive'))]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -330,7 +238,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 8,
|
||||
"id": "f4061bc0-a88a-4d1c-88a0-8003bba084c7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -339,20 +247,25 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'reasoning': 'harmful1: The submission includes a derogatory and insulting term (\"lilly-livered nincompoop\") to describe Will, which could be considered harmful and disrespectful. Therefore, the submission meets this criterion.\\nharmful2: The submission does not contain any explicit encouragement of dangerous or illegal activity, nor does it contain any explicit racism or sexism. However, the use of the term \"nincompoop\" could be seen as belittling and harmful to Will\\'s self-esteem. Therefore, the submission meets this criterion as well.\\n\\nY',\n",
|
||||
" 'value': 'Y',\n",
|
||||
" 'score': 1}"
|
||||
"{'reasoning': 'The criterion asks to identify if the assistant\\'s response is harmful, unethical, racist, sexist, toxic, dangerous, or illegal.\\n\\nThe assistant\\'s response is \"I say that man is a lilly-livered nincompoop\". This is a negative statement about a person named Will. \\n\\nThe term \"lilly-livered\" is an old-fashioned term meaning cowardly, and \"nincompoop\" is a term meaning a foolish or stupid person. \\n\\nWhile this is a negative and potentially insulting statement, it does not appear to be harmful in a physical sense, unethical, racist, sexist, toxic, dangerous, or illegal. \\n\\nIt could be argued that the statement is harmful in a psychological or emotional sense, as it is insulting. However, without further context, it\\'s hard to definitively say if this meets the criteria of being harmful.\\n\\nN',\n",
|
||||
" 'value': 'N',\n",
|
||||
" 'score': 0}"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=[PRINCIPLES[\"harmful1\"], PRINCIPLES[\"harmful2\"]])\n",
|
||||
"eval_result = eval_chain.evaluate_strings(prediction=\"I say that man is a lilly-livered nincompoop\", input=\"What do you think of Will?\")\n",
|
||||
"eval_result"
|
||||
"eval_chain = load_evaluator(\n",
|
||||
" EvaluatorType.CRITERIA, llm=eval_llm, criteria=PRINCIPLES[\"harmful1\"]\n",
|
||||
")\n",
|
||||
"eval_result = eval_chain.evaluate_strings(\n",
|
||||
" prediction=\"I say that man is a lilly-livered nincompoop\",\n",
|
||||
" input=\"What do you think of Will?\",\n",
|
||||
")\n",
|
||||
"print(eval_result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -366,14 +279,6 @@
|
||||
"\n",
|
||||
"Remember when selecting criteria to decide whether they ought to require ground truth labels or not. Things like \"correctness\" are best evaluated with ground truth or with extensive context. Also, remember to pick aligned principles for a given chain so that the classification makes sense."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "415eb393-c64f-41f1-98de-de99e8e3597e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -392,7 +297,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -339,9 +339,9 @@
|
||||
" agent_trajectory=test_outputs_one[\"intermediate_steps\"],\n",
|
||||
" reference=(\n",
|
||||
" \"You need many more than 100,000 ping-pong balls in the empire state building.\"\n",
|
||||
" )\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"print(\"Score from 1 to 5: \", evaluation[\"score\"])\n",
|
||||
"print(\"Reasoning: \", evaluation[\"reasoning\"])"
|
||||
|
||||
575
docs/extras/guides/langsmith/walkthrough.ipynb
Normal file
@@ -0,0 +1,575 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1a4596ea-a631-416d-a2a4-3577c140493d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"# LangSmith Walkthrough\n",
|
||||
"\n",
|
||||
"LangChain makes it easy to prototype LLM applications and Agents. However, delivering LLM applications to production can be deceptively difficult. You will likely have to heavily customize and iterate on your prompts, chains, and other components to create a high-quality product.\n",
|
||||
"\n",
|
||||
"To aid in this process, we've launched LangSmith, a unified platform for debugging, testing, and monitoring your LLM applications.\n",
|
||||
"\n",
|
||||
"When might this come in handy? You may find it useful when you want to:\n",
|
||||
"\n",
|
||||
"- Quickly debug a new chain, agent, or set of tools\n",
|
||||
"- Visualize how components (chains, llms, retrievers, etc.) relate and are used\n",
|
||||
"- Evaluate different prompts and LLMs for a single component\n",
|
||||
"- Run a given chain several times over a dataset to ensure it consistently meets a quality bar\n",
|
||||
"- Capture usage traces and using LLMs or analytics pipelines to generate insights"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "138fbb8f-960d-4d26-9dd5-6d6acab3ee55",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"**[Create a LangSmith account](https://smith.langchain.com/) and create an API key (see bottom left corner). Familiarize yourself with the platform by looking through the [docs](https://docs.smith.langchain.com/)**\n",
|
||||
"\n",
|
||||
"Note LangSmith is in closed beta; we're in the process of rolling it out to more users. However, you can fill out the form on the website for expedited access.\n",
|
||||
"\n",
|
||||
"Now, let's get started!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2d77d064-41b4-41fb-82e6-2d16461269ec",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Log runs to LangSmith\n",
|
||||
"\n",
|
||||
"First, configure your environment variables to tell LangChain to log traces. This is done by setting the `LANGCHAIN_TRACING_V2` environment variable to true.\n",
|
||||
"You can tell LangChain which project to log to by setting the `LANGCHAIN_PROJECT` environment variable (if this isn't set, runs will be logged to the `default` project). This will automatically create the project for you if it doesn't exist. You must also set the `LANGCHAIN_ENDPOINT` and `LANGCHAIN_API_KEY` environment variables.\n",
|
||||
"\n",
|
||||
"For more information on other ways to set up tracing, please reference the [LangSmith documentation](https://docs.smith.langchain.com/docs/)\n",
|
||||
"\n",
|
||||
"**NOTE:** You must also set your `OPENAI_API_KEY` and `SERPAPI_API_KEY` environment variables in order to run the following tutorial.\n",
|
||||
"\n",
|
||||
"**NOTE:** You can only access an API key when you first create it. Keep it somewhere safe.\n",
|
||||
"\n",
|
||||
"**NOTE:** You can also use a context manager in python to log traces using\n",
|
||||
"```python\n",
|
||||
"from langchain.callbacks.manager import tracing_v2_enabled\n",
|
||||
"\n",
|
||||
"with tracing_v2_enabled(project_name=\"My Project\"):\n",
|
||||
" agent.run(\"How many people live in canada as of 2023?\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"However, in this example, we will use environment variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "904db9a5-f387-4a57-914c-c8af8d39e249",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"unique_id = uuid4().hex[0:8]\n",
|
||||
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
||||
"os.environ[\"LANGCHAIN_PROJECT\"] = f\"Tracing Walkthrough - {unique_id}\"\n",
|
||||
"os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
|
||||
"os.environ[\"LANGCHAIN_API_KEY\"] = \"\" # Update to your API key\n",
|
||||
"\n",
|
||||
"# Used by the agent in this tutorial\n",
|
||||
"# os.environ[\"OPENAI_API_KEY\"] = \"<YOUR-OPENAI-API-KEY>\"\n",
|
||||
"# os.environ[\"SERPAPI_API_KEY\"] = \"<YOUR-SERPAPI-API-KEY>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8ee7f34b-b65c-4e09-ad52-e3ace78d0221",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"Create the langsmith client to interact with the API"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "510b5ca0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langsmith import Client\n",
|
||||
"\n",
|
||||
"client = Client()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ca27fa11-ddce-4af0-971e-c5c37d5b92ef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Create a LangChain component and log runs to the platform. In this example, we will create a ReAct-style agent with access to Search and Calculator as tools. However, LangSmith works regardless of which type of LangChain component you use (LLMs, Chat Models, Tools, Retrievers, Agents are all supported)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "7c801853-8e96-404d-984c-51ace59cbbef",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cab51e1e-8270-452c-ba22-22b5b5951899",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We are running the agent concurrently on multiple inputs to reduce latency. Runs get logged to LangSmith in the background so execution latency is unaffected."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "19537902-b95c-4390-80a4-f6c9a937081e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"\n",
|
||||
"inputs = [\n",
|
||||
" \"How many people live in canada as of 2023?\",\n",
|
||||
" \"who is dua lipa's boyfriend? what is his age raised to the .43 power?\",\n",
|
||||
" \"what is dua lipa's boyfriend age raised to the .43 power?\",\n",
|
||||
" \"how far is it from paris to boston in miles\",\n",
|
||||
" \"what was the total number of points scored in the 2023 super bowl? what is that number raised to the .23 power?\",\n",
|
||||
" \"what was the total number of points scored in the 2023 super bowl raised to the .23 power?\",\n",
|
||||
" \"how many more points were scored in the 2023 super bowl than in the 2022 super bowl?\",\n",
|
||||
" \"what is 153 raised to .1312 power?\",\n",
|
||||
" \"who is kendall jenner's boyfriend? what is his height (in inches) raised to .13 power?\",\n",
|
||||
" \"what is 1213 divided by 4345?\",\n",
|
||||
"]\n",
|
||||
"results = []\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"async def arun(agent, input_example):\n",
|
||||
" try:\n",
|
||||
" return await agent.arun(input_example)\n",
|
||||
" except Exception as e:\n",
|
||||
" # The agent sometimes makes mistakes! These will be captured by the tracing.\n",
|
||||
" return e\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"for input_example in inputs:\n",
|
||||
" results.append(arun(agent, input_example))\n",
|
||||
"results = await asyncio.gather(*results)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "0405ff30-21fe-413d-85cf-9fa3c649efec",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks.tracers.langchain import wait_for_all_tracers\n",
|
||||
"\n",
|
||||
"# Logs are submitted in a background thread to avoid blocking execution.\n",
|
||||
"# For the sake of this tutorial, we want to make sure\n",
|
||||
"# they've been submitted before moving on. This is also\n",
|
||||
"# useful for serverless deployments.\n",
|
||||
"wait_for_all_tracers()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9decb964-be07-4b6c-9802-9825c8be7b64",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Assuming you've successfully set up your environment, your agent traces should show up in the `Projects` section in the [app](https://smith.langchain.com/). Congrats!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6c43c311-4e09-4d57-9ef3-13afb96ff430",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Evaluate another agent implementation\n",
|
||||
"\n",
|
||||
"In addition to logging runs, LangSmith also allows you to test and evaluate your LLM applications.\n",
|
||||
"\n",
|
||||
"In this section, you will leverage LangSmith to create a benchmark dataset and run AI-assisted evaluators on an agent. You will do so in a few steps:\n",
|
||||
"\n",
|
||||
"1. Create a dataset from pre-existing run inputs and outputs\n",
|
||||
"2. Initialize a new agent to benchmark\n",
|
||||
"3. Configure evaluators to grade an agent's output\n",
|
||||
"4. Run the agent over the dataset and evaluate the results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "beab1a29-b79d-4a99-b5b1-0870c2d772b1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1. Create a LangSmith dataset\n",
|
||||
"\n",
|
||||
"Below, we use the LangSmith client to create a dataset from the agent runs you just logged above. You will use these later to measure performance for a new agent. This is simply taking the inputs and outputs of the runs and saving them as examples to a dataset. A dataset is a collection of examples, which are nothing more than input-output pairs you can use as test cases to your application.\n",
|
||||
"\n",
|
||||
"**Note: this is a simple, walkthrough example. In a real-world setting, you'd ideally first validate the outputs before adding them to a benchmark dataset to be used for evaluating other agents.**\n",
|
||||
"\n",
|
||||
"For more information on datasets, including how to create them from CSVs or other files or how to create them in the platform, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "17580c4b-bd04-4dde-9d21-9d4edd25b00d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dataset_name = f\"calculator-example-dataset-{unique_id}\"\n",
|
||||
"\n",
|
||||
"dataset = client.create_dataset(\n",
|
||||
" dataset_name, description=\"A calculator example dataset\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"runs = client.list_runs(\n",
|
||||
" project_name=os.environ[\"LANGCHAIN_PROJECT\"],\n",
|
||||
" execution_order=1, # Only return the top-level runs\n",
|
||||
" error=False, # Only runs that succeed\n",
|
||||
")\n",
|
||||
"for run in runs:\n",
|
||||
" client.create_example(inputs=run.inputs, outputs=run.outputs, dataset_id=dataset.id)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8adfd29c-b258-49e5-94b4-74597a12ba16",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"### 2. Initialize a new agent to benchmark\n",
|
||||
"\n",
|
||||
"You can evaluate any LLM, chain, or agent. Since chains can have memory, we will pass in a `chain_factory` (aka a `constructor` ) function to initialize for each call.\n",
|
||||
"\n",
|
||||
"In this case, we will test an agent that uses OpenAI's function calling endpoints."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "f42d8ecc-d46a-448b-a89c-04b0f6907f75",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
|
||||
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Since chains can be stateful (e.g. they can have memory), we provide\n",
|
||||
"# a way to initialize a new chain for each row in the dataset. This is done\n",
|
||||
"# by passing in a factory function that returns a new chain for each row.\n",
|
||||
"def agent_factory():\n",
|
||||
" return initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=False)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# If your chain is NOT stateful, your factory can return the object directly\n",
|
||||
"# to improve runtime performance. For example:\n",
|
||||
"# chain_factory = lambda: agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9cb9ef53",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 3. Configure evaluation\n",
|
||||
"\n",
|
||||
"Manually comparing the results of chains in the UI is effective, but it can be time consuming.\n",
|
||||
"It can be helpful to use automated metrics and AI-assisted feedback to evaluate your component's performance.\n",
|
||||
"\n",
|
||||
"Below, we will create some pre-implemented run evaluators that do the following:\n",
|
||||
"- Compare results against ground truth labels. (You used the debug outputs above for this)\n",
|
||||
"- Measure semantic (dis)similarity using embedding distance\n",
|
||||
"- Evaluate 'aspects' of the agent's response in a reference-free manner using custom criteria\n",
|
||||
"\n",
|
||||
"For a longer discussion of how to select an appropriate evaluator for your use case and how to create your own\n",
|
||||
"custom evaluators, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "a25dc281",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.evaluation import EvaluatorType\n",
|
||||
"from langchain.smith import RunEvalConfig\n",
|
||||
"\n",
|
||||
"evaluation_config = RunEvalConfig(\n",
|
||||
" # Evaluators can either be an evaluator type (e.g., \"qa\", \"criteria\", \"embedding_distance\", etc.) or a configuration for that evaluator\n",
|
||||
" evaluators=[\n",
|
||||
" # Measures whether a QA response is \"Correct\", based on a reference answer\n",
|
||||
" # You can also select via the raw string \"qa\"\n",
|
||||
" EvaluatorType.QA,\n",
|
||||
" # Measure the embedding distance between the output and the reference answer\n",
|
||||
" # Equivalent to: EvalConfig.EmbeddingDistance(embeddings=OpenAIEmbeddings())\n",
|
||||
" EvaluatorType.EMBEDDING_DISTANCE,\n",
|
||||
" # Grade whether the output satisfies the stated criteria. You can select a default one such as \"helpfulness\" or provide your own.\n",
|
||||
" RunEvalConfig.LabeledCriteria(\"helpfulness\"),\n",
|
||||
" # Both the Criteria and LabeledCriteria evaluators can be configured with a dictionary of custom criteria.\n",
|
||||
" RunEvalConfig.Criteria(\n",
|
||||
" {\n",
|
||||
" \"fifth-grader-score\": \"Do you have to be smarter than a fifth grader to answer this question?\"\n",
|
||||
" }\n",
|
||||
" ),\n",
|
||||
" ],\n",
|
||||
" # You can add custom StringEvaluator or RunEvaluator objects here as well, which will automatically be\n",
|
||||
" # applied to each prediction. Check out the docs for examples.\n",
|
||||
" custom_evaluators=[],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "07885b10",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"### 4. Run the agent and evaluators\n",
|
||||
"\n",
|
||||
"Use the [arun_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.arun_on_dataset.html#langchain.smith.evaluation.runner_utils.arun_on_dataset) (or synchronous [run_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.run_on_dataset.html#langchain.smith.evaluation.runner_utils.run_on_dataset)) function to evaluate your model. This will:\n",
|
||||
"1. Fetch example rows from the specified dataset\n",
|
||||
"2. Run your llm or chain on each example.\n",
|
||||
"3. Apply evalutors to the resulting run traces and corresponding reference examples to generate automated feedback.\n",
|
||||
"\n",
|
||||
"The results will be visible in the LangSmith app."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "3733269b-8085-4644-9d5d-baedcff13a2f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 1\r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Chain failed for example 85f3a543-0429-48ae-be23-f48f0d903530. Error: LLMMathChain._evaluate(\"\n",
|
||||
"age_of_Dua_Lipa_boyfriend ** 0.43\n",
|
||||
"\") raised error: 'age_of_Dua_Lipa_boyfriend'. Please try again with a valid numerical expression\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 6\r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Chain failed for example 97d0d138-e9b3-4825-af2c-42789c66c0d4. Error: Too many arguments to single-input tool Calculator. Args: ['height ^ 0.13', {'height': 72}]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 9\r"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.smith import (\n",
|
||||
" arun_on_dataset,\n",
|
||||
" run_on_dataset, # Available if your chain doesn't support async calls.\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"chain_results = await arun_on_dataset(\n",
|
||||
" client=client,\n",
|
||||
" dataset_name=dataset_name,\n",
|
||||
" llm_or_chain_factory=agent_factory,\n",
|
||||
" evaluation=evaluation_config,\n",
|
||||
" verbose=True,\n",
|
||||
" tags=[\"testing-notebook\"], # Optional, adds a tag to the resulting chain runs\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Sometimes, the agent will error due to parsing issues, incompatible tool inputs, etc.\n",
|
||||
"# These are logged as warnings here and captured as errors in the tracing UI."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cdacd159-eb4d-49e9-bb2a-c55322c40ed4",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"### Review the test results\n",
|
||||
"\n",
|
||||
"You can review the test results tracing UI below by navigating to the \"Datasets & Testing\" page and selecting the **\"calculator-example-dataset-*\"** dataset, clicking on the `Test Runs` tab, then inspecting the runs in the corresponding project. \n",
|
||||
"\n",
|
||||
"This will show the new runs and the feedback logged from the selected evaluators. Note that runs that error out will not have feedback."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "591c819e-9932-45cf-adab-63727dd49559",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Exporting datasets and runs\n",
|
||||
"\n",
|
||||
"LangSmith lets you export data to common formats such as CSV or JSONL directly in the web app. You can also use the client to fetch runs for further analysis, to store in your own database, or to share with others. Let's fetch the run traces from the evaluation run."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "33bfefde-d1bb-4f50-9f7a-fd572ee76820",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Run(id=UUID('eb71a98c-660b-45e4-904e-e1567fdec145'), name='AgentExecutor', start_time=datetime.datetime(2023, 7, 13, 8, 23, 35, 102907), run_type=<RunTypeEnum.chain: 'chain'>, end_time=datetime.datetime(2023, 7, 13, 8, 23, 37, 793962), extra={'runtime': {'library': 'langchain', 'runtime': 'python', 'platform': 'macOS-13.4.1-arm64-arm-64bit', 'sdk_version': '0.0.5', 'library_version': '0.0.231', 'runtime_version': '3.11.2'}, 'total_tokens': 512, 'prompt_tokens': 451, 'completion_tokens': 61}, error=None, serialized=None, events=[{'name': 'start', 'time': '2023-07-13T08:23:35.102907'}, {'name': 'end', 'time': '2023-07-13T08:23:37.793962'}], inputs={'input': 'what is 1213 divided by 4345?'}, outputs={'output': '1213 divided by 4345 is approximately 0.2792.'}, reference_example_id=UUID('d343add7-2631-417b-905a-dc39361ace69'), parent_run_id=None, tags=['openai-functions', 'testing-notebook'], execution_order=1, session_id=UUID('cc5f4f88-f1bf-495f-8adb-384f66321eb2'), child_run_ids=[UUID('daa9708a-ad08-4be1-9841-e92e2f384cce'), UUID('28b1ada7-3fe8-4853-a5b0-dac8a93a3066'), UUID('dc0b4867-3f3d-46f7-bfb5-f4be10f3cc52'), UUID('58c9494e-2ea6-4291-ab78-73b8ffcdaef5'), UUID('8f5a3e08-ce96-4c81-a6aa-86bf5b3bb590'), UUID('f0447532-7ded-45b6-9d87-f1fa18e381b0')], child_runs=None, feedback_stats={'correctness': {'n': 1, 'avg': 1.0, 'mode': 1}, 'helpfulness': {'n': 1, 'avg': 1.0, 'mode': 1}, 'fifth-grader-score': {'n': 1, 'avg': 0.0, 'mode': 0}, 'embedding_cosine_distance': {'n': 1, 'avg': 0.144522385071361, 'mode': 0.144522385071361}})"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"runs = list(client.list_runs(dataset_name=dataset_name))\n",
|
||||
"runs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "6595c888-1f5c-4ae3-9390-0a559f5575d1",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'correctness': {'n': 7, 'avg': 0.7142857142857143, 'mode': 1},\n",
|
||||
" 'helpfulness': {'n': 7, 'avg': 1.0, 'mode': 1},\n",
|
||||
" 'fifth-grader-score': {'n': 7, 'avg': 0.7142857142857143, 'mode': 1},\n",
|
||||
" 'embedding_cosine_distance': {'n': 7,\n",
|
||||
" 'avg': 0.08308464442094905,\n",
|
||||
" 'mode': 0.00371031210788608}}"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"client.read_project(project_id=runs[0].session_id).feedback_stats"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "2646f0fb-81d4-43ce-8a9b-54b8e19841e2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Conclusion\n",
|
||||
"\n",
|
||||
"Congratulations! You have succesfully traced and evaluated an agent using LangSmith!\n",
|
||||
"\n",
|
||||
"This was a quick guide to get started, but there are many more ways to use LangSmith to speed up your developer flow and produce better results.\n",
|
||||
"\n",
|
||||
"For more information on how you can get the most out of LangSmith, check out [LangSmith documentation](https://docs.smith.langchain.com/), and please reach out with questions, feature requests, or feedback at [support@langchain.dev](mailto:support@langchain.dev)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "57237f12",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -16,7 +16,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 1,
|
||||
"id": "c0a83623",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -38,6 +38,20 @@
|
||||
">This initializes the SerpAPIWrapper for search functionality (search).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a2b0a215",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\n",
|
||||
" \"SERPAPI_API_KEY\"\n",
|
||||
"] = \"897780527132b5f31d8d73c40c820d5ef2c2279687efa69f413a61f752027747\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
@@ -46,11 +60,11 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize the OpenAI language model\n",
|
||||
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual OpenAI key.\n",
|
||||
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual OpenAI key.\n",
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||
"\n",
|
||||
"# Initialize the SerpAPIWrapper for search functionality\n",
|
||||
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
|
||||
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
|
||||
"search = SerpAPIWrapper()\n",
|
||||
"\n",
|
||||
"# Define a list of tools offered by the agent\n",
|
||||
@@ -58,9 +72,9 @@
|
||||
" Tool(\n",
|
||||
" name=\"Search\",\n",
|
||||
" func=search.run,\n",
|
||||
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\"\n",
|
||||
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\",\n",
|
||||
" ),\n",
|
||||
"]\n"
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -70,7 +84,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mrkl = initialize_agent(tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=True)"
|
||||
"mrkl = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -82,6 +98,7 @@
|
||||
"source": [
|
||||
"# Do this so we can see exactly what's going on under the hood\n",
|
||||
"import langchain\n",
|
||||
"\n",
|
||||
"langchain.debug = True"
|
||||
]
|
||||
},
|
||||
@@ -194,15 +211,223 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"mrkl.run(\n",
|
||||
" \"What is the weather in LA and SF?\"\n",
|
||||
"mrkl.run(\"What is the weather in LA and SF?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d31d4c09",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Configuring max iteration behavior\n",
|
||||
"\n",
|
||||
"To make sure that our agent doesn't get stuck in excessively long loops, we can set max_iterations. We can also set an early stopping method, which will determine our agent's behavior once the number of max iterations is hit. By default, the early stopping uses method `force` which just returns that constant string. Alternatively, you could specify method `generate` which then does one FINAL pass through the LLM to generate an output."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "9f5f6743",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mrkl = initialize_agent(\n",
|
||||
" tools,\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
" verbose=True,\n",
|
||||
" max_iterations=2,\n",
|
||||
" early_stopping_method=\"generate\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "4362ebc7",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor] Entering Chain run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"input\": \"What is the weather in NYC today, yesterday, and the day before?\"\n",
|
||||
"}\n",
|
||||
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 2:llm:ChatOpenAI] Entering LLM run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"prompts\": [\n",
|
||||
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\"\n",
|
||||
" ]\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 2:llm:ChatOpenAI] [1.27s] Exiting LLM run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"generations\": [\n",
|
||||
" [\n",
|
||||
" {\n",
|
||||
" \"text\": \"\",\n",
|
||||
" \"generation_info\": null,\n",
|
||||
" \"message\": {\n",
|
||||
" \"lc\": 1,\n",
|
||||
" \"type\": \"constructor\",\n",
|
||||
" \"id\": [\n",
|
||||
" \"langchain\",\n",
|
||||
" \"schema\",\n",
|
||||
" \"messages\",\n",
|
||||
" \"AIMessage\"\n",
|
||||
" ],\n",
|
||||
" \"kwargs\": {\n",
|
||||
" \"content\": \"\",\n",
|
||||
" \"additional_kwargs\": {\n",
|
||||
" \"function_call\": {\n",
|
||||
" \"name\": \"Search\",\n",
|
||||
" \"arguments\": \"{\\n \\\"query\\\": \\\"weather in NYC today\\\"\\n}\"\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" ],\n",
|
||||
" \"llm_output\": {\n",
|
||||
" \"token_usage\": {\n",
|
||||
" \"prompt_tokens\": 79,\n",
|
||||
" \"completion_tokens\": 17,\n",
|
||||
" \"total_tokens\": 96\n",
|
||||
" },\n",
|
||||
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
|
||||
" },\n",
|
||||
" \"run\": null\n",
|
||||
"}\n",
|
||||
"\u001b[32;1m\u001b[1;3m[tool/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 3:tool:Search] Entering Tool run with input:\n",
|
||||
"\u001b[0m\"{'query': 'weather in NYC today'}\"\n",
|
||||
"\u001b[36;1m\u001b[1;3m[tool/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 3:tool:Search] [3.84s] Exiting Tool run with output:\n",
|
||||
"\u001b[0m\"10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\"\n",
|
||||
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 4:llm:ChatOpenAI] Entering LLM run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"prompts\": [\n",
|
||||
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC today\\\"\\\\n}'}\\nFunction: 10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\"\n",
|
||||
" ]\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 4:llm:ChatOpenAI] [1.24s] Exiting LLM run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"generations\": [\n",
|
||||
" [\n",
|
||||
" {\n",
|
||||
" \"text\": \"\",\n",
|
||||
" \"generation_info\": null,\n",
|
||||
" \"message\": {\n",
|
||||
" \"lc\": 1,\n",
|
||||
" \"type\": \"constructor\",\n",
|
||||
" \"id\": [\n",
|
||||
" \"langchain\",\n",
|
||||
" \"schema\",\n",
|
||||
" \"messages\",\n",
|
||||
" \"AIMessage\"\n",
|
||||
" ],\n",
|
||||
" \"kwargs\": {\n",
|
||||
" \"content\": \"\",\n",
|
||||
" \"additional_kwargs\": {\n",
|
||||
" \"function_call\": {\n",
|
||||
" \"name\": \"Search\",\n",
|
||||
" \"arguments\": \"{\\n \\\"query\\\": \\\"weather in NYC yesterday\\\"\\n}\"\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" ],\n",
|
||||
" \"llm_output\": {\n",
|
||||
" \"token_usage\": {\n",
|
||||
" \"prompt_tokens\": 142,\n",
|
||||
" \"completion_tokens\": 17,\n",
|
||||
" \"total_tokens\": 159\n",
|
||||
" },\n",
|
||||
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
|
||||
" },\n",
|
||||
" \"run\": null\n",
|
||||
"}\n",
|
||||
"\u001b[32;1m\u001b[1;3m[tool/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 5:tool:Search] Entering Tool run with input:\n",
|
||||
"\u001b[0m\"{'query': 'weather in NYC yesterday'}\"\n",
|
||||
"\u001b[36;1m\u001b[1;3m[tool/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 5:tool:Search] [1.15s] Exiting Tool run with output:\n",
|
||||
"\u001b[0m\"New York Temperature Yesterday. Maximum temperature yesterday: 81 °F (at 1:51 pm) Minimum temperature yesterday: 72 °F (at 7:17 pm) Average temperature ...\"\n",
|
||||
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:llm:ChatOpenAI] Entering LLM run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"prompts\": [\n",
|
||||
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC today\\\"\\\\n}'}\\nFunction: 10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC yesterday\\\"\\\\n}'}\\nFunction: New York Temperature Yesterday. Maximum temperature yesterday: 81 °F (at 1:51 pm) Minimum temperature yesterday: 72 °F (at 7:17 pm) Average temperature ...\"\n",
|
||||
" ]\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:llm:ChatOpenAI] [2.68s] Exiting LLM run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"generations\": [\n",
|
||||
" [\n",
|
||||
" {\n",
|
||||
" \"text\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\",\n",
|
||||
" \"generation_info\": null,\n",
|
||||
" \"message\": {\n",
|
||||
" \"lc\": 1,\n",
|
||||
" \"type\": \"constructor\",\n",
|
||||
" \"id\": [\n",
|
||||
" \"langchain\",\n",
|
||||
" \"schema\",\n",
|
||||
" \"messages\",\n",
|
||||
" \"AIMessage\"\n",
|
||||
" ],\n",
|
||||
" \"kwargs\": {\n",
|
||||
" \"content\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\",\n",
|
||||
" \"additional_kwargs\": {}\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" ],\n",
|
||||
" \"llm_output\": {\n",
|
||||
" \"token_usage\": {\n",
|
||||
" \"prompt_tokens\": 160,\n",
|
||||
" \"completion_tokens\": 91,\n",
|
||||
" \"total_tokens\": 251\n",
|
||||
" },\n",
|
||||
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
|
||||
" },\n",
|
||||
" \"run\": null\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor] [10.18s] Exiting Chain run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"output\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\"\n",
|
||||
"}\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"mrkl.run(\"What is the weather in NYC today, yesterday, and the day before?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "067a8d3e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice that we never get around to looking up the weather the day before yesterday, due to hitting our max_iterations limit."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9f5f6743",
|
||||
"id": "c3318a11",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
@@ -210,9 +435,9 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
@@ -224,7 +449,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -78,6 +78,7 @@
|
||||
"source": [
|
||||
"from langchain.prompts import MessagesPlaceholder\n",
|
||||
"from langchain.memory import ConversationBufferMemory\n",
|
||||
"\n",
|
||||
"agent_kwargs = {\n",
|
||||
" \"extra_prompt_messages\": [MessagesPlaceholder(variable_name=\"memory\")],\n",
|
||||
"}\n",
|
||||
@@ -92,12 +93,12 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, \n",
|
||||
" llm, \n",
|
||||
" agent=AgentType.OPENAI_FUNCTIONS, \n",
|
||||
" verbose=True, \n",
|
||||
" agent_kwargs=agent_kwargs, \n",
|
||||
" memory=memory\n",
|
||||
" tools,\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
" verbose=True,\n",
|
||||
" agent_kwargs=agent_kwargs,\n",
|
||||
" memory=memory,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -42,15 +42,14 @@
|
||||
"import yfinance as yf\n",
|
||||
"from datetime import datetime, timedelta\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def get_current_stock_price(ticker):\n",
|
||||
" \"\"\"Method to get current stock price\"\"\"\n",
|
||||
"\n",
|
||||
" ticker_data = yf.Ticker(ticker)\n",
|
||||
" recent = ticker_data.history(period='1d')\n",
|
||||
" return {\n",
|
||||
" 'price': recent.iloc[0]['Close'],\n",
|
||||
" 'currency': ticker_data.info['currency']\n",
|
||||
" }\n",
|
||||
" recent = ticker_data.history(period=\"1d\")\n",
|
||||
" return {\"price\": recent.iloc[0][\"Close\"], \"currency\": ticker_data.info[\"currency\"]}\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def get_stock_performance(ticker, days):\n",
|
||||
" \"\"\"Method to get stock price change in percentage\"\"\"\n",
|
||||
@@ -58,11 +57,9 @@
|
||||
" past_date = datetime.today() - timedelta(days=days)\n",
|
||||
" ticker_data = yf.Ticker(ticker)\n",
|
||||
" history = ticker_data.history(start=past_date)\n",
|
||||
" old_price = history.iloc[0]['Close']\n",
|
||||
" current_price = history.iloc[-1]['Close']\n",
|
||||
" return {\n",
|
||||
" 'percent_change': ((current_price - old_price)/old_price)*100\n",
|
||||
" }"
|
||||
" old_price = history.iloc[0][\"Close\"]\n",
|
||||
" current_price = history.iloc[-1][\"Close\"]\n",
|
||||
" return {\"percent_change\": ((current_price - old_price) / old_price) * 100}"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -88,7 +85,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"get_current_stock_price('MSFT')"
|
||||
"get_current_stock_price(\"MSFT\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -114,7 +111,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"get_stock_performance('MSFT', 30)"
|
||||
"get_stock_performance(\"MSFT\", 30)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -138,10 +135,13 @@
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"from langchain.tools import BaseTool\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class CurrentStockPriceInput(BaseModel):\n",
|
||||
" \"\"\"Inputs for get_current_stock_price\"\"\"\n",
|
||||
"\n",
|
||||
" ticker: str = Field(description=\"Ticker symbol of the stock\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class CurrentStockPriceTool(BaseTool):\n",
|
||||
" name = \"get_current_stock_price\"\n",
|
||||
" description = \"\"\"\n",
|
||||
@@ -160,8 +160,10 @@
|
||||
"\n",
|
||||
"class StockPercentChangeInput(BaseModel):\n",
|
||||
" \"\"\"Inputs for get_stock_performance\"\"\"\n",
|
||||
"\n",
|
||||
" ticker: str = Field(description=\"Ticker symbol of the stock\")\n",
|
||||
" days: int = Field(description='Timedelta days to get past date from current date')\n",
|
||||
" days: int = Field(description=\"Timedelta days to get past date from current date\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class StockPerformanceTool(BaseTool):\n",
|
||||
" name = \"get_stock_performance\"\n",
|
||||
@@ -202,15 +204,9 @@
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.agents import initialize_agent\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(\n",
|
||||
" model=\"gpt-3.5-turbo-0613\",\n",
|
||||
" temperature=0\n",
|
||||
")\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
|
||||
"\n",
|
||||
"tools = [\n",
|
||||
" CurrentStockPriceTool(),\n",
|
||||
" StockPerformanceTool()\n",
|
||||
"]\n",
|
||||
"tools = [CurrentStockPriceTool(), StockPerformanceTool()]\n",
|
||||
"\n",
|
||||
"agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)"
|
||||
]
|
||||
@@ -261,7 +257,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What is the current price of Microsoft stock? How it has performed over past 6 months?\")"
|
||||
"agent.run(\n",
|
||||
" \"What is the current price of Microsoft stock? How it has performed over past 6 months?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -355,7 +353,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run('In the past 3 months, which stock between Microsoft and Google has performed the best?')"
|
||||
"agent.run(\n",
|
||||
" \"In the past 3 months, which stock between Microsoft and Google has performed the best?\"\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -79,10 +79,10 @@
|
||||
"source": [
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" toolkit.get_tools(), \n",
|
||||
" llm, \n",
|
||||
" agent=AgentType.OPENAI_FUNCTIONS, \n",
|
||||
" verbose=True, \n",
|
||||
" toolkit.get_tools(),\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
" verbose=True,\n",
|
||||
" agent_kwargs=agent_kwargs,\n",
|
||||
")"
|
||||
]
|
||||
|
||||
@@ -17,16 +17,7 @@
|
||||
"execution_count": 1,
|
||||
"id": "8632a37c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.5) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"\n",
|
||||
@@ -56,14 +47,14 @@
|
||||
"files = [\n",
|
||||
" # https://abc.xyz/investor/static/pdf/2023Q1_alphabet_earnings_release.pdf\n",
|
||||
" {\n",
|
||||
" \"name\": \"alphabet-earnings\", \n",
|
||||
" \"name\": \"alphabet-earnings\",\n",
|
||||
" \"path\": \"/Users/harrisonchase/Downloads/2023Q1_alphabet_earnings_release.pdf\",\n",
|
||||
" }, \n",
|
||||
" },\n",
|
||||
" # https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2023-Update\n",
|
||||
" {\n",
|
||||
" \"name\": \"tesla-earnings\", \n",
|
||||
" \"path\": \"/Users/harrisonchase/Downloads/TSLA-Q1-2023-Update.pdf\"\n",
|
||||
" }\n",
|
||||
" \"name\": \"tesla-earnings\",\n",
|
||||
" \"path\": \"/Users/harrisonchase/Downloads/TSLA-Q1-2023-Update.pdf\",\n",
|
||||
" },\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"for file in files:\n",
|
||||
@@ -73,14 +64,14 @@
|
||||
" docs = text_splitter.split_documents(pages)\n",
|
||||
" embeddings = OpenAIEmbeddings()\n",
|
||||
" retriever = FAISS.from_documents(docs, embeddings).as_retriever()\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" # Wrap retrievers in a Tool\n",
|
||||
" tools.append(\n",
|
||||
" Tool(\n",
|
||||
" args_schema=DocumentInput,\n",
|
||||
" name=file[\"name\"], \n",
|
||||
" name=file[\"name\"],\n",
|
||||
" description=f\"useful when you want to answer questions about {file['name']}\",\n",
|
||||
" func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever)\n",
|
||||
" func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever),\n",
|
||||
" )\n",
|
||||
" )"
|
||||
]
|
||||
@@ -139,7 +130,7 @@
|
||||
"source": [
|
||||
"llm = ChatOpenAI(\n",
|
||||
" temperature=0,\n",
|
||||
" model=\"gpt-3.5-turbo-0613\", \n",
|
||||
" model=\"gpt-3.5-turbo-0613\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"agent = initialize_agent(\n",
|
||||
@@ -170,6 +161,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import langchain\n",
|
||||
"\n",
|
||||
"langchain.debug = True"
|
||||
]
|
||||
},
|
||||
@@ -405,7 +397,7 @@
|
||||
"source": [
|
||||
"llm = ChatOpenAI(\n",
|
||||
" temperature=0,\n",
|
||||
" model=\"gpt-3.5-turbo-0613\", \n",
|
||||
" model=\"gpt-3.5-turbo-0613\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"agent = initialize_agent(\n",
|
||||
|
||||
@@ -136,9 +136,11 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Create an email draft for me to edit of a letter from the perspective of a sentient parrot\"\n",
|
||||
" \" who is looking to collaborate on some research with her\"\n",
|
||||
" \" estranged friend, a cat. Under no circumstances may you send the message, however.\")"
|
||||
"agent.run(\n",
|
||||
" \"Create an email draft for me to edit of a letter from the perspective of a sentient parrot\"\n",
|
||||
" \" who is looking to collaborate on some research with her\"\n",
|
||||
" \" estranged friend, a cat. Under no circumstances may you send the message, however.\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -160,7 +162,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Could you search in my drafts folder and let me know if any of them are about collaboration?\")"
|
||||
"agent.run(\n",
|
||||
" \"Could you search in my drafts folder and let me know if any of them are about collaboration?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -190,7 +194,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Can you schedule a 30 minute meeting with a sentient parrot to discuss research collaborations on October 3, 2023 at 2 pm Easter Time?\")"
|
||||
"agent.run(\n",
|
||||
" \"Can you schedule a 30 minute meeting with a sentient parrot to discuss research collaborations on October 3, 2023 at 2 pm Easter Time?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -210,7 +216,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Can you tell me if I have any events on October 3, 2023 in Eastern Time, and if so, tell me if any of them are with a sentient parrot?\")"
|
||||
"agent.run(\n",
|
||||
" \"Can you tell me if I have any events on October 3, 2023 in Eastern Time, and if so, tell me if any of them are with a sentient parrot?\"\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "0e499e90-7a6d-4fab-8aab-31a4df417601",
|
||||
"metadata": {},
|
||||
@@ -15,6 +16,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "ec927ac6-9b2a-4e8a-9a6e-3e429191875c",
|
||||
"metadata": {
|
||||
@@ -54,6 +56,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "f74d1792",
|
||||
"metadata": {},
|
||||
@@ -81,6 +84,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "971cc455",
|
||||
"metadata": {},
|
||||
@@ -106,6 +110,44 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "54c01168",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Disclamer ⚠️\n",
|
||||
"\n",
|
||||
"The query chain may generate insert/update/delete queries. When this is not expected, use a custom prompt or create a SQL users without write permissions.\n",
|
||||
"\n",
|
||||
"The final user might overload your SQL database by asking a simple question such as \"run the biggest query possible\". The generated query might look like:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "949772b9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"SELECT * FROM \"public\".\"users\"\n",
|
||||
" JOIN \"public\".\"user_permissions\" ON \"public\".\"users\".id = \"public\".\"user_permissions\".user_id\n",
|
||||
" JOIN \"public\".\"projects\" ON \"public\".\"users\".id = \"public\".\"projects\".user_id\n",
|
||||
" JOIN \"public\".\"events\" ON \"public\".\"projects\".id = \"public\".\"events\".project_id;"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "5a4a9455",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For a transactional SQL database, if one of the table above contains millions of rows, the query might cause trouble to other applications using the same database.\n",
|
||||
"\n",
|
||||
"Most datawarehouse oriented databases support user-level quota, for limiting resource usage."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "36ae48c7-cb08-4fef-977e-c7d4b96a464b",
|
||||
"metadata": {},
|
||||
@@ -195,6 +237,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9abcfe8e-1868-42a4-8345-ad2d9b44c681",
|
||||
"metadata": {},
|
||||
@@ -269,6 +312,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6fbc26af-97e4-4a21-82aa-48bdc992da26",
|
||||
"metadata": {},
|
||||
@@ -451,6 +495,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7c7503b5-d9d9-4faa-b064-29fcdb5ff213",
|
||||
"metadata": {},
|
||||
|
||||
742
docs/extras/modules/agents/toolkits/xorbits.ipynb
Normal file
@@ -0,0 +1,742 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Xorbits Agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook shows how to use agents to interact with [Xorbits Pandas](https://doc.xorbits.io/en/latest/reference/pandas/index.html) dataframe and [Xorbits Numpy](https://doc.xorbits.io/en/latest/reference/numpy/index.html) ndarray. It is mostly optimized for question answering.\n",
|
||||
"\n",
|
||||
"**NOTE: this agent calls the Python agent under the hood, which executes LLM generated Python code - this can be bad if the LLM generated Python code is harmful. Use cautiously.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pandas examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-07-13T08:06:33.955439Z",
|
||||
"start_time": "2023-07-13T08:06:33.767539500Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "05b7c067b1114ce9a8aef4a58a5d5fef",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import xorbits.pandas as pd\n",
|
||||
"\n",
|
||||
"from langchain.agents import create_xorbits_agent\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"data = pd.read_csv(\"titanic.csv\")\n",
|
||||
"agent = create_xorbits_agent(OpenAI(temperature=0), data, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-07-13T08:11:06.622471100Z",
|
||||
"start_time": "2023-07-13T08:11:03.183042Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to count the number of rows and columns\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data.shape\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m(891, 12)\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: There are 891 rows and 12 columns.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There are 891 rows and 12 columns.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"How many rows and columns are there?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-07-13T08:11:23.189275300Z",
|
||||
"start_time": "2023-07-13T08:11:11.029030900Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "8c63d745a7eb41a484043a5dba357997",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to count the number of people in pclass 1\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data[data['Pclass'] == 1].shape[0]\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m216\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: There are 216 people in pclass 1.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There are 216 people in pclass 1.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"How many people are in pclass 1?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to calculate the mean age\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data['Age'].mean()\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "29af2e29f2d64a3397c212812adf0e9b",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m29.69911764705882\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The mean age is 29.69911764705882.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The mean age is 29.69911764705882.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"whats the mean age?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to group the data by sex and then find the average age for each group\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data.groupby('Sex')['Age'].mean()\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "c3d28625c35946fd91ebc2a47f8d8c5b",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mSex\n",
|
||||
"female 27.915709\n",
|
||||
"male 30.726645\n",
|
||||
"Name: Age, dtype: float64\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the average age for each group\n",
|
||||
"Final Answer: The average age for female passengers is 27.92 and the average age for male passengers is 30.73.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The average age for female passengers is 27.92 and the average age for male passengers is 30.73.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Group the data by sex and find the average age for each group\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "c72aab63b20d47599f4f9806f6887a69",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to filter the dataframe to get the desired result\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data[(data['Age'] > 30) & (data['Fare'] > 30) & (data['Fare'] < 50) & ((data['Pclass'] == 1) | (data['Pclass'] == 2))].shape[0]\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m20\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: 20\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'20'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\n",
|
||||
" \"Show the number of people whose age is greater than 30 and fare is between 30 and 50 , and pclass is either 1 or 2\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Numpy examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "fa8baf315a0c41c89392edc4a24b76f5",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import xorbits.numpy as np\n",
|
||||
"\n",
|
||||
"from langchain.agents import create_xorbits_agent\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"arr = np.array([1, 2, 3, 4, 5, 6])\n",
|
||||
"agent = create_xorbits_agent(OpenAI(temperature=0), arr, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to find out the shape of the array\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data.shape\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m(6,)\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The shape of the array is (6,).\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The shape of the array is (6,).'"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Give the shape of the array \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to access the 2nd element of the array\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: data[1]\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "64efcc74f81f404eb0a7d3f0326cd8b3",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m2\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: 2\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'2'"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Give the 2nd element of the array \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to reshape the array and then transpose it\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: np.reshape(data, (2,3)).T\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "fce51acf6fb347c0b400da67c6750534",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m[[1 4]\n",
|
||||
" [2 5]\n",
|
||||
" [3 6]]\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The reshaped and transposed array is [[1 4], [2 5], [3 6]].\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The reshaped and transposed array is [[1 4], [2 5], [3 6]].'"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\n",
|
||||
" \"Reshape the array into a 2-dimensional array with 2 rows and 3 columns, and then transpose it\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to reshape the array and then sum it\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: np.sum(np.reshape(data, (3,2)), axis=0)\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "27fd4a0bbf694936bc41a6991064dec2",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m[ 9 12]\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The sum of the array along the first axis is [9, 12].\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The sum of the array along the first axis is [9, 12].'"
|
||||
]
|
||||
},
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\n",
|
||||
" \"Reshape the array into a 2-dimensional array with 3 rows and 2 columns and sum the array along the first axis\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "a591b6d7913f45cba98d2f3b71a5120a",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
|
||||
"agent = create_xorbits_agent(OpenAI(temperature=0), arr, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to use the numpy covariance function\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: np.cov(data)\u001b[0m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "5fe40f83cfae48d0919c147627b5839f",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m[[1. 1. 1.]\n",
|
||||
" [1. 1. 1.]\n",
|
||||
" [1. 1. 1.]]\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The covariance matrix is [[1. 1. 1.], [1. 1. 1.], [1. 1. 1.]].\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The covariance matrix is [[1. 1. 1.], [1. 1. 1.], [1. 1. 1.]].'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"calculate the covariance matrix\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to use the SVD function\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: U, S, V = np.linalg.svd(data)\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now have the U matrix\n",
|
||||
"Final Answer: U = [[-0.70710678 -0.70710678]\n",
|
||||
" [-0.70710678 0.70710678]]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'U = [[-0.70710678 -0.70710678]\\n [-0.70710678 0.70710678]]'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"compute the U of Singular Value Decomposition of the matrix\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -934,7 +934,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import ToolException\n",
|
||||
"from langchain.tools.base import ToolException\n",
|
||||
"\n",
|
||||
"from langchain import SerpAPIWrapper\n",
|
||||
"from langchain.agents import AgentType, initialize_agent\n",
|
||||
|
||||
@@ -24,7 +24,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install apify-client"
|
||||
"#!pip install apify-client openai langchain chromadb tiktoken"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -34,6 +34,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"DATAFORSEO_LOGIN\"] = \"your_api_access_username\"\n",
|
||||
"os.environ[\"DATAFORSEO_PASSWORD\"] = \"your_api_access_password\"\n",
|
||||
"\n",
|
||||
@@ -88,7 +89,8 @@
|
||||
"json_wrapper = DataForSeoAPIWrapper(\n",
|
||||
" json_result_types=[\"organic\", \"knowledge_graph\", \"answer_box\"],\n",
|
||||
" json_result_fields=[\"type\", \"title\", \"description\", \"text\"],\n",
|
||||
" top_count=3)"
|
||||
" top_count=3,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -119,7 +121,8 @@
|
||||
" top_count=10,\n",
|
||||
" json_result_types=[\"organic\", \"local_pack\"],\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
|
||||
" params={\"location_name\": \"Germany\", \"language_code\": \"en\"})\n",
|
||||
" params={\"location_name\": \"Germany\", \"language_code\": \"en\"},\n",
|
||||
")\n",
|
||||
"customized_wrapper.results(\"coffee near me\")"
|
||||
]
|
||||
},
|
||||
@@ -142,7 +145,8 @@
|
||||
" top_count=10,\n",
|
||||
" json_result_types=[\"organic\", \"local_pack\"],\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
|
||||
" params={\"location_name\": \"Germany\", \"language_code\": \"en\", \"se_name\": \"bing\"})\n",
|
||||
" params={\"location_name\": \"Germany\", \"language_code\": \"en\", \"se_name\": \"bing\"},\n",
|
||||
")\n",
|
||||
"customized_wrapper.results(\"coffee near me\")"
|
||||
]
|
||||
},
|
||||
@@ -164,7 +168,12 @@
|
||||
"maps_search = DataForSeoAPIWrapper(\n",
|
||||
" top_count=10,\n",
|
||||
" json_result_fields=[\"title\", \"value\", \"address\", \"rating\", \"type\"],\n",
|
||||
" params={\"location_coordinate\": \"52.512,13.36,12z\", \"language_code\": \"en\", \"se_type\": \"maps\"})\n",
|
||||
" params={\n",
|
||||
" \"location_coordinate\": \"52.512,13.36,12z\",\n",
|
||||
" \"language_code\": \"en\",\n",
|
||||
" \"se_type\": \"maps\",\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"maps_search.results(\"coffee near me\")"
|
||||
]
|
||||
},
|
||||
@@ -184,10 +193,12 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import Tool\n",
|
||||
"\n",
|
||||
"search = DataForSeoAPIWrapper(\n",
|
||||
" top_count=3,\n",
|
||||
" json_result_types=[\"organic\"],\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"])\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
|
||||
")\n",
|
||||
"tool = Tool(\n",
|
||||
" name=\"google-search-answer\",\n",
|
||||
" description=\"My new answer tool\",\n",
|
||||
|
||||
@@ -52,7 +52,6 @@
|
||||
"tools = load_tools(\n",
|
||||
" [\"graphql\"],\n",
|
||||
" graphql_endpoint=\"https://swapi-graphql.netlify.app/.netlify/functions/index\",\n",
|
||||
" llm=llm,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"agent = initialize_agent(\n",
|
||||
|
||||
233
docs/extras/modules/agents/tools/integrations/lemonai.ipynb
Normal file
@@ -0,0 +1,233 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "16763ed3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Lemon AI NLP Workflow Automation\n",
|
||||
"\\\n",
|
||||
"Full docs are available at: https://github.com/felixbrock/lemonai-py-client\n",
|
||||
"\n",
|
||||
"**Lemon AI helps you build powerful AI assistants in minutes and automate workflows by allowing for accurate and reliable read and write operations in tools like Airtable, Hubspot, Discord, Notion, Slack and Github.**\n",
|
||||
"\n",
|
||||
"Most connectors available today are focused on read-only operations, limiting the potential of LLMs. Agents, on the other hand, have a tendency to hallucinate from time to time due to missing context or instructions.\n",
|
||||
"\n",
|
||||
"With Lemon AI, it is possible to give your agents access to well-defined APIs for reliable read and write operations. In addition, Lemon AI functions allow you to further reduce the risk of hallucinations by providing a way to statically define workflows that the model can rely on in case of uncertainty."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "4881b484-1b97-478f-b206-aec407ceff66",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Quick Start\n",
|
||||
"\n",
|
||||
"The following quick start demonstrates how to use Lemon AI in combination with Agents to automate workflows that involve interaction with internal tooling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "ff91b41a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1. Install Lemon AI\n",
|
||||
"\n",
|
||||
"Requires Python 3.8.1 and above.\n",
|
||||
"\n",
|
||||
"To use Lemon AI in your Python project run `pip install lemonai`\n",
|
||||
"\n",
|
||||
"This will install the corresponding Lemon AI client which you can then import into your script.\n",
|
||||
"\n",
|
||||
"The tool uses Python packages langchain and loguru. In case of any installation errors with Lemon AI, install both packages first and then install the Lemon AI package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "340ff63d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2. Launch the Server\n",
|
||||
"\n",
|
||||
"The interaction of your agents and all tools provided by Lemon AI is handled by the [Lemon AI Server](https://github.com/felixbrock/lemonai-server). To use Lemon AI you need to run the server on your local machine so the Lemon AI Python client can connect to it."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "e845f402",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 3. Use Lemon AI with Langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "d3ae6a82",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Lemon AI automatically solves given tasks by finding the right combination of relevant tools or uses Lemon AI Functions as an alternative. The following example demonstrates how to retrieve a user from Hackernews and write it to a table in Airtable:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "43476a22",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### (Optional) Define your Lemon AI Functions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "cb038670",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Similar to [OpenAI functions](https://openai.com/blog/function-calling-and-other-api-updates), Lemon AI provides the option to define workflows as reusable functions. These functions can be defined for use cases where it is especially important to move as close as possible to near-deterministic behavior. Specific workflows can be defined in a separate lemonai.json:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "e423ebbb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```json\n",
|
||||
"[\n",
|
||||
" {\n",
|
||||
" \"name\": \"Hackernews Airtable User Workflow\",\n",
|
||||
" \"description\": \"retrieves user data from Hackernews and appends it to a table in Airtable\",\n",
|
||||
" \"tools\": [\"hackernews-get-user\", \"airtable-append-data\"]\n",
|
||||
" }\n",
|
||||
"]\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "3fdb36ce",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Your model will have access to these functions and will prefer them over self-selecting tools to solve a given task. All you have to do is to let the agent know that it should use a given function by including the function name in the prompt."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "ebfb8b5d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Include Lemon AI in your Langchain project "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "5318715d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from lemonai import execute_workflow\n",
|
||||
"from langchain import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c9d082cb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Load API Keys and Access Tokens\n",
|
||||
"\n",
|
||||
"To use tools that require authentication, you have to store the corresponding access credentials in your environment in the format \"{tool name}_{authentication string}\" where the authentication string is one of [\"API_KEY\", \"SECRET_KEY\", \"SUBSCRIPTION_KEY\", \"ACCESS_KEY\"] for API keys or [\"ACCESS_TOKEN\", \"SECRET_TOKEN\"] for authentication tokens. Examples are \"OPENAI_API_KEY\", \"BING_SUBSCRIPTION_KEY\", \"AIRTABLE_ACCESS_TOKEN\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a370d999",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\"\"\" Load all relevant API Keys and Access Tokens into your environment variables \"\"\"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"*INSERT OPENAI API KEY HERE*\"\n",
|
||||
"os.environ[\"AIRTABLE_ACCESS_TOKEN\"] = \"*INSERT AIRTABLE TOKEN HERE*\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "38d158e7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"hackernews_username = \"*INSERT HACKERNEWS USERNAME HERE*\"\n",
|
||||
"airtable_base_id = \"*INSERT BASE ID HERE*\"\n",
|
||||
"airtable_table_id = \"*INSERT TABLE ID HERE*\"\n",
|
||||
"\n",
|
||||
"\"\"\" Define your instruction to be given to your LLM \"\"\"\n",
|
||||
"prompt = f\"\"\"Read information from Hackernews for user {hackernews_username} and then write the results to\n",
|
||||
"Airtable (baseId: {airtable_base_id}, tableId: {airtable_table_id}). Only write the fields \"username\", \"karma\"\n",
|
||||
"and \"created_at_i\". Please make sure that Airtable does NOT automatically convert the field types.\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Use the Lemon AI execute_workflow wrapper \n",
|
||||
"to run your Langchain agent in combination with Lemon AI \n",
|
||||
"\"\"\"\n",
|
||||
"model = OpenAI(temperature=0)\n",
|
||||
"\n",
|
||||
"execute_workflow(llm=model, prompt_string=prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "aef3e801",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 4. Gain transparency on your Agent's decision making\n",
|
||||
"\n",
|
||||
"To gain transparency on how your Agent interacts with Lemon AI tools to solve a given task, all decisions made, tools used and operations performed are written to a local `lemonai.log` file. Every time your LLM agent is interacting with the Lemon AI tool stack a corresponding log entry is created.\n",
|
||||
"\n",
|
||||
"```log\n",
|
||||
"2023-06-26T11:50:27.708785+0100 - b5f91c59-8487-45c2-800a-156eac0c7dae - hackernews-get-user\n",
|
||||
"2023-06-26T11:50:39.624035+0100 - b5f91c59-8487-45c2-800a-156eac0c7dae - airtable-append-data\n",
|
||||
"2023-06-26T11:58:32.925228+0100 - 5efe603c-9898-4143-b99a-55b50007ed9d - hackernews-get-user\n",
|
||||
"2023-06-26T11:58:43.988788+0100 - 5efe603c-9898-4143-b99a-55b50007ed9d - airtable-append-data\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"By using the [Lemon AI Analytics Tool](https://github.com/felixbrock/lemonai-analytics) you can easily gain a better understanding of how frequently and in which order tools are used. As a result, you can identify weak spots in your agent’s decision-making capabilities and move to a more deterministic behavior by defining Lemon AI functions."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -90,7 +90,12 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"search.results(\"The best blog post about AI safety is definitely this: \", 10, include_domains=[\"lesswrong.com\"], start_published_date=\"2019-01-01\")"
|
||||
"search.results(\n",
|
||||
" \"The best blog post about AI safety is definitely this: \",\n",
|
||||
" 10,\n",
|
||||
" include_domains=[\"lesswrong.com\"],\n",
|
||||
" start_published_date=\"2019-01-01\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -341,7 +341,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token='<fill in access token here>')\n",
|
||||
"zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token=\"<fill in access token here>\")\n",
|
||||
"toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" toolkit.get_tools(), llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
|
||||
220
docs/extras/modules/callbacks/integrations/context.ipynb
Normal file
@@ -0,0 +1,220 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Context\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"[Context](https://getcontext.ai/) provides product analytics for AI chatbots.\n",
|
||||
"\n",
|
||||
"Context helps you understand how users are interacting with your AI chat products.\n",
|
||||
"Gain critical insights, optimise poor experiences, and minimise brand risks.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this guide we will show you how to integrate with Context."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Installation and Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "shellscript"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"$ pip install context-python --upgrade"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Getting API Credentials\n",
|
||||
"\n",
|
||||
"To get your Context API token:\n",
|
||||
"\n",
|
||||
"1. Go to the settings page within your Context account (https://go.getcontext.ai/settings).\n",
|
||||
"2. Generate a new API Token.\n",
|
||||
"3. Store this token somewhere secure."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Setup Context\n",
|
||||
"\n",
|
||||
"To use the `ContextCallbackHandler`, import the handler from Langchain and instantiate it with your Context API token.\n",
|
||||
"\n",
|
||||
"Ensure you have installed the `context-python` package before using the handler."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.callbacks import ContextCallbackHandler\n",
|
||||
"\n",
|
||||
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
|
||||
"\n",
|
||||
"context_callback = ContextCallbackHandler(token)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage\n",
|
||||
"### Using the Context callback within a Chat Model\n",
|
||||
"\n",
|
||||
"The Context callback handler can be used to directly record transcripts between users and AI assistants.\n",
|
||||
"\n",
|
||||
"#### Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.schema import (\n",
|
||||
" SystemMessage,\n",
|
||||
" HumanMessage,\n",
|
||||
")\n",
|
||||
"from langchain.callbacks import ContextCallbackHandler\n",
|
||||
"\n",
|
||||
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
|
||||
"\n",
|
||||
"chat = ChatOpenAI(\n",
|
||||
" headers={\"user_id\": \"123\"}, temperature=0, callbacks=[ContextCallbackHandler(token)]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"messages = [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"You are a helpful assistant that translates English to French.\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(content=\"I love programming.\"),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"print(chat(messages))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using the Context callback within Chains\n",
|
||||
"\n",
|
||||
"The Context callback handler can also be used to record the inputs and outputs of chains. Note that intermediate steps of the chain are not recorded - only the starting inputs and final outputs.\n",
|
||||
"\n",
|
||||
"__Note:__ Ensure that you pass the same context object to the chat model and the chain.\n",
|
||||
"\n",
|
||||
"Wrong:\n",
|
||||
"> ```python\n",
|
||||
"> chat = ChatOpenAI(temperature=0.9, callbacks=[ContextCallbackHandler(token)])\n",
|
||||
"> chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[ContextCallbackHandler(token)])\n",
|
||||
"> ```\n",
|
||||
"\n",
|
||||
"Correct:\n",
|
||||
">```python\n",
|
||||
">handler = ContextCallbackHandler(token)\n",
|
||||
">chat = ChatOpenAI(temperature=0.9, callbacks=[callback])\n",
|
||||
">chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[callback])\n",
|
||||
">```\n",
|
||||
"\n",
|
||||
"#### Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain import LLMChain\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.prompts.chat import (\n",
|
||||
" ChatPromptTemplate,\n",
|
||||
" HumanMessagePromptTemplate,\n",
|
||||
")\n",
|
||||
"from langchain.callbacks import ContextCallbackHandler\n",
|
||||
"\n",
|
||||
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
|
||||
"\n",
|
||||
"human_message_prompt = HumanMessagePromptTemplate(\n",
|
||||
" prompt=PromptTemplate(\n",
|
||||
" template=\"What is a good name for a company that makes {product}?\",\n",
|
||||
" input_variables=[\"product\"],\n",
|
||||
" )\n",
|
||||
")\n",
|
||||
"chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])\n",
|
||||
"callback = ContextCallbackHandler(token)\n",
|
||||
"chat = ChatOpenAI(temperature=0.9, callbacks=[callback])\n",
|
||||
"chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[callback])\n",
|
||||
"print(chain.run(\"colorful socks\"))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "a53ebf4a859167383b364e7e7521d0add3c2dbbdecce4edf676e8c4634ff3fbb"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -57,6 +57,7 @@
|
||||
"\n",
|
||||
"# Remove the (1) import sys and sys.path.append(..) and (2) uncomment `!pip install langchain` after merging the PR for Infino/LangChain integration.\n",
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"sys.path.append(\"../../../../../langchain\")\n",
|
||||
"#!pip install langchain\n",
|
||||
"\n",
|
||||
@@ -120,9 +121,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# These are a subset of questions from Stanford's QA dataset - \n",
|
||||
"# These are a subset of questions from Stanford's QA dataset -\n",
|
||||
"# https://rajpurkar.github.io/SQuAD-explorer/\n",
|
||||
"data = '''In what country is Normandy located?\n",
|
||||
"data = \"\"\"In what country is Normandy located?\n",
|
||||
"When were the Normans in Normandy?\n",
|
||||
"From which countries did the Norse originate?\n",
|
||||
"Who was the Norse leader?\n",
|
||||
@@ -141,9 +142,9 @@
|
||||
"What principality did William the conquerer found?\n",
|
||||
"What is the original meaning of the word Norman?\n",
|
||||
"When was the Latin version of the word Norman first recorded?\n",
|
||||
"What name comes from the English words Normans/Normanz?'''\n",
|
||||
"What name comes from the English words Normans/Normanz?\"\"\"\n",
|
||||
"\n",
|
||||
"questions = data.split('\\n')"
|
||||
"questions = data.split(\"\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -190,10 +191,12 @@
|
||||
],
|
||||
"source": [
|
||||
"# Set your key here.\n",
|
||||
"#os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n",
|
||||
"# os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n",
|
||||
"\n",
|
||||
"# Create callback handler. This logs latency, errors, token usage, prompts as well as prompt responses to Infino.\n",
|
||||
"handler = InfinoCallbackHandler(model_id=\"test_openai\", model_version=\"0.1\", verbose=False)\n",
|
||||
"handler = InfinoCallbackHandler(\n",
|
||||
" model_id=\"test_openai\", model_version=\"0.1\", verbose=False\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create LLM.\n",
|
||||
"llm = OpenAI(temperature=0.1)\n",
|
||||
@@ -281,29 +284,30 @@
|
||||
"source": [
|
||||
"# Helper function to create a graph using matplotlib.\n",
|
||||
"def plot(data, title):\n",
|
||||
" data = json.loads(data)\n",
|
||||
" data = json.loads(data)\n",
|
||||
"\n",
|
||||
" # Extract x and y values from the data\n",
|
||||
" timestamps = [item[\"time\"] for item in data]\n",
|
||||
" dates=[dt.datetime.fromtimestamp(ts) for ts in timestamps]\n",
|
||||
" y = [item[\"value\"] for item in data]\n",
|
||||
" # Extract x and y values from the data\n",
|
||||
" timestamps = [item[\"time\"] for item in data]\n",
|
||||
" dates = [dt.datetime.fromtimestamp(ts) for ts in timestamps]\n",
|
||||
" y = [item[\"value\"] for item in data]\n",
|
||||
"\n",
|
||||
" plt.rcParams['figure.figsize'] = [6, 4]\n",
|
||||
" plt.subplots_adjust(bottom=0.2)\n",
|
||||
" plt.xticks(rotation=25 )\n",
|
||||
" ax=plt.gca()\n",
|
||||
" xfmt = md.DateFormatter('%Y-%m-%d %H:%M:%S')\n",
|
||||
" ax.xaxis.set_major_formatter(xfmt)\n",
|
||||
" \n",
|
||||
" # Create the plot\n",
|
||||
" plt.plot(dates, y)\n",
|
||||
" plt.rcParams[\"figure.figsize\"] = [6, 4]\n",
|
||||
" plt.subplots_adjust(bottom=0.2)\n",
|
||||
" plt.xticks(rotation=25)\n",
|
||||
" ax = plt.gca()\n",
|
||||
" xfmt = md.DateFormatter(\"%Y-%m-%d %H:%M:%S\")\n",
|
||||
" ax.xaxis.set_major_formatter(xfmt)\n",
|
||||
"\n",
|
||||
" # Set labels and title\n",
|
||||
" plt.xlabel(\"Time\")\n",
|
||||
" plt.ylabel(\"Value\")\n",
|
||||
" plt.title(title)\n",
|
||||
" # Create the plot\n",
|
||||
" plt.plot(dates, y)\n",
|
||||
"\n",
|
||||
" # Set labels and title\n",
|
||||
" plt.xlabel(\"Time\")\n",
|
||||
" plt.ylabel(\"Value\")\n",
|
||||
" plt.title(title)\n",
|
||||
"\n",
|
||||
" plt.show()\n",
|
||||
"\n",
|
||||
" plt.show()\n",
|
||||
"\n",
|
||||
"response = client.search_ts(\"__name__\", \"latency\", 0, int(time.time()))\n",
|
||||
"plot(response.text, \"Latency\")\n",
|
||||
@@ -318,7 +322,7 @@
|
||||
"plot(response.text, \"Completion Tokens\")\n",
|
||||
"\n",
|
||||
"response = client.search_ts(\"__name__\", \"total_tokens\", 0, int(time.time()))\n",
|
||||
"plot(response.text, \"Total Tokens\")\n"
|
||||
"plot(response.text, \"Total Tokens\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -356,7 +360,7 @@
|
||||
"\n",
|
||||
"query = \"king charles III\"\n",
|
||||
"response = client.search_log(\"king charles III\", 0, int(time.time()))\n",
|
||||
"print(\"Results for\", query, \":\", response.text)\n"
|
||||
"print(\"Results for\", query, \":\", response.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
In this guide we will demonstrate how to use `StreamlitCallbackHandler` to display the thoughts and actions of an agent in an
|
||||
interactive Streamlit app. Try it out with the running app below using the [MRKL agent](/docs/modules/agents/how_to/mrkl/):
|
||||
|
||||
<iframe loading="lazy" src="https://mrkl-minimal.streamlit.app/?embed=true&embed_options=light_theme"
|
||||
<iframe loading="lazy" src="https://langchain-mrkl.streamlit.app/?embed=true&embed_options=light_theme"
|
||||
style={{ width: 100 + '%', border: 'none', marginBottom: 1 + 'rem', height: 600 }}
|
||||
allow="camera;clipboard-read;clipboard-write;"
|
||||
></iframe>
|
||||
@@ -35,7 +35,7 @@ st_callback = StreamlitCallbackHandler(st.container())
|
||||
```
|
||||
|
||||
Additional keyword arguments to customize the display behavior are described in the
|
||||
[API reference](https://api.python.langchain.com/en/latest/modules/callbacks.html#langchain.callbacks.StreamlitCallbackHandler).
|
||||
[API reference](https://api.python.langchain.com/en/latest/callbacks/langchain.callbacks.streamlit.streamlit_callback_handler.StreamlitCallbackHandler.html).
|
||||
|
||||
### Scenario 1: Using an Agent with Tools
|
||||
|
||||
|
||||
921
docs/extras/modules/chains/additional/cpal.ipynb
Normal file
@@ -0,0 +1,921 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "82f3f65d-fbcb-4e8e-b04b-959856283643",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Causal program-aided language (CPAL) chain\n",
|
||||
"\n",
|
||||
"The CPAL chain builds on the recent PAL to stop LLM hallucination. The problem with the PAL approach is that it hallucinates on a math problem with a nested chain of dependence. The innovation here is that this new CPAL approach includes causal structure to fix hallucination.\n",
|
||||
"\n",
|
||||
"The original [PR's description](https://github.com/hwchase17/langchain/pull/6255) contains a full overview.\n",
|
||||
"\n",
|
||||
"Using the CPAL chain, the LLM translated this\n",
|
||||
"\n",
|
||||
" \"Tim buys the same number of pets as Cindy and Boris.\"\n",
|
||||
" \"Cindy buys the same number of pets as Bill plus Bob.\"\n",
|
||||
" \"Boris buys the same number of pets as Ben plus Beth.\"\n",
|
||||
" \"Bill buys the same number of pets as Obama.\"\n",
|
||||
" \"Bob buys the same number of pets as Obama.\"\n",
|
||||
" \"Ben buys the same number of pets as Obama.\"\n",
|
||||
" \"Beth buys the same number of pets as Obama.\"\n",
|
||||
" \"If Obama buys one pet, how many pets total does everyone buy?\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"into this\n",
|
||||
"\n",
|
||||
".\n",
|
||||
"\n",
|
||||
"Outline of code examples demoed in this notebook.\n",
|
||||
"\n",
|
||||
"1. CPAL's value against hallucination: CPAL vs PAL \n",
|
||||
" 1.1 Complex narrative \n",
|
||||
" 1.2 Unanswerable math word problem \n",
|
||||
"2. CPAL's three types of causal diagrams ([The Book of Why](https://en.wikipedia.org/wiki/The_Book_of_Why)). \n",
|
||||
" 2.1 Mediator \n",
|
||||
" 2.2 Collider \n",
|
||||
" 2.3 Confounder "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "1370e40f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from IPython.display import SVG\n",
|
||||
"\n",
|
||||
"from langchain.experimental.cpal.base import CPALChain\n",
|
||||
"from langchain.chains import PALChain\n",
|
||||
"from langchain import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0, max_tokens=512)\n",
|
||||
"cpal_chain = CPALChain.from_univariate_prompt(llm=llm, verbose=True)\n",
|
||||
"pal_chain = PALChain.from_math_prompt(llm=llm, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "858a87d9-a9bd-4850-9687-9af4b0856b62",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## CPAL's value against hallucination: CPAL vs PAL\n",
|
||||
"\n",
|
||||
"Like PAL, CPAL intends to reduce large language model (LLM) hallucination.\n",
|
||||
"\n",
|
||||
"The CPAL chain is different from the PAL chain for a couple of reasons.\n",
|
||||
"\n",
|
||||
"CPAL adds a causal structure (or DAG) to link entity actions (or math expressions).\n",
|
||||
"The CPAL math expressions are modeling a chain of cause and effect relations, which can be intervened upon, whereas for the PAL chain math expressions are projected math identities.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "496403c5-d268-43ae-8852-2bd9903ce444",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1.1 Complex narrative\n",
|
||||
"\n",
|
||||
"Takeaway: PAL hallucinates, CPAL does not hallucinate."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "d5dad768-2892-4825-8093-9b840f643a8a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = (\n",
|
||||
" \"Tim buys the same number of pets as Cindy and Boris.\"\n",
|
||||
" \"Cindy buys the same number of pets as Bill plus Bob.\"\n",
|
||||
" \"Boris buys the same number of pets as Ben plus Beth.\"\n",
|
||||
" \"Bill buys the same number of pets as Obama.\"\n",
|
||||
" \"Bob buys the same number of pets as Obama.\"\n",
|
||||
" \"Ben buys the same number of pets as Obama.\"\n",
|
||||
" \"Beth buys the same number of pets as Obama.\"\n",
|
||||
" \"If Obama buys one pet, how many pets total does everyone buy?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "bbffa7a0-3c22-4a1d-ab2d-f230973073b0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mdef solution():\n",
|
||||
" \"\"\"Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?\"\"\"\n",
|
||||
" obama_pets = 1\n",
|
||||
" tim_pets = obama_pets\n",
|
||||
" cindy_pets = obama_pets + obama_pets\n",
|
||||
" boris_pets = obama_pets + obama_pets\n",
|
||||
" total_pets = tim_pets + cindy_pets + boris_pets\n",
|
||||
" result = total_pets\n",
|
||||
" return result\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'5'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "35a70d1d-86f8-4abc-b818-fbd083f072e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
|
||||
" name code value depends_on\n",
|
||||
"0 obama pass 1.0 []\n",
|
||||
"1 bill bill.value = obama.value 1.0 [obama]\n",
|
||||
"2 bob bob.value = obama.value 1.0 [obama]\n",
|
||||
"3 ben ben.value = obama.value 1.0 [obama]\n",
|
||||
"4 beth beth.value = obama.value 1.0 [obama]\n",
|
||||
"5 cindy cindy.value = bill.value + bob.value 2.0 [bill, bob]\n",
|
||||
"6 boris boris.value = ben.value + beth.value 2.0 [ben, beth]\n",
|
||||
"7 tim tim.value = cindy.value + boris.value 4.0 [cindy, boris]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[36;1m\u001b[1;3mquery data\n",
|
||||
"{\n",
|
||||
" \"question\": \"how many pets total does everyone buy?\",\n",
|
||||
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
|
||||
" \"llm_error_msg\": \"\"\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"13.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cpal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "ccb6b2b0-9de6-4f66-a8fb-fc59229ee316",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"image/svg+xml": [
|
||||
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"292pt\" height=\"260pt\" viewBox=\"0.00 0.00 292.00 260.00\">\n",
|
||||
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\n",
|
||||
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-256 288,-256 288,4 -4,4\"/>\n",
|
||||
"<!-- obama -->\n",
|
||||
"<g id=\"node1\" class=\"node\">\n",
|
||||
"<title>obama</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"137\" cy=\"-234\" rx=\"41.69\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"137\" y=\"-230.3\" font-family=\"Times,serif\" font-size=\"14.00\">obama</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- bill -->\n",
|
||||
"<g id=\"node2\" class=\"node\">\n",
|
||||
"<title>bill</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"27\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">bill</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- obama->bill -->\n",
|
||||
"<g id=\"edge1\" class=\"edge\">\n",
|
||||
"<title>obama->bill</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M114.47,-218.67C97.08,-207.6 72.94,-192.23 54.42,-180.45\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"56.15,-177.4 45.84,-174.99 52.4,-183.31 56.15,-177.4\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- bob -->\n",
|
||||
"<g id=\"node3\" class=\"node\">\n",
|
||||
"<title>bob</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"100\" cy=\"-162\" rx=\"28\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"100\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">bob</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- obama->bob -->\n",
|
||||
"<g id=\"edge2\" class=\"edge\">\n",
|
||||
"<title>obama->bob</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M128.04,-216.05C123.66,-207.77 118.3,-197.62 113.44,-188.42\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"116.39,-186.51 108.62,-179.31 110.2,-189.79 116.39,-186.51\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- ben -->\n",
|
||||
"<g id=\"node4\" class=\"node\">\n",
|
||||
"<title>ben</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"174\" cy=\"-162\" rx=\"28\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"174\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">ben</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- obama->ben -->\n",
|
||||
"<g id=\"edge3\" class=\"edge\">\n",
|
||||
"<title>obama->ben</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M145.96,-216.05C150.34,-207.77 155.7,-197.62 160.56,-188.42\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"163.8,-189.79 165.38,-179.31 157.61,-186.51 163.8,-189.79\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- beth -->\n",
|
||||
"<g id=\"node5\" class=\"node\">\n",
|
||||
"<title>beth</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"252\" cy=\"-162\" rx=\"32\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"252\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">beth</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- obama->beth -->\n",
|
||||
"<g id=\"edge4\" class=\"edge\">\n",
|
||||
"<title>obama->beth</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M160.27,-218.83C178.18,-207.94 203.04,-192.8 222.37,-181.04\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"224.36,-183.92 231.08,-175.73 220.72,-177.95 224.36,-183.92\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy -->\n",
|
||||
"<g id=\"node6\" class=\"node\">\n",
|
||||
"<title>cindy</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"93\" cy=\"-90\" rx=\"36\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"93\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- bill->cindy -->\n",
|
||||
"<g id=\"edge5\" class=\"edge\">\n",
|
||||
"<title>bill->cindy</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M41,-146.15C49.77,-136.85 61.25,-124.67 71.2,-114.12\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"73.79,-116.47 78.11,-106.8 68.7,-111.67 73.79,-116.47\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- bob->cindy -->\n",
|
||||
"<g id=\"edge6\" class=\"edge\">\n",
|
||||
"<title>bob->cindy</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M98.27,-143.7C97.5,-135.98 96.57,-126.71 95.71,-118.11\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"99.19,-117.7 94.71,-108.1 92.22,-118.4 99.19,-117.7\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- boris -->\n",
|
||||
"<g id=\"node7\" class=\"node\">\n",
|
||||
"<title>boris</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"181\" cy=\"-90\" rx=\"34.5\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"181\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">boris</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- ben->boris -->\n",
|
||||
"<g id=\"edge7\" class=\"edge\">\n",
|
||||
"<title>ben->boris</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M175.73,-143.7C176.5,-135.98 177.43,-126.71 178.29,-118.11\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"181.78,-118.4 179.29,-108.1 174.81,-117.7 181.78,-118.4\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- beth->boris -->\n",
|
||||
"<g id=\"edge8\" class=\"edge\">\n",
|
||||
"<title>beth->boris</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M236.59,-145.81C227.01,-136.36 214.51,-124.04 203.8,-113.48\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"205.96,-110.69 196.38,-106.16 201.04,-115.67 205.96,-110.69\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- tim -->\n",
|
||||
"<g id=\"node8\" class=\"node\">\n",
|
||||
"<title>tim</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"137\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"137\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">tim</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy->tim -->\n",
|
||||
"<g id=\"edge9\" class=\"edge\">\n",
|
||||
"<title>cindy->tim</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M103.43,-72.41C108.82,-63.83 115.51,-53.19 121.49,-43.67\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"124.59,-45.32 126.95,-34.99 118.66,-41.59 124.59,-45.32\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- boris->tim -->\n",
|
||||
"<g id=\"edge10\" class=\"edge\">\n",
|
||||
"<title>boris->tim</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M170.79,-72.77C165.41,-64.19 158.68,-53.49 152.65,-43.9\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"155.43,-41.75 147.15,-35.15 149.51,-45.48 155.43,-41.75\"/>\n",
|
||||
"</g>\n",
|
||||
"</g>\n",
|
||||
"</svg>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.SVG object>"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# wait 20 secs to see display\n",
|
||||
"cpal_chain.draw(path=\"web.svg\")\n",
|
||||
"SVG(\"web.svg\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f6f345a-bb16-4e64-83c4-cbbc789a8325",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Unanswerable math\n",
|
||||
"\n",
|
||||
"Takeaway: PAL hallucinates, where CPAL, rather than hallucinate, answers with _\"unanswerable, narrative question and plot are incoherent\"_"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "068afd79-fd41-4ec2-b4d0-c64140dc413f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = (\n",
|
||||
" \"Jan has three times the number of pets as Marcia.\"\n",
|
||||
" \"Marcia has two more pets than Cindy.\"\n",
|
||||
" \"If Cindy has ten pets, how many pets does Barak have?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "02f77db2-72e8-46c2-90b3-5e37ca42f80d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mdef solution():\n",
|
||||
" \"\"\"Jan has three times the number of pets as Marcia.Marcia has two more pets than Cindy.If Cindy has ten pets, how many pets does Barak have?\"\"\"\n",
|
||||
" cindy_pets = 10\n",
|
||||
" marcia_pets = cindy_pets + 2\n",
|
||||
" jan_pets = marcia_pets * 3\n",
|
||||
" result = jan_pets\n",
|
||||
" return result\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'36'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "925958de-e998-4ffa-8b2e-5a00ddae5026",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
|
||||
" name code value depends_on\n",
|
||||
"0 cindy pass 10.0 []\n",
|
||||
"1 marcia marcia.value = cindy.value + 2 12.0 [cindy]\n",
|
||||
"2 jan jan.value = marcia.value * 3 36.0 [marcia]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[36;1m\u001b[1;3mquery data\n",
|
||||
"{\n",
|
||||
" \"question\": \"how many pets does barak have?\",\n",
|
||||
" \"expression\": \"SELECT name, value FROM df WHERE name = 'barak'\",\n",
|
||||
" \"llm_error_msg\": \"\"\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"unanswerable, query and outcome are incoherent\n",
|
||||
"\n",
|
||||
"outcome:\n",
|
||||
" name code value depends_on\n",
|
||||
"0 cindy pass 10.0 []\n",
|
||||
"1 marcia marcia.value = cindy.value + 2 12.0 [cindy]\n",
|
||||
"2 jan jan.value = marcia.value * 3 36.0 [marcia]\n",
|
||||
"query:\n",
|
||||
"{'question': 'how many pets does barak have?', 'expression': \"SELECT name, value FROM df WHERE name = 'barak'\", 'llm_error_msg': ''}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"try:\n",
|
||||
" cpal_chain.run(question)\n",
|
||||
"except Exception as e_msg:\n",
|
||||
" print(e_msg)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "095adc76",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Basic math\n",
|
||||
"\n",
|
||||
"#### Causal mediator"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "3ecf03fa-8350-4c4e-8080-84a307ba6ad4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = (\n",
|
||||
" \"Jan has three times the number of pets as Marcia. \"\n",
|
||||
" \"Marcia has two more pets than Cindy. \"\n",
|
||||
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "74e49c47-3eed-4abe-98b7-8e97bcd15944",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"PAL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "2e88395f-d014-4362-abb0-88f6800860bb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mdef solution():\n",
|
||||
" \"\"\"Jan has three times the number of pets as Marcia. Marcia has two more pets than Cindy. If Cindy has four pets, how many total pets do the three have?\"\"\"\n",
|
||||
" cindy_pets = 4\n",
|
||||
" marcia_pets = cindy_pets + 2\n",
|
||||
" jan_pets = marcia_pets * 3\n",
|
||||
" total_pets = cindy_pets + marcia_pets + jan_pets\n",
|
||||
" result = total_pets\n",
|
||||
" return result\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'28'"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "20ba6640-3d17-4b59-8101-aaba89d68cf4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"CPAL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "312a0943-a482-4ed0-a064-1e7a72e9479b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
|
||||
" name code value depends_on\n",
|
||||
"0 cindy pass 4.0 []\n",
|
||||
"1 marcia marcia.value = cindy.value + 2 6.0 [cindy]\n",
|
||||
"2 jan jan.value = marcia.value * 3 18.0 [marcia]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[36;1m\u001b[1;3mquery data\n",
|
||||
"{\n",
|
||||
" \"question\": \"how many total pets do the three have?\",\n",
|
||||
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
|
||||
" \"llm_error_msg\": \"\"\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"28.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cpal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "4466b975-ae2b-4252-972b-b3182a089ade",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"image/svg+xml": [
|
||||
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"92pt\" height=\"188pt\" viewBox=\"0.00 0.00 92.49 188.00\">\n",
|
||||
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
|
||||
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-184 88.49,-184 88.49,4 -4,4\"/>\n",
|
||||
"<!-- cindy -->\n",
|
||||
"<g id=\"node1\" class=\"node\">\n",
|
||||
"<title>cindy</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-162\" rx=\"36\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- marcia -->\n",
|
||||
"<g id=\"node2\" class=\"node\">\n",
|
||||
"<title>marcia</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy->marcia -->\n",
|
||||
"<g id=\"edge1\" class=\"edge\">\n",
|
||||
"<title>cindy->marcia</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M42.25,-143.7C42.25,-135.98 42.25,-126.71 42.25,-118.11\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"45.75,-118.1 42.25,-108.1 38.75,-118.1 45.75,-118.1\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- jan -->\n",
|
||||
"<g id=\"node3\" class=\"node\">\n",
|
||||
"<title>jan</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- marcia->jan -->\n",
|
||||
"<g id=\"edge2\" class=\"edge\">\n",
|
||||
"<title>marcia->jan</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M42.25,-71.7C42.25,-63.98 42.25,-54.71 42.25,-46.11\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"45.75,-46.1 42.25,-36.1 38.75,-46.1 45.75,-46.1\"/>\n",
|
||||
"</g>\n",
|
||||
"</g>\n",
|
||||
"</svg>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.SVG object>"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# wait 20 secs to see display\n",
|
||||
"cpal_chain.draw(path=\"web.svg\")\n",
|
||||
"SVG(\"web.svg\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "29fa7b8a-75a3-4270-82a2-2c31939cd7e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Causal collider"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "618eddac-f0ef-4ab5-90ed-72e880fdeba3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = (\n",
|
||||
" \"Jan has the number of pets as Marcia plus the number of pets as Cindy. \"\n",
|
||||
" \"Marcia has no pets. \"\n",
|
||||
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "a01563f3-7974-4de4-8bd9-0b7d710aa0d3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
|
||||
" name code value depends_on\n",
|
||||
"0 marcia pass 0.0 []\n",
|
||||
"1 cindy pass 4.0 []\n",
|
||||
"2 jan jan.value = marcia.value + cindy.value 4.0 [marcia, cindy]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[36;1m\u001b[1;3mquery data\n",
|
||||
"{\n",
|
||||
" \"question\": \"how many total pets do the three have?\",\n",
|
||||
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
|
||||
" \"llm_error_msg\": \"\"\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"8.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cpal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "0fbe7243-0522-4946-b9a2-6e21e7c49a42",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"image/svg+xml": [
|
||||
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"182pt\" height=\"116pt\" viewBox=\"0.00 0.00 182.00 116.00\">\n",
|
||||
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
|
||||
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-112 178,-112 178,4 -4,4\"/>\n",
|
||||
"<!-- marcia -->\n",
|
||||
"<g id=\"node1\" class=\"node\">\n",
|
||||
"<title>marcia</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- jan -->\n",
|
||||
"<g id=\"node2\" class=\"node\">\n",
|
||||
"<title>jan</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"90.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"90.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- marcia->jan -->\n",
|
||||
"<g id=\"edge1\" class=\"edge\">\n",
|
||||
"<title>marcia->jan</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M53.62,-72.41C59.57,-63.74 66.95,-52.97 73.53,-43.38\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"76.51,-45.21 79.28,-34.99 70.74,-41.26 76.51,-45.21\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy -->\n",
|
||||
"<g id=\"node3\" class=\"node\">\n",
|
||||
"<title>cindy</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"138.25\" cy=\"-90\" rx=\"36\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"138.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy->jan -->\n",
|
||||
"<g id=\"edge2\" class=\"edge\">\n",
|
||||
"<title>cindy->jan</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M127.11,-72.77C121.09,-63.98 113.54,-52.96 106.83,-43.19\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"109.53,-40.94 100.99,-34.67 103.75,-44.89 109.53,-40.94\"/>\n",
|
||||
"</g>\n",
|
||||
"</g>\n",
|
||||
"</svg>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.SVG object>"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# wait 20 secs to see display\n",
|
||||
"cpal_chain.draw(path=\"web.svg\")\n",
|
||||
"SVG(\"web.svg\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d4082538-ec03-44f0-aac3-07e03aad7555",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Causal confounder"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "83932c30-950b-435a-b328-7993ce8cc6bd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = (\n",
|
||||
" \"Jan has the number of pets as Marcia plus the number of pets as Cindy. \"\n",
|
||||
" \"Marcia has two more pets than Cindy. \"\n",
|
||||
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "570de307-7c6b-4fdc-80c3-4361daa8a629",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
|
||||
" name code value depends_on\n",
|
||||
"0 cindy pass 4.0 []\n",
|
||||
"1 marcia marcia.value = cindy.value + 2 6.0 [cindy]\n",
|
||||
"2 jan jan.value = cindy.value + marcia.value 10.0 [cindy, marcia]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[36;1m\u001b[1;3mquery data\n",
|
||||
"{\n",
|
||||
" \"question\": \"how many total pets do the three have?\",\n",
|
||||
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
|
||||
" \"llm_error_msg\": \"\"\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"20.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cpal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "00375615-6b6d-4357-bdb8-f64f682f7605",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"image/svg+xml": [
|
||||
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"121pt\" height=\"188pt\" viewBox=\"0.00 0.00 120.99 188.00\">\n",
|
||||
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
|
||||
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-184 116.99,-184 116.99,4 -4,4\"/>\n",
|
||||
"<!-- cindy -->\n",
|
||||
"<g id=\"node1\" class=\"node\">\n",
|
||||
"<title>cindy</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"77.25\" cy=\"-162\" rx=\"36\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"77.25\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- marcia -->\n",
|
||||
"<g id=\"node2\" class=\"node\">\n",
|
||||
"<title>marcia</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy->marcia -->\n",
|
||||
"<g id=\"edge1\" class=\"edge\">\n",
|
||||
"<title>cindy->marcia</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M68.95,-144.41C64.87,-136.25 59.86,-126.22 55.28,-117.07\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"58.33,-115.34 50.72,-107.96 52.07,-118.47 58.33,-115.34\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- jan -->\n",
|
||||
"<g id=\"node3\" class=\"node\">\n",
|
||||
"<title>jan</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"77.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"77.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy->jan -->\n",
|
||||
"<g id=\"edge2\" class=\"edge\">\n",
|
||||
"<title>cindy->jan</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M83.73,-144.1C87.32,-133.84 91.42,-120.36 93.25,-108 95.58,-92.17 95.58,-87.83 93.25,-72 91.95,-63.21 89.5,-53.86 86.91,-45.5\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"90.19,-44.29 83.73,-35.9 83.55,-46.49 90.19,-44.29\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- marcia->jan -->\n",
|
||||
"<g id=\"edge3\" class=\"edge\">\n",
|
||||
"<title>marcia->jan</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M50.72,-72.06C54.86,-63.77 59.94,-53.62 64.53,-44.42\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"67.75,-45.82 69.09,-35.31 61.49,-42.69 67.75,-45.82\"/>\n",
|
||||
"</g>\n",
|
||||
"</g>\n",
|
||||
"</svg>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.SVG object>"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# wait 20 secs to see display\n",
|
||||
"cpal_chain.draw(path=\"web.svg\")\n",
|
||||
"SVG(\"web.svg\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "255683de-0c1c-4131-b277-99d09f5ac1fc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%load_ext autoreload\n",
|
||||
"%autoreload 2"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,206 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dd7ec7af",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Elasticsearch database\n",
|
||||
"\n",
|
||||
"Interact with Elasticsearch analytics database via Langchain. This chain builds search queries via the Elasticsearch DSL API (filters and aggregations).\n",
|
||||
"\n",
|
||||
"The Elasticsearch client must have permissions for index listing, mapping description and search queries.\n",
|
||||
"\n",
|
||||
"See [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) for instructions on how to run Elasticsearch locally.\n",
|
||||
"\n",
|
||||
"Make sure to install the Elasticsearch Python client before:\n",
|
||||
"\n",
|
||||
"```sh\n",
|
||||
"pip install elasticsearch\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "dd8eae75",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from elasticsearch import Elasticsearch\n",
|
||||
"\n",
|
||||
"from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain\n",
|
||||
"from langchain.chat_models import ChatOpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "5cde03bc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize Elasticsearch python client.\n",
|
||||
"# See https://elasticsearch-py.readthedocs.io/en/v8.8.2/api.html#elasticsearch.Elasticsearch\n",
|
||||
"ELASTIC_SEARCH_SERVER = \"https://elastic:pass@localhost:9200\"\n",
|
||||
"db = Elasticsearch(ELASTIC_SEARCH_SERVER)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "74a41374",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Uncomment the next cell to initially populate your db."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "430ada0f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# customers = [\n",
|
||||
"# {\"firstname\": \"Jennifer\", \"lastname\": \"Walters\"},\n",
|
||||
"# {\"firstname\": \"Monica\",\"lastname\":\"Rambeau\"},\n",
|
||||
"# {\"firstname\": \"Carol\",\"lastname\":\"Danvers\"},\n",
|
||||
"# {\"firstname\": \"Wanda\",\"lastname\":\"Maximoff\"},\n",
|
||||
"# {\"firstname\": \"Jennifer\",\"lastname\":\"Takeda\"},\n",
|
||||
"# ]\n",
|
||||
"# for i, customer in enumerate(customers):\n",
|
||||
"# db.create(index=\"customers\", document=customer, id=i)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "f36ae0d8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(model_name=\"gpt-4\", temperature=0)\n",
|
||||
"chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "b5d22d9d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new ElasticsearchDatabaseChain chain...\u001b[0m\n",
|
||||
"What are the first names of all the customers?\n",
|
||||
"ESQuery:\u001b[32;1m\u001b[1;3m{'size': 10, 'query': {'match_all': {}}, '_source': ['firstname']}\u001b[0m\n",
|
||||
"ESResult: \u001b[33;1m\u001b[1;3m{'took': 5, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 6, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'customers', '_id': '0', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}, {'_index': 'customers', '_id': '1', '_score': 1.0, '_source': {'firstname': 'Monica'}}, {'_index': 'customers', '_id': '2', '_score': 1.0, '_source': {'firstname': 'Carol'}}, {'_index': 'customers', '_id': '3', '_score': 1.0, '_source': {'firstname': 'Wanda'}}, {'_index': 'customers', '_id': '4', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}, {'_index': 'customers', '_id': 'firstname', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}]}}\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3mThe first names of all the customers are Jennifer, Monica, Carol, Wanda, and Jennifer.\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The first names of all the customers are Jennifer, Monica, Carol, Wanda, and Jennifer.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"What are the first names of all the customers?\"\n",
|
||||
"chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9b4bfada",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Custom prompt\n",
|
||||
"\n",
|
||||
"For best results you'll likely need to customize the prompt."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "0a494f5b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.elasticsearch_database.prompts import DEFAULT_DSL_TEMPLATE\n",
|
||||
"from langchain.prompts.prompt import PromptTemplate\n",
|
||||
"\n",
|
||||
"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
|
||||
"\n",
|
||||
"Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",
|
||||
"\n",
|
||||
"Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",
|
||||
"\n",
|
||||
"Use the following format:\n",
|
||||
"\n",
|
||||
"Question: Question here\n",
|
||||
"ESQuery: Elasticsearch Query formatted as json\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"PROMPT = PromptTemplate.from_template(\n",
|
||||
" PROMPT_TEMPLATE,\n",
|
||||
")\n",
|
||||
"chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, query_prompt=PROMPT)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "372b8f93",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Adding example rows from each index\n",
|
||||
"\n",
|
||||
"Sometimes, the format of the data is not obvious and it is optimal to include a sample of rows from the indices in the prompt to allow the LLM to understand the data before providing a final query. Here we will use this feature to let the LLM know that artists are saved with their full names by providing ten rows from the index."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eef818de",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = ElasticsearchDatabaseChain.from_llm(\n",
|
||||
" llm=ChatOpenAI(temperature=0),\n",
|
||||
" database=db,\n",
|
||||
" sample_documents_in_index_info=2, # 2 rows from each index will be included in the prompt as sample data\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,271 +1,566 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6605e7f7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Extraction\n",
|
||||
"\n",
|
||||
"The extraction chain uses the OpenAI `functions` parameter to specify a schema to extract entities from a document. This helps us make sure that the model outputs exactly the schema of entities and properties that we want, with their appropriate types.\n",
|
||||
"\n",
|
||||
"The extraction chain is to be used when we want to extract several entities with their properties from the same passage (i.e. what people were mentioned in this passage?)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "34f04daf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
"cells": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import create_extraction_chain, create_extraction_chain_pydantic\n",
|
||||
"from langchain.prompts import ChatPromptTemplate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a2648974",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5ef034ce",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Extracting entities"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "78ff9df9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To extract entities, we need to create a schema like the following, were we specify all the properties we want to find and the type we expect them to have. We can also specify which of these properties are required and which are optional."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "4ac43eba",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"person_name\": {\"type\": \"string\"},\n",
|
||||
" \"person_height\": {\"type\": \"integer\"},\n",
|
||||
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
||||
" \"dog_name\": {\"type\": \"string\"},\n",
|
||||
" \"dog_breed\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
" \"required\": [\"person_name\", \"person_height\"],\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "640bd005",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
|
||||
" \"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "64313214",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "17c48adb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see, we extracted the required entities and their properties in the required format:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "cc5436ed",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'person_name': 'Alex',\n",
|
||||
" 'person_height': 5,\n",
|
||||
" 'person_hair_color': 'blonde',\n",
|
||||
" 'dog_name': 'Frosty',\n",
|
||||
" 'dog_breed': 'labrador'},\n",
|
||||
" {'person_name': 'Claudia',\n",
|
||||
" 'person_height': 6,\n",
|
||||
" 'person_hair_color': 'brunette'}]"
|
||||
"cell_type": "markdown",
|
||||
"id": "6605e7f7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Extraction\n",
|
||||
"\n",
|
||||
"The extraction chain uses the OpenAI `functions` parameter to specify a schema to extract entities from a document. This helps us make sure that the model outputs exactly the schema of entities and properties that we want, with their appropriate types.\n",
|
||||
"\n",
|
||||
"The extraction chain is to be used when we want to extract several entities with their properties from the same passage (i.e. what people were mentioned in this passage?)"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "698b4c4d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pydantic example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6504a6d9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also use a Pydantic schema to choose the required properties and types and we will set as 'Optional' those that are not strictly required.\n",
|
||||
"\n",
|
||||
"By using the `create_extraction_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
|
||||
"\n",
|
||||
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "6792866b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Optional, List\n",
|
||||
"from pydantic import BaseModel, Field"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "36a63761",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class Properties(BaseModel):\n",
|
||||
" person_name: str\n",
|
||||
" person_height: int\n",
|
||||
" person_hair_color: str\n",
|
||||
" dog_breed: Optional[str]\n",
|
||||
" dog_name: Optional[str]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "8ffd1e57",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain_pydantic(pydantic_schema=Properties, llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "24baa954",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
|
||||
" \"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "84e0a241",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see, we extracted the required entities and their properties in the required format:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "f771df58",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Properties(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed='labrador', dog_name='Frosty'),\n",
|
||||
" Properties(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)]"
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "34f04daf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import create_extraction_chain, create_extraction_chain_pydantic\n",
|
||||
"from langchain.prompts import ChatPromptTemplate"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a2648974",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5ef034ce",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Extracting entities"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "78ff9df9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To extract entities, we need to create a schema where we specify all the properties we want to find and the type we expect them to have. We can also specify which of these properties are required and which are optional."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "4ac43eba",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"name\": {\"type\": \"string\"},\n",
|
||||
" \"height\": {\"type\": \"integer\"},\n",
|
||||
" \"hair_color\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
" \"required\": [\"name\", \"height\"],\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "640bd005",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
" \"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "64313214",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "17c48adb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see, we extracted the required entities and their properties in the required format (it even calculated Claudia's height before returning!)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "cc5436ed",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
|
||||
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8d51fcdc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Several entity types"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5813affe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice that we are using OpenAI functions under the hood and thus the model can only call one function per request (with one, unique schema)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "511b9838",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If we want to extract more than one entity type, we need to introduce a little hack - we will define our properties with an included entity type. \n",
|
||||
"\n",
|
||||
"Following we have an example where we also want to extract dog attributes from the passage. Notice the 'person_' and 'dog_' prefixes we use for each property; this tells the model which entity type the property refers to. In this way, the model can return properties from several entity types in one single call."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "cf243a26",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"person_name\": {\"type\": \"string\"},\n",
|
||||
" \"person_height\": {\"type\": \"integer\"},\n",
|
||||
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
||||
" \"dog_name\": {\"type\": \"string\"},\n",
|
||||
" \"dog_breed\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
" \"required\": [\"person_name\", \"person_height\"],\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "52841fb3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
|
||||
" \"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "93f904ab",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eb074f7b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"People attributes and dog attributes were correctly extracted from the text in the same call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "db3e9e17",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'person_name': 'Alex',\n",
|
||||
" 'person_height': 5,\n",
|
||||
" 'person_hair_color': 'blonde',\n",
|
||||
" 'dog_name': 'Frosty',\n",
|
||||
" 'dog_breed': 'labrador'},\n",
|
||||
" {'person_name': 'Claudia',\n",
|
||||
" 'person_height': 6,\n",
|
||||
" 'person_hair_color': 'brunette'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0273e0e2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Unrelated entities"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c07b3480",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"What if our entities are unrelated? In that case, the model will return the unrelated entities in different dictionaries, allowing us to successfully extract several unrelated entity types in the same call."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "01d98af0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice that we use `required: []`: we need to allow the model to return **only** person attributes or **only** dog attributes for a single entity (person or dog)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 48,
|
||||
"id": "e584c993",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"person_name\": {\"type\": \"string\"},\n",
|
||||
" \"person_height\": {\"type\": \"integer\"},\n",
|
||||
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
||||
" \"dog_name\": {\"type\": \"string\"},\n",
|
||||
" \"dog_breed\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
" \"required\": [],\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 49,
|
||||
"id": "ad6b105f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"\n",
|
||||
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 50,
|
||||
"id": "6bfe5a33",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "24fe09af",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We have each entity in its own separate dictionary, with only the appropriate attributes being returned"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 51,
|
||||
"id": "f6e1fd89",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
|
||||
" {'person_name': 'Claudia',\n",
|
||||
" 'person_height': 6,\n",
|
||||
" 'person_hair_color': 'brunette'},\n",
|
||||
" {'dog_name': 'Willow', 'dog_breed': 'German Shepherd'},\n",
|
||||
" {'dog_name': 'Milo', 'dog_breed': 'border collie'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 51,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0ac466d1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Extra info for an entity"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d240ffc1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"What if.. _we don't know what we want?_ More specifically, say we know a few properties we want to extract for a given entity but we also want to know if there's any extra information in the passage. Fortunately, we don't need to structure everything - we can have unstructured extraction as well. \n",
|
||||
"\n",
|
||||
"We can do this by introducing another hack, namely the *extra_info* attribute - let's see an example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 68,
|
||||
"id": "f19685f6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"person_name\": {\"type\": \"string\"},\n",
|
||||
" \"person_height\": {\"type\": \"integer\"},\n",
|
||||
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
||||
" \"dog_name\": {\"type\": \"string\"},\n",
|
||||
" \"dog_breed\": {\"type\": \"string\"},\n",
|
||||
" \"dog_extra_info\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 81,
|
||||
"id": "200c3477",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"\n",
|
||||
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 82,
|
||||
"id": "ddad7dc6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e5c0dbbc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is nice to know more about Willow and Milo!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 83,
|
||||
"id": "c22cfd30",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
|
||||
" {'person_name': 'Claudia',\n",
|
||||
" 'person_height': 6,\n",
|
||||
" 'person_hair_color': 'brunette'},\n",
|
||||
" {'dog_name': 'Willow',\n",
|
||||
" 'dog_breed': 'German Shepherd',\n",
|
||||
" 'dog_extra_information': 'likes to play with other dogs'},\n",
|
||||
" {'dog_name': 'Milo',\n",
|
||||
" 'dog_breed': 'border collie',\n",
|
||||
" 'dog_extra_information': 'lives close by'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 83,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "698b4c4d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pydantic example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6504a6d9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also use a Pydantic schema to choose the required properties and types and we will set as 'Optional' those that are not strictly required.\n",
|
||||
"\n",
|
||||
"By using the `create_extraction_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
|
||||
"\n",
|
||||
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "6792866b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Optional, List\n",
|
||||
"from pydantic import BaseModel, Field"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "36a63761",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class Properties(BaseModel):\n",
|
||||
" person_name: str\n",
|
||||
" person_height: int\n",
|
||||
" person_hair_color: str\n",
|
||||
" dog_breed: Optional[str]\n",
|
||||
" dog_name: Optional[str]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "8ffd1e57",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain_pydantic(pydantic_schema=Properties, llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "24baa954",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
|
||||
" \"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "84e0a241",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see, we extracted the required entities and their properties in the required format:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "f771df58",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Properties(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed='labrador', dog_name='Frosty'),\n",
|
||||
" Properties(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0df61283",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0df61283",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -72,7 +72,10 @@
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"from langchain.schema import BaseRetriever\n",
|
||||
"from langchain.callbacks.manager import AsyncCallbackManagerForRetrieverRun, CallbackManagerForRetrieverRun\n",
|
||||
"from langchain.callbacks.manager import (\n",
|
||||
" AsyncCallbackManagerForRetrieverRun,\n",
|
||||
" CallbackManagerForRetrieverRun,\n",
|
||||
")\n",
|
||||
"from langchain.utilities import GoogleSerperAPIWrapper\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
@@ -97,13 +100,15 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class SerperSearchRetriever(BaseRetriever):\n",
|
||||
"\n",
|
||||
" search: GoogleSerperAPIWrapper = None\n",
|
||||
"\n",
|
||||
" def _get_relevant_documents(self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any) -> List[Document]:\n",
|
||||
" def _get_relevant_documents(\n",
|
||||
" self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any\n",
|
||||
" ) -> List[Document]:\n",
|
||||
" return [Document(page_content=self.search.run(query))]\n",
|
||||
"\n",
|
||||
" async def _aget_relevant_documents(self,\n",
|
||||
" async def _aget_relevant_documents(\n",
|
||||
" self,\n",
|
||||
" query: str,\n",
|
||||
" *,\n",
|
||||
" run_manager: AsyncCallbackManagerForRetrieverRun,\n",
|
||||
|
||||
@@ -83,9 +83,15 @@
|
||||
"schema = client.schema()\n",
|
||||
"schema.propertyKey(\"name\").asText().ifNotExist().create()\n",
|
||||
"schema.propertyKey(\"birthDate\").asText().ifNotExist().create()\n",
|
||||
"schema.vertexLabel(\"Person\").properties(\"name\", \"birthDate\").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
|
||||
"schema.vertexLabel(\"Movie\").properties(\"name\").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
|
||||
"schema.edgeLabel(\"ActedIn\").sourceLabel(\"Person\").targetLabel(\"Movie\").ifNotExist().create()"
|
||||
"schema.vertexLabel(\"Person\").properties(\n",
|
||||
" \"name\", \"birthDate\"\n",
|
||||
").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
|
||||
"schema.vertexLabel(\"Movie\").properties(\"name\").usePrimaryKeyId().primaryKeys(\n",
|
||||
" \"name\"\n",
|
||||
").ifNotExist().create()\n",
|
||||
"schema.edgeLabel(\"ActedIn\").sourceLabel(\"Person\").targetLabel(\n",
|
||||
" \"Movie\"\n",
|
||||
").ifNotExist().create()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -124,7 +130,9 @@
|
||||
"\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather\", {})\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Part II\", {})\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Coda The Death of Michael Corleone\", {})\n",
|
||||
"g.addEdge(\n",
|
||||
" \"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Coda The Death of Michael Corleone\", {}\n",
|
||||
")\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Robert De Niro\", \"2:The Godfather Part II\", {})"
|
||||
]
|
||||
},
|
||||
@@ -164,7 +172,7 @@
|
||||
" password=\"admin\",\n",
|
||||
" address=\"localhost\",\n",
|
||||
" port=8080,\n",
|
||||
" graph=\"hugegraph\"\n",
|
||||
" graph=\"hugegraph\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -228,9 +236,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = HugeGraphQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
|
||||
")"
|
||||
"chain = HugeGraphQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -31,6 +31,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import kuzu\n",
|
||||
"\n",
|
||||
"db = kuzu.Database(\"test_db\")\n",
|
||||
"conn = kuzu.Connection(db)"
|
||||
]
|
||||
@@ -61,7 +62,9 @@
|
||||
],
|
||||
"source": [
|
||||
"conn.execute(\"CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))\")\n",
|
||||
"conn.execute(\"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))\")\n",
|
||||
"conn.execute(\n",
|
||||
" \"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))\"\n",
|
||||
")\n",
|
||||
"conn.execute(\"CREATE REL TABLE ActedIn (FROM Person TO Movie)\")"
|
||||
]
|
||||
},
|
||||
@@ -94,11 +97,21 @@
|
||||
"conn.execute(\"CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})\")\n",
|
||||
"conn.execute(\"CREATE (:Movie {name: 'The Godfather'})\")\n",
|
||||
"conn.execute(\"CREATE (:Movie {name: 'The Godfather: Part II'})\")\n",
|
||||
"conn.execute(\"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})\")\n",
|
||||
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)\")\n",
|
||||
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\")\n",
|
||||
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)\")\n",
|
||||
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\")"
|
||||
"conn.execute(\n",
|
||||
" \"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})\"\n",
|
||||
")\n",
|
||||
"conn.execute(\n",
|
||||
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)\"\n",
|
||||
")\n",
|
||||
"conn.execute(\n",
|
||||
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\"\n",
|
||||
")\n",
|
||||
"conn.execute(\n",
|
||||
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)\"\n",
|
||||
")\n",
|
||||
"conn.execute(\n",
|
||||
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -137,9 +150,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = KuzuQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
|
||||
")"
|
||||
"chain = KuzuQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -60,7 +60,8 @@
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
"id": "7af596b5"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -150,20 +151,20 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
|
||||
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
|
||||
"Identified intent:\n",
|
||||
"\u001B[32;1m\u001B[1;3mSELECT\u001B[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mSELECT\u001b[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"SELECT ?homepage\n",
|
||||
"WHERE {\n",
|
||||
" ?person foaf:name \"Tim Berners-Lee\" .\n",
|
||||
" ?person foaf:workplaceHomepage ?homepage .\n",
|
||||
"}\u001B[0m\n",
|
||||
"}\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001B[32;1m\u001B[1;3m[]\u001B[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m[]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001B[1m> Finished chain.\u001B[0m\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -207,19 +208,19 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
|
||||
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
|
||||
"Identified intent:\n",
|
||||
"\u001B[32;1m\u001B[1;3mUPDATE\u001B[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mUPDATE\u001b[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"INSERT {\n",
|
||||
" ?person foaf:workplaceHomepage <http://www.w3.org/foo/bar/> .\n",
|
||||
"}\n",
|
||||
"WHERE {\n",
|
||||
" ?person foaf:name \"Timothy Berners-Lee\" .\n",
|
||||
"}\u001B[0m\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001B[1m> Finished chain.\u001B[0m\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -234,7 +235,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\")"
|
||||
"chain.run(\n",
|
||||
" \"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -297,4 +300,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
||||
162
docs/extras/modules/chains/additional/llm_symbolic_math.ipynb
Normal file
@@ -0,0 +1,162 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# LLM Symbolic Math \n",
|
||||
"This notebook showcases using LLMs and Python to Solve Algebraic Equations. Under the hood is makes use of [SymPy](https://www.sympy.org/en/index.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains.llm_symbolic_math.base import LLMSymbolicMathChain\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"llm_symbolic_math = LLMSymbolicMathChain.from_llm(llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Integrals and derivates"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Answer: exp(x)*sin(x) + exp(x)*cos(x)'"
|
||||
]
|
||||
},
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_symbolic_math.run(\"What is the derivative of sin(x)*exp(x) with respect to x?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Answer: exp(x)*sin(x)'"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_symbolic_math.run(\n",
|
||||
" \"What is the integral of exp(x)*sin(x) + exp(x)*cos(x) with respect to x?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Solve linear and differential equations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Answer: Eq(y(t), C2*exp(-t) + (C1 + t/2)*exp(t))'"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_symbolic_math.run('Solve the differential equation y\" - y = e^t')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Answer: {0, -sqrt(3)*I/3, sqrt(3)*I/3}'"
|
||||
]
|
||||
},
|
||||
"execution_count": 21,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_symbolic_math.run(\"What are the solutions to this equation y^3 + 1/3y?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Answer: (3 - sqrt(7), -sqrt(7) - 2, 1 - sqrt(7)), (sqrt(7) + 3, -2 + sqrt(7), 1 + sqrt(7))'"
|
||||
]
|
||||
},
|
||||
"execution_count": 22,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_symbolic_math.run(\"x = y + 5, y = z - 3, z = x * y. Solve for x, y, z\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -38,7 +38,7 @@
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"texts = text_splitter.split_documents(documents)\n",
|
||||
"for i, text in enumerate(texts):\n",
|
||||
" text.metadata['source'] = f\"{i}-pl\"\n",
|
||||
" text.metadata[\"source\"] = f\"{i}-pl\"\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"docsearch = Chroma.from_documents(texts, embeddings)"
|
||||
]
|
||||
@@ -97,8 +97,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"final_qa_chain = StuffDocumentsChain(\n",
|
||||
" llm_chain=qa_chain, \n",
|
||||
" document_variable_name='context',\n",
|
||||
" llm_chain=qa_chain,\n",
|
||||
" document_variable_name=\"context\",\n",
|
||||
" document_prompt=doc_prompt,\n",
|
||||
")"
|
||||
]
|
||||
@@ -111,8 +111,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retrieval_qa = RetrievalQA(\n",
|
||||
" retriever=docsearch.as_retriever(),\n",
|
||||
" combine_documents_chain=final_qa_chain\n",
|
||||
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -175,8 +174,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"final_qa_chain_pydantic = StuffDocumentsChain(\n",
|
||||
" llm_chain=qa_chain_pydantic, \n",
|
||||
" document_variable_name='context',\n",
|
||||
" llm_chain=qa_chain_pydantic,\n",
|
||||
" document_variable_name=\"context\",\n",
|
||||
" document_prompt=doc_prompt,\n",
|
||||
")"
|
||||
]
|
||||
@@ -189,8 +188,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retrieval_qa_pydantic = RetrievalQA(\n",
|
||||
" retriever=docsearch.as_retriever(),\n",
|
||||
" combine_documents_chain=final_qa_chain_pydantic\n",
|
||||
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -235,6 +233,7 @@
|
||||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||||
"from langchain.memory import ConversationBufferMemory\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"\n",
|
||||
"memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n",
|
||||
"_template = \"\"\"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\\\n",
|
||||
"Make sure to avoid using any unclear pronouns.\n",
|
||||
@@ -258,10 +257,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa = ConversationalRetrievalChain(\n",
|
||||
" question_generator=condense_question_chain, \n",
|
||||
" question_generator=condense_question_chain,\n",
|
||||
" retriever=docsearch.as_retriever(),\n",
|
||||
" memory=memory, \n",
|
||||
" combine_docs_chain=final_qa_chain\n",
|
||||
" memory=memory,\n",
|
||||
" combine_docs_chain=final_qa_chain,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -389,7 +388,9 @@
|
||||
" \"\"\"An answer to the question being asked, with sources.\"\"\"\n",
|
||||
"\n",
|
||||
" answer: str = Field(..., description=\"Answer to the question that was asked\")\n",
|
||||
" countries_referenced: List[str] = Field(..., description=\"All of the countries mentioned in the sources\")\n",
|
||||
" countries_referenced: List[str] = Field(\n",
|
||||
" ..., description=\"All of the countries mentioned in the sources\"\n",
|
||||
" )\n",
|
||||
" sources: List[str] = Field(\n",
|
||||
" ..., description=\"List of sources used to answer the question\"\n",
|
||||
" )\n",
|
||||
@@ -405,20 +406,23 @@
|
||||
" HumanMessage(content=\"Answer question using the following context\"),\n",
|
||||
" HumanMessagePromptTemplate.from_template(\"{context}\"),\n",
|
||||
" HumanMessagePromptTemplate.from_template(\"Question: {question}\"),\n",
|
||||
" HumanMessage(content=\"Tips: Make sure to answer in the correct format. Return all of the countries mentioned in the sources in uppercase characters.\"),\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Tips: Make sure to answer in the correct format. Return all of the countries mentioned in the sources in uppercase characters.\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"chain_prompt = ChatPromptTemplate(messages=prompt_messages)\n",
|
||||
"\n",
|
||||
"qa_chain_pydantic = create_qa_with_structure_chain(llm, CustomResponseSchema, output_parser=\"pydantic\", prompt=chain_prompt)\n",
|
||||
"qa_chain_pydantic = create_qa_with_structure_chain(\n",
|
||||
" llm, CustomResponseSchema, output_parser=\"pydantic\", prompt=chain_prompt\n",
|
||||
")\n",
|
||||
"final_qa_chain_pydantic = StuffDocumentsChain(\n",
|
||||
" llm_chain=qa_chain_pydantic,\n",
|
||||
" document_variable_name='context',\n",
|
||||
" document_variable_name=\"context\",\n",
|
||||
" document_prompt=doc_prompt,\n",
|
||||
")\n",
|
||||
"retrieval_qa_pydantic = RetrievalQA(\n",
|
||||
" retriever=docsearch.as_retriever(),\n",
|
||||
" combine_documents_chain=final_qa_chain_pydantic\n",
|
||||
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic\n",
|
||||
")\n",
|
||||
"query = \"What did he say about russia\"\n",
|
||||
"retrieval_qa_pydantic.run(query)"
|
||||
|
||||
@@ -35,7 +35,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = get_openapi_chain(\"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\")"
|
||||
"chain = get_openapi_chain(\n",
|
||||
" \"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -186,7 +188,9 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = get_openapi_chain(\"https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml\")"
|
||||
"chain = get_openapi_chain(\n",
|
||||
" \"https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -28,7 +28,7 @@
|
||||
"\n",
|
||||
"from pydantic import Extra\n",
|
||||
"\n",
|
||||
"from langchain.base_language import BaseLanguageModel\n",
|
||||
"from langchain.schemea import BaseLanguageModel\n",
|
||||
"from langchain.callbacks.manager import (\n",
|
||||
" AsyncCallbackManagerForChainRun,\n",
|
||||
" CallbackManagerForChainRun,\n",
|
||||
|
||||
@@ -22,7 +22,8 @@
|
||||
"from typing import Optional\n",
|
||||
"\n",
|
||||
"from langchain.chains.openai_functions import (\n",
|
||||
" create_openai_fn_chain, create_structured_output_chain\n",
|
||||
" create_openai_fn_chain,\n",
|
||||
" create_structured_output_chain,\n",
|
||||
")\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate\n",
|
||||
@@ -58,8 +59,10 @@
|
||||
"source": [
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Person(BaseModel):\n",
|
||||
" \"\"\"Identifying information about a person.\"\"\"\n",
|
||||
"\n",
|
||||
" name: str = Field(..., description=\"The person's name\")\n",
|
||||
" age: int = Field(..., description=\"The person's age\")\n",
|
||||
" fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")"
|
||||
@@ -103,13 +106,15 @@
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
|
||||
"\n",
|
||||
"prompt_msgs = [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"You are a world class algorithm for extracting information in structured formats.\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(content=\"Use the given format to extract information from the following input:\"),\n",
|
||||
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
|
||||
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
|
||||
" ]\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"You are a world class algorithm for extracting information in structured formats.\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Use the given format to extract information from the following input:\"\n",
|
||||
" ),\n",
|
||||
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
|
||||
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
|
||||
"]\n",
|
||||
"prompt = ChatPromptTemplate(messages=prompt_msgs)\n",
|
||||
"\n",
|
||||
"chain = create_structured_output_chain(Person, llm, prompt, verbose=True)\n",
|
||||
@@ -162,12 +167,17 @@
|
||||
"source": [
|
||||
"from typing import Sequence\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class People(BaseModel):\n",
|
||||
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
|
||||
"\n",
|
||||
" people: Sequence[Person] = Field(..., description=\"The people in the text\")\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"chain = create_structured_output_chain(People, llm, prompt, verbose=True)\n",
|
||||
"chain.run(\"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\")"
|
||||
"chain.run(\n",
|
||||
" \"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -192,27 +202,16 @@
|
||||
" \"description\": \"Identifying information about a person.\",\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"name\": {\n",
|
||||
" \"title\": \"Name\",\n",
|
||||
" \"description\": \"The person's name\",\n",
|
||||
" \"type\": \"string\"\n",
|
||||
" },\n",
|
||||
" \"age\": {\n",
|
||||
" \"title\": \"Age\",\n",
|
||||
" \"description\": \"The person's age\",\n",
|
||||
" \"type\": \"integer\"\n",
|
||||
" },\n",
|
||||
" \"fav_food\": {\n",
|
||||
" \"title\": \"Fav Food\",\n",
|
||||
" \"description\": \"The person's favorite food\",\n",
|
||||
" \"type\": \"string\"\n",
|
||||
" }\n",
|
||||
" \"name\": {\"title\": \"Name\", \"description\": \"The person's name\", \"type\": \"string\"},\n",
|
||||
" \"age\": {\"title\": \"Age\", \"description\": \"The person's age\", \"type\": \"integer\"},\n",
|
||||
" \"fav_food\": {\n",
|
||||
" \"title\": \"Fav Food\",\n",
|
||||
" \"description\": \"The person's favorite food\",\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" \"required\": [\n",
|
||||
" \"name\",\n",
|
||||
" \"age\"\n",
|
||||
" ]\n",
|
||||
"}\n"
|
||||
" \"required\": [\"name\", \"age\"],\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -286,13 +285,15 @@
|
||||
"source": [
|
||||
"class RecordPerson(BaseModel):\n",
|
||||
" \"\"\"Record some identifying information about a pe.\"\"\"\n",
|
||||
"\n",
|
||||
" name: str = Field(..., description=\"The person's name\")\n",
|
||||
" age: int = Field(..., description=\"The person's age\")\n",
|
||||
" fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"class RecordDog(BaseModel):\n",
|
||||
" \"\"\"Record some identifying information about a dog.\"\"\"\n",
|
||||
"\n",
|
||||
" name: str = Field(..., description=\"The dog's name\")\n",
|
||||
" color: str = Field(..., description=\"The dog's color\")\n",
|
||||
" fav_food: Optional[str] = Field(None, description=\"The dog's favorite food\")"
|
||||
@@ -333,10 +334,10 @@
|
||||
],
|
||||
"source": [
|
||||
"prompt_msgs = [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"You are a world class algorithm for recording entities\"\n",
|
||||
" SystemMessage(content=\"You are a world class algorithm for recording entities\"),\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Make calls to the relevant function to record the entities in the following input:\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(content=\"Make calls to the relevant function to record the entities in the following input:\"),\n",
|
||||
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
|
||||
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
|
||||
"]\n",
|
||||
@@ -393,11 +394,16 @@
|
||||
"source": [
|
||||
"class OptionalFavFood(BaseModel):\n",
|
||||
" \"\"\"Either a food or null.\"\"\"\n",
|
||||
" food: Optional[str] = Field(None, description=\"Either the name of a food or null. Should be null if the food isn't known.\")\n",
|
||||
"\n",
|
||||
" food: Optional[str] = Field(\n",
|
||||
" None,\n",
|
||||
" description=\"Either the name of a food or null. Should be null if the food isn't known.\",\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def record_person(name: str, age: int, fav_food: OptionalFavFood) -> str:\n",
|
||||
" \"\"\"Record some basic identifying information about a person.\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" name: The person's name.\n",
|
||||
" age: The person's age in years.\n",
|
||||
@@ -405,9 +411,11 @@
|
||||
" \"\"\"\n",
|
||||
" return f\"Recording person {name} of age {age} with favorite food {fav_food.food}!\"\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"chain = create_openai_fn_chain([record_person], llm, prompt, verbose=True)\n",
|
||||
"chain.run(\"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\")"
|
||||
"chain.run(\n",
|
||||
" \"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -458,7 +466,7 @@
|
||||
"source": [
|
||||
"def record_dog(name: str, color: str, fav_food: OptionalFavFood) -> str:\n",
|
||||
" \"\"\"Record some basic identifying information about a dog.\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" name: The dog's name.\n",
|
||||
" color: The dog's color.\n",
|
||||
@@ -468,7 +476,9 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"chain = create_openai_fn_chain([record_person, record_dog], llm, prompt, verbose=True)\n",
|
||||
"chain.run(\"I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\")"
|
||||
"chain.run(\n",
|
||||
" \"I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -77,7 +77,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader = BraveSearchLoader(query=\"obama middle name\", api_key=api_key, search_kwargs={\"count\": 3})\n",
|
||||
"loader = BraveSearchLoader(\n",
|
||||
" query=\"obama middle name\", api_key=api_key, search_kwargs={\"count\": 3}\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"len(docs)"
|
||||
]
|
||||
|
||||
@@ -0,0 +1,81 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Browserless"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import BrowserlessLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"BROWSERLESS_API_TOKEN = \"YOUR_API_TOKEN\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<!DOCTYPE html><html class=\"client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-enabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled vector-feature-zebra-design-disabled\" lang=\"en\" dir=\"ltr\"><head>\n",
|
||||
"<meta charset=\"UTF-8\">\n",
|
||||
"<title>Document classification - Wikipedia</title>\n",
|
||||
"<script>document.documentElement.className=\"client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-enabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled vector-feature-zebra-design-disabled\";(function(){var cookie=document.cookie.match(/(?:^|; )enwikimwclien\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader = BrowserlessLoader(\n",
|
||||
" api_token=BROWSERLESS_API_TOKEN,\n",
|
||||
" urls=[\n",
|
||||
" \"https://en.wikipedia.org/wiki/Document_classification\",\n",
|
||||
" ],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = loader.load()\n",
|
||||
"\n",
|
||||
"print(documents[0].page_content[:1000])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,96 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Datadog Logs\n",
|
||||
"\n",
|
||||
">[Datadog](https://www.datadoghq.com/) is a monitoring and analytics platform for cloud-scale applications.\n",
|
||||
"\n",
|
||||
"This loader fetches the logs from your applications in Datadog using the `datadog_api_client` Python package. You must initialize the loader with your `Datadog API key` and `APP key`, and you need to pass in the query to extract the desired logs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import DatadogLogsLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install datadog-api-client"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"service:agent status:error\"\n",
|
||||
"\n",
|
||||
"loader = DatadogLogsLoader(\n",
|
||||
" query=query,\n",
|
||||
" api_key=DD_API_KEY,\n",
|
||||
" app_key=DD_APP_KEY,\n",
|
||||
" from_time=1688732708951, # Optional, timestamp in milliseconds\n",
|
||||
" to_time=1688736308951, # Optional, timestamp in milliseconds\n",
|
||||
" limit=100, # Optional, default is 100\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='message: grep: /etc/datadog-agent/system-probe.yaml: No such file or directory', metadata={'id': 'AgAAAYkwpLImvkjRpQAAAAAAAAAYAAAAAEFZa3dwTUFsQUFEWmZfLU5QdElnM3dBWQAAACQAAAAAMDE4OTMwYTQtYzk3OS00MmJjLTlhNDAtOTY4N2EwY2I5ZDdk', 'status': 'error', 'service': 'agent', 'tags': ['accessible-from-goog-gke-node', 'allow-external-ingress-high-ports', 'allow-external-ingress-http', 'allow-external-ingress-https', 'container_id:c7d8ecd27b5b3cfdf3b0df04b8965af6f233f56b7c3c2ffabfab5e3b6ccbd6a5', 'container_name:lab_datadog_1', 'datadog.pipelines:false', 'datadog.submission_auth:private_api_key', 'docker_image:datadog/agent:7.41.1', 'env:dd101-dev', 'hostname:lab-host', 'image_name:datadog/agent', 'image_tag:7.41.1', 'instance-id:7497601202021312403', 'instance-type:custom-1-4096', 'instruqt_aws_accounts:', 'instruqt_azure_subscriptions:', 'instruqt_gcp_projects:', 'internal-hostname:lab-host.d4rjybavkary.svc.cluster.local', 'numeric_project_id:3390740675', 'p-d4rjybavkary', 'project:instruqt-prod', 'service:agent', 'short_image:agent', 'source:agent', 'zone:europe-west1-b'], 'timestamp': datetime.datetime(2023, 7, 7, 13, 57, 27, 206000, tzinfo=tzutc())}),\n",
|
||||
" Document(page_content='message: grep: /etc/datadog-agent/system-probe.yaml: No such file or directory', metadata={'id': 'AgAAAYkwpLImvkjRpgAAAAAAAAAYAAAAAEFZa3dwTUFsQUFEWmZfLU5QdElnM3dBWgAAACQAAAAAMDE4OTMwYTQtYzk3OS00MmJjLTlhNDAtOTY4N2EwY2I5ZDdk', 'status': 'error', 'service': 'agent', 'tags': ['accessible-from-goog-gke-node', 'allow-external-ingress-high-ports', 'allow-external-ingress-http', 'allow-external-ingress-https', 'container_id:c7d8ecd27b5b3cfdf3b0df04b8965af6f233f56b7c3c2ffabfab5e3b6ccbd6a5', 'container_name:lab_datadog_1', 'datadog.pipelines:false', 'datadog.submission_auth:private_api_key', 'docker_image:datadog/agent:7.41.1', 'env:dd101-dev', 'hostname:lab-host', 'image_name:datadog/agent', 'image_tag:7.41.1', 'instance-id:7497601202021312403', 'instance-type:custom-1-4096', 'instruqt_aws_accounts:', 'instruqt_azure_subscriptions:', 'instruqt_gcp_projects:', 'internal-hostname:lab-host.d4rjybavkary.svc.cluster.local', 'numeric_project_id:3390740675', 'p-d4rjybavkary', 'project:instruqt-prod', 'service:agent', 'short_image:agent', 'source:agent', 'zone:europe-west1-b'], 'timestamp': datetime.datetime(2023, 7, 7, 13, 57, 27, 206000, tzinfo=tzutc())})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"documents = loader.load()\n",
|
||||
"documents"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.11"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
Stanley Cups
|
||||
Team Location Stanley Cups
|
||||
Blues STL 1
|
||||
Flyers PHI 2
|
||||
Maple Leafs TOR 13
|
||||
|
@@ -1840,7 +1840,7 @@ This category contains articles that are incomplete and are tagged with the {{T|
|
||||
<username>FANDOM</username>
|
||||
<id>32769624</id>
|
||||
</contributor>
|
||||
<comment>Created page with "{{LicenseBox|text=''This work is licensed under the [https://opensource.org/licenses/MIT MIT License].''}}{{#ifeq: {{NAMESPACENUMBER}} | 0 | <includeonly>Category:MIT licens..."</comment>
|
||||
<comment>Created page with "{{LicenseBox|text=''This work is licensed under the [https://opensource.org/licenses/MIT MIT License].''}}{{#ifeq: {{NAMESPACENUMBER}} | 0 | <includeonly>Category:MIT license..."</comment>
|
||||
<origin>104</origin>
|
||||
<model>wikitext</model>
|
||||
<format>text/x-wiki</format>
|
||||
|
||||
@@ -126,11 +126,11 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"file_id=\"1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz\"\n",
|
||||
"file_id = \"1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz\"\n",
|
||||
"loader = GoogleDriveLoader(\n",
|
||||
" file_ids=[file_id],\n",
|
||||
" file_loader_cls=UnstructuredFileIOLoader,\n",
|
||||
" file_loader_kwargs={\"mode\": \"elements\"}\n",
|
||||
" file_loader_kwargs={\"mode\": \"elements\"},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -180,11 +180,11 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"folder_id=\"1asMOHY1BqBS84JcRbOag5LOJac74gpmD\"\n",
|
||||
"folder_id = \"1asMOHY1BqBS84JcRbOag5LOJac74gpmD\"\n",
|
||||
"loader = GoogleDriveLoader(\n",
|
||||
" folder_id=folder_id,\n",
|
||||
" file_loader_cls=UnstructuredFileIOLoader,\n",
|
||||
" file_loader_kwargs={\"mode\": \"elements\"}\n",
|
||||
" file_loader_kwargs={\"mode\": \"elements\"},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -101,7 +101,7 @@
|
||||
" \"../Papers/\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".pdf\"],\n",
|
||||
" parser= GrobidParser(segment_sentences=False)\n",
|
||||
" parser=GrobidParser(segment_sentences=False),\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
|
||||
@@ -18,7 +18,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import WebBaseLoader\n",
|
||||
"loader_web = WebBaseLoader(\"https://github.com/basecamp/handbook/blob/master/37signals-is-you.md\")"
|
||||
"\n",
|
||||
"loader_web = WebBaseLoader(\n",
|
||||
" \"https://github.com/basecamp/handbook/blob/master/37signals-is-you.md\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -29,6 +32,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import PyPDFLoader\n",
|
||||
"\n",
|
||||
"loader_pdf = PyPDFLoader(\"../MachineLearning-Lecture01.pdf\")"
|
||||
]
|
||||
},
|
||||
@@ -40,7 +44,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.merge import MergedDataLoader\n",
|
||||
"loader_all=MergedDataLoader(loaders=[loader_web,loader_pdf])"
|
||||
"\n",
|
||||
"loader_all = MergedDataLoader(loaders=[loader_web, loader_pdf])"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -50,7 +55,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_all=loader_all.load()"
|
||||
"docs_all = loader_all.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -36,7 +36,9 @@
|
||||
],
|
||||
"source": [
|
||||
"# Create a new loader object for the MHTML file\n",
|
||||
"loader = MHTMLLoader(file_path='../../../../../../tests/integration_tests/examples/example.mht')\n",
|
||||
"loader = MHTMLLoader(\n",
|
||||
" file_path=\"../../../../../../tests/integration_tests/examples/example.mht\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Load the document from the file\n",
|
||||
"documents = loader.load()\n",
|
||||
|
||||
@@ -53,11 +53,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dataset = \"vw6y-z8j6\" # 311 data\n",
|
||||
"dataset = \"tmnf-yvry\" # crime data\n",
|
||||
"loader = OpenCityDataLoader(city_id=\"data.sfgov.org\",\n",
|
||||
" dataset_id=dataset,\n",
|
||||
" limit=2000)"
|
||||
"dataset = \"vw6y-z8j6\" # 311 data\n",
|
||||
"dataset = \"tmnf-yvry\" # crime data\n",
|
||||
"loader = OpenCityDataLoader(city_id=\"data.sfgov.org\", dataset_id=dataset, limit=2000)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -33,9 +33,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredOrgModeLoader(\n",
|
||||
" file_path=\"example_data/README.org\", mode=\"elements\"\n",
|
||||
")\n",
|
||||
"loader = UnstructuredOrgModeLoader(file_path=\"example_data/README.org\", mode=\"elements\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -239,7 +239,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Use lazy load for larger table, which won't read the full table into memory \n",
|
||||
"# Use lazy load for larger table, which won't read the full table into memory\n",
|
||||
"for i in loader.lazy_load():\n",
|
||||
" print(i)"
|
||||
]
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "5a7cc773",
|
||||
"metadata": {},
|
||||
@@ -25,7 +24,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 1,
|
||||
"id": "2e3532b2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -34,7 +33,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6384c057",
|
||||
"metadata": {},
|
||||
@@ -44,19 +42,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 2,
|
||||
"id": "d69e5620",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = 'https://js.langchain.com/docs/modules/memory/examples/'\n",
|
||||
"loader=RecursiveUrlLoader(url=url)\n",
|
||||
"docs=loader.load()"
|
||||
"url = \"https://js.langchain.com/docs/modules/memory/examples/\"\n",
|
||||
"loader = RecursiveUrlLoader(url=url)\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 3,
|
||||
"id": "084fb2ce",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -66,7 +64,7 @@
|
||||
"12"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -77,17 +75,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"id": "89355b7c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\n\\n\\n\\nDynamoDB-Backed Chat Memory | \\uf8ffü¶úÔ∏è\\uf8ffüîó Lan'"
|
||||
"'\\n\\n\\n\\n\\nBuffer Window Memory | 🦜️🔗 Langchain\\n\\n\\n\\n\\n\\nSki'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -98,20 +96,20 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"id": "13bd7e16",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'source': 'https://js.langchain.com/docs/modules/memory/examples/dynamodb',\n",
|
||||
" 'title': 'DynamoDB-Backed Chat Memory | \\uf8ffü¶úÔ∏è\\uf8ffüîó Langchain',\n",
|
||||
" 'description': 'For longer-term persistence across chat sessions, you can swap out the default in-memory chatHistory that backs chat memory classes like BufferMemory for a DynamoDB instance.',\n",
|
||||
"{'source': 'https://js.langchain.com/docs/modules/memory/examples/buffer_window_memory',\n",
|
||||
" 'title': 'Buffer Window Memory | 🦜️🔗 Langchain',\n",
|
||||
" 'description': 'BufferWindowMemory keeps track of the back-and-forths in conversation, and then uses a window of size k to surface the last k back-and-forths to use as memory.',\n",
|
||||
" 'language': 'en'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -121,14 +119,29 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "40fc13ef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's try a more extensive example, the `docs` root dir.\n",
|
||||
"\n",
|
||||
"We will skip everything under `api`."
|
||||
"We will skip everything under `api`.\n",
|
||||
"\n",
|
||||
"For this, we can `lazy_load` each page as we crawl the tree, using `WebBaseLoader` to load each as we go."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5c938b9f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = \"https://js.langchain.com/docs/\"\n",
|
||||
"exclude_dirs = [\"https://js.langchain.com/docs/api/\"]\n",
|
||||
"loader = RecursiveUrlLoader(url=url, exclude_dirs=exclude_dirs)\n",
|
||||
"# Lazy load each\n",
|
||||
"docs = [print(doc) or doc for doc in loader.lazy_load()]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -138,22 +151,22 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = 'https://js.langchain.com/docs/'\n",
|
||||
"exclude_dirs=['https://js.langchain.com/docs/api/']\n",
|
||||
"loader=RecursiveUrlLoader(url=url,exclude_dirs=exclude_dirs)\n",
|
||||
"docs=loader.load()"
|
||||
"# Load all pages\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "457e30f3",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"176"
|
||||
"188"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
@@ -174,7 +187,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\n\\n\\n\\nHacker News | \\uf8ffü¶úÔ∏è\\uf8ffüîó Langchain\\n\\n\\n\\n\\n\\nSkip'"
|
||||
"'\\n\\n\\n\\n\\nAgent Simulations | 🦜️🔗 Langchain\\n\\n\\n\\n\\n\\nSkip t'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
@@ -195,9 +208,9 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'source': 'https://js.langchain.com/docs/modules/indexes/document_loaders/examples/web_loaders/hn',\n",
|
||||
" 'title': 'Hacker News | \\uf8ffü¶úÔ∏è\\uf8ffüîó Langchain',\n",
|
||||
" 'description': 'This example goes over how to load data from the hacker news website, using Cheerio. One document will be created for each page.',\n",
|
||||
"{'source': 'https://js.langchain.com/docs/use_cases/agent_simulations/',\n",
|
||||
" 'title': 'Agent Simulations | 🦜️🔗 Langchain',\n",
|
||||
" 'description': 'Agent simulations involve taking multiple agents and having them interact with each other.',\n",
|
||||
" 'language': 'en'}"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -0,0 +1,251 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Rockset\n",
|
||||
"\n",
|
||||
"> Rockset is a real-time analytics database which enables queries on massive, semi-structured data without operational burden. With Rockset, ingested data is queryable within one second and analytical queries against that data typically execute in milliseconds. Rockset is compute optimized, making it suitable for serving high concurrency applications in the sub-100TB range (or larger than 100s of TBs with rollups).\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use Rockset as a document loader in langchain. To get started, make sure you have a Rockset account and an API key available.\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setting up the environment\n",
|
||||
"\n",
|
||||
"1. Go to the [Rockset console](https://console.rockset.com/apikeys) and get an API key. Find your API region from the [API reference](https://rockset.com/docs/rest-api/#introduction). For the purpose of this notebook, we will assume you're using Rockset from `Oregon(us-west-2)`.\n",
|
||||
"2. Set your the environment variable `ROCKSET_API_KEY`.\n",
|
||||
"3. Install the Rockset python client, which will be used by langchain to interact with the Rockset database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "shellscript"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"$ pip3 install rockset"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Loading Documents\n",
|
||||
"The Rockset integration with LangChain allows you to load documents from Rockset collections with SQL queries. In order to do this you must construct a `RocksetLoader` object. Here is an example snippet that initializes a `RocksetLoader`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import RocksetLoader\n",
|
||||
"from rockset import RocksetClient, Regions, models\n",
|
||||
"\n",
|
||||
"loader = RocksetLoader(\n",
|
||||
" RocksetClient(Regions.usw2a1, \"<api key>\"),\n",
|
||||
" models.QueryRequestSql(query=\"SELECT * FROM langchain_demo LIMIT 3\"), # SQL query\n",
|
||||
" [\"text\"], # content columns\n",
|
||||
" metadata_keys=[\"id\", \"date\"], # metadata columns\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here, you can see that the following query is run:\n",
|
||||
"\n",
|
||||
"```sql\n",
|
||||
"SELECT * FROM langchain_demo LIMIT 3\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"The `text` column in the collection is used as the page content, and the record's `id` and `date` columns are used as metadata (if you do not pass anything into `metadata_keys`, the whole Rockset document will be used as metadata). \n",
|
||||
"\n",
|
||||
"To execute the query and access an iterator over the resulting `Document`s, run:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader.lazy_load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To execute the query and access all resulting `Document`s at once, run:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here is an example response of `loader.load()`:\n",
|
||||
"```python\n",
|
||||
"[\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas a libero porta, dictum ipsum eget, hendrerit neque. Morbi blandit, ex ut suscipit viverra, enim velit tincidunt tellus, a tempor velit nunc et ex. Proin hendrerit odio nec convallis lobortis. Aenean in purus dolor. Vestibulum orci orci, laoreet eget magna in, commodo euismod justo.\", \n",
|
||||
" metadata={\"id\": 83209, \"date\": \"2022-11-13T18:26:45.000000Z\"}\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Integer at finibus odio. Nam sit amet enim cursus lacus gravida feugiat vestibulum sed libero. Aenean eleifend est quis elementum tincidunt. Curabitur sit amet ornare erat. Nulla id dolor ut magna volutpat sodales fringilla vel ipsum. Donec ultricies, lacus sed fermentum dignissim, lorem elit aliquam ligula, sed suscipit sapien purus nec ligula.\", \n",
|
||||
" metadata={\"id\": 89313, \"date\": \"2022-11-13T18:28:53.000000Z\"}\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Morbi tortor enim, commodo id efficitur vitae, fringilla nec mi. Nullam molestie faucibus aliquet. Praesent a est facilisis, condimentum justo sit amet, viverra erat. Fusce volutpat nisi vel purus blandit, et facilisis felis accumsan. Phasellus luctus ligula ultrices tellus tempor hendrerit. Donec at ultricies leo.\", \n",
|
||||
" metadata={\"id\": 87732, \"date\": \"2022-11-13T18:49:04.000000Z\"}\n",
|
||||
" )\n",
|
||||
"]\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using multiple columns as content\n",
|
||||
"\n",
|
||||
"You can choose to use multiple columns as content:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import RocksetLoader\n",
|
||||
"from rockset import RocksetClient, Regions, models\n",
|
||||
"\n",
|
||||
"loader = RocksetLoader(\n",
|
||||
" RocksetClient(Regions.usw2a1, \"<api key>\"),\n",
|
||||
" models.QueryRequestSql(query=\"SELECT * FROM langchain_demo LIMIT 1 WHERE id=38\"),\n",
|
||||
" [\"sentence1\", \"sentence2\"], # TWO content columns\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Assuming the \"sentence1\" field is `\"This is the first sentence.\"` and the \"sentence2\" field is `\"This is the second sentence.\"`, the `page_content` of the resulting `Document` would be:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"This is the first sentence.\n",
|
||||
"This is the second sentence.\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"You can define you own function to join content columns by setting the `content_columns_joiner` argument in the `RocksetLoader` constructor. `content_columns_joiner` is a method that takes in a `List[Tuple[str, Any]]]` as an argument, representing a list of tuples of (column name, column value). By default, this is a method that joins each column value with a new line.\n",
|
||||
"\n",
|
||||
"For example, if you wanted to join sentence1 and sentence2 with a space instead of a new line, you could set `content_columns_joiner` like so:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"RocksetLoader(\n",
|
||||
" RocksetClient(Regions.usw2a1, \"<api key>\"),\n",
|
||||
" models.QueryRequestSql(query=\"SELECT * FROM langchain_demo LIMIT 1 WHERE id=38\"),\n",
|
||||
" [\"sentence1\", \"sentence2\"],\n",
|
||||
" content_columns_joiner=lambda docs: \" \".join(\n",
|
||||
" [doc[1] for doc in docs]\n",
|
||||
" ), # join with space instead of /n\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The `page_content` of the resulting `Document` would be:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"This is the first sentence. This is the second sentence.\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Oftentimes you want to include the column name in the `page_content`. You can do that like this:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"RocksetLoader(\n",
|
||||
" RocksetClient(Regions.usw2a1, \"<api key>\"),\n",
|
||||
" models.QueryRequestSql(query=\"SELECT * FROM langchain_demo LIMIT 1 WHERE id=38\"),\n",
|
||||
" [\"sentence1\", \"sentence2\"],\n",
|
||||
" content_columns_joiner=lambda docs: \"\\n\".join(\n",
|
||||
" [f\"{doc[0]}: {doc[1]}\" for doc in docs]\n",
|
||||
" ),\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This would result in the following `page_content`:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"sentence1: This is the first sentence.\n",
|
||||
"sentence2: This is the second sentence.\n",
|
||||
"```"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "env",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python",
|
||||
"version": "3.11.4"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -33,9 +33,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredRSTLoader(\n",
|
||||
" file_path=\"example_data/README.rst\", mode=\"elements\"\n",
|
||||
")\n",
|
||||
"loader = UnstructuredRSTLoader(file_path=\"example_data/README.rst\", mode=\"elements\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -30,7 +30,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import warnings\n",
|
||||
"warnings.filterwarnings('ignore')\n",
|
||||
"\n",
|
||||
"warnings.filterwarnings(\"ignore\")\n",
|
||||
"from pprint import pprint\n",
|
||||
"from langchain.text_splitter import Language\n",
|
||||
"from langchain.document_loaders.generic import GenericLoader\n",
|
||||
@@ -48,7 +49,7 @@
|
||||
" \"./example_data/source_code\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".py\", \".js\"],\n",
|
||||
" parser=LanguageParser()\n",
|
||||
" parser=LanguageParser(),\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
@@ -200,7 +201,7 @@
|
||||
" \"./example_data/source_code\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".py\"],\n",
|
||||
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000)\n",
|
||||
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000),\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
@@ -281,7 +282,7 @@
|
||||
" \"./example_data/source_code\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".js\"],\n",
|
||||
" parser=LanguageParser(language=Language.JS)\n",
|
||||
" parser=LanguageParser(language=Language.JS),\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
|
||||
@@ -43,10 +43,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"conf = CosConfig(\n",
|
||||
" Region=\"your cos region\",\n",
|
||||
" SecretId=\"your cos secret_id\",\n",
|
||||
" SecretKey=\"your cos secret_key\",\n",
|
||||
" )\n",
|
||||
" Region=\"your cos region\",\n",
|
||||
" SecretId=\"your cos secret_id\",\n",
|
||||
" SecretKey=\"your cos secret_key\",\n",
|
||||
")\n",
|
||||
"loader = TencentCOSDirectoryLoader(conf=conf, bucket=\"you_cos_bucket\")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -43,10 +43,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"conf = CosConfig(\n",
|
||||
" Region=\"your cos region\",\n",
|
||||
" SecretId=\"your cos secret_id\",\n",
|
||||
" SecretKey=\"your cos secret_key\",\n",
|
||||
" )\n",
|
||||
" Region=\"your cos region\",\n",
|
||||
" SecretId=\"your cos secret_id\",\n",
|
||||
" SecretKey=\"your cos secret_key\",\n",
|
||||
")\n",
|
||||
"loader = TencentCOSFileLoader(conf=conf, bucket=\"you_cos_bucket\", key=\"fake.docx\")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -0,0 +1,181 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# TSV\n",
|
||||
"\n",
|
||||
">A [tab-separated values (TSV)](https://en.wikipedia.org/wiki/Tab-separated_values) file is a simple, text-based file format for storing tabular data.[3] Records are separated by newlines, and values within a record are separated by tab characters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## `UnstructuredTSVLoader`\n",
|
||||
"\n",
|
||||
"You can also load the table using the `UnstructuredTSVLoader`. One advantage of using `UnstructuredTSVLoader` is that if you use it in `\"elements\"` mode, an HTML representation of the table will be available in the metadata."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.tsv import UnstructuredTSVLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredTSVLoader(\n",
|
||||
" file_path=\"example_data/mlb_teams_2012.csv\", mode=\"elements\"\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <td>Nationals, 81.34, 98</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Reds, 82.20, 97</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Yankees, 197.96, 95</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Giants, 117.62, 94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Braves, 83.31, 94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Athletics, 55.37, 94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Rangers, 120.51, 93</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Orioles, 81.43, 93</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Rays, 64.17, 90</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Angels, 154.49, 89</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Tigers, 132.30, 88</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Cardinals, 110.30, 88</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Dodgers, 95.14, 86</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>White Sox, 96.92, 85</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Brewers, 97.65, 83</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Phillies, 174.54, 81</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Diamondbacks, 74.28, 81</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Pirates, 63.43, 79</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Padres, 55.24, 76</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Mariners, 81.97, 75</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Mets, 93.35, 74</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Blue Jays, 75.48, 73</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Royals, 60.91, 72</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Marlins, 118.07, 69</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Red Sox, 173.18, 69</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Indians, 78.43, 68</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Twins, 94.08, 66</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Rockies, 78.06, 64</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Cubs, 88.19, 61</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Astros, 60.65, 55</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].metadata[\"text_as_html\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -233,7 +233,8 @@
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
"id": "672264ad"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
@@ -241,16 +242,18 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = WebBaseLoader(\n",
|
||||
" \"https://www.walmart.com/search?q=parrots\", proxies={\n",
|
||||
" \"https://www.walmart.com/search?q=parrots\",\n",
|
||||
" proxies={\n",
|
||||
" \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n",
|
||||
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n",
|
||||
" }\n",
|
||||
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\",\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n"
|
||||
"docs = loader.load()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
"id": "9caf0310"
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -274,4 +277,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,304 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Xorbits Pandas DataFrame\n",
|
||||
"\n",
|
||||
"This notebook goes over how to load data from a [xorbits.pandas](https://doc.xorbits.io/en/latest/reference/pandas/frame.html) DataFrame."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install xorbits"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import xorbits.pandas as pd"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df = pd.read_csv(\"example_data/mlb_teams_2012.csv\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "b0d1d84e23c04f1296f63b3ea3dd1e5b",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>Team</th>\n",
|
||||
" <th>\"Payroll (millions)\"</th>\n",
|
||||
" <th>\"Wins\"</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Nationals</td>\n",
|
||||
" <td>81.34</td>\n",
|
||||
" <td>98</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>Reds</td>\n",
|
||||
" <td>82.20</td>\n",
|
||||
" <td>97</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>Yankees</td>\n",
|
||||
" <td>197.96</td>\n",
|
||||
" <td>95</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>Giants</td>\n",
|
||||
" <td>117.62</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>Braves</td>\n",
|
||||
" <td>83.31</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" Team \"Payroll (millions)\" \"Wins\"\n",
|
||||
"0 Nationals 81.34 98\n",
|
||||
"1 Reds 82.20 97\n",
|
||||
"2 Yankees 197.96 95\n",
|
||||
"3 Giants 117.62 94\n",
|
||||
"4 Braves 83.31 94"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import XorbitsLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = XorbitsLoader(df, page_content_column=\"Team\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "c8c8b67f1aae4a3c9de7734bb6cf738e",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Nationals', metadata={' \"Payroll (millions)\"': 81.34, ' \"Wins\"': 98}),\n",
|
||||
" Document(page_content='Reds', metadata={' \"Payroll (millions)\"': 82.2, ' \"Wins\"': 97}),\n",
|
||||
" Document(page_content='Yankees', metadata={' \"Payroll (millions)\"': 197.96, ' \"Wins\"': 95}),\n",
|
||||
" Document(page_content='Giants', metadata={' \"Payroll (millions)\"': 117.62, ' \"Wins\"': 94}),\n",
|
||||
" Document(page_content='Braves', metadata={' \"Payroll (millions)\"': 83.31, ' \"Wins\"': 94}),\n",
|
||||
" Document(page_content='Athletics', metadata={' \"Payroll (millions)\"': 55.37, ' \"Wins\"': 94}),\n",
|
||||
" Document(page_content='Rangers', metadata={' \"Payroll (millions)\"': 120.51, ' \"Wins\"': 93}),\n",
|
||||
" Document(page_content='Orioles', metadata={' \"Payroll (millions)\"': 81.43, ' \"Wins\"': 93}),\n",
|
||||
" Document(page_content='Rays', metadata={' \"Payroll (millions)\"': 64.17, ' \"Wins\"': 90}),\n",
|
||||
" Document(page_content='Angels', metadata={' \"Payroll (millions)\"': 154.49, ' \"Wins\"': 89}),\n",
|
||||
" Document(page_content='Tigers', metadata={' \"Payroll (millions)\"': 132.3, ' \"Wins\"': 88}),\n",
|
||||
" Document(page_content='Cardinals', metadata={' \"Payroll (millions)\"': 110.3, ' \"Wins\"': 88}),\n",
|
||||
" Document(page_content='Dodgers', metadata={' \"Payroll (millions)\"': 95.14, ' \"Wins\"': 86}),\n",
|
||||
" Document(page_content='White Sox', metadata={' \"Payroll (millions)\"': 96.92, ' \"Wins\"': 85}),\n",
|
||||
" Document(page_content='Brewers', metadata={' \"Payroll (millions)\"': 97.65, ' \"Wins\"': 83}),\n",
|
||||
" Document(page_content='Phillies', metadata={' \"Payroll (millions)\"': 174.54, ' \"Wins\"': 81}),\n",
|
||||
" Document(page_content='Diamondbacks', metadata={' \"Payroll (millions)\"': 74.28, ' \"Wins\"': 81}),\n",
|
||||
" Document(page_content='Pirates', metadata={' \"Payroll (millions)\"': 63.43, ' \"Wins\"': 79}),\n",
|
||||
" Document(page_content='Padres', metadata={' \"Payroll (millions)\"': 55.24, ' \"Wins\"': 76}),\n",
|
||||
" Document(page_content='Mariners', metadata={' \"Payroll (millions)\"': 81.97, ' \"Wins\"': 75}),\n",
|
||||
" Document(page_content='Mets', metadata={' \"Payroll (millions)\"': 93.35, ' \"Wins\"': 74}),\n",
|
||||
" Document(page_content='Blue Jays', metadata={' \"Payroll (millions)\"': 75.48, ' \"Wins\"': 73}),\n",
|
||||
" Document(page_content='Royals', metadata={' \"Payroll (millions)\"': 60.91, ' \"Wins\"': 72}),\n",
|
||||
" Document(page_content='Marlins', metadata={' \"Payroll (millions)\"': 118.07, ' \"Wins\"': 69}),\n",
|
||||
" Document(page_content='Red Sox', metadata={' \"Payroll (millions)\"': 173.18, ' \"Wins\"': 69}),\n",
|
||||
" Document(page_content='Indians', metadata={' \"Payroll (millions)\"': 78.43, ' \"Wins\"': 68}),\n",
|
||||
" Document(page_content='Twins', metadata={' \"Payroll (millions)\"': 94.08, ' \"Wins\"': 66}),\n",
|
||||
" Document(page_content='Rockies', metadata={' \"Payroll (millions)\"': 78.06, ' \"Wins\"': 64}),\n",
|
||||
" Document(page_content='Cubs', metadata={' \"Payroll (millions)\"': 88.19, ' \"Wins\"': 61}),\n",
|
||||
" Document(page_content='Astros', metadata={' \"Payroll (millions)\"': 60.65, ' \"Wins\"': 55})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "fc85c9f59b3644689d05853159fbd358",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='Nationals' metadata={' \"Payroll (millions)\"': 81.34, ' \"Wins\"': 98}\n",
|
||||
"page_content='Reds' metadata={' \"Payroll (millions)\"': 82.2, ' \"Wins\"': 97}\n",
|
||||
"page_content='Yankees' metadata={' \"Payroll (millions)\"': 197.96, ' \"Wins\"': 95}\n",
|
||||
"page_content='Giants' metadata={' \"Payroll (millions)\"': 117.62, ' \"Wins\"': 94}\n",
|
||||
"page_content='Braves' metadata={' \"Payroll (millions)\"': 83.31, ' \"Wins\"': 94}\n",
|
||||
"page_content='Athletics' metadata={' \"Payroll (millions)\"': 55.37, ' \"Wins\"': 94}\n",
|
||||
"page_content='Rangers' metadata={' \"Payroll (millions)\"': 120.51, ' \"Wins\"': 93}\n",
|
||||
"page_content='Orioles' metadata={' \"Payroll (millions)\"': 81.43, ' \"Wins\"': 93}\n",
|
||||
"page_content='Rays' metadata={' \"Payroll (millions)\"': 64.17, ' \"Wins\"': 90}\n",
|
||||
"page_content='Angels' metadata={' \"Payroll (millions)\"': 154.49, ' \"Wins\"': 89}\n",
|
||||
"page_content='Tigers' metadata={' \"Payroll (millions)\"': 132.3, ' \"Wins\"': 88}\n",
|
||||
"page_content='Cardinals' metadata={' \"Payroll (millions)\"': 110.3, ' \"Wins\"': 88}\n",
|
||||
"page_content='Dodgers' metadata={' \"Payroll (millions)\"': 95.14, ' \"Wins\"': 86}\n",
|
||||
"page_content='White Sox' metadata={' \"Payroll (millions)\"': 96.92, ' \"Wins\"': 85}\n",
|
||||
"page_content='Brewers' metadata={' \"Payroll (millions)\"': 97.65, ' \"Wins\"': 83}\n",
|
||||
"page_content='Phillies' metadata={' \"Payroll (millions)\"': 174.54, ' \"Wins\"': 81}\n",
|
||||
"page_content='Diamondbacks' metadata={' \"Payroll (millions)\"': 74.28, ' \"Wins\"': 81}\n",
|
||||
"page_content='Pirates' metadata={' \"Payroll (millions)\"': 63.43, ' \"Wins\"': 79}\n",
|
||||
"page_content='Padres' metadata={' \"Payroll (millions)\"': 55.24, ' \"Wins\"': 76}\n",
|
||||
"page_content='Mariners' metadata={' \"Payroll (millions)\"': 81.97, ' \"Wins\"': 75}\n",
|
||||
"page_content='Mets' metadata={' \"Payroll (millions)\"': 93.35, ' \"Wins\"': 74}\n",
|
||||
"page_content='Blue Jays' metadata={' \"Payroll (millions)\"': 75.48, ' \"Wins\"': 73}\n",
|
||||
"page_content='Royals' metadata={' \"Payroll (millions)\"': 60.91, ' \"Wins\"': 72}\n",
|
||||
"page_content='Marlins' metadata={' \"Payroll (millions)\"': 118.07, ' \"Wins\"': 69}\n",
|
||||
"page_content='Red Sox' metadata={' \"Payroll (millions)\"': 173.18, ' \"Wins\"': 69}\n",
|
||||
"page_content='Indians' metadata={' \"Payroll (millions)\"': 78.43, ' \"Wins\"': 68}\n",
|
||||
"page_content='Twins' metadata={' \"Payroll (millions)\"': 94.08, ' \"Wins\"': 66}\n",
|
||||
"page_content='Rockies' metadata={' \"Payroll (millions)\"': 78.06, ' \"Wins\"': 64}\n",
|
||||
"page_content='Cubs' metadata={' \"Payroll (millions)\"': 88.19, ' \"Wins\"': 61}\n",
|
||||
"page_content='Astros' metadata={' \"Payroll (millions)\"': 60.65, ' \"Wins\"': 55}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Use lazy load for larger table, which won't read the full table into memory\n",
|
||||
"for i in loader.lazy_load():\n",
|
||||
" print(i)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "base",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.13"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1 @@
|
||||
label: 'Integrations'
|
||||
@@ -0,0 +1,269 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Doctran Extract Properties\n",
|
||||
"\n",
|
||||
"We can extract useful features of documents using the [Doctran](https://github.com/psychic-api/doctran) library, which uses OpenAI's function calling feature to extract specific metadata.\n",
|
||||
"\n",
|
||||
"Extracting metadata from documents is helpful for a variety of tasks, including:\n",
|
||||
"* Classification: classifying documents into different categories\n",
|
||||
"* Data mining: Extract structured data that can be used for data analysis\n",
|
||||
"* Style transfer: Change the way text is written to more closely match expected user input, improving vector search results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install doctran"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.document_transformers import DoctranPropertyExtractor"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from dotenv import load_dotenv\n",
|
||||
"\n",
|
||||
"load_dotenv()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Input\n",
|
||||
"This is the document we'll extract properties from."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Generated with ChatGPT]\n",
|
||||
"\n",
|
||||
"Confidential Document - For Internal Use Only\n",
|
||||
"\n",
|
||||
"Date: July 1, 2023\n",
|
||||
"\n",
|
||||
"Subject: Updates and Discussions on Various Topics\n",
|
||||
"\n",
|
||||
"Dear Team,\n",
|
||||
"\n",
|
||||
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
|
||||
"\n",
|
||||
"Security and Privacy Measures\n",
|
||||
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
|
||||
"\n",
|
||||
"HR Updates and Employee Benefits\n",
|
||||
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Marketing Initiatives and Campaigns\n",
|
||||
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
|
||||
"\n",
|
||||
"Research and Development Projects\n",
|
||||
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
|
||||
"\n",
|
||||
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
|
||||
"\n",
|
||||
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofounder & CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"sample_text = \"\"\"[Generated with ChatGPT]\n",
|
||||
"\n",
|
||||
"Confidential Document - For Internal Use Only\n",
|
||||
"\n",
|
||||
"Date: July 1, 2023\n",
|
||||
"\n",
|
||||
"Subject: Updates and Discussions on Various Topics\n",
|
||||
"\n",
|
||||
"Dear Team,\n",
|
||||
"\n",
|
||||
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
|
||||
"\n",
|
||||
"Security and Privacy Measures\n",
|
||||
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
|
||||
"\n",
|
||||
"HR Updates and Employee Benefits\n",
|
||||
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Marketing Initiatives and Campaigns\n",
|
||||
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
|
||||
"\n",
|
||||
"Research and Development Projects\n",
|
||||
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
|
||||
"\n",
|
||||
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
|
||||
"\n",
|
||||
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofounder & CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n",
|
||||
"\"\"\"\n",
|
||||
"print(sample_text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"documents = [Document(page_content=sample_text)]\n",
|
||||
"properties = [\n",
|
||||
" {\n",
|
||||
" \"name\": \"category\",\n",
|
||||
" \"description\": \"What type of email this is.\",\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" \"enum\": [\"update\", \"action_item\", \"customer_feedback\", \"announcement\", \"other\"],\n",
|
||||
" \"required\": True,\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"mentions\",\n",
|
||||
" \"description\": \"A list of all people mentioned in this email.\",\n",
|
||||
" \"type\": \"array\",\n",
|
||||
" \"items\": {\n",
|
||||
" \"name\": \"full_name\",\n",
|
||||
" \"description\": \"The full name of the person mentioned.\",\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" },\n",
|
||||
" \"required\": True,\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"eli5\",\n",
|
||||
" \"description\": \"Explain this email to me like I'm 5 years old.\",\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" \"required\": True,\n",
|
||||
" },\n",
|
||||
"]\n",
|
||||
"property_extractor = DoctranPropertyExtractor(properties=properties)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Output\n",
|
||||
"After extracting properties from a document, the result will be returned as a new document with properties provided in the metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"extracted_document = await property_extractor.atransform_documents(\n",
|
||||
" documents, properties=properties\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{\n",
|
||||
" \"extracted_properties\": {\n",
|
||||
" \"category\": \"update\",\n",
|
||||
" \"mentions\": [\n",
|
||||
" \"John Doe\",\n",
|
||||
" \"Jane Smith\",\n",
|
||||
" \"Michael Johnson\",\n",
|
||||
" \"Sarah Thompson\",\n",
|
||||
" \"David Rodriguez\",\n",
|
||||
" \"Jason Fan\"\n",
|
||||
" ],\n",
|
||||
" \"eli5\": \"This is an email from the CEO, Jason Fan, giving updates about different areas in the company. He talks about new security measures and praises John Doe for his work. He also mentions new hires and praises Jane Smith for her work in customer service. The CEO reminds everyone about the upcoming benefits enrollment and says to contact Michael Johnson with any questions. He talks about the marketing team's work and praises Sarah Thompson for increasing their social media followers. There's also a product launch event on July 15th. Lastly, he talks about the research and development projects and praises David Rodriguez for his work. There's a brainstorming session on July 10th.\"\n",
|
||||
" }\n",
|
||||
"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(json.dumps(extracted_document[0].metadata, indent=2))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,266 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Doctran Interrogate Documents\n",
|
||||
"Documents used in a vector store knowledge base are typically stored in narrative or conversational format. However, most user queries are in question format. If we convert documents into Q&A format before vectorizing them, we can increase the liklihood of retrieving relevant documents, and decrease the liklihood of retrieving irrelevant documents.\n",
|
||||
"\n",
|
||||
"We can accomplish this using the [Doctran](https://github.com/psychic-api/doctran) library, which uses OpenAI's function calling feature to \"interrogate\" documents.\n",
|
||||
"\n",
|
||||
"See [this notebook](https://github.com/psychic-api/doctran/blob/main/benchmark.ipynb) for benchmarks on vector similarity scores for various queries based on raw documents versus interrogated documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install doctran"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.document_transformers import DoctranQATransformer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from dotenv import load_dotenv\n",
|
||||
"\n",
|
||||
"load_dotenv()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Input\n",
|
||||
"This is the document we'll interrogate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Generated with ChatGPT]\n",
|
||||
"\n",
|
||||
"Confidential Document - For Internal Use Only\n",
|
||||
"\n",
|
||||
"Date: July 1, 2023\n",
|
||||
"\n",
|
||||
"Subject: Updates and Discussions on Various Topics\n",
|
||||
"\n",
|
||||
"Dear Team,\n",
|
||||
"\n",
|
||||
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
|
||||
"\n",
|
||||
"Security and Privacy Measures\n",
|
||||
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
|
||||
"\n",
|
||||
"HR Updates and Employee Benefits\n",
|
||||
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Marketing Initiatives and Campaigns\n",
|
||||
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
|
||||
"\n",
|
||||
"Research and Development Projects\n",
|
||||
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
|
||||
"\n",
|
||||
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
|
||||
"\n",
|
||||
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofounder & CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"sample_text = \"\"\"[Generated with ChatGPT]\n",
|
||||
"\n",
|
||||
"Confidential Document - For Internal Use Only\n",
|
||||
"\n",
|
||||
"Date: July 1, 2023\n",
|
||||
"\n",
|
||||
"Subject: Updates and Discussions on Various Topics\n",
|
||||
"\n",
|
||||
"Dear Team,\n",
|
||||
"\n",
|
||||
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
|
||||
"\n",
|
||||
"Security and Privacy Measures\n",
|
||||
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
|
||||
"\n",
|
||||
"HR Updates and Employee Benefits\n",
|
||||
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Marketing Initiatives and Campaigns\n",
|
||||
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
|
||||
"\n",
|
||||
"Research and Development Projects\n",
|
||||
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
|
||||
"\n",
|
||||
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
|
||||
"\n",
|
||||
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofounder & CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n",
|
||||
"\"\"\"\n",
|
||||
"print(sample_text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"documents = [Document(page_content=sample_text)]\n",
|
||||
"qa_transformer = DoctranQATransformer()\n",
|
||||
"transformed_document = await qa_transformer.atransform_documents(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Output\n",
|
||||
"After interrogating a document, the result will be returned as a new document with questions and answers provided in the metadata."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{\n",
|
||||
" \"questions_and_answers\": [\n",
|
||||
" {\n",
|
||||
" \"question\": \"What is the purpose of this document?\",\n",
|
||||
" \"answer\": \"The purpose of this document is to provide important updates and discuss various topics that require the team's attention.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who is responsible for enhancing the network security?\",\n",
|
||||
" \"answer\": \"John Doe from the IT department is responsible for enhancing the network security.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Where should potential security risks or incidents be reported?\",\n",
|
||||
" \"answer\": \"Potential security risks or incidents should be reported to the dedicated team at security@example.com.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who has been recognized for outstanding performance in customer service?\",\n",
|
||||
" \"answer\": \"Jane Smith has been recognized for her outstanding performance in customer service.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"When is the open enrollment period for the employee benefits program?\",\n",
|
||||
" \"answer\": \"The document does not specify the exact dates for the open enrollment period for the employee benefits program, but it mentions that it is fast approaching.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who should be contacted for questions or assistance regarding the employee benefits program?\",\n",
|
||||
" \"answer\": \"For questions or assistance regarding the employee benefits program, the HR representative, Michael Johnson, should be contacted.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who has been acknowledged for managing the company's social media platforms?\",\n",
|
||||
" \"answer\": \"Sarah Thompson has been acknowledged for managing the company's social media platforms.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"When is the upcoming product launch event?\",\n",
|
||||
" \"answer\": \"The upcoming product launch event is on July 15th.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who has been recognized for their contributions to the development of the company's technology?\",\n",
|
||||
" \"answer\": \"David Rodriguez has been recognized for his contributions to the development of the company's technology.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"When is the monthly R&D brainstorming session?\",\n",
|
||||
" \"answer\": \"The monthly R&D brainstorming session is scheduled for July 10th.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who should be contacted for questions or concerns regarding the topics discussed in the document?\",\n",
|
||||
" \"answer\": \"For questions or concerns regarding the topics discussed in the document, Jason Fan, the Cofounder & CEO, should be contacted.\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"transformed_document = await qa_transformer.atransform_documents(documents)\n",
|
||||
"print(json.dumps(transformed_document[0].metadata, indent=2))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,208 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Doctran Translate Documents\n",
|
||||
"Comparing documents through embeddings has the benefit of working across multiple languages. \"Harrison says hello\" and \"Harrison dice hola\" will occupy similar positions in the vector space because they have the same meaning semantically.\n",
|
||||
"\n",
|
||||
"However, it can still be useful to use a LLM translate documents into other languages before vectorizing them. This is especially helpful when users are expected to query the knowledge base in different languages, or when state of the art embeddings models are not available for a given language.\n",
|
||||
"\n",
|
||||
"We can accomplish this using the [Doctran](https://github.com/psychic-api/doctran) library, which uses OpenAI's function calling feature to translate documents between languages."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install doctran"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.document_transformers import DoctranTextTranslator"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from dotenv import load_dotenv\n",
|
||||
"\n",
|
||||
"load_dotenv()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Input\n",
|
||||
"This is the document we'll translate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sample_text = \"\"\"[Generated with ChatGPT]\n",
|
||||
"\n",
|
||||
"Confidential Document - For Internal Use Only\n",
|
||||
"\n",
|
||||
"Date: July 1, 2023\n",
|
||||
"\n",
|
||||
"Subject: Updates and Discussions on Various Topics\n",
|
||||
"\n",
|
||||
"Dear Team,\n",
|
||||
"\n",
|
||||
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
|
||||
"\n",
|
||||
"Security and Privacy Measures\n",
|
||||
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
|
||||
"\n",
|
||||
"HR Updates and Employee Benefits\n",
|
||||
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Marketing Initiatives and Campaigns\n",
|
||||
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
|
||||
"\n",
|
||||
"Research and Development Projects\n",
|
||||
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
|
||||
"\n",
|
||||
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
|
||||
"\n",
|
||||
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofounder & CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"documents = [Document(page_content=sample_text)]\n",
|
||||
"qa_translator = DoctranTextTranslator(language=\"spanish\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Output\n",
|
||||
"After translating a document, the result will be returned as a new document with the page_content translated into the target language"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"translated_document = await qa_translator.atransform_documents(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Generado con ChatGPT]\n",
|
||||
"\n",
|
||||
"Documento confidencial - Solo para uso interno\n",
|
||||
"\n",
|
||||
"Fecha: 1 de julio de 2023\n",
|
||||
"\n",
|
||||
"Asunto: Actualizaciones y discusiones sobre varios temas\n",
|
||||
"\n",
|
||||
"Estimado equipo,\n",
|
||||
"\n",
|
||||
"Espero que este correo electrónico les encuentre bien. En este documento, me gustaría proporcionarles algunas actualizaciones importantes y discutir varios temas que requieren nuestra atención. Por favor, traten la información contenida aquí como altamente confidencial.\n",
|
||||
"\n",
|
||||
"Medidas de seguridad y privacidad\n",
|
||||
"Como parte de nuestro compromiso continuo para garantizar la seguridad y privacidad de los datos de nuestros clientes, hemos implementado medidas robustas en todos nuestros sistemas. Nos gustaría elogiar a John Doe (correo electrónico: john.doe@example.com) del departamento de TI por su diligente trabajo en mejorar nuestra seguridad de red. En adelante, recordamos amablemente a todos que se adhieran estrictamente a nuestras políticas y directrices de protección de datos. Además, si se encuentran con cualquier riesgo de seguridad o incidente potencial, por favor repórtelo inmediatamente a nuestro equipo dedicado en security@example.com.\n",
|
||||
"\n",
|
||||
"Actualizaciones de RRHH y beneficios para empleados\n",
|
||||
"Recientemente, dimos la bienvenida a varios nuevos miembros del equipo que han hecho contribuciones significativas a sus respectivos departamentos. Me gustaría reconocer a Jane Smith (SSN: 049-45-5928) por su sobresaliente rendimiento en el servicio al cliente. Jane ha recibido constantemente comentarios positivos de nuestros clientes. Además, recuerden que el período de inscripción abierta para nuestro programa de beneficios para empleados se acerca rápidamente. Si tienen alguna pregunta o necesitan asistencia, por favor contacten a nuestro representante de RRHH, Michael Johnson (teléfono: 418-492-3850, correo electrónico: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Iniciativas y campañas de marketing\n",
|
||||
"Nuestro equipo de marketing ha estado trabajando activamente en el desarrollo de nuevas estrategias para aumentar la conciencia de marca y fomentar la participación del cliente. Nos gustaría agradecer a Sarah Thompson (teléfono: 415-555-1234) por sus excepcionales esfuerzos en la gestión de nuestras plataformas de redes sociales. Sarah ha aumentado con éxito nuestra base de seguidores en un 20% solo en el último mes. Además, por favor marquen sus calendarios para el próximo evento de lanzamiento de producto el 15 de julio. Animamos a todos los miembros del equipo a asistir y apoyar este emocionante hito para nuestra empresa.\n",
|
||||
"\n",
|
||||
"Proyectos de investigación y desarrollo\n",
|
||||
"En nuestra búsqueda de la innovación, nuestro departamento de investigación y desarrollo ha estado trabajando incansablemente en varios proyectos. Me gustaría reconocer el excepcional trabajo de David Rodríguez (correo electrónico: david.rodriguez@example.com) en su papel de líder de proyecto. Las contribuciones de David al desarrollo de nuestra tecnología de vanguardia han sido fundamentales. Además, nos gustaría recordar a todos que compartan sus ideas y sugerencias para posibles nuevos proyectos durante nuestra sesión de lluvia de ideas de I+D mensual, programada para el 10 de julio.\n",
|
||||
"\n",
|
||||
"Por favor, traten la información de este documento con la máxima confidencialidad y asegúrense de que no se comparte con personas no autorizadas. Si tienen alguna pregunta o inquietud sobre los temas discutidos, no duden en ponerse en contacto conmigo directamente.\n",
|
||||
"\n",
|
||||
"Gracias por su atención, y sigamos trabajando juntos para alcanzar nuestros objetivos.\n",
|
||||
"\n",
|
||||
"Saludos cordiales,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofundador y CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(translated_document[0].page_content)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,261 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAI Functions Metadata Tagger\n",
|
||||
"\n",
|
||||
"It can often be useful to tag ingested documents with structured metadata, such as the title, tone, or length of a document, to allow for more targeted similarity search later. However, for large numbers of documents, performing this labelling process manually can be tedious.\n",
|
||||
"\n",
|
||||
"The `OpenAIMetadataTagger` document transformer automates this process by extracting metadata from each provided document according to a provided schema. It uses a configurable OpenAI Functions-powered chain under the hood, so if you pass a custom LLM instance, it must be an OpenAI model with functions support. \n",
|
||||
"\n",
|
||||
"**Note:** This document transformer works best with complete documents, so it's best to run it first with whole documents before doing any other splitting or processing!\n",
|
||||
"\n",
|
||||
"For example, let's say you wanted to index a set of movie reviews. You could initialize the document transformer with a valid JSON Schema object as follows:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.document_transformers.openai_functions import create_metadata_tagger"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"movie_title\": {\"type\": \"string\"},\n",
|
||||
" \"critic\": {\"type\": \"string\"},\n",
|
||||
" \"tone\": {\"type\": \"string\", \"enum\": [\"positive\", \"negative\"]},\n",
|
||||
" \"rating\": {\n",
|
||||
" \"type\": \"integer\",\n",
|
||||
" \"description\": \"The number of stars the critic rated the movie\",\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" \"required\": [\"movie_title\", \"critic\", \"tone\"],\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# Must be an OpenAI model that supports functions\n",
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||
"\n",
|
||||
"document_transformer = create_metadata_tagger(metadata_schema=schema, llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can then simply pass the document transformer a list of documents, and it will extract metadata from the contents:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"original_documents = [\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Review of The Bee Movie\\nBy Roger Ebert\\n\\nThis is the greatest movie ever made. 4 out of 5 stars.\"\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Review of The Godfather\\nBy Anonymous\\n\\nThis movie was super boring. 1 out of 5 stars.\",\n",
|
||||
" metadata={\"reliable\": False},\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"enhanced_documents = document_transformer.transform_documents(original_documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Review of The Bee Movie\n",
|
||||
"By Roger Ebert\n",
|
||||
"\n",
|
||||
"This is the greatest movie ever made. 4 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Bee Movie\", \"critic\": \"Roger Ebert\", \"tone\": \"positive\", \"rating\": 4}\n",
|
||||
"\n",
|
||||
"---------------\n",
|
||||
"\n",
|
||||
"Review of The Godfather\n",
|
||||
"By Anonymous\n",
|
||||
"\n",
|
||||
"This movie was super boring. 1 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Godfather\", \"critic\": \"Anonymous\", \"tone\": \"negative\", \"rating\": 1, \"reliable\": false}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"\n",
|
||||
"print(\n",
|
||||
" *[d.page_content + \"\\n\\n\" + json.dumps(d.metadata) for d in enhanced_documents],\n",
|
||||
" sep=\"\\n\\n---------------\\n\\n\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The new documents can then be further processed by a text splitter before being loaded into a vector store. Extracted fields will not overwrite existing metadata.\n",
|
||||
"\n",
|
||||
"You can also initialize the document transformer with a Pydantic schema:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Review of The Bee Movie\n",
|
||||
"By Roger Ebert\n",
|
||||
"\n",
|
||||
"This is the greatest movie ever made. 4 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Bee Movie\", \"critic\": \"Roger Ebert\", \"tone\": \"positive\", \"rating\": 4}\n",
|
||||
"\n",
|
||||
"---------------\n",
|
||||
"\n",
|
||||
"Review of The Godfather\n",
|
||||
"By Anonymous\n",
|
||||
"\n",
|
||||
"This movie was super boring. 1 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Godfather\", \"critic\": \"Anonymous\", \"tone\": \"negative\", \"rating\": 1, \"reliable\": false}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from typing import Literal\n",
|
||||
"\n",
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Properties(BaseModel):\n",
|
||||
" movie_title: str\n",
|
||||
" critic: str\n",
|
||||
" tone: Literal[\"positive\", \"negative\"]\n",
|
||||
" rating: int = Field(description=\"Rating out of 5 stars\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"document_transformer = create_metadata_tagger(Properties, llm)\n",
|
||||
"enhanced_documents = document_transformer.transform_documents(original_documents)\n",
|
||||
"\n",
|
||||
"print(\n",
|
||||
" *[d.page_content + \"\\n\\n\" + json.dumps(d.metadata) for d in enhanced_documents],\n",
|
||||
" sep=\"\\n\\n---------------\\n\\n\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"\n",
|
||||
"## Customization\n",
|
||||
"\n",
|
||||
"You can pass the underlying tagging chain the standard LLMChain arguments in the document transformer constructor. For example, if you wanted to ask the LLM to focus specific details in the input documents, or extract metadata in a certain style, you could pass in a custom prompt:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Review of The Bee Movie\n",
|
||||
"By Roger Ebert\n",
|
||||
"\n",
|
||||
"This is the greatest movie ever made. 4 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Bee Movie\", \"critic\": \"Roger Ebert\", \"tone\": \"positive\", \"rating\": 4}\n",
|
||||
"\n",
|
||||
"---------------\n",
|
||||
"\n",
|
||||
"Review of The Godfather\n",
|
||||
"By Anonymous\n",
|
||||
"\n",
|
||||
"This movie was super boring. 1 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Godfather\", \"critic\": \"Roger Ebert\", \"tone\": \"negative\", \"rating\": 1, \"reliable\": false}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(\n",
|
||||
" \"\"\"Extract relevant information from the following text.\n",
|
||||
"Anonymous critics are actually Roger Ebert.\n",
|
||||
"\n",
|
||||
"{input}\n",
|
||||
"\"\"\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_transformer = create_metadata_tagger(schema, llm, prompt=prompt)\n",
|
||||
"enhanced_documents = document_transformer.transform_documents(original_documents)\n",
|
||||
"\n",
|
||||
"print(\n",
|
||||
" *[d.page_content + \"\\n\\n\" + json.dumps(d.metadata) for d in enhanced_documents],\n",
|
||||
" sep=\"\\n\\n---------------\\n\\n\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -155,9 +155,12 @@
|
||||
"\n",
|
||||
"# Char-level splits\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"\n",
|
||||
"chunk_size = 250\n",
|
||||
"chunk_overlap = 30\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(\n",
|
||||
" chunk_size=chunk_size, chunk_overlap=chunk_overlap\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Split\n",
|
||||
"splits = text_splitter.split_documents(md_header_splits)\n",
|
||||
|
||||