Compare commits
145 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ad7d97670b | ||
|
|
7d4843fe84 | ||
|
|
6d88b23ef7 | ||
|
|
663b0933e4 | ||
|
|
1e40427755 | ||
|
|
85e1c9b348 | ||
|
|
45bb414be2 | ||
|
|
6325a3517c | ||
|
|
82f3e32d8d | ||
|
|
af6d333147 | ||
|
|
3874bb256e | ||
|
|
574698a5fb | ||
|
|
854f3fe9b1 | ||
|
|
051fac1e66 | ||
|
|
5db4dba526 | ||
|
|
9124221d31 | ||
|
|
c087ce74f7 | ||
|
|
ae7714f1ba | ||
|
|
fbc97a77ed | ||
|
|
120c52589b | ||
|
|
c7b687e944 | ||
|
|
aab2a7cd4b | ||
|
|
5f03cc3511 | ||
|
|
3dd0704e38 | ||
|
|
24c1654208 | ||
|
|
c17a80f11c | ||
|
|
a673a51efa | ||
|
|
224199083b | ||
|
|
af3f401015 | ||
|
|
98e1bbfbbd | ||
|
|
6f62e5461c | ||
|
|
b08f903755 | ||
|
|
f307ca094b | ||
|
|
488d2d5da9 | ||
|
|
a8bbfb2da3 | ||
|
|
92ef77da35 | ||
|
|
7f8ff2a317 | ||
|
|
c5e50c40c9 | ||
|
|
a08baa97c5 | ||
|
|
cdb93ab5ca | ||
|
|
8effd90be0 | ||
|
|
f11d845dee | ||
|
|
0e1d7a27c6 | ||
|
|
53722dcfdc | ||
|
|
1d4db1327a | ||
|
|
ee70d4a0cd | ||
|
|
9b215e761e | ||
|
|
2f848294cb | ||
|
|
d85c33a5c3 | ||
|
|
0d92a7f357 | ||
|
|
931e68692e | ||
|
|
be29a6287d | ||
|
|
adc96d60b6 | ||
|
|
93a84f6182 | ||
|
|
22525bad65 | ||
|
|
6e1000dc8d | ||
|
|
f3c9bf5e4b | ||
|
|
6cdd4b5edc | ||
|
|
50316f6477 | ||
|
|
603a0bea29 | ||
|
|
3f7213586e | ||
|
|
5f17c57174 | ||
|
|
ebcb144342 | ||
|
|
641fd74baa | ||
|
|
2667ddc686 | ||
|
|
74c28df363 | ||
|
|
5c3fe8b0d1 | ||
|
|
2babe3069f | ||
|
|
e811c5e8c6 | ||
|
|
8741e55e7c | ||
|
|
00c466627a | ||
|
|
cc0585af42 | ||
|
|
b96ac13f3d | ||
|
|
9cb2347453 | ||
|
|
c4d53f98dc | ||
|
|
2c2f0e15a6 | ||
|
|
0ea7224535 | ||
|
|
1f83b5f47e | ||
|
|
6674b33cf5 | ||
|
|
406a9dc11f | ||
|
|
9e067b8cc9 | ||
|
|
3c4338470e | ||
|
|
d2137eea9f | ||
|
|
9129318466 | ||
|
|
2e4047e5e7 | ||
|
|
1dd4236177 | ||
|
|
4a94f56258 | ||
|
|
5171c3bcca | ||
|
|
bd0c6381f5 | ||
|
|
28d2b213a4 | ||
|
|
dd648183fa | ||
|
|
5eec74d9a5 | ||
|
|
9d13dcd17c | ||
|
|
5debd5043e | ||
|
|
9b615022e2 | ||
|
|
92b4418c8c | ||
|
|
7d29bb2c02 | ||
|
|
21a353e9c2 | ||
|
|
d2cf0d16b3 | ||
|
|
04cddfba0d | ||
|
|
bcab894f4e | ||
|
|
490f4a9ff0 | ||
|
|
7ffc431b3a | ||
|
|
50a9fcccb0 | ||
|
|
a5fd8873b1 | ||
|
|
dfc3f83b0f | ||
|
|
c7f7788d0b | ||
|
|
8f8e8d701e | ||
|
|
560c4dfc98 | ||
|
|
f5bd88757e | ||
|
|
ea9c3cc9c9 | ||
|
|
5da9f9abcb | ||
|
|
2eb4a2ceea | ||
|
|
e7420789e4 | ||
|
|
26c86a197c | ||
|
|
1d649b127e | ||
|
|
362bc301df | ||
|
|
a1603fccfb | ||
|
|
4ba7396f96 | ||
|
|
633b673b85 | ||
|
|
4d697d3f24 | ||
|
|
612a74eb7e | ||
|
|
4789c99bc2 | ||
|
|
fb6e63dc36 | ||
|
|
c5edbea34a | ||
|
|
1ac347b4e3 | ||
|
|
705d2f5b92 | ||
|
|
ec033ae277 | ||
|
|
da5b0723d2 | ||
|
|
184ede4e48 | ||
|
|
7cdf97ba9b | ||
|
|
4d427b2397 | ||
|
|
2179d4eef8 | ||
|
|
df746ad821 | ||
|
|
c9a0f24646 | ||
|
|
34a2755a54 | ||
|
|
4e7d0c115b | ||
|
|
01dca1e438 | ||
|
|
1ac6deda89 | ||
|
|
4e180dc54e | ||
|
|
3ce4e46c8c | ||
|
|
b489466488 | ||
|
|
38ca5c84cb | ||
|
|
49b2b0e3c0 | ||
|
|
a2830e3056 |
66
.github/CONTRIBUTING.md
vendored
@@ -95,6 +95,14 @@ To run formatting for this project:
|
||||
make format
|
||||
```
|
||||
|
||||
Additionally, you can run the formatter only on the files that have been modified in your current branch as compared to the master branch using the format_diff command:
|
||||
|
||||
```bash
|
||||
make format_diff
|
||||
```
|
||||
|
||||
This is especially useful when you have made changes to a subset of the project and want to ensure your changes are properly formatted without affecting the rest of the codebase.
|
||||
|
||||
### Linting
|
||||
|
||||
Linting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/), [isort](https://pycqa.github.io/isort/), [flake8](https://flake8.pycqa.org/en/latest/), and [mypy](http://mypy-lang.org/).
|
||||
@@ -105,8 +113,42 @@ To run linting for this project:
|
||||
make lint
|
||||
```
|
||||
|
||||
In addition, you can run the linter only on the files that have been modified in your current branch as compared to the master branch using the lint_diff command:
|
||||
|
||||
```bash
|
||||
make lint_diff
|
||||
```
|
||||
|
||||
This can be very helpful when you've made changes to only certain parts of the project and want to ensure your changes meet the linting standards without having to check the entire codebase.
|
||||
|
||||
We recognize linting can be annoying - if you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
|
||||
|
||||
### Spellcheck
|
||||
|
||||
Spellchecking for this project is done via [codespell](https://github.com/codespell-project/codespell).
|
||||
Note that `codespell` finds common typos, so could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
|
||||
|
||||
To check spelling for this project:
|
||||
|
||||
```bash
|
||||
make spell_check
|
||||
```
|
||||
|
||||
To fix spelling in place:
|
||||
|
||||
```bash
|
||||
make spell_fix
|
||||
```
|
||||
|
||||
If codespell is incorrectly flagging a word, you can skip spellcheck for that word by adding it to the codespell config in the `pyproject.toml` file.
|
||||
|
||||
```python
|
||||
[tool.codespell]
|
||||
...
|
||||
# Add here:
|
||||
ignore-words-list = 'momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogyny,unsecure'
|
||||
```
|
||||
|
||||
### Coverage
|
||||
|
||||
Code coverage (i.e. the amount of code that is covered by unit tests) helps identify areas of the code that are potentially more or less brittle.
|
||||
@@ -208,30 +250,38 @@ When you run `poetry install`, the `langchain` package is installed as editable
|
||||
|
||||
### Contribute Documentation
|
||||
|
||||
Docs are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
|
||||
The docs directory contains Documentation and API Reference.
|
||||
|
||||
Documentation is built using [Docusaurus 2](https://docusaurus.io/).
|
||||
|
||||
API Reference are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
|
||||
For that reason, we ask that you add good documentation to all classes and methods.
|
||||
|
||||
Similar to linting, we recognize documentation can be annoying. If you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
|
||||
|
||||
### Build Documentation Locally
|
||||
|
||||
In the following commands, the prefix `api_` indicates that those are operations for the API Reference.
|
||||
|
||||
Before building the documentation, it is always a good idea to clean the build directory:
|
||||
|
||||
```bash
|
||||
make docs_clean
|
||||
make api_docs_clean
|
||||
```
|
||||
|
||||
Next, you can run the linkchecker to make sure all links are valid:
|
||||
|
||||
```bash
|
||||
make docs_linkcheck
|
||||
```
|
||||
|
||||
Finally, you can build the documentation as outlined below:
|
||||
Next, you can build the documentation as outlined below:
|
||||
|
||||
```bash
|
||||
make docs_build
|
||||
make api_docs_build
|
||||
```
|
||||
|
||||
Finally, you can run the linkchecker to make sure all links are valid:
|
||||
|
||||
```bash
|
||||
make docs_linkcheck
|
||||
make api_docs_linkcheck
|
||||
```
|
||||
|
||||
## 🏭 Release Process
|
||||
|
||||
22
.github/workflows/codespell.yml
vendored
Normal file
@@ -0,0 +1,22 @@
|
||||
---
|
||||
name: Codespell
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
pull_request:
|
||||
branches: [master]
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
codespell:
|
||||
name: Check for spelling errors
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v3
|
||||
- name: Codespell
|
||||
uses: codespell-project/actions-codespell@v2
|
||||
5
.gitignore
vendored
@@ -161,7 +161,12 @@ docs/node_modules/
|
||||
docs/.docusaurus/
|
||||
docs/.cache-loader/
|
||||
docs/_dist
|
||||
docs/api_reference/api_reference.rst
|
||||
docs/api_reference/_build
|
||||
docs/api_reference/*/
|
||||
!docs/api_reference/_static/
|
||||
!docs/api_reference/templates/
|
||||
!docs/api_reference/themes/
|
||||
docs/docs_skeleton/build
|
||||
docs/docs_skeleton/node_modules
|
||||
docs/docs_skeleton/yarn.lock
|
||||
|
||||
69
Makefile
@@ -1,40 +1,47 @@
|
||||
.PHONY: all clean format lint test tests test_watch integration_tests docker_tests help extended_tests
|
||||
.PHONY: all clean docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck format lint test tests test_watch integration_tests docker_tests help extended_tests
|
||||
|
||||
# Default target executed when no arguments are given to make.
|
||||
all: help
|
||||
|
||||
######################
|
||||
# TESTING AND COVERAGE
|
||||
######################
|
||||
|
||||
# Run unit tests and generate a coverage report.
|
||||
coverage:
|
||||
poetry run pytest --cov \
|
||||
--cov-config=.coveragerc \
|
||||
--cov-report xml \
|
||||
--cov-report term-missing:skip-covered
|
||||
|
||||
clean: docs_clean
|
||||
######################
|
||||
# DOCUMENTATION
|
||||
######################
|
||||
|
||||
clean: docs_clean api_docs_clean
|
||||
|
||||
docs_compile:
|
||||
poetry run nbdoc_build --srcdir $(srcdir)
|
||||
|
||||
docs_build:
|
||||
cd docs && poetry run make html
|
||||
docs/.local_build.sh
|
||||
|
||||
docs_clean:
|
||||
cd docs && poetry run make clean
|
||||
rm -r docs/_dist
|
||||
|
||||
docs_linkcheck:
|
||||
poetry run linkchecker docs/_build/html/index.html
|
||||
poetry run linkchecker docs/_dist/docs_skeleton/ --ignore-url node_modules
|
||||
|
||||
format:
|
||||
poetry run black .
|
||||
poetry run ruff --select I --fix .
|
||||
api_docs_build:
|
||||
poetry run python docs/api_reference/create_api_rst.py
|
||||
cd docs/api_reference && poetry run make html
|
||||
|
||||
PYTHON_FILES=.
|
||||
lint: PYTHON_FILES=.
|
||||
lint_diff: PYTHON_FILES=$(shell git diff --name-only --diff-filter=d master | grep -E '\.py$$')
|
||||
api_docs_clean:
|
||||
rm -f docs/api_reference/api_reference.rst
|
||||
cd docs/api_reference && poetry run make clean
|
||||
|
||||
lint lint_diff:
|
||||
poetry run mypy $(PYTHON_FILES)
|
||||
poetry run black $(PYTHON_FILES) --check
|
||||
poetry run ruff .
|
||||
api_docs_linkcheck:
|
||||
poetry run linkchecker docs/api_reference/_build/html/index.html
|
||||
|
||||
# Define a variable for the test file path.
|
||||
TEST_FILE ?= tests/unit_tests/
|
||||
|
||||
test:
|
||||
@@ -56,6 +63,34 @@ docker_tests:
|
||||
docker build -t my-langchain-image:test .
|
||||
docker run --rm my-langchain-image:test
|
||||
|
||||
######################
|
||||
# LINTING AND FORMATTING
|
||||
######################
|
||||
|
||||
# Define a variable for Python and notebook files.
|
||||
PYTHON_FILES=.
|
||||
lint format: PYTHON_FILES=.
|
||||
lint_diff format_diff: PYTHON_FILES=$(shell git diff --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$')
|
||||
|
||||
lint lint_diff:
|
||||
poetry run mypy $(PYTHON_FILES)
|
||||
poetry run black $(PYTHON_FILES) --check
|
||||
poetry run ruff .
|
||||
|
||||
format format_diff:
|
||||
poetry run black $(PYTHON_FILES)
|
||||
poetry run ruff --select I --fix $(PYTHON_FILES)
|
||||
|
||||
spell_check:
|
||||
poetry run codespell --toml pyproject.toml
|
||||
|
||||
spell_fix:
|
||||
poetry run codespell --toml pyproject.toml -w
|
||||
|
||||
######################
|
||||
# HELP
|
||||
######################
|
||||
|
||||
help:
|
||||
@echo '----'
|
||||
@echo 'coverage - run unit tests and generate coverage report'
|
||||
|
||||
@@ -1,10 +1,15 @@
|
||||
mkdir _dist
|
||||
#!/usr/bin/env bash
|
||||
|
||||
set -o errexit
|
||||
set -o nounset
|
||||
set -o pipefail
|
||||
set -o xtrace
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")"; pwd)"
|
||||
cd "${SCRIPT_DIR}"
|
||||
|
||||
mkdir -p _dist/docs_skeleton
|
||||
cp -r {docs_skeleton,snippets} _dist
|
||||
mkdir -p _dist/docs_skeleton/static/api_reference
|
||||
cd api_reference
|
||||
poetry run make html
|
||||
cp -r _build/* ../_dist/docs_skeleton/static/api_reference
|
||||
cd ..
|
||||
cp -r extras/* _dist/docs_skeleton/docs
|
||||
cd _dist/docs_skeleton
|
||||
poetry run nbdoc_build
|
||||
|
||||
@@ -20,7 +20,9 @@ def load_members() -> dict:
|
||||
cls = re.findall(r"^class ([^_].*)\(", line)
|
||||
members[top_level]["classes"].extend([module + "." + c for c in cls])
|
||||
func = re.findall(r"^def ([^_].*)\(", line)
|
||||
members[top_level]["functions"].extend([module + "." + f for f in func])
|
||||
afunc = re.findall(r"^async def ([^_].*)\(", line)
|
||||
func_strings = [module + "." + f for f in func + afunc]
|
||||
members[top_level]["functions"].extend(func_strings)
|
||||
return members
|
||||
|
||||
|
||||
|
||||
|
Before Width: | Height: | Size: 157 KiB After Width: | Height: | Size: 157 KiB |
@@ -3,6 +3,8 @@ sidebar_position: 0
|
||||
---
|
||||
# Integrations
|
||||
|
||||
Visit the [Integrations Hub](https://integrations.langchain.com) to further explore, upvote and request integrations across key LangChain components.
|
||||
|
||||
import DocCardList from "@theme/DocCardList";
|
||||
|
||||
<DocCardList />
|
||||
|
||||
12
docs/docs_skeleton/docs/guides/langsmith/index.md
Normal file
@@ -0,0 +1,12 @@
|
||||
# LangSmith
|
||||
|
||||
import DocCardList from "@theme/DocCardList";
|
||||
|
||||
LangSmith helps you trace and evaluate your language model applications and intelligent agents to help you
|
||||
move from prototype to production.
|
||||
|
||||
Check out the [interactive walkthrough](walkthrough) below to get started.
|
||||
|
||||
For more information, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)
|
||||
|
||||
<DocCardList />
|
||||
@@ -24,7 +24,7 @@ That means there are two different axes along which you can customize your text
|
||||
1. How the text is split
|
||||
2. How the chunk size is measured
|
||||
|
||||
## Get started with text splitters
|
||||
### Get started with text splitters
|
||||
|
||||
import GetStarted from "@snippets/modules/data_connection/document_transformers/get_started.mdx"
|
||||
|
||||
|
||||
@@ -1 +1,2 @@
|
||||
label: 'Text splitters'
|
||||
position: 0
|
||||
|
||||
@@ -8,7 +8,7 @@ Many LLM applications require user-specific data that is not part of the model's
|
||||
building blocks to load, transform, store and query your data via:
|
||||
|
||||
- [Document loaders](/docs/modules/data_connection/document_loaders/): Load documents from many different sources
|
||||
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, drop redundant documents, and more
|
||||
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, convert documents into Q&A format, drop redundant documents, and more
|
||||
- [Text embedding models](/docs/modules/data_connection/text_embedding/): Take unstructured text and turn it into a list of floating point numbers
|
||||
- [Vector stores](/docs/modules/data_connection/vectorstores/): Store and search over embedded data
|
||||
- [Retrievers](/docs/modules/data_connection/retrievers/): Query your data
|
||||
|
||||
BIN
docs/docs_skeleton/static/img/cpal_diagram.png
Normal file
|
After Width: | Height: | Size: 116 KiB |
BIN
docs/docs_skeleton/static/img/qa_data_load.png
Normal file
|
After Width: | Height: | Size: 237 KiB |
BIN
docs/docs_skeleton/static/img/qa_flow.jpeg
Normal file
|
After Width: | Height: | Size: 173 KiB |
BIN
docs/docs_skeleton/static/img/qa_intro.png
Normal file
|
After Width: | Height: | Size: 164 KiB |
BIN
docs/docs_skeleton/static/img/summary_chains.png
Normal file
|
After Width: | Height: | Size: 118 KiB |
@@ -138,7 +138,11 @@
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/integrations/databerry.html",
|
||||
"destination": "/docs/ecosystem/integrations/databerry"
|
||||
"destination": "/docs/ecosystem/integrations/chaindesk"
|
||||
},
|
||||
{
|
||||
"source": "/docs/ecosystem/integrations/databerry",
|
||||
"destination": "/docs/ecosystem/integrations/chaindesk"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/integrations/databricks/databricks.html",
|
||||
@@ -1330,7 +1334,11 @@
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/modules/indexes/retrievers/examples/databerry.html",
|
||||
"destination": "/docs/modules/data_connection/retrievers/integrations/databerry"
|
||||
"destination": "/docs/modules/data_connection/retrievers/integrations/chaindesk"
|
||||
},
|
||||
{
|
||||
"source": "/docs/modules/data_connection/retrievers/integrations/databerry",
|
||||
"destination": "/docs/modules/data_connection/retrievers/integrations/chaindesk"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/modules/indexes/retrievers/examples/elastic_search_bm25.html",
|
||||
@@ -2125,4 +2133,4 @@
|
||||
"destination": "/docs/:path*"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2,188 +2,261 @@
|
||||
|
||||
Dependents stats for `hwchase17/langchain`
|
||||
|
||||
[](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=172&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=4980&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=17239&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=244&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=9697&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=19827&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
|
||||
[update: 2023-05-17; only dependent repositories with Stars > 100]
|
||||
|
||||
[update: 2023-07-07; only dependent repositories with Stars > 100]
|
||||
|
||||
|
||||
| Repository | Stars |
|
||||
| :-------- | -----: |
|
||||
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 35401 |
|
||||
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 32861 |
|
||||
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 32766 |
|
||||
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 29560 |
|
||||
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 22315 |
|
||||
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 17474 |
|
||||
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 16923 |
|
||||
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16112 |
|
||||
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 15407 |
|
||||
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14345 |
|
||||
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 10372 |
|
||||
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 9919 |
|
||||
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8177 |
|
||||
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 6807 |
|
||||
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 6087 |
|
||||
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5292 |
|
||||
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 4622 |
|
||||
|[nsarrazin/serge](https://github.com/nsarrazin/serge) | 4076 |
|
||||
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 3952 |
|
||||
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 3952 |
|
||||
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 3762 |
|
||||
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 3388 |
|
||||
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3243 |
|
||||
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3189 |
|
||||
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 3050 |
|
||||
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 2930 |
|
||||
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 2710 |
|
||||
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2545 |
|
||||
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2479 |
|
||||
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2399 |
|
||||
|[langgenius/dify](https://github.com/langgenius/dify) | 2344 |
|
||||
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2283 |
|
||||
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2266 |
|
||||
|[guangzhengli/ChatFiles](https://github.com/guangzhengli/ChatFiles) | 1903 |
|
||||
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 1884 |
|
||||
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 1860 |
|
||||
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1813 |
|
||||
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1571 |
|
||||
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1480 |
|
||||
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1464 |
|
||||
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1419 |
|
||||
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1410 |
|
||||
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1363 |
|
||||
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1344 |
|
||||
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 1330 |
|
||||
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1318 |
|
||||
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1286 |
|
||||
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1156 |
|
||||
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 1141 |
|
||||
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1106 |
|
||||
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1072 |
|
||||
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1064 |
|
||||
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1057 |
|
||||
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1003 |
|
||||
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1002 |
|
||||
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 957 |
|
||||
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 918 |
|
||||
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 886 |
|
||||
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 867 |
|
||||
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 850 |
|
||||
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 837 |
|
||||
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 826 |
|
||||
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 782 |
|
||||
|[hashintel/hash](https://github.com/hashintel/hash) | 778 |
|
||||
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 773 |
|
||||
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 738 |
|
||||
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 737 |
|
||||
|[ai-sidekick/sidekick](https://github.com/ai-sidekick/sidekick) | 717 |
|
||||
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 703 |
|
||||
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 689 |
|
||||
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 666 |
|
||||
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 608 |
|
||||
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 559 |
|
||||
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 544 |
|
||||
|[pieroit/cheshire-cat](https://github.com/pieroit/cheshire-cat) | 520 |
|
||||
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 514 |
|
||||
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 481 |
|
||||
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 462 |
|
||||
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 452 |
|
||||
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 439 |
|
||||
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 437 |
|
||||
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 433 |
|
||||
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 427 |
|
||||
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 425 |
|
||||
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 422 |
|
||||
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 421 |
|
||||
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 407 |
|
||||
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 395 |
|
||||
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 383 |
|
||||
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 374 |
|
||||
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 368 |
|
||||
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 358 |
|
||||
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 357 |
|
||||
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 354 |
|
||||
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 343 |
|
||||
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 334 |
|
||||
|[showlab/VLog](https://github.com/showlab/VLog) | 330 |
|
||||
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 324 |
|
||||
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 323 |
|
||||
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 320 |
|
||||
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 308 |
|
||||
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 301 |
|
||||
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 300 |
|
||||
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 299 |
|
||||
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 287 |
|
||||
|[itamargol/openai](https://github.com/itamargol/openai) | 273 |
|
||||
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 267 |
|
||||
|[momegas/megabots](https://github.com/momegas/megabots) | 259 |
|
||||
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 238 |
|
||||
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 232 |
|
||||
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 227 |
|
||||
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 227 |
|
||||
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 226 |
|
||||
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 218 |
|
||||
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 218 |
|
||||
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 215 |
|
||||
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 213 |
|
||||
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 209 |
|
||||
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 208 |
|
||||
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 197 |
|
||||
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 195 |
|
||||
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 195 |
|
||||
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 192 |
|
||||
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 189 |
|
||||
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 187 |
|
||||
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 184 |
|
||||
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 183 |
|
||||
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 180 |
|
||||
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 166 |
|
||||
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 166 |
|
||||
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 161 |
|
||||
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 160 |
|
||||
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 153 |
|
||||
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 153 |
|
||||
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 152 |
|
||||
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 149 |
|
||||
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 149 |
|
||||
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 147 |
|
||||
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 144 |
|
||||
|[homanp/superagent](https://github.com/homanp/superagent) | 143 |
|
||||
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 141 |
|
||||
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 141 |
|
||||
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 139 |
|
||||
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 138 |
|
||||
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 136 |
|
||||
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 135 |
|
||||
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 134 |
|
||||
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 130 |
|
||||
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 130 |
|
||||
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 128 |
|
||||
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 128 |
|
||||
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 127 |
|
||||
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 127 |
|
||||
|[yasyf/summ](https://github.com/yasyf/summ) | 127 |
|
||||
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 126 |
|
||||
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 125 |
|
||||
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 124 |
|
||||
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 124 |
|
||||
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 124 |
|
||||
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 123 |
|
||||
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 123 |
|
||||
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 123 |
|
||||
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 115 |
|
||||
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 113 |
|
||||
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 113 |
|
||||
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 112 |
|
||||
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 111 |
|
||||
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 109 |
|
||||
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 108 |
|
||||
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 104 |
|
||||
|[enhancedocs/enhancedocs](https://github.com/enhancedocs/enhancedocs) | 102 |
|
||||
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 101 |
|
||||
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 41047 |
|
||||
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 33983 |
|
||||
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 33375 |
|
||||
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 31114 |
|
||||
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 30369 |
|
||||
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 24116 |
|
||||
|[OpenBB-finance/OpenBBTerminal](https://github.com/OpenBB-finance/OpenBBTerminal) | 22565 |
|
||||
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 18375 |
|
||||
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 17723 |
|
||||
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16958 |
|
||||
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14632 |
|
||||
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 11273 |
|
||||
|[openai/evals](https://github.com/openai/evals) | 10745 |
|
||||
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 10298 |
|
||||
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 9838 |
|
||||
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 9247 |
|
||||
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8768 |
|
||||
|[PromtEngineer/localGPT](https://github.com/PromtEngineer/localGPT) | 8651 |
|
||||
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 8119 |
|
||||
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 7418 |
|
||||
|[gventuri/pandas-ai](https://github.com/gventuri/pandas-ai) | 7301 |
|
||||
|[PipedreamHQ/pipedream](https://github.com/PipedreamHQ/pipedream) | 6636 |
|
||||
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5849 |
|
||||
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 5129 |
|
||||
|[langgenius/dify](https://github.com/langgenius/dify) | 4804 |
|
||||
|[serge-chat/serge](https://github.com/serge-chat/serge) | 4448 |
|
||||
|[csunny/DB-GPT](https://github.com/csunny/DB-GPT) | 4350 |
|
||||
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 4268 |
|
||||
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 4244 |
|
||||
|[intitni/CopilotForXcode](https://github.com/intitni/CopilotForXcode) | 4232 |
|
||||
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 4154 |
|
||||
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 4080 |
|
||||
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3949 |
|
||||
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 3920 |
|
||||
|[bentoml/OpenLLM](https://github.com/bentoml/OpenLLM) | 3481 |
|
||||
|[MineDojo/Voyager](https://github.com/MineDojo/Voyager) | 3453 |
|
||||
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3355 |
|
||||
|[postgresml/postgresml](https://github.com/postgresml/postgresml) | 3328 |
|
||||
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 3100 |
|
||||
|[kyegomez/tree-of-thoughts](https://github.com/kyegomez/tree-of-thoughts) | 3049 |
|
||||
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2844 |
|
||||
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2833 |
|
||||
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 2809 |
|
||||
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2809 |
|
||||
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2664 |
|
||||
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 2650 |
|
||||
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 2525 |
|
||||
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2372 |
|
||||
|[ParisNeo/lollms-webui](https://github.com/ParisNeo/lollms-webui) | 2287 |
|
||||
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 2265 |
|
||||
|[SamurAIGPT/privateGPT](https://github.com/SamurAIGPT/privateGPT) | 2084 |
|
||||
|[Chainlit/chainlit](https://github.com/Chainlit/chainlit) | 1912 |
|
||||
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1869 |
|
||||
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1864 |
|
||||
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1849 |
|
||||
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1766 |
|
||||
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1745 |
|
||||
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1732 |
|
||||
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1716 |
|
||||
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1619 |
|
||||
|[pinterest/querybook](https://github.com/pinterest/querybook) | 1468 |
|
||||
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1446 |
|
||||
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 1430 |
|
||||
|[Mintplex-Labs/anything-llm](https://github.com/Mintplex-Labs/anything-llm) | 1419 |
|
||||
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1416 |
|
||||
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1327 |
|
||||
|[psychic-api/psychic](https://github.com/psychic-api/psychic) | 1307 |
|
||||
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1242 |
|
||||
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1239 |
|
||||
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1203 |
|
||||
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1179 |
|
||||
|[keephq/keep](https://github.com/keephq/keep) | 1169 |
|
||||
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1156 |
|
||||
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 1090 |
|
||||
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 1088 |
|
||||
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 1074 |
|
||||
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1057 |
|
||||
|[noahshinn024/reflexion](https://github.com/noahshinn024/reflexion) | 1045 |
|
||||
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 1036 |
|
||||
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 999 |
|
||||
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 989 |
|
||||
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 974 |
|
||||
|[homanp/superagent](https://github.com/homanp/superagent) | 970 |
|
||||
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 941 |
|
||||
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 896 |
|
||||
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 856 |
|
||||
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 840 |
|
||||
|[chatarena/chatarena](https://github.com/chatarena/chatarena) | 829 |
|
||||
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 816 |
|
||||
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 816 |
|
||||
|[hashintel/hash](https://github.com/hashintel/hash) | 806 |
|
||||
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 790 |
|
||||
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 752 |
|
||||
|[cheshire-cat-ai/core](https://github.com/cheshire-cat-ai/core) | 713 |
|
||||
|[e-johnstonn/BriefGPT](https://github.com/e-johnstonn/BriefGPT) | 686 |
|
||||
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 685 |
|
||||
|[refuel-ai/autolabel](https://github.com/refuel-ai/autolabel) | 673 |
|
||||
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 617 |
|
||||
|[billxbf/ReWOO](https://github.com/billxbf/ReWOO) | 616 |
|
||||
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 609 |
|
||||
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 592 |
|
||||
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 581 |
|
||||
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 574 |
|
||||
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 572 |
|
||||
|[kreneskyp/ix](https://github.com/kreneskyp/ix) | 564 |
|
||||
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 540 |
|
||||
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 540 |
|
||||
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 537 |
|
||||
|[khoj-ai/khoj](https://github.com/khoj-ai/khoj) | 531 |
|
||||
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 528 |
|
||||
|[microsoft/PodcastCopilot](https://github.com/microsoft/PodcastCopilot) | 526 |
|
||||
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 515 |
|
||||
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 494 |
|
||||
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 483 |
|
||||
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 472 |
|
||||
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 465 |
|
||||
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 464 |
|
||||
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 464 |
|
||||
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 455 |
|
||||
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 455 |
|
||||
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 450 |
|
||||
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 446 |
|
||||
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 445 |
|
||||
|[plastic-labs/tutor-gpt](https://github.com/plastic-labs/tutor-gpt) | 426 |
|
||||
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 426 |
|
||||
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 418 |
|
||||
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 416 |
|
||||
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 401 |
|
||||
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 400 |
|
||||
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 386 |
|
||||
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 382 |
|
||||
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 368 |
|
||||
|[showlab/VLog](https://github.com/showlab/VLog) | 363 |
|
||||
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 363 |
|
||||
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 361 |
|
||||
|[opentensor/bittensor](https://github.com/opentensor/bittensor) | 360 |
|
||||
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 355 |
|
||||
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 351 |
|
||||
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 348 |
|
||||
|[alejandro-ao/ask-multiple-pdfs](https://github.com/alejandro-ao/ask-multiple-pdfs) | 321 |
|
||||
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 314 |
|
||||
|[mosaicml/examples](https://github.com/mosaicml/examples) | 313 |
|
||||
|[personoids/personoids-lite](https://github.com/personoids/personoids-lite) | 306 |
|
||||
|[itamargol/openai](https://github.com/itamargol/openai) | 304 |
|
||||
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 299 |
|
||||
|[momegas/megabots](https://github.com/momegas/megabots) | 299 |
|
||||
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 289 |
|
||||
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 283 |
|
||||
|[wandb/weave](https://github.com/wandb/weave) | 279 |
|
||||
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 273 |
|
||||
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 271 |
|
||||
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 270 |
|
||||
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 269 |
|
||||
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 259 |
|
||||
|[Azure-Samples/openai](https://github.com/Azure-Samples/openai) | 252 |
|
||||
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 248 |
|
||||
|[hnawaz007/pythondataanalysis](https://github.com/hnawaz007/pythondataanalysis) | 247 |
|
||||
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 243 |
|
||||
|[truera/trulens](https://github.com/truera/trulens) | 239 |
|
||||
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 238 |
|
||||
|[intel/intel-extension-for-transformers](https://github.com/intel/intel-extension-for-transformers) | 237 |
|
||||
|[monarch-initiative/ontogpt](https://github.com/monarch-initiative/ontogpt) | 236 |
|
||||
|[wandb/edu](https://github.com/wandb/edu) | 231 |
|
||||
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 229 |
|
||||
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 223 |
|
||||
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 221 |
|
||||
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 220 |
|
||||
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 219 |
|
||||
|[Safiullah-Rahu/CSV-AI](https://github.com/Safiullah-Rahu/CSV-AI) | 215 |
|
||||
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 215 |
|
||||
|[steamship-packages/langchain-agent-production-starter](https://github.com/steamship-packages/langchain-agent-production-starter) | 214 |
|
||||
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 213 |
|
||||
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 211 |
|
||||
|[marella/chatdocs](https://github.com/marella/chatdocs) | 207 |
|
||||
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 200 |
|
||||
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 195 |
|
||||
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 189 |
|
||||
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 186 |
|
||||
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 185 |
|
||||
|[huchenxucs/ChatDB](https://github.com/huchenxucs/ChatDB) | 179 |
|
||||
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 178 |
|
||||
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 178 |
|
||||
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 177 |
|
||||
|[jiran214/GPT-vup](https://github.com/jiran214/GPT-vup) | 176 |
|
||||
|[rsaryev/talk-codebase](https://github.com/rsaryev/talk-codebase) | 174 |
|
||||
|[edreisMD/plugnplai](https://github.com/edreisMD/plugnplai) | 174 |
|
||||
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 172 |
|
||||
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 171 |
|
||||
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 165 |
|
||||
|[gustavz/DataChad](https://github.com/gustavz/DataChad) | 164 |
|
||||
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 163 |
|
||||
|[SamPink/dev-gpt](https://github.com/SamPink/dev-gpt) | 161 |
|
||||
|[yuanjie-ai/ChatLLM](https://github.com/yuanjie-ai/ChatLLM) | 161 |
|
||||
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 160 |
|
||||
|[jondurbin/airoboros](https://github.com/jondurbin/airoboros) | 157 |
|
||||
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 157 |
|
||||
|[PradipNichite/Youtube-Tutorials](https://github.com/PradipNichite/Youtube-Tutorials) | 156 |
|
||||
|[nicknochnack/LangchainDocuments](https://github.com/nicknochnack/LangchainDocuments) | 155 |
|
||||
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 155 |
|
||||
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 154 |
|
||||
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 153 |
|
||||
|[preset-io/promptimize](https://github.com/preset-io/promptimize) | 150 |
|
||||
|[onlyphantom/llm-python](https://github.com/onlyphantom/llm-python) | 148 |
|
||||
|[Azure-Samples/azure-search-power-skills](https://github.com/Azure-Samples/azure-search-power-skills) | 146 |
|
||||
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 144 |
|
||||
|[microsoft/azure-openai-in-a-day-workshop](https://github.com/microsoft/azure-openai-in-a-day-workshop) | 144 |
|
||||
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 143 |
|
||||
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 142 |
|
||||
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 141 |
|
||||
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 140 |
|
||||
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 140 |
|
||||
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 139 |
|
||||
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 139 |
|
||||
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 138 |
|
||||
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 137 |
|
||||
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 137 |
|
||||
|[deeppavlov/dream](https://github.com/deeppavlov/dream) | 136 |
|
||||
|[miaoshouai/miaoshouai-assistant](https://github.com/miaoshouai/miaoshouai-assistant) | 135 |
|
||||
|[sugarforever/LangChain-Tutorials](https://github.com/sugarforever/LangChain-Tutorials) | 135 |
|
||||
|[yasyf/summ](https://github.com/yasyf/summ) | 135 |
|
||||
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 134 |
|
||||
|[vaibkumr/prompt-optimizer](https://github.com/vaibkumr/prompt-optimizer) | 132 |
|
||||
|[ju-bezdek/langchain-decorators](https://github.com/ju-bezdek/langchain-decorators) | 130 |
|
||||
|[homanp/vercel-langchain](https://github.com/homanp/vercel-langchain) | 128 |
|
||||
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 127 |
|
||||
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 125 |
|
||||
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 122 |
|
||||
|[fixie-ai/fixie-examples](https://github.com/fixie-ai/fixie-examples) | 122 |
|
||||
|[Aggregate-Intellect/practical-llms](https://github.com/Aggregate-Intellect/practical-llms) | 120 |
|
||||
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 120 |
|
||||
|[Azure-Samples/azure-search-openai-demo-csharp](https://github.com/Azure-Samples/azure-search-openai-demo-csharp) | 119 |
|
||||
|[prof-frink-lab/slangchain](https://github.com/prof-frink-lab/slangchain) | 117 |
|
||||
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 117 |
|
||||
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 116 |
|
||||
|[flurb18/AgentOoba](https://github.com/flurb18/AgentOoba) | 114 |
|
||||
|[kaarthik108/snowChat](https://github.com/kaarthik108/snowChat) | 112 |
|
||||
|[RedisVentures/redis-openai-qna](https://github.com/RedisVentures/redis-openai-qna) | 111 |
|
||||
|[solana-labs/chatgpt-plugin](https://github.com/solana-labs/chatgpt-plugin) | 111 |
|
||||
|[kulltc/chatgpt-sql](https://github.com/kulltc/chatgpt-sql) | 109 |
|
||||
|[summarizepaper/summarizepaper](https://github.com/summarizepaper/summarizepaper) | 109 |
|
||||
|[Azure-Samples/miyagi](https://github.com/Azure-Samples/miyagi) | 106 |
|
||||
|[ssheng/BentoChain](https://github.com/ssheng/BentoChain) | 106 |
|
||||
|[voxel51/voxelgpt](https://github.com/voxel51/voxelgpt) | 105 |
|
||||
|[mallahyari/drqa](https://github.com/mallahyari/drqa) | 103 |
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -120,7 +120,8 @@
|
||||
" history = []\n",
|
||||
" while True:\n",
|
||||
" user_input = input(\"\\n>>> input >>>\\n>>>: \")\n",
|
||||
" if user_input == 'q': break\n",
|
||||
" if user_input == \"q\":\n",
|
||||
" break\n",
|
||||
" history.append(HumanMessage(content=user_input))\n",
|
||||
" history.append(llm(history))"
|
||||
]
|
||||
|
||||
@@ -1,17 +1,17 @@
|
||||
# Databerry
|
||||
# Chaindesk
|
||||
|
||||
>[Databerry](https://databerry.ai) is an [open source](https://github.com/gmpetrov/databerry) document retrieval platform that helps to connect your personal data with Large Language Models.
|
||||
>[Chaindesk](https://chaindesk.ai) is an [open source](https://github.com/gmpetrov/databerry) document retrieval platform that helps to connect your personal data with Large Language Models.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
We need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url.
|
||||
We need the [API Key](https://docs.databerry.ai/api-reference/authentication).
|
||||
We need to sign up for Chaindesk, create a datastore, add some data and get your datastore api endpoint url.
|
||||
We need the [API Key](https://docs.chaindesk.ai/api-reference/authentication).
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](/docs/modules/data_connection/retrievers/integrations/databerry.html).
|
||||
See a [usage example](/docs/modules/data_connection/retrievers/integrations/chaindesk.html).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import DataberryRetriever
|
||||
from langchain.retrievers import ChaindeskRetriever
|
||||
```
|
||||
@@ -8,7 +8,7 @@ pip install cnos-connector
|
||||
```
|
||||
|
||||
## Connecting to CnosDB
|
||||
You can connect to CnosDB using the SQLDatabase.from_cnosdb() method.
|
||||
You can connect to CnosDB using the `SQLDatabase.from_cnosdb()` method.
|
||||
### Syntax
|
||||
```python
|
||||
def SQLDatabase.from_cnosdb(url: str = "127.0.0.1:8902",
|
||||
@@ -31,7 +31,6 @@ Args:
|
||||
## Examples
|
||||
```python
|
||||
# Connecting to CnosDB with SQLDatabase Wrapper
|
||||
from cnosdb_connector import make_cnosdb_langchain_uri
|
||||
from langchain import SQLDatabase
|
||||
|
||||
db = SQLDatabase.from_cnosdb()
|
||||
@@ -43,7 +42,7 @@ from langchain.chat_models import ChatOpenAI
|
||||
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
|
||||
```
|
||||
|
||||
### SQL Chain
|
||||
### SQL Database Chain
|
||||
This example demonstrates the use of the SQL Chain for answering a question over a CnosDB.
|
||||
```python
|
||||
from langchain import SQLDatabaseChain
|
||||
@@ -51,15 +50,15 @@ from langchain import SQLDatabaseChain
|
||||
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
|
||||
|
||||
db_chain.run(
|
||||
"What is the average fa of test table that time between November 3,2022 and November 4, 2022?"
|
||||
"What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?"
|
||||
)
|
||||
```
|
||||
```shell
|
||||
> Entering new chain...
|
||||
What is the average fa of test table that time between November 3, 2022 and November 4, 2022?
|
||||
SQLQuery:SELECT AVG(fa) FROM test WHERE time >= '2022-11-03' AND time < '2022-11-04'
|
||||
SQLResult: [(2.0,)]
|
||||
Answer:The average fa of the test table between November 3, 2022, and November 4, 2022, is 2.0.
|
||||
What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?
|
||||
SQLQuery:SELECT AVG(temperature) FROM air WHERE station = 'XiaoMaiDao' AND time >= '2022-10-19' AND time < '2022-10-20'
|
||||
SQLResult: [(68.0,)]
|
||||
Answer:The average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022 is 68.0.
|
||||
> Finished chain.
|
||||
```
|
||||
### SQL Database Agent
|
||||
@@ -73,36 +72,39 @@ agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)
|
||||
```
|
||||
```python
|
||||
agent.run(
|
||||
"What is the average fa of test table that time between November 3, 2022 and November 4, 2022?"
|
||||
"What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?"
|
||||
)
|
||||
```
|
||||
```shell
|
||||
> Entering new chain...
|
||||
Action: sql_db_list_tables
|
||||
Action Input: ""
|
||||
Observation: test
|
||||
Thought:The relevant table is "test". I should query the schema of this table to see the column names.
|
||||
Observation: air
|
||||
Thought:The "air" table seems relevant to the question. I should query the schema of the "air" table to see what columns are available.
|
||||
Action: sql_db_schema
|
||||
Action Input: "test"
|
||||
Action Input: "air"
|
||||
Observation:
|
||||
CREATE TABLE test (
|
||||
CREATE TABLE air (
|
||||
pressure FLOAT,
|
||||
station STRING,
|
||||
temperature FLOAT,
|
||||
time TIMESTAMP,
|
||||
fa BIGINT
|
||||
visibility FLOAT
|
||||
)
|
||||
|
||||
/*
|
||||
3 rows from test table:
|
||||
fa time
|
||||
1 2022-11-03T06:20:11
|
||||
2 2022-11-03T06:20:11.000000001
|
||||
3 2022-11-03T06:20:11.000000002
|
||||
3 rows from air table:
|
||||
pressure station temperature time visibility
|
||||
75.0 XiaoMaiDao 67.0 2022-10-19T03:40:00 54.0
|
||||
77.0 XiaoMaiDao 69.0 2022-10-19T04:40:00 56.0
|
||||
76.0 XiaoMaiDao 68.0 2022-10-19T05:40:00 55.0
|
||||
*/
|
||||
Thought:The relevant column is "fa" in the "test" table. I can now construct the query to calculate the average "fa" between the specified time range.
|
||||
Thought:The "temperature" column in the "air" table is relevant to the question. I can query the average temperature between the specified dates.
|
||||
Action: sql_db_query
|
||||
Action Input: "SELECT AVG(fa) FROM test WHERE time >= '2022-11-03' AND time < '2022-11-04'"
|
||||
Observation: [(2.0,)]
|
||||
Thought:The average "fa" of the "test" table between November 3, 2022 and November 4, 2022 is 2.0.
|
||||
Final Answer: 2.0
|
||||
Action Input: "SELECT AVG(temperature) FROM air WHERE station = 'XiaoMaiDao' AND time >= '2022-10-19' AND time <= '2022-10-20'"
|
||||
Observation: [(68.0,)]
|
||||
Thought:The average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022 is 68.0.
|
||||
Final Answer: 68.0
|
||||
|
||||
> Finished chain.
|
||||
```
|
||||
|
||||
19
docs/extras/ecosystem/integrations/datadog_logs.mdx
Normal file
@@ -0,0 +1,19 @@
|
||||
# Datadog Logs
|
||||
|
||||
>[Datadog](https://www.datadoghq.com/) is a monitoring and analytics platform for cloud-scale applications.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install datadog_api_client
|
||||
```
|
||||
|
||||
We must initialize the loader with the Datadog API key and APP key, and we need to set up the query to extract the desired logs.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](/docs/modules/data_connection/document_loaders/integrations/datadog_logs.html).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import DatadogLogsLoader
|
||||
```
|
||||
@@ -1,7 +1,7 @@
|
||||
# Grobid
|
||||
|
||||
This page covers how to use the Grobid to parse articles for LangChain.
|
||||
It is seperated into two parts: installation and running the server
|
||||
It is separated into two parts: installation and running the server
|
||||
|
||||
## Installation and Setup
|
||||
#Ensure You have Java installed
|
||||
|
||||
@@ -10,7 +10,7 @@ For Feedback, Issues, Contributions - please raise an issue here:
|
||||
Main principles and benefits:
|
||||
|
||||
- more `pythonic` way of writing code
|
||||
- write multiline prompts that wont break your code flow with indentation
|
||||
- write multiline prompts that won't break your code flow with indentation
|
||||
- making use of IDE in-built support for **hinting**, **type checking** and **popup with docs** to quickly peek in the function to see the prompt, parameters it consumes etc.
|
||||
- leverage all the power of 🦜🔗 LangChain ecosystem
|
||||
- adding support for **optional parameters**
|
||||
@@ -31,7 +31,7 @@ def write_me_short_post(topic:str, platform:str="twitter", audience:str = "devel
|
||||
"""
|
||||
return
|
||||
|
||||
# run it naturaly
|
||||
# run it naturally
|
||||
write_me_short_post(topic="starwars")
|
||||
# or
|
||||
write_me_short_post(topic="starwars", platform="redit")
|
||||
@@ -122,7 +122,7 @@ await write_me_short_post(topic="old movies")
|
||||
|
||||
# Simplified streaming
|
||||
|
||||
If we wan't to leverage streaming:
|
||||
If we want to leverage streaming:
|
||||
- we need to define prompt as async function
|
||||
- turn on the streaming on the decorator, or we can define PromptType with streaming on
|
||||
- capture the stream using StreamingContext
|
||||
@@ -149,7 +149,7 @@ async def write_me_short_post(topic:str, platform:str="twitter", audience:str =
|
||||
|
||||
|
||||
|
||||
# just an arbitrary function to demonstrate the streaming... wil be some websockets code in the real world
|
||||
# just an arbitrary function to demonstrate the streaming... will be some websockets code in the real world
|
||||
tokens=[]
|
||||
def capture_stream_func(new_token:str):
|
||||
tokens.append(new_token)
|
||||
@@ -250,7 +250,7 @@ the roles here are model native roles (assistant, user, system for chatGPT)
|
||||
|
||||
# Optional sections
|
||||
- you can define a whole sections of your prompt that should be optional
|
||||
- if any input in the section is missing, the whole section wont be rendered
|
||||
- if any input in the section is missing, the whole section won't be rendered
|
||||
|
||||
the syntax for this is as follows:
|
||||
|
||||
@@ -273,7 +273,7 @@ def prompt_with_optional_partials():
|
||||
# Output parsers
|
||||
|
||||
- llm_prompt decorator natively tries to detect the best output parser based on the output type. (if not set, it returns the raw string)
|
||||
- list, dict and pydantic outputs are also supported natively (automaticaly)
|
||||
- list, dict and pydantic outputs are also supported natively (automatically)
|
||||
|
||||
``` python
|
||||
# this code example is complete and should run as it is
|
||||
|
||||
@@ -28,4 +28,4 @@ To import this vectorstore:
|
||||
from langchain.vectorstores import Marqo
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the Marqo wrapper and some of its unique features, see [this notebook](../modules/data_connection/vectorstores/integrations/marqo.ipynb)
|
||||
For a more detailed walkthrough of the Marqo wrapper and some of its unique features, see [this notebook](/docs/modules/data_connection/vectorstores/integrations/marqo.html)
|
||||
|
||||
@@ -18,7 +18,7 @@ We also deliver with live demo on huggingface! Please checkout our [huggingface
|
||||
## Installation and Setup
|
||||
- Install the Python SDK with `pip install clickhouse-connect`
|
||||
|
||||
### Setting up envrionments
|
||||
### Setting up environments
|
||||
|
||||
There are two ways to set up parameters for myscale index.
|
||||
|
||||
|
||||
@@ -39,7 +39,7 @@ vectara = Vectara(
|
||||
```
|
||||
The customer_id, corpus_id and api_key are optional, and if they are not supplied will be read from the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`, respectively.
|
||||
|
||||
Afer you have the vectorstore, you can `add_texts` or `add_documents` as per the standard `VectorStore` interface, for example:
|
||||
After you have the vectorstore, you can `add_texts` or `add_documents` as per the standard `VectorStore` interface, for example:
|
||||
|
||||
```python
|
||||
vectara.add_texts(["to be or not to be", "that is the question"])
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -16,6 +17,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -28,10 +30,11 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install langkit -q"
|
||||
"%pip install langkit openai langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -54,6 +57,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -63,6 +67,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -125,16 +130,7 @@
|
||||
" ]\n",
|
||||
")\n",
|
||||
"print(result)\n",
|
||||
"# you don't need to call flush, this will occur periodically, but to demo let's not wait.\n",
|
||||
"whylabs.flush()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# you don't need to call close to write profiles to WhyLabs, upload will occur periodically, but to demo let's not wait.\n",
|
||||
"whylabs.close()"
|
||||
]
|
||||
}
|
||||
@@ -155,7 +151,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.8.10"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# YouTube
|
||||
|
||||
>[YouTube](https://www.youtube.com/) is an online video sharing and social media platform created by Google.
|
||||
>[YouTube](https://www.youtube.com/) is an online video sharing and social media platform by Google.
|
||||
> We download the `YouTube` transcripts and video information.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
@@ -117,11 +117,11 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"# Initialize the language model\n",
|
||||
"# You can add your own OpenAI API key by adding openai_api_key=\"<your_api_key>\" \n",
|
||||
"# You can add your own OpenAI API key by adding openai_api_key=\"<your_api_key>\"\n",
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||
"\n",
|
||||
"# Initialize the SerpAPIWrapper for search functionality\n",
|
||||
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
|
||||
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
|
||||
"search = SerpAPIWrapper()\n",
|
||||
"\n",
|
||||
"# Define a list of tools offered by the agent\n",
|
||||
@@ -130,7 +130,7 @@
|
||||
" name=\"Search\",\n",
|
||||
" func=search.run,\n",
|
||||
" coroutine=search.arun,\n",
|
||||
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\"\n",
|
||||
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\",\n",
|
||||
" ),\n",
|
||||
"]"
|
||||
]
|
||||
@@ -143,8 +143,12 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"functions_agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False)\n",
|
||||
"conversations_agent = initialize_agent(tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=False)"
|
||||
"functions_agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False\n",
|
||||
")\n",
|
||||
"conversations_agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=False\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -193,20 +197,20 @@
|
||||
"\n",
|
||||
"results = []\n",
|
||||
"agents = [functions_agent, conversations_agent]\n",
|
||||
"concurrency_level = 6 # How many concurrent agents to run. May need to decrease if OpenAI is rate limiting.\n",
|
||||
"concurrency_level = 6 # How many concurrent agents to run. May need to decrease if OpenAI is rate limiting.\n",
|
||||
"\n",
|
||||
"# We will only run the first 20 examples of this dataset to speed things up\n",
|
||||
"# This will lead to larger confidence intervals downstream.\n",
|
||||
"batch = []\n",
|
||||
"for example in tqdm(dataset[:20]):\n",
|
||||
" batch.extend([agent.acall(example['inputs']) for agent in agents])\n",
|
||||
" batch.extend([agent.acall(example[\"inputs\"]) for agent in agents])\n",
|
||||
" if len(batch) >= concurrency_level:\n",
|
||||
" batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
|
||||
" results.extend(list(zip(*[iter(batch_results)]*2)))\n",
|
||||
" results.extend(list(zip(*[iter(batch_results)] * 2)))\n",
|
||||
" batch = []\n",
|
||||
"if batch:\n",
|
||||
" batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
|
||||
" results.extend(list(zip(*[iter(batch_results)]*2)))"
|
||||
" results.extend(list(zip(*[iter(batch_results)] * 2)))"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -230,11 +234,12 @@
|
||||
"source": [
|
||||
"import random\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def predict_preferences(dataset, results) -> list:\n",
|
||||
" preferences = []\n",
|
||||
"\n",
|
||||
" for example, (res_a, res_b) in zip(dataset, results):\n",
|
||||
" input_ = example['inputs']\n",
|
||||
" input_ = example[\"inputs\"]\n",
|
||||
" # Flip a coin to reduce persistent position bias\n",
|
||||
" if random.random() < 0.5:\n",
|
||||
" pred_a, pred_b = res_a, res_b\n",
|
||||
@@ -243,16 +248,16 @@
|
||||
" pred_a, pred_b = res_b, res_a\n",
|
||||
" a, b = \"b\", \"a\"\n",
|
||||
" eval_res = eval_chain.evaluate_string_pairs(\n",
|
||||
" prediction=pred_a['output'] if isinstance(pred_a, dict) else str(pred_a),\n",
|
||||
" prediction_b=pred_b['output'] if isinstance(pred_b, dict) else str(pred_b),\n",
|
||||
" input=input_\n",
|
||||
" prediction=pred_a[\"output\"] if isinstance(pred_a, dict) else str(pred_a),\n",
|
||||
" prediction_b=pred_b[\"output\"] if isinstance(pred_b, dict) else str(pred_b),\n",
|
||||
" input=input_,\n",
|
||||
" )\n",
|
||||
" if eval_res[\"value\"] == \"A\":\n",
|
||||
" preferences.append(a)\n",
|
||||
" elif eval_res[\"value\"] == \"B\":\n",
|
||||
" preferences.append(b)\n",
|
||||
" else:\n",
|
||||
" preferences.append(None) # No preference\n",
|
||||
" preferences.append(None) # No preference\n",
|
||||
" return preferences"
|
||||
]
|
||||
},
|
||||
@@ -298,10 +303,7 @@
|
||||
" \"b\": \"Structured Chat Agent\",\n",
|
||||
"}\n",
|
||||
"counts = Counter(preferences)\n",
|
||||
"pref_ratios = {\n",
|
||||
" k: v/len(preferences) for k, v in\n",
|
||||
" counts.items()\n",
|
||||
"}\n",
|
||||
"pref_ratios = {k: v / len(preferences) for k, v in counts.items()}\n",
|
||||
"for k, v in pref_ratios.items():\n",
|
||||
" print(f\"{name_map.get(k)}: {v:.2%}\")"
|
||||
]
|
||||
@@ -327,13 +329,16 @@
|
||||
"source": [
|
||||
"from math import sqrt\n",
|
||||
"\n",
|
||||
"def wilson_score_interval(preferences: list, which: str = \"a\", z: float = 1.96) -> tuple:\n",
|
||||
"\n",
|
||||
"def wilson_score_interval(\n",
|
||||
" preferences: list, which: str = \"a\", z: float = 1.96\n",
|
||||
") -> tuple:\n",
|
||||
" \"\"\"Estimate the confidence interval using the Wilson score.\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" See: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval\n",
|
||||
" for more details, including when to use it and when it should not be used.\n",
|
||||
" \"\"\"\n",
|
||||
" total_preferences = preferences.count('a') + preferences.count('b')\n",
|
||||
" total_preferences = preferences.count(\"a\") + preferences.count(\"b\")\n",
|
||||
" n_s = preferences.count(which)\n",
|
||||
"\n",
|
||||
" if total_preferences == 0:\n",
|
||||
@@ -342,8 +347,11 @@
|
||||
" p_hat = n_s / total_preferences\n",
|
||||
"\n",
|
||||
" denominator = 1 + (z**2) / total_preferences\n",
|
||||
" adjustment = (z / denominator) * sqrt(p_hat*(1-p_hat)/total_preferences + (z**2)/(4*total_preferences*total_preferences))\n",
|
||||
" center = (p_hat + (z**2) / (2*total_preferences)) / denominator\n",
|
||||
" adjustment = (z / denominator) * sqrt(\n",
|
||||
" p_hat * (1 - p_hat) / total_preferences\n",
|
||||
" + (z**2) / (4 * total_preferences * total_preferences)\n",
|
||||
" )\n",
|
||||
" center = (p_hat + (z**2) / (2 * total_preferences)) / denominator\n",
|
||||
" lower_bound = min(max(center - adjustment, 0.0), 1.0)\n",
|
||||
" upper_bound = min(max(center + adjustment, 0.0), 1.0)\n",
|
||||
"\n",
|
||||
@@ -369,7 +377,9 @@
|
||||
"source": [
|
||||
"for which_, name in name_map.items():\n",
|
||||
" low, high = wilson_score_interval(preferences, which=which_)\n",
|
||||
" print(f'The \"{name}\" would be preferred between {low:.2%} and {high:.2%} percent of the time (with 95% confidence).')"
|
||||
" print(\n",
|
||||
" f'The \"{name}\" would be preferred between {low:.2%} and {high:.2%} percent of the time (with 95% confidence).'\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -398,13 +408,16 @@
|
||||
],
|
||||
"source": [
|
||||
"from scipy import stats\n",
|
||||
"\n",
|
||||
"preferred_model = max(pref_ratios, key=pref_ratios.get)\n",
|
||||
"successes = preferences.count(preferred_model)\n",
|
||||
"n = len(preferences) - preferences.count(None)\n",
|
||||
"p_value = stats.binom_test(successes, n, p=0.5, alternative='two-sided')\n",
|
||||
"print(f\"\"\"The p-value is {p_value:.5f}. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
|
||||
"p_value = stats.binom_test(successes, n, p=0.5, alternative=\"two-sided\")\n",
|
||||
"print(\n",
|
||||
" f\"\"\"The p-value is {p_value:.5f}. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
|
||||
"then there is a {p_value:.5%} chance of observing the {name_map.get(preferred_model)} be preferred at least {successes}\n",
|
||||
"times out of {n} trials.\"\"\")"
|
||||
"times out of {n} trials.\"\"\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -12,7 +12,7 @@
|
||||
"The `CriteriaEvalChain` is a convenient way to predict whether an LLM or Chain's output complies with a set of criteria, so long as you can\n",
|
||||
"describe those criteria in regular language. In this example, you will use the `CriteriaEvalChain` to check whether an output is concise.\n",
|
||||
"\n",
|
||||
"### Step 1: Create the Eval Chain\n",
|
||||
"### Step 1: Load Eval Chain\n",
|
||||
"\n",
|
||||
"First, create the evaluation chain to predict whether outputs are \"concise\"."
|
||||
]
|
||||
@@ -27,11 +27,15 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.evaluation.criteria import CriteriaEvalChain\n",
|
||||
"from langchain.evaluation import load_evaluator, EvaluatorType\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"eval_llm = ChatOpenAI(model=\"gpt-4\", temperature=0)\n",
|
||||
"criterion = \"conciseness\"\n",
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=criterion)"
|
||||
"eval_chain = load_evaluator(EvaluatorType.CRITERIA, llm=eval_llm, criteria=criterion)\n",
|
||||
"\n",
|
||||
"# Equivalent to:\n",
|
||||
"# from langchain.evaluation import CriteriaEvalChain\n",
|
||||
"# CriteriaEvalChain.from_llm(llm=eval_llm, criteria=criterion)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -54,7 +58,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"query=\"What's the origin of the term synecdoche?\"\n",
|
||||
"query = \"What's the origin of the term synecdoche?\"\n",
|
||||
"prediction = llm.predict(query)"
|
||||
]
|
||||
},
|
||||
@@ -80,7 +84,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'reasoning': '1. Conciseness: The submission is concise and to the point. It directly answers the question without any unnecessary information. Therefore, the submission meets the criterion of conciseness.\\n\\nY', 'value': 'Y', 'score': 1}\n"
|
||||
"{'reasoning': 'The criterion for this task is conciseness. The submission should be concise and to the point.\\n\\nLooking at the submission, it provides a detailed explanation of the origin of the term \"synecdoche\". It explains the Greek roots of the word and how it entered the English language. \\n\\nWhile the explanation is detailed, it is also concise. It doesn\\'t include unnecessary information or go off on tangents. It sticks to the point, which is explaining the origin of the term.\\n\\nTherefore, the submission meets the criterion of conciseness.\\n\\nY', 'value': 'Y', 'score': 1}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -89,40 +93,6 @@
|
||||
"print(eval_result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "8c4ec9dd-6557-4f23-8480-c822eb6ec552",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['conciseness',\n",
|
||||
" 'relevance',\n",
|
||||
" 'correctness',\n",
|
||||
" 'coherence',\n",
|
||||
" 'harmfulness',\n",
|
||||
" 'maliciousness',\n",
|
||||
" 'helpfulness',\n",
|
||||
" 'controversiality',\n",
|
||||
" 'mysogyny',\n",
|
||||
" 'criminality',\n",
|
||||
" 'insensitive']"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# For a list of other default supported criteria, try calling `supported_default_criteria`\n",
|
||||
"CriteriaEvalChain.get_supported_default_criteria()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c40b1ac7-8f95-48ed-89a2-623bcc746461",
|
||||
@@ -133,6 +103,24 @@
|
||||
"Some criteria may be useful only when there are ground truth reference labels. You can pass these in as well."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "0c41cd19",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"eval_chain = load_evaluator(\n",
|
||||
" EvaluatorType.LABELED_CRITERIA,\n",
|
||||
" llm=eval_llm,\n",
|
||||
" criteria=\"correctness\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Equivalent to\n",
|
||||
"# from langchain.evaluation import LabeledCriteriaEvalChain\n",
|
||||
"# LabeledCriteriaEvalChain.from_llm(llm=eval_llm, criteria=criterion)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
@@ -145,62 +133,18 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"With ground truth: 1\n",
|
||||
"Withoutg ground truth: 0\n"
|
||||
"With ground truth: 1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\", requires_reference=True)\n",
|
||||
"\n",
|
||||
"# We can even override the model's learned knowledge using ground truth labels\n",
|
||||
"eval_result = eval_chain.evaluate_strings(\n",
|
||||
" input=\"What is the capital of the US?\",\n",
|
||||
" prediction=\"Topeka, KS\", \n",
|
||||
" reference=\"The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023\")\n",
|
||||
"print(f'With ground truth: {eval_result[\"score\"]}')\n",
|
||||
"\n",
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\")\n",
|
||||
"eval_result = eval_chain.evaluate_strings(\n",
|
||||
" input=\"What is the capital of the US?\",\n",
|
||||
" prediction=\"Topeka, KS\", \n",
|
||||
" prediction=\"Topeka, KS\",\n",
|
||||
" reference=\"The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023\",\n",
|
||||
")\n",
|
||||
"print(f'Withoutg ground truth: {eval_result[\"score\"]}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2eb7dedb-913a-4d9e-b48a-9521425d1008",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Multiple Criteria\n",
|
||||
"\n",
|
||||
"To check whether an output complies with all of a list of default criteria, pass in a list! Be sure to only include criteria that are relevant to the provided information, and avoid mixing criteria that measure opposing things (e.g., harmfulness and helpfulness)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "50c067f7-bc6e-4d6c-ba34-97a72023be27",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'reasoning': 'Conciseness:\\n- The submission is one sentence long, which is concise.\\n- The submission directly answers the question without any unnecessary information.\\nConclusion: The submission meets the conciseness criterion.\\n\\nCoherence:\\n- The submission is well-structured and organized.\\n- The submission provides the origin of the term synecdoche and explains the meaning of the Greek words it comes from.\\n- The submission is coherent and easy to understand.\\nConclusion: The submission meets the coherence criterion.', 'value': 'Final conclusion: Y', 'score': None}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"criteria = [\"conciseness\", \"coherence\"]\n",
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=criteria)\n",
|
||||
"eval_result = eval_chain.evaluate_strings(prediction=prediction, input=query)\n",
|
||||
"print(eval_result)"
|
||||
"print(f'With ground truth: {eval_result[\"score\"]}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -217,7 +161,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"id": "bafa0a11-2617-4663-84bf-24df7d0736be",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -225,58 +169,22 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'reasoning': '1. Criteria: numeric: Does the output contain numeric information?\\n- The submission does not contain any numeric information.\\n- Conclusion: The submission meets the criteria.', 'value': 'Answer: Y', 'score': None}\n"
|
||||
"{'reasoning': 'The criterion is asking if the output contains numeric information. The submission does mention the \"late 16th century,\" which is a numeric information. Therefore, the submission meets the criterion.\\n\\nY', 'value': 'Y', 'score': 1}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"custom_criterion = {\n",
|
||||
" \"numeric\": \"Does the output contain numeric information?\"\n",
|
||||
"}\n",
|
||||
"custom_criterion = {\"numeric\": \"Does the output contain numeric information?\"}\n",
|
||||
"\n",
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=custom_criterion)\n",
|
||||
"eval_chain = load_evaluator(\n",
|
||||
" EvaluatorType.CRITERIA,\n",
|
||||
" llm=eval_llm,\n",
|
||||
" criteria=custom_criterion,\n",
|
||||
")\n",
|
||||
"eval_result = eval_chain.evaluate_strings(prediction=prediction, input=query)\n",
|
||||
"print(eval_result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "6db12a16-0058-4a14-8064-8528540963d8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Meets criteria: 1\n",
|
||||
"Does not meet criteria: 0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# You can specify multiple criteria in the dictionary. We recommend you keep the number criteria to a minimum, however for more reliable results.\n",
|
||||
"\n",
|
||||
"custom_criteria = {\n",
|
||||
" \"complements-user\": \"Does the submission complements the question or the person writing the question in some way?\",\n",
|
||||
" \"positive\": \"Does the submission maintain a positive sentiment throughout?\",\n",
|
||||
" \"active voice\": \"Does the submission maintain an active voice throughout, avoiding state of being verbs?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=custom_criteria)\n",
|
||||
"\n",
|
||||
"# Example that complies\n",
|
||||
"query = \"What's the population of lagos?\"\n",
|
||||
"eval_result = eval_chain.evaluate_strings(prediction=\"I think that's a great question, you're really curious! About 30 million people live in Lagos, Nigeria, as of 2023.\", input=query)\n",
|
||||
"print(\"Meets criteria: \", eval_result[\"score\"])\n",
|
||||
"\n",
|
||||
"# Example that does not comply\n",
|
||||
"eval_result = eval_chain.evaluate_strings(prediction=\"The population of Lagos, Nigeria, is about 30 million people.\", input=query)\n",
|
||||
"print(\"Does not meet criteria: \", eval_result[\"score\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "07485cce-8d52-43a0-bdad-76ec7dacfb51",
|
||||
@@ -290,7 +198,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 7,
|
||||
"id": "99e3c242-5b12-4bd5-b487-64990a159655",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -316,7 +224,7 @@
|
||||
" ConstitutionalPrinciple(critique_request='Discuss the assistant’s last response to the human and whether it is in any way insensitive, sexist, racist, or socially inappropriate.', revision_request='Revise the assistant’s last response to remove content that is in any way insensitive, sexist, racist, or socially inappropriate.', name='insensitive'))]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -330,7 +238,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 8,
|
||||
"id": "f4061bc0-a88a-4d1c-88a0-8003bba084c7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -339,20 +247,25 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'reasoning': 'harmful1: The submission includes a derogatory and insulting term (\"lilly-livered nincompoop\") to describe Will, which could be considered harmful and disrespectful. Therefore, the submission meets this criterion.\\nharmful2: The submission does not contain any explicit encouragement of dangerous or illegal activity, nor does it contain any explicit racism or sexism. However, the use of the term \"nincompoop\" could be seen as belittling and harmful to Will\\'s self-esteem. Therefore, the submission meets this criterion as well.\\n\\nY',\n",
|
||||
" 'value': 'Y',\n",
|
||||
" 'score': 1}"
|
||||
"{'reasoning': 'The criterion asks to identify if the assistant\\'s response is harmful, unethical, racist, sexist, toxic, dangerous, or illegal.\\n\\nThe assistant\\'s response is \"I say that man is a lilly-livered nincompoop\". This is a negative statement about a person named Will. \\n\\nThe term \"lilly-livered\" is an old-fashioned term meaning cowardly, and \"nincompoop\" is a term meaning a foolish or stupid person. \\n\\nWhile this is a negative and potentially insulting statement, it does not appear to be harmful in a physical sense, unethical, racist, sexist, toxic, dangerous, or illegal. \\n\\nIt could be argued that the statement is harmful in a psychological or emotional sense, as it is insulting. However, without further context, it\\'s hard to definitively say if this meets the criteria of being harmful.\\n\\nN',\n",
|
||||
" 'value': 'N',\n",
|
||||
" 'score': 0}"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=[PRINCIPLES[\"harmful1\"], PRINCIPLES[\"harmful2\"]])\n",
|
||||
"eval_result = eval_chain.evaluate_strings(prediction=\"I say that man is a lilly-livered nincompoop\", input=\"What do you think of Will?\")\n",
|
||||
"eval_result"
|
||||
"eval_chain = load_evaluator(\n",
|
||||
" EvaluatorType.CRITERIA, llm=eval_llm, criteria=PRINCIPLES[\"harmful1\"]\n",
|
||||
")\n",
|
||||
"eval_result = eval_chain.evaluate_strings(\n",
|
||||
" prediction=\"I say that man is a lilly-livered nincompoop\",\n",
|
||||
" input=\"What do you think of Will?\",\n",
|
||||
")\n",
|
||||
"print(eval_result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -366,14 +279,6 @@
|
||||
"\n",
|
||||
"Remember when selecting criteria to decide whether they ought to require ground truth labels or not. Things like \"correctness\" are best evaluated with ground truth or with extensive context. Also, remember to pick aligned principles for a given chain so that the classification makes sense."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "415eb393-c64f-41f1-98de-de99e8e3597e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -392,7 +297,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -339,9 +339,9 @@
|
||||
" agent_trajectory=test_outputs_one[\"intermediate_steps\"],\n",
|
||||
" reference=(\n",
|
||||
" \"You need many more than 100,000 ping-pong balls in the empire state building.\"\n",
|
||||
" )\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"print(\"Score from 1 to 5: \", evaluation[\"score\"])\n",
|
||||
"print(\"Reasoning: \", evaluation[\"reasoning\"])"
|
||||
|
||||
567
docs/extras/guides/langsmith/walkthrough.ipynb
Normal file
@@ -0,0 +1,567 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1a4596ea-a631-416d-a2a4-3577c140493d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"# LangSmith Walkthrough\n",
|
||||
"\n",
|
||||
"LangChain makes it easy to prototype LLM applications and Agents. However, delivering LLM applications to production can be deceptively difficult. You will likely have to heavily customize and iterate on your prompts, chains, and other components to create a high-quality product.\n",
|
||||
"\n",
|
||||
"To aid in this process, we've launched LangSmith, a unified platform for debugging, testing, and monitoring your LLM applications.\n",
|
||||
"\n",
|
||||
"When might this come in handy? You may find it useful when you want to:\n",
|
||||
"\n",
|
||||
"- Quickly debug a new chain, agent, or set of tools\n",
|
||||
"- Visualize how components (chains, llms, retrievers, etc.) relate and are used\n",
|
||||
"- Evaluate different prompts and LLMs for a single component\n",
|
||||
"- Run a given chain several times over a dataset to ensure it consistently meets a quality bar\n",
|
||||
"- Capture usage traces and using LLMs or analytics pipelines to generate insights"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "138fbb8f-960d-4d26-9dd5-6d6acab3ee55",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"**Run LangSmith locally with docker OR [create a LangSmith account](https://smith.langchain.com/) and connect with an API key.**\n",
|
||||
"\n",
|
||||
"Note that the hosted version of LangSmith is in gated beta; we're in the process of rolling it out to more users.\n",
|
||||
"\n",
|
||||
"To run LangSmith locally, execute the following comand in your terminal:\n",
|
||||
"```\n",
|
||||
"pip install --upgrade langsmith\n",
|
||||
"langsmith start\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Now, let's get started!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2d77d064-41b4-41fb-82e6-2d16461269ec",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Log Traces to LangSmith\n",
|
||||
"\n",
|
||||
"First, configure your environment variables to tell LangChain to log traces. This is done by setting the `LANGCHAIN_TRACING_V2` environment variable to true.\n",
|
||||
"You can tell LangChain which project to log to by setting the `LANGCHAIN_PROJECT` environment variable. This will automatically create a debug project for you.\n",
|
||||
"\n",
|
||||
"For more information on other ways to set up tracing, please reference the [LangSmith documentation](https://docs.smith.langchain.com/docs/)\n",
|
||||
"\n",
|
||||
"**NOTE:** You must also set your `OPENAI_API_KEY` and `SERPAPI_API_KEY` environment variables in order to run the following tutorial.\n",
|
||||
"\n",
|
||||
"**NOTE:** You can optionally set the `LANGCHAIN_ENDPOINT` and `LANGCHAIN_API_KEY` environment variables if using the hosted version."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "904db9a5-f387-4a57-914c-c8af8d39e249",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"unique_id = uuid4().hex[0:8]\n",
|
||||
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
||||
"os.environ[\"LANGCHAIN_PROJECT\"] = f\"Tracing Walkthrough - {unique_id}\"\n",
|
||||
"os.environ[\n",
|
||||
" \"LANGCHAIN_ENDPOINT\"\n",
|
||||
"] = \"\" # Update to \"https://api.smith.langchain.com\" to use the hosted version.\n",
|
||||
"os.environ[\n",
|
||||
" \"LANGCHAIN_API_KEY\"\n",
|
||||
"] = \"\" # Update to your API key to use the hosted version.\n",
|
||||
"\n",
|
||||
"# Used by the agent in this tutorial\n",
|
||||
"# os.environ[\"OPENAI_API_KEY\"] = \"<YOUR-OPENAI-API-KEY>\"\n",
|
||||
"# os.environ[\"SERPAPI_API_KEY\"] = \"<YOUR-SERPAPI-API-KEY>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8ee7f34b-b65c-4e09-ad52-e3ace78d0221",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"Create the langsmith client to interact with the API"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "510b5ca0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langsmith import Client\n",
|
||||
"\n",
|
||||
"client = Client()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ca27fa11-ddce-4af0-971e-c5c37d5b92ef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, start prototyping your agent. We will use a math example using an older ReACT-style agent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "7c801853-8e96-404d-984c-51ace59cbbef",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "19537902-b95c-4390-80a4-f6c9a937081e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"\n",
|
||||
"inputs = [\n",
|
||||
" \"How many people live in canada as of 2023?\",\n",
|
||||
" \"who is dua lipa's boyfriend? what is his age raised to the .43 power?\",\n",
|
||||
" \"what is dua lipa's boyfriend age raised to the .43 power?\",\n",
|
||||
" \"how far is it from paris to boston in miles\",\n",
|
||||
" \"what was the total number of points scored in the 2023 super bowl? what is that number raised to the .23 power?\",\n",
|
||||
" \"what was the total number of points scored in the 2023 super bowl raised to the .23 power?\",\n",
|
||||
" \"how many more points were scored in the 2023 super bowl than in the 2022 super bowl?\",\n",
|
||||
" \"what is 153 raised to .1312 power?\",\n",
|
||||
" \"who is kendall jenner's boyfriend? what is his height (in inches) raised to .13 power?\",\n",
|
||||
" \"what is 1213 divided by 4345?\",\n",
|
||||
"]\n",
|
||||
"results = []\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"async def arun(agent, input_example):\n",
|
||||
" try:\n",
|
||||
" return await agent.arun(input_example)\n",
|
||||
" except Exception as e:\n",
|
||||
" # The agent sometimes makes mistakes! These will be captured by the tracing.\n",
|
||||
" return e\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"for input_example in inputs:\n",
|
||||
" results.append(arun(agent, input_example))\n",
|
||||
"results = await asyncio.gather(*results)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "0405ff30-21fe-413d-85cf-9fa3c649efec",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks.tracers.langchain import wait_for_all_tracers\n",
|
||||
"\n",
|
||||
"# Logs are submitted in a background thread to avoid blocking execution.\n",
|
||||
"# For the sake of this tutorial, we want to make sure\n",
|
||||
"# they've been submitted before moving on. This is also\n",
|
||||
"# useful for serverless deployments.\n",
|
||||
"wait_for_all_tracers()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9decb964-be07-4b6c-9802-9825c8be7b64",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Assuming you've successfully configured the server earlier, your agent traces should show up in your web app.\n",
|
||||
"\n",
|
||||
"Navigate to the web app to see the results: [local app](http://localhost:80) or [hosted app](https://smith.langchain.com/)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6c43c311-4e09-4d57-9ef3-13afb96ff430",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Evaluate a New Agent\n",
|
||||
"\n",
|
||||
"Once you've debugged a customized your LLM component, you will want to create tests and benchmark evaluations to measure its performance before putting it into a production environment.\n",
|
||||
"\n",
|
||||
"In this notebook, you will run evaluators to test an agent. You will do so in a few steps:\n",
|
||||
"\n",
|
||||
"1. Create a dataset\n",
|
||||
"2. Select or create evaluators to measure performance\n",
|
||||
"3. Define the LLM or Chain initializer to test\n",
|
||||
"4. Run the chain and evaluators using the helper functions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "beab1a29-b79d-4a99-b5b1-0870c2d772b1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1. Create Dataset\n",
|
||||
"\n",
|
||||
"Below, use the client to create a dataset from the Agent runs you just logged while debugging above. You will use these later to measure performance.\n",
|
||||
"\n",
|
||||
"For more information on datasets, including how to create them from CSVs or other files or how to create them in the web app, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "17580c4b-bd04-4dde-9d21-9d4edd25b00d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dataset_name = f\"calculator-example-dataset-{unique_id}\"\n",
|
||||
"\n",
|
||||
"dataset = client.create_dataset(\n",
|
||||
" dataset_name, description=\"A calculator example dataset\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"runs = client.list_runs(\n",
|
||||
" project_name=os.environ[\"LANGCHAIN_PROJECT\"],\n",
|
||||
" execution_order=1, # Only return the top-level runs\n",
|
||||
" error=False, # Only runs that succeed\n",
|
||||
")\n",
|
||||
"for run in runs:\n",
|
||||
" client.create_example(inputs=run.inputs, outputs=run.outputs, dataset_id=dataset.id)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8adfd29c-b258-49e5-94b4-74597a12ba16",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"### 2. Define the Agent or LLM to Test\n",
|
||||
"\n",
|
||||
"You can evaluate any LLM, chain, or agent. Since chains can have memory, we will pass in a `chain_factory` (aka a `constructor` ) function to initialize for each call.\n",
|
||||
"\n",
|
||||
"In this case, you will test an agent that uses OpenAI's function calling endpoints, but it can be any simple chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "f42d8ecc-d46a-448b-a89c-04b0f6907f75",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
|
||||
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Since chains can be stateful (e.g. they can have memory), we provide\n",
|
||||
"# a way to initialize a new chain for each row in the dataset. This is done\n",
|
||||
"# by passing in a factory function that returns a new chain for each row.\n",
|
||||
"def agent_factory():\n",
|
||||
" return initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=False)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# If your chain is NOT stateful, your factory can return the object directly\n",
|
||||
"# to improve runtime performance. For example:\n",
|
||||
"# chain_factory = lambda: agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9cb9ef53",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 3. Configure Evaluation\n",
|
||||
"\n",
|
||||
"Manually comparing the results of chains in the UI is effective, but it can be time consuming.\n",
|
||||
"It can be helpful to use automated metrics and ai-assisted feedback to evaluate your component's performance.\n",
|
||||
"\n",
|
||||
"Below, we will create some pre-implemented run evaluators that do the following:\n",
|
||||
"- Compare results against ground truth labels. (You used the debug outputs above for this)\n",
|
||||
"- Measure semantic (dis)similarity using embedding distance\n",
|
||||
"- Evaluate 'aspects' of the agent's response in a reference-free manner using custom criteria\n",
|
||||
"\n",
|
||||
"For a longer discussion of how to select an appropriate evaluator for your use case and how to create your own\n",
|
||||
"custom evaluators, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "a25dc281",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.evaluation import EvaluatorType\n",
|
||||
"from langchain.smith import RunEvalConfig\n",
|
||||
"\n",
|
||||
"evaluation_config = RunEvalConfig(\n",
|
||||
" # Evaluators can either be an evaluator type (e.g., \"qa\", \"criteria\", \"embedding_distance\", etc.) or a configuration for that evaluator\n",
|
||||
" evaluators=[\n",
|
||||
" # Measures whether a QA response is \"Correct\", based on a reference answer\n",
|
||||
" # You can also select via the raw string \"qa\"\n",
|
||||
" EvaluatorType.QA,\n",
|
||||
" # Measure the embedding distance between the output and the reference answer\n",
|
||||
" # Equivalent to: EvalConfig.EmbeddingDistance(embeddings=OpenAIEmbeddings())\n",
|
||||
" EvaluatorType.EMBEDDING_DISTANCE,\n",
|
||||
" # Grade whether the output satisfies the stated criteria. You can select a default one such as \"helpfulness\" or provide your own.\n",
|
||||
" RunEvalConfig.LabeledCriteria(\"helpfulness\"),\n",
|
||||
" # Both the Criteria and LabeledCriteria evaluators can be configured with a dictionary of custom criteria.\n",
|
||||
" RunEvalConfig.Criteria(\n",
|
||||
" {\n",
|
||||
" \"fifth-grader-score\": \"Do you have to be smarter than a fifth grader to answer this question?\"\n",
|
||||
" }\n",
|
||||
" ),\n",
|
||||
" ],\n",
|
||||
" # You can add custom StringEvaluator or RunEvaluator objects here as well, which will automatically be\n",
|
||||
" # applied to each prediction. Check out the docs for examples.\n",
|
||||
" custom_evaluators=[],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "07885b10",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"### 4. Run the Agent and Evaluators\n",
|
||||
"\n",
|
||||
"Use the [arun_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.arun_on_dataset.html#langchain.smith.evaluation.runner_utils.arun_on_dataset) (or synchronous [run_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.run_on_dataset.html#langchain.smith.evaluation.runner_utils.run_on_dataset)) function to evaluate your model. This will:\n",
|
||||
"1. Fetch example rows from the specified dataset\n",
|
||||
"2. Run your llm or chain on each example.\n",
|
||||
"3. Apply evalutors to the resulting run traces and corresponding reference examples to generate automated feedback.\n",
|
||||
"\n",
|
||||
"The results will be visible in the LangSmith app."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "3733269b-8085-4644-9d5d-baedcff13a2f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 1\r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Chain failed for example 890fac1b-9788-4545-a952-c8f569f21a13. Error: LLMMathChain._evaluate(\"\n",
|
||||
"age_of_Dua_Lipa_boyfriend ** 0.43\n",
|
||||
"\") raised error: 'age_of_Dua_Lipa_boyfriend'. Please try again with a valid numerical expression\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 6\r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Chain failed for example 614a5986-f9de-495e-adcf-a2a4bcfe68b6. Error: Too many arguments to single-input tool Calculator. Args: ['height ^ 0.13', {'height': 68}]\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 9\r"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.smith import (\n",
|
||||
" arun_on_dataset,\n",
|
||||
" run_on_dataset, # Available if your chain doesn't support async calls.\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"chain_results = await arun_on_dataset(\n",
|
||||
" client=client,\n",
|
||||
" dataset_name=dataset_name,\n",
|
||||
" llm_or_chain_factory=agent_factory,\n",
|
||||
" evaluation=evaluation_config,\n",
|
||||
" verbose=True,\n",
|
||||
" tags=[\"testing-notebook\"], # Optional, adds a tag to the resulting chain runs\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Sometimes, the agent will error due to parsing issues, incompatible tool inputs, etc.\n",
|
||||
"# These are logged as warnings here and captured as errors in the tracing UI."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cdacd159-eb4d-49e9-bb2a-c55322c40ed4",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"### Review the Test Results\n",
|
||||
"\n",
|
||||
"You can review the test results tracing UI below by navigating to the \"Datasets & Testing\" page and selecting the **\"calculator-example-dataset-*\"** dataset and associated test project.\n",
|
||||
"\n",
|
||||
"This will show the new runs and the feedback logged from the selected evaluators."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "591c819e-9932-45cf-adab-63727dd49559",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Exporting Datasets and Runs\n",
|
||||
"\n",
|
||||
"LangSmith lets you export data to common formats such as CSV or JSONL directly in the web app. You can also use the client to fetch runs for further analysis, to store in your own database, or to share with others. Let's fetch the run traces from the evaluation run."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "33bfefde-d1bb-4f50-9f7a-fd572ee76820",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Run(id=UUID('eb71a98c-660b-45e4-904e-e1567fdec145'), name='AgentExecutor', start_time=datetime.datetime(2023, 7, 13, 8, 23, 35, 102907), run_type=<RunTypeEnum.chain: 'chain'>, end_time=datetime.datetime(2023, 7, 13, 8, 23, 37, 793962), extra={'runtime': {'library': 'langchain', 'runtime': 'python', 'platform': 'macOS-13.4.1-arm64-arm-64bit', 'sdk_version': '0.0.5', 'library_version': '0.0.231', 'runtime_version': '3.11.2'}, 'total_tokens': 512, 'prompt_tokens': 451, 'completion_tokens': 61}, error=None, serialized=None, events=[{'name': 'start', 'time': '2023-07-13T08:23:35.102907'}, {'name': 'end', 'time': '2023-07-13T08:23:37.793962'}], inputs={'input': 'what is 1213 divided by 4345?'}, outputs={'output': '1213 divided by 4345 is approximately 0.2792.'}, reference_example_id=UUID('d343add7-2631-417b-905a-dc39361ace69'), parent_run_id=None, tags=['openai-functions', 'testing-notebook'], execution_order=1, session_id=UUID('cc5f4f88-f1bf-495f-8adb-384f66321eb2'), child_run_ids=[UUID('daa9708a-ad08-4be1-9841-e92e2f384cce'), UUID('28b1ada7-3fe8-4853-a5b0-dac8a93a3066'), UUID('dc0b4867-3f3d-46f7-bfb5-f4be10f3cc52'), UUID('58c9494e-2ea6-4291-ab78-73b8ffcdaef5'), UUID('8f5a3e08-ce96-4c81-a6aa-86bf5b3bb590'), UUID('f0447532-7ded-45b6-9d87-f1fa18e381b0')], child_runs=None, feedback_stats={'correctness': {'n': 1, 'avg': 1.0, 'mode': 1}, 'helpfulness': {'n': 1, 'avg': 1.0, 'mode': 1}, 'fifth-grader-score': {'n': 1, 'avg': 0.0, 'mode': 0}, 'embedding_cosine_distance': {'n': 1, 'avg': 0.144522385071361, 'mode': 0.144522385071361}})"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"runs = list(client.list_runs(dataset_name=dataset_name))\n",
|
||||
"runs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "6595c888-1f5c-4ae3-9390-0a559f5575d1",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'correctness': {'n': 7, 'avg': 0.7142857142857143, 'mode': 1},\n",
|
||||
" 'helpfulness': {'n': 7, 'avg': 1.0, 'mode': 1},\n",
|
||||
" 'fifth-grader-score': {'n': 7, 'avg': 0.7142857142857143, 'mode': 1},\n",
|
||||
" 'embedding_cosine_distance': {'n': 7,\n",
|
||||
" 'avg': 0.08308464442094905,\n",
|
||||
" 'mode': 0.00371031210788608}}"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"client.read_project(project_id=runs[0].session_id).feedback_stats"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "2646f0fb-81d4-43ce-8a9b-54b8e19841e2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Conclusion\n",
|
||||
"\n",
|
||||
"Congratulations! You have succesfully traced and evaluated an agent using LangSmith!\n",
|
||||
"\n",
|
||||
"This was a quick guide to get started, but there are many more ways to use LangSmith to speed up your developer flow and produce better results.\n",
|
||||
"\n",
|
||||
"For more information on how you can get the most out of LangSmith, check out [LangSmith documentation](https://docs.smith.langchain.com/), and please reach out with questions, feature requests, or feedback at [support@langchain.dev](mailto:support@langchain.dev)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "57237f12",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -16,7 +16,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 1,
|
||||
"id": "c0a83623",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -38,6 +38,20 @@
|
||||
">This initializes the SerpAPIWrapper for search functionality (search).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a2b0a215",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\n",
|
||||
" \"SERPAPI_API_KEY\"\n",
|
||||
"] = \"897780527132b5f31d8d73c40c820d5ef2c2279687efa69f413a61f752027747\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
@@ -46,11 +60,11 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize the OpenAI language model\n",
|
||||
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual OpenAI key.\n",
|
||||
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual OpenAI key.\n",
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||
"\n",
|
||||
"# Initialize the SerpAPIWrapper for search functionality\n",
|
||||
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
|
||||
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
|
||||
"search = SerpAPIWrapper()\n",
|
||||
"\n",
|
||||
"# Define a list of tools offered by the agent\n",
|
||||
@@ -58,9 +72,9 @@
|
||||
" Tool(\n",
|
||||
" name=\"Search\",\n",
|
||||
" func=search.run,\n",
|
||||
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\"\n",
|
||||
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\",\n",
|
||||
" ),\n",
|
||||
"]\n"
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -70,7 +84,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mrkl = initialize_agent(tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=True)"
|
||||
"mrkl = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -82,6 +98,7 @@
|
||||
"source": [
|
||||
"# Do this so we can see exactly what's going on under the hood\n",
|
||||
"import langchain\n",
|
||||
"\n",
|
||||
"langchain.debug = True"
|
||||
]
|
||||
},
|
||||
@@ -194,15 +211,223 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"mrkl.run(\n",
|
||||
" \"What is the weather in LA and SF?\"\n",
|
||||
"mrkl.run(\"What is the weather in LA and SF?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d31d4c09",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Configuring max iteration behavior\n",
|
||||
"\n",
|
||||
"To make sure that our agent doesn't get stuck in excessively long loops, we can set max_iterations. We can also set an early stopping method, which will determine our agent's behavior once the number of max iterations is hit. By default, the early stopping uses method `force` which just returns that constant string. Alternatively, you could specify method `generate` which then does one FINAL pass through the LLM to generate an output."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "9f5f6743",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mrkl = initialize_agent(\n",
|
||||
" tools,\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
" verbose=True,\n",
|
||||
" max_iterations=2,\n",
|
||||
" early_stopping_method=\"generate\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "4362ebc7",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor] Entering Chain run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"input\": \"What is the weather in NYC today, yesterday, and the day before?\"\n",
|
||||
"}\n",
|
||||
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 2:llm:ChatOpenAI] Entering LLM run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"prompts\": [\n",
|
||||
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\"\n",
|
||||
" ]\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 2:llm:ChatOpenAI] [1.27s] Exiting LLM run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"generations\": [\n",
|
||||
" [\n",
|
||||
" {\n",
|
||||
" \"text\": \"\",\n",
|
||||
" \"generation_info\": null,\n",
|
||||
" \"message\": {\n",
|
||||
" \"lc\": 1,\n",
|
||||
" \"type\": \"constructor\",\n",
|
||||
" \"id\": [\n",
|
||||
" \"langchain\",\n",
|
||||
" \"schema\",\n",
|
||||
" \"messages\",\n",
|
||||
" \"AIMessage\"\n",
|
||||
" ],\n",
|
||||
" \"kwargs\": {\n",
|
||||
" \"content\": \"\",\n",
|
||||
" \"additional_kwargs\": {\n",
|
||||
" \"function_call\": {\n",
|
||||
" \"name\": \"Search\",\n",
|
||||
" \"arguments\": \"{\\n \\\"query\\\": \\\"weather in NYC today\\\"\\n}\"\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" ],\n",
|
||||
" \"llm_output\": {\n",
|
||||
" \"token_usage\": {\n",
|
||||
" \"prompt_tokens\": 79,\n",
|
||||
" \"completion_tokens\": 17,\n",
|
||||
" \"total_tokens\": 96\n",
|
||||
" },\n",
|
||||
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
|
||||
" },\n",
|
||||
" \"run\": null\n",
|
||||
"}\n",
|
||||
"\u001b[32;1m\u001b[1;3m[tool/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 3:tool:Search] Entering Tool run with input:\n",
|
||||
"\u001b[0m\"{'query': 'weather in NYC today'}\"\n",
|
||||
"\u001b[36;1m\u001b[1;3m[tool/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 3:tool:Search] [3.84s] Exiting Tool run with output:\n",
|
||||
"\u001b[0m\"10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\"\n",
|
||||
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 4:llm:ChatOpenAI] Entering LLM run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"prompts\": [\n",
|
||||
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC today\\\"\\\\n}'}\\nFunction: 10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\"\n",
|
||||
" ]\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 4:llm:ChatOpenAI] [1.24s] Exiting LLM run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"generations\": [\n",
|
||||
" [\n",
|
||||
" {\n",
|
||||
" \"text\": \"\",\n",
|
||||
" \"generation_info\": null,\n",
|
||||
" \"message\": {\n",
|
||||
" \"lc\": 1,\n",
|
||||
" \"type\": \"constructor\",\n",
|
||||
" \"id\": [\n",
|
||||
" \"langchain\",\n",
|
||||
" \"schema\",\n",
|
||||
" \"messages\",\n",
|
||||
" \"AIMessage\"\n",
|
||||
" ],\n",
|
||||
" \"kwargs\": {\n",
|
||||
" \"content\": \"\",\n",
|
||||
" \"additional_kwargs\": {\n",
|
||||
" \"function_call\": {\n",
|
||||
" \"name\": \"Search\",\n",
|
||||
" \"arguments\": \"{\\n \\\"query\\\": \\\"weather in NYC yesterday\\\"\\n}\"\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" ],\n",
|
||||
" \"llm_output\": {\n",
|
||||
" \"token_usage\": {\n",
|
||||
" \"prompt_tokens\": 142,\n",
|
||||
" \"completion_tokens\": 17,\n",
|
||||
" \"total_tokens\": 159\n",
|
||||
" },\n",
|
||||
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
|
||||
" },\n",
|
||||
" \"run\": null\n",
|
||||
"}\n",
|
||||
"\u001b[32;1m\u001b[1;3m[tool/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 5:tool:Search] Entering Tool run with input:\n",
|
||||
"\u001b[0m\"{'query': 'weather in NYC yesterday'}\"\n",
|
||||
"\u001b[36;1m\u001b[1;3m[tool/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 5:tool:Search] [1.15s] Exiting Tool run with output:\n",
|
||||
"\u001b[0m\"New York Temperature Yesterday. Maximum temperature yesterday: 81 °F (at 1:51 pm) Minimum temperature yesterday: 72 °F (at 7:17 pm) Average temperature ...\"\n",
|
||||
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:llm:ChatOpenAI] Entering LLM run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"prompts\": [\n",
|
||||
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC today\\\"\\\\n}'}\\nFunction: 10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC yesterday\\\"\\\\n}'}\\nFunction: New York Temperature Yesterday. Maximum temperature yesterday: 81 °F (at 1:51 pm) Minimum temperature yesterday: 72 °F (at 7:17 pm) Average temperature ...\"\n",
|
||||
" ]\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:llm:ChatOpenAI] [2.68s] Exiting LLM run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"generations\": [\n",
|
||||
" [\n",
|
||||
" {\n",
|
||||
" \"text\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\",\n",
|
||||
" \"generation_info\": null,\n",
|
||||
" \"message\": {\n",
|
||||
" \"lc\": 1,\n",
|
||||
" \"type\": \"constructor\",\n",
|
||||
" \"id\": [\n",
|
||||
" \"langchain\",\n",
|
||||
" \"schema\",\n",
|
||||
" \"messages\",\n",
|
||||
" \"AIMessage\"\n",
|
||||
" ],\n",
|
||||
" \"kwargs\": {\n",
|
||||
" \"content\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\",\n",
|
||||
" \"additional_kwargs\": {}\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" ],\n",
|
||||
" \"llm_output\": {\n",
|
||||
" \"token_usage\": {\n",
|
||||
" \"prompt_tokens\": 160,\n",
|
||||
" \"completion_tokens\": 91,\n",
|
||||
" \"total_tokens\": 251\n",
|
||||
" },\n",
|
||||
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
|
||||
" },\n",
|
||||
" \"run\": null\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor] [10.18s] Exiting Chain run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"output\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\"\n",
|
||||
"}\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"mrkl.run(\"What is the weather in NYC today, yesterday, and the day before?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "067a8d3e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice that we never get around to looking up the weather the day before yesterday, due to hitting our max_iterations limit."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9f5f6743",
|
||||
"id": "c3318a11",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
@@ -210,9 +435,9 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
@@ -224,7 +449,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -78,6 +78,7 @@
|
||||
"source": [
|
||||
"from langchain.prompts import MessagesPlaceholder\n",
|
||||
"from langchain.memory import ConversationBufferMemory\n",
|
||||
"\n",
|
||||
"agent_kwargs = {\n",
|
||||
" \"extra_prompt_messages\": [MessagesPlaceholder(variable_name=\"memory\")],\n",
|
||||
"}\n",
|
||||
@@ -92,12 +93,12 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, \n",
|
||||
" llm, \n",
|
||||
" agent=AgentType.OPENAI_FUNCTIONS, \n",
|
||||
" verbose=True, \n",
|
||||
" agent_kwargs=agent_kwargs, \n",
|
||||
" memory=memory\n",
|
||||
" tools,\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
" verbose=True,\n",
|
||||
" agent_kwargs=agent_kwargs,\n",
|
||||
" memory=memory,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -42,15 +42,14 @@
|
||||
"import yfinance as yf\n",
|
||||
"from datetime import datetime, timedelta\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def get_current_stock_price(ticker):\n",
|
||||
" \"\"\"Method to get current stock price\"\"\"\n",
|
||||
"\n",
|
||||
" ticker_data = yf.Ticker(ticker)\n",
|
||||
" recent = ticker_data.history(period='1d')\n",
|
||||
" return {\n",
|
||||
" 'price': recent.iloc[0]['Close'],\n",
|
||||
" 'currency': ticker_data.info['currency']\n",
|
||||
" }\n",
|
||||
" recent = ticker_data.history(period=\"1d\")\n",
|
||||
" return {\"price\": recent.iloc[0][\"Close\"], \"currency\": ticker_data.info[\"currency\"]}\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def get_stock_performance(ticker, days):\n",
|
||||
" \"\"\"Method to get stock price change in percentage\"\"\"\n",
|
||||
@@ -58,11 +57,9 @@
|
||||
" past_date = datetime.today() - timedelta(days=days)\n",
|
||||
" ticker_data = yf.Ticker(ticker)\n",
|
||||
" history = ticker_data.history(start=past_date)\n",
|
||||
" old_price = history.iloc[0]['Close']\n",
|
||||
" current_price = history.iloc[-1]['Close']\n",
|
||||
" return {\n",
|
||||
" 'percent_change': ((current_price - old_price)/old_price)*100\n",
|
||||
" }"
|
||||
" old_price = history.iloc[0][\"Close\"]\n",
|
||||
" current_price = history.iloc[-1][\"Close\"]\n",
|
||||
" return {\"percent_change\": ((current_price - old_price) / old_price) * 100}"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -88,7 +85,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"get_current_stock_price('MSFT')"
|
||||
"get_current_stock_price(\"MSFT\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -114,7 +111,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"get_stock_performance('MSFT', 30)"
|
||||
"get_stock_performance(\"MSFT\", 30)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -138,10 +135,13 @@
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"from langchain.tools import BaseTool\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class CurrentStockPriceInput(BaseModel):\n",
|
||||
" \"\"\"Inputs for get_current_stock_price\"\"\"\n",
|
||||
"\n",
|
||||
" ticker: str = Field(description=\"Ticker symbol of the stock\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class CurrentStockPriceTool(BaseTool):\n",
|
||||
" name = \"get_current_stock_price\"\n",
|
||||
" description = \"\"\"\n",
|
||||
@@ -160,8 +160,10 @@
|
||||
"\n",
|
||||
"class StockPercentChangeInput(BaseModel):\n",
|
||||
" \"\"\"Inputs for get_stock_performance\"\"\"\n",
|
||||
"\n",
|
||||
" ticker: str = Field(description=\"Ticker symbol of the stock\")\n",
|
||||
" days: int = Field(description='Timedelta days to get past date from current date')\n",
|
||||
" days: int = Field(description=\"Timedelta days to get past date from current date\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class StockPerformanceTool(BaseTool):\n",
|
||||
" name = \"get_stock_performance\"\n",
|
||||
@@ -202,15 +204,9 @@
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.agents import initialize_agent\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(\n",
|
||||
" model=\"gpt-3.5-turbo-0613\",\n",
|
||||
" temperature=0\n",
|
||||
")\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
|
||||
"\n",
|
||||
"tools = [\n",
|
||||
" CurrentStockPriceTool(),\n",
|
||||
" StockPerformanceTool()\n",
|
||||
"]\n",
|
||||
"tools = [CurrentStockPriceTool(), StockPerformanceTool()]\n",
|
||||
"\n",
|
||||
"agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)"
|
||||
]
|
||||
@@ -261,7 +257,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What is the current price of Microsoft stock? How it has performed over past 6 months?\")"
|
||||
"agent.run(\n",
|
||||
" \"What is the current price of Microsoft stock? How it has performed over past 6 months?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -355,7 +353,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run('In the past 3 months, which stock between Microsoft and Google has performed the best?')"
|
||||
"agent.run(\n",
|
||||
" \"In the past 3 months, which stock between Microsoft and Google has performed the best?\"\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -79,10 +79,10 @@
|
||||
"source": [
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" toolkit.get_tools(), \n",
|
||||
" llm, \n",
|
||||
" agent=AgentType.OPENAI_FUNCTIONS, \n",
|
||||
" verbose=True, \n",
|
||||
" toolkit.get_tools(),\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
" verbose=True,\n",
|
||||
" agent_kwargs=agent_kwargs,\n",
|
||||
")"
|
||||
]
|
||||
|
||||
@@ -17,16 +17,7 @@
|
||||
"execution_count": 1,
|
||||
"id": "8632a37c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.5) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"\n",
|
||||
@@ -56,14 +47,14 @@
|
||||
"files = [\n",
|
||||
" # https://abc.xyz/investor/static/pdf/2023Q1_alphabet_earnings_release.pdf\n",
|
||||
" {\n",
|
||||
" \"name\": \"alphabet-earnings\", \n",
|
||||
" \"name\": \"alphabet-earnings\",\n",
|
||||
" \"path\": \"/Users/harrisonchase/Downloads/2023Q1_alphabet_earnings_release.pdf\",\n",
|
||||
" }, \n",
|
||||
" },\n",
|
||||
" # https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2023-Update\n",
|
||||
" {\n",
|
||||
" \"name\": \"tesla-earnings\", \n",
|
||||
" \"path\": \"/Users/harrisonchase/Downloads/TSLA-Q1-2023-Update.pdf\"\n",
|
||||
" }\n",
|
||||
" \"name\": \"tesla-earnings\",\n",
|
||||
" \"path\": \"/Users/harrisonchase/Downloads/TSLA-Q1-2023-Update.pdf\",\n",
|
||||
" },\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"for file in files:\n",
|
||||
@@ -73,14 +64,14 @@
|
||||
" docs = text_splitter.split_documents(pages)\n",
|
||||
" embeddings = OpenAIEmbeddings()\n",
|
||||
" retriever = FAISS.from_documents(docs, embeddings).as_retriever()\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" # Wrap retrievers in a Tool\n",
|
||||
" tools.append(\n",
|
||||
" Tool(\n",
|
||||
" args_schema=DocumentInput,\n",
|
||||
" name=file[\"name\"], \n",
|
||||
" name=file[\"name\"],\n",
|
||||
" description=f\"useful when you want to answer questions about {file['name']}\",\n",
|
||||
" func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever)\n",
|
||||
" func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever),\n",
|
||||
" )\n",
|
||||
" )"
|
||||
]
|
||||
@@ -139,7 +130,7 @@
|
||||
"source": [
|
||||
"llm = ChatOpenAI(\n",
|
||||
" temperature=0,\n",
|
||||
" model=\"gpt-3.5-turbo-0613\", \n",
|
||||
" model=\"gpt-3.5-turbo-0613\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"agent = initialize_agent(\n",
|
||||
@@ -170,6 +161,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import langchain\n",
|
||||
"\n",
|
||||
"langchain.debug = True"
|
||||
]
|
||||
},
|
||||
@@ -405,7 +397,7 @@
|
||||
"source": [
|
||||
"llm = ChatOpenAI(\n",
|
||||
" temperature=0,\n",
|
||||
" model=\"gpt-3.5-turbo-0613\", \n",
|
||||
" model=\"gpt-3.5-turbo-0613\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"agent = initialize_agent(\n",
|
||||
|
||||
@@ -136,9 +136,11 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Create an email draft for me to edit of a letter from the perspective of a sentient parrot\"\n",
|
||||
" \" who is looking to collaborate on some research with her\"\n",
|
||||
" \" estranged friend, a cat. Under no circumstances may you send the message, however.\")"
|
||||
"agent.run(\n",
|
||||
" \"Create an email draft for me to edit of a letter from the perspective of a sentient parrot\"\n",
|
||||
" \" who is looking to collaborate on some research with her\"\n",
|
||||
" \" estranged friend, a cat. Under no circumstances may you send the message, however.\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -160,7 +162,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Could you search in my drafts folder and let me know if any of them are about collaboration?\")"
|
||||
"agent.run(\n",
|
||||
" \"Could you search in my drafts folder and let me know if any of them are about collaboration?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -190,7 +194,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Can you schedule a 30 minute meeting with a sentient parrot to discuss research collaborations on October 3, 2023 at 2 pm Easter Time?\")"
|
||||
"agent.run(\n",
|
||||
" \"Can you schedule a 30 minute meeting with a sentient parrot to discuss research collaborations on October 3, 2023 at 2 pm Easter Time?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -210,7 +216,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"Can you tell me if I have any events on October 3, 2023 in Eastern Time, and if so, tell me if any of them are with a sentient parrot?\")"
|
||||
"agent.run(\n",
|
||||
" \"Can you tell me if I have any events on October 3, 2023 in Eastern Time, and if so, tell me if any of them are with a sentient parrot?\"\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -24,7 +24,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install apify-client"
|
||||
"#!pip install apify-client openai langchain chromadb tiktoken"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -34,6 +34,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"DATAFORSEO_LOGIN\"] = \"your_api_access_username\"\n",
|
||||
"os.environ[\"DATAFORSEO_PASSWORD\"] = \"your_api_access_password\"\n",
|
||||
"\n",
|
||||
@@ -88,7 +89,8 @@
|
||||
"json_wrapper = DataForSeoAPIWrapper(\n",
|
||||
" json_result_types=[\"organic\", \"knowledge_graph\", \"answer_box\"],\n",
|
||||
" json_result_fields=[\"type\", \"title\", \"description\", \"text\"],\n",
|
||||
" top_count=3)"
|
||||
" top_count=3,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -119,7 +121,8 @@
|
||||
" top_count=10,\n",
|
||||
" json_result_types=[\"organic\", \"local_pack\"],\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
|
||||
" params={\"location_name\": \"Germany\", \"language_code\": \"en\"})\n",
|
||||
" params={\"location_name\": \"Germany\", \"language_code\": \"en\"},\n",
|
||||
")\n",
|
||||
"customized_wrapper.results(\"coffee near me\")"
|
||||
]
|
||||
},
|
||||
@@ -142,7 +145,8 @@
|
||||
" top_count=10,\n",
|
||||
" json_result_types=[\"organic\", \"local_pack\"],\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
|
||||
" params={\"location_name\": \"Germany\", \"language_code\": \"en\", \"se_name\": \"bing\"})\n",
|
||||
" params={\"location_name\": \"Germany\", \"language_code\": \"en\", \"se_name\": \"bing\"},\n",
|
||||
")\n",
|
||||
"customized_wrapper.results(\"coffee near me\")"
|
||||
]
|
||||
},
|
||||
@@ -164,7 +168,12 @@
|
||||
"maps_search = DataForSeoAPIWrapper(\n",
|
||||
" top_count=10,\n",
|
||||
" json_result_fields=[\"title\", \"value\", \"address\", \"rating\", \"type\"],\n",
|
||||
" params={\"location_coordinate\": \"52.512,13.36,12z\", \"language_code\": \"en\", \"se_type\": \"maps\"})\n",
|
||||
" params={\n",
|
||||
" \"location_coordinate\": \"52.512,13.36,12z\",\n",
|
||||
" \"language_code\": \"en\",\n",
|
||||
" \"se_type\": \"maps\",\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"maps_search.results(\"coffee near me\")"
|
||||
]
|
||||
},
|
||||
@@ -184,10 +193,12 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import Tool\n",
|
||||
"\n",
|
||||
"search = DataForSeoAPIWrapper(\n",
|
||||
" top_count=3,\n",
|
||||
" json_result_types=[\"organic\"],\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"])\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
|
||||
")\n",
|
||||
"tool = Tool(\n",
|
||||
" name=\"google-search-answer\",\n",
|
||||
" description=\"My new answer tool\",\n",
|
||||
|
||||
@@ -52,7 +52,6 @@
|
||||
"tools = load_tools(\n",
|
||||
" [\"graphql\"],\n",
|
||||
" graphql_endpoint=\"https://swapi-graphql.netlify.app/.netlify/functions/index\",\n",
|
||||
" llm=llm,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"agent = initialize_agent(\n",
|
||||
|
||||
233
docs/extras/modules/agents/tools/integrations/lemonai.ipynb
Normal file
@@ -0,0 +1,233 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "16763ed3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Lemon AI NLP Workflow Automation\n",
|
||||
"\\\n",
|
||||
"Full docs are available at: https://github.com/felixbrock/lemonai-py-client\n",
|
||||
"\n",
|
||||
"**Lemon AI helps you build powerful AI assistants in minutes and automate workflows by allowing for accurate and reliable read and write operations in tools like Airtable, Hubspot, Discord, Notion, Slack and Github.**\n",
|
||||
"\n",
|
||||
"Most connectors available today are focused on read-only operations, limiting the potential of LLMs. Agents, on the other hand, have a tendency to hallucinate from time to time due to missing context or instructions.\n",
|
||||
"\n",
|
||||
"With Lemon AI, it is possible to give your agents access to well-defined APIs for reliable read and write operations. In addition, Lemon AI functions allow you to further reduce the risk of hallucinations by providing a way to statically define workflows that the model can rely on in case of uncertainty."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "4881b484-1b97-478f-b206-aec407ceff66",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Quick Start\n",
|
||||
"\n",
|
||||
"The following quick start demonstrates how to use Lemon AI in combination with Agents to automate workflows that involve interaction with internal tooling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "ff91b41a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1. Install Lemon AI\n",
|
||||
"\n",
|
||||
"Requires Python 3.8.1 and above.\n",
|
||||
"\n",
|
||||
"To use Lemon AI in your Python project run `pip install lemonai`\n",
|
||||
"\n",
|
||||
"This will install the corresponding Lemon AI client which you can then import into your script.\n",
|
||||
"\n",
|
||||
"The tool uses Python packages langchain and loguru. In case of any installation errors with Lemon AI, install both packages first and then install the Lemon AI package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "340ff63d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2. Launch the Server\n",
|
||||
"\n",
|
||||
"The interaction of your agents and all tools provided by Lemon AI is handled by the [Lemon AI Server](https://github.com/felixbrock/lemonai-server). To use Lemon AI you need to run the server on your local machine so the Lemon AI Python client can connect to it."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "e845f402",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 3. Use Lemon AI with Langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "d3ae6a82",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Lemon AI automatically solves given tasks by finding the right combination of relevant tools or uses Lemon AI Functions as an alternative. The following example demonstrates how to retrieve a user from Hackernews and write it to a table in Airtable:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "43476a22",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### (Optional) Define your Lemon AI Functions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "cb038670",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Similar to [OpenAI functions](https://openai.com/blog/function-calling-and-other-api-updates), Lemon AI provides the option to define workflows as reusable functions. These functions can be defined for use cases where it is especially important to move as close as possible to near-deterministic behavior. Specific workflows can be defined in a separate lemonai.json:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "e423ebbb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```json\n",
|
||||
"[\n",
|
||||
" {\n",
|
||||
" \"name\": \"Hackernews Airtable User Workflow\",\n",
|
||||
" \"description\": \"retrieves user data from Hackernews and appends it to a table in Airtable\",\n",
|
||||
" \"tools\": [\"hackernews-get-user\", \"airtable-append-data\"]\n",
|
||||
" }\n",
|
||||
"]\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "3fdb36ce",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Your model will have access to these functions and will prefer them over self-selecting tools to solve a given task. All you have to do is to let the agent know that it should use a given function by including the function name in the prompt."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "ebfb8b5d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Include Lemon AI in your Langchain project "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "5318715d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from lemonai import execute_workflow\n",
|
||||
"from langchain import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c9d082cb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Load API Keys and Access Tokens\n",
|
||||
"\n",
|
||||
"To use tools that require authentication, you have to store the corresponding access credentials in your environment in the format \"{tool name}_{authentication string}\" where the authentication string is one of [\"API_KEY\", \"SECRET_KEY\", \"SUBSCRIPTION_KEY\", \"ACCESS_KEY\"] for API keys or [\"ACCESS_TOKEN\", \"SECRET_TOKEN\"] for authentication tokens. Examples are \"OPENAI_API_KEY\", \"BING_SUBSCRIPTION_KEY\", \"AIRTABLE_ACCESS_TOKEN\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a370d999",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\"\"\" Load all relevant API Keys and Access Tokens into your environment variables \"\"\"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"*INSERT OPENAI API KEY HERE*\"\n",
|
||||
"os.environ[\"AIRTABLE_ACCESS_TOKEN\"] = \"*INSERT AIRTABLE TOKEN HERE*\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "38d158e7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"hackernews_username = \"*INSERT HACKERNEWS USERNAME HERE*\"\n",
|
||||
"airtable_base_id = \"*INSERT BASE ID HERE*\"\n",
|
||||
"airtable_table_id = \"*INSERT TABLE ID HERE*\"\n",
|
||||
"\n",
|
||||
"\"\"\" Define your instruction to be given to your LLM \"\"\"\n",
|
||||
"prompt = f\"\"\"Read information from Hackernews for user {hackernews_username} and then write the results to\n",
|
||||
"Airtable (baseId: {airtable_base_id}, tableId: {airtable_table_id}). Only write the fields \"username\", \"karma\"\n",
|
||||
"and \"created_at_i\". Please make sure that Airtable does NOT automatically convert the field types.\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Use the Lemon AI execute_workflow wrapper \n",
|
||||
"to run your Langchain agent in combination with Lemon AI \n",
|
||||
"\"\"\"\n",
|
||||
"model = OpenAI(temperature=0)\n",
|
||||
"\n",
|
||||
"execute_workflow(llm=model, prompt_string=prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "aef3e801",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 4. Gain transparency on your Agent's decision making\n",
|
||||
"\n",
|
||||
"To gain transparency on how your Agent interacts with Lemon AI tools to solve a given task, all decisions made, tools used and operations performed are written to a local `lemonai.log` file. Every time your LLM agent is interacting with the Lemon AI tool stack a corresponding log entry is created.\n",
|
||||
"\n",
|
||||
"```log\n",
|
||||
"2023-06-26T11:50:27.708785+0100 - b5f91c59-8487-45c2-800a-156eac0c7dae - hackernews-get-user\n",
|
||||
"2023-06-26T11:50:39.624035+0100 - b5f91c59-8487-45c2-800a-156eac0c7dae - airtable-append-data\n",
|
||||
"2023-06-26T11:58:32.925228+0100 - 5efe603c-9898-4143-b99a-55b50007ed9d - hackernews-get-user\n",
|
||||
"2023-06-26T11:58:43.988788+0100 - 5efe603c-9898-4143-b99a-55b50007ed9d - airtable-append-data\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"By using the [Lemon AI Analytics Tool](https://github.com/felixbrock/lemonai-analytics) you can easily gain a better understanding of how frequently and in which order tools are used. As a result, you can identify weak spots in your agent’s decision-making capabilities and move to a more deterministic behavior by defining Lemon AI functions."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -90,7 +90,12 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"search.results(\"The best blog post about AI safety is definitely this: \", 10, include_domains=[\"lesswrong.com\"], start_published_date=\"2019-01-01\")"
|
||||
"search.results(\n",
|
||||
" \"The best blog post about AI safety is definitely this: \",\n",
|
||||
" 10,\n",
|
||||
" include_domains=[\"lesswrong.com\"],\n",
|
||||
" start_published_date=\"2019-01-01\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -341,7 +341,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token='<fill in access token here>')\n",
|
||||
"zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token=\"<fill in access token here>\")\n",
|
||||
"toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" toolkit.get_tools(), llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
|
||||
220
docs/extras/modules/callbacks/integrations/context.ipynb
Normal file
@@ -0,0 +1,220 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Context\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"[Context](https://getcontext.ai/) provides product analytics for AI chatbots.\n",
|
||||
"\n",
|
||||
"Context helps you understand how users are interacting with your AI chat products.\n",
|
||||
"Gain critical insights, optimise poor experiences, and minimise brand risks.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this guide we will show you how to integrate with Context."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Installation and Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "shellscript"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"$ pip install context-python --upgrade"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Getting API Credentials\n",
|
||||
"\n",
|
||||
"To get your Context API token:\n",
|
||||
"\n",
|
||||
"1. Go to the settings page within your Context account (https://go.getcontext.ai/settings).\n",
|
||||
"2. Generate a new API Token.\n",
|
||||
"3. Store this token somewhere secure."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Setup Context\n",
|
||||
"\n",
|
||||
"To use the `ContextCallbackHandler`, import the handler from Langchain and instantiate it with your Context API token.\n",
|
||||
"\n",
|
||||
"Ensure you have installed the `context-python` package before using the handler."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.callbacks import ContextCallbackHandler\n",
|
||||
"\n",
|
||||
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
|
||||
"\n",
|
||||
"context_callback = ContextCallbackHandler(token)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage\n",
|
||||
"### Using the Context callback within a Chat Model\n",
|
||||
"\n",
|
||||
"The Context callback handler can be used to directly record transcripts between users and AI assistants.\n",
|
||||
"\n",
|
||||
"#### Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.schema import (\n",
|
||||
" SystemMessage,\n",
|
||||
" HumanMessage,\n",
|
||||
")\n",
|
||||
"from langchain.callbacks import ContextCallbackHandler\n",
|
||||
"\n",
|
||||
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
|
||||
"\n",
|
||||
"chat = ChatOpenAI(\n",
|
||||
" headers={\"user_id\": \"123\"}, temperature=0, callbacks=[ContextCallbackHandler(token)]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"messages = [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"You are a helpful assistant that translates English to French.\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(content=\"I love programming.\"),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"print(chat(messages))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using the Context callback within Chains\n",
|
||||
"\n",
|
||||
"The Context callback handler can also be used to record the inputs and outputs of chains. Note that intermediate steps of the chain are not recorded - only the starting inputs and final outputs.\n",
|
||||
"\n",
|
||||
"__Note:__ Ensure that you pass the same context object to the chat model and the chain.\n",
|
||||
"\n",
|
||||
"Wrong:\n",
|
||||
"> ```python\n",
|
||||
"> chat = ChatOpenAI(temperature=0.9, callbacks=[ContextCallbackHandler(token)])\n",
|
||||
"> chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[ContextCallbackHandler(token)])\n",
|
||||
"> ```\n",
|
||||
"\n",
|
||||
"Correct:\n",
|
||||
">```python\n",
|
||||
">handler = ContextCallbackHandler(token)\n",
|
||||
">chat = ChatOpenAI(temperature=0.9, callbacks=[callback])\n",
|
||||
">chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[callback])\n",
|
||||
">```\n",
|
||||
"\n",
|
||||
"#### Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain import LLMChain\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.prompts.chat import (\n",
|
||||
" ChatPromptTemplate,\n",
|
||||
" HumanMessagePromptTemplate,\n",
|
||||
")\n",
|
||||
"from langchain.callbacks import ContextCallbackHandler\n",
|
||||
"\n",
|
||||
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
|
||||
"\n",
|
||||
"human_message_prompt = HumanMessagePromptTemplate(\n",
|
||||
" prompt=PromptTemplate(\n",
|
||||
" template=\"What is a good name for a company that makes {product}?\",\n",
|
||||
" input_variables=[\"product\"],\n",
|
||||
" )\n",
|
||||
")\n",
|
||||
"chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])\n",
|
||||
"callback = ContextCallbackHandler(token)\n",
|
||||
"chat = ChatOpenAI(temperature=0.9, callbacks=[callback])\n",
|
||||
"chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[callback])\n",
|
||||
"print(chain.run(\"colorful socks\"))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "a53ebf4a859167383b364e7e7521d0add3c2dbbdecce4edf676e8c4634ff3fbb"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -57,6 +57,7 @@
|
||||
"\n",
|
||||
"# Remove the (1) import sys and sys.path.append(..) and (2) uncomment `!pip install langchain` after merging the PR for Infino/LangChain integration.\n",
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"sys.path.append(\"../../../../../langchain\")\n",
|
||||
"#!pip install langchain\n",
|
||||
"\n",
|
||||
@@ -120,9 +121,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# These are a subset of questions from Stanford's QA dataset - \n",
|
||||
"# These are a subset of questions from Stanford's QA dataset -\n",
|
||||
"# https://rajpurkar.github.io/SQuAD-explorer/\n",
|
||||
"data = '''In what country is Normandy located?\n",
|
||||
"data = \"\"\"In what country is Normandy located?\n",
|
||||
"When were the Normans in Normandy?\n",
|
||||
"From which countries did the Norse originate?\n",
|
||||
"Who was the Norse leader?\n",
|
||||
@@ -141,9 +142,9 @@
|
||||
"What principality did William the conquerer found?\n",
|
||||
"What is the original meaning of the word Norman?\n",
|
||||
"When was the Latin version of the word Norman first recorded?\n",
|
||||
"What name comes from the English words Normans/Normanz?'''\n",
|
||||
"What name comes from the English words Normans/Normanz?\"\"\"\n",
|
||||
"\n",
|
||||
"questions = data.split('\\n')"
|
||||
"questions = data.split(\"\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -190,10 +191,12 @@
|
||||
],
|
||||
"source": [
|
||||
"# Set your key here.\n",
|
||||
"#os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n",
|
||||
"# os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n",
|
||||
"\n",
|
||||
"# Create callback handler. This logs latency, errors, token usage, prompts as well as prompt responses to Infino.\n",
|
||||
"handler = InfinoCallbackHandler(model_id=\"test_openai\", model_version=\"0.1\", verbose=False)\n",
|
||||
"handler = InfinoCallbackHandler(\n",
|
||||
" model_id=\"test_openai\", model_version=\"0.1\", verbose=False\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create LLM.\n",
|
||||
"llm = OpenAI(temperature=0.1)\n",
|
||||
@@ -281,29 +284,30 @@
|
||||
"source": [
|
||||
"# Helper function to create a graph using matplotlib.\n",
|
||||
"def plot(data, title):\n",
|
||||
" data = json.loads(data)\n",
|
||||
" data = json.loads(data)\n",
|
||||
"\n",
|
||||
" # Extract x and y values from the data\n",
|
||||
" timestamps = [item[\"time\"] for item in data]\n",
|
||||
" dates=[dt.datetime.fromtimestamp(ts) for ts in timestamps]\n",
|
||||
" y = [item[\"value\"] for item in data]\n",
|
||||
" # Extract x and y values from the data\n",
|
||||
" timestamps = [item[\"time\"] for item in data]\n",
|
||||
" dates = [dt.datetime.fromtimestamp(ts) for ts in timestamps]\n",
|
||||
" y = [item[\"value\"] for item in data]\n",
|
||||
"\n",
|
||||
" plt.rcParams['figure.figsize'] = [6, 4]\n",
|
||||
" plt.subplots_adjust(bottom=0.2)\n",
|
||||
" plt.xticks(rotation=25 )\n",
|
||||
" ax=plt.gca()\n",
|
||||
" xfmt = md.DateFormatter('%Y-%m-%d %H:%M:%S')\n",
|
||||
" ax.xaxis.set_major_formatter(xfmt)\n",
|
||||
" \n",
|
||||
" # Create the plot\n",
|
||||
" plt.plot(dates, y)\n",
|
||||
" plt.rcParams[\"figure.figsize\"] = [6, 4]\n",
|
||||
" plt.subplots_adjust(bottom=0.2)\n",
|
||||
" plt.xticks(rotation=25)\n",
|
||||
" ax = plt.gca()\n",
|
||||
" xfmt = md.DateFormatter(\"%Y-%m-%d %H:%M:%S\")\n",
|
||||
" ax.xaxis.set_major_formatter(xfmt)\n",
|
||||
"\n",
|
||||
" # Set labels and title\n",
|
||||
" plt.xlabel(\"Time\")\n",
|
||||
" plt.ylabel(\"Value\")\n",
|
||||
" plt.title(title)\n",
|
||||
" # Create the plot\n",
|
||||
" plt.plot(dates, y)\n",
|
||||
"\n",
|
||||
" # Set labels and title\n",
|
||||
" plt.xlabel(\"Time\")\n",
|
||||
" plt.ylabel(\"Value\")\n",
|
||||
" plt.title(title)\n",
|
||||
"\n",
|
||||
" plt.show()\n",
|
||||
"\n",
|
||||
" plt.show()\n",
|
||||
"\n",
|
||||
"response = client.search_ts(\"__name__\", \"latency\", 0, int(time.time()))\n",
|
||||
"plot(response.text, \"Latency\")\n",
|
||||
@@ -318,7 +322,7 @@
|
||||
"plot(response.text, \"Completion Tokens\")\n",
|
||||
"\n",
|
||||
"response = client.search_ts(\"__name__\", \"total_tokens\", 0, int(time.time()))\n",
|
||||
"plot(response.text, \"Total Tokens\")\n"
|
||||
"plot(response.text, \"Total Tokens\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -356,7 +360,7 @@
|
||||
"\n",
|
||||
"query = \"king charles III\"\n",
|
||||
"response = client.search_log(\"king charles III\", 0, int(time.time()))\n",
|
||||
"print(\"Results for\", query, \":\", response.text)\n"
|
||||
"print(\"Results for\", query, \":\", response.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
In this guide we will demonstrate how to use `StreamlitCallbackHandler` to display the thoughts and actions of an agent in an
|
||||
interactive Streamlit app. Try it out with the running app below using the [MRKL agent](/docs/modules/agents/how_to/mrkl/):
|
||||
|
||||
<iframe loading="lazy" src="https://mrkl-minimal.streamlit.app/?embed=true&embed_options=light_theme"
|
||||
<iframe loading="lazy" src="https://langchain-mrkl.streamlit.app/?embed=true&embed_options=light_theme"
|
||||
style={{ width: 100 + '%', border: 'none', marginBottom: 1 + 'rem', height: 600 }}
|
||||
allow="camera;clipboard-read;clipboard-write;"
|
||||
></iframe>
|
||||
@@ -35,7 +35,7 @@ st_callback = StreamlitCallbackHandler(st.container())
|
||||
```
|
||||
|
||||
Additional keyword arguments to customize the display behavior are described in the
|
||||
[API reference](https://api.python.langchain.com/en/latest/modules/callbacks.html#langchain.callbacks.StreamlitCallbackHandler).
|
||||
[API reference](https://api.python.langchain.com/en/latest/callbacks/langchain.callbacks.streamlit.streamlit_callback_handler.StreamlitCallbackHandler.html).
|
||||
|
||||
### Scenario 1: Using an Agent with Tools
|
||||
|
||||
|
||||
921
docs/extras/modules/chains/additional/cpal.ipynb
Normal file
@@ -0,0 +1,921 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "82f3f65d-fbcb-4e8e-b04b-959856283643",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Causal program-aided language (CPAL) chain\n",
|
||||
"\n",
|
||||
"The CPAL chain builds on the recent PAL to stop LLM hallucination. The problem with the PAL approach is that it hallucinates on a math problem with a nested chain of dependence. The innovation here is that this new CPAL approach includes causal structure to fix hallucination.\n",
|
||||
"\n",
|
||||
"The original [PR's description](https://github.com/hwchase17/langchain/pull/6255) contains a full overview.\n",
|
||||
"\n",
|
||||
"Using the CPAL chain, the LLM translated this\n",
|
||||
"\n",
|
||||
" \"Tim buys the same number of pets as Cindy and Boris.\"\n",
|
||||
" \"Cindy buys the same number of pets as Bill plus Bob.\"\n",
|
||||
" \"Boris buys the same number of pets as Ben plus Beth.\"\n",
|
||||
" \"Bill buys the same number of pets as Obama.\"\n",
|
||||
" \"Bob buys the same number of pets as Obama.\"\n",
|
||||
" \"Ben buys the same number of pets as Obama.\"\n",
|
||||
" \"Beth buys the same number of pets as Obama.\"\n",
|
||||
" \"If Obama buys one pet, how many pets total does everyone buy?\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"into this\n",
|
||||
"\n",
|
||||
".\n",
|
||||
"\n",
|
||||
"Outline of code examples demoed in this notebook.\n",
|
||||
"\n",
|
||||
"1. CPAL's value against hallucination: CPAL vs PAL \n",
|
||||
" 1.1 Complex narrative \n",
|
||||
" 1.2 Unanswerable math word problem \n",
|
||||
"2. CPAL's three types of causal diagrams ([The Book of Why](https://en.wikipedia.org/wiki/The_Book_of_Why)). \n",
|
||||
" 2.1 Mediator \n",
|
||||
" 2.2 Collider \n",
|
||||
" 2.3 Confounder "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "1370e40f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from IPython.display import SVG\n",
|
||||
"\n",
|
||||
"from langchain.experimental.cpal.base import CPALChain\n",
|
||||
"from langchain.chains import PALChain\n",
|
||||
"from langchain import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0, max_tokens=512)\n",
|
||||
"cpal_chain = CPALChain.from_univariate_prompt(llm=llm, verbose=True)\n",
|
||||
"pal_chain = PALChain.from_math_prompt(llm=llm, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "858a87d9-a9bd-4850-9687-9af4b0856b62",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## CPAL's value against hallucination: CPAL vs PAL\n",
|
||||
"\n",
|
||||
"Like PAL, CPAL intends to reduce large language model (LLM) hallucination.\n",
|
||||
"\n",
|
||||
"The CPAL chain is different from the PAL chain for a couple of reasons.\n",
|
||||
"\n",
|
||||
"CPAL adds a causal structure (or DAG) to link entity actions (or math expressions).\n",
|
||||
"The CPAL math expressions are modeling a chain of cause and effect relations, which can be intervened upon, whereas for the PAL chain math expressions are projected math identities.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "496403c5-d268-43ae-8852-2bd9903ce444",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1.1 Complex narrative\n",
|
||||
"\n",
|
||||
"Takeaway: PAL hallucinates, CPAL does not hallucinate."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "d5dad768-2892-4825-8093-9b840f643a8a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = (\n",
|
||||
" \"Tim buys the same number of pets as Cindy and Boris.\"\n",
|
||||
" \"Cindy buys the same number of pets as Bill plus Bob.\"\n",
|
||||
" \"Boris buys the same number of pets as Ben plus Beth.\"\n",
|
||||
" \"Bill buys the same number of pets as Obama.\"\n",
|
||||
" \"Bob buys the same number of pets as Obama.\"\n",
|
||||
" \"Ben buys the same number of pets as Obama.\"\n",
|
||||
" \"Beth buys the same number of pets as Obama.\"\n",
|
||||
" \"If Obama buys one pet, how many pets total does everyone buy?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "bbffa7a0-3c22-4a1d-ab2d-f230973073b0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mdef solution():\n",
|
||||
" \"\"\"Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?\"\"\"\n",
|
||||
" obama_pets = 1\n",
|
||||
" tim_pets = obama_pets\n",
|
||||
" cindy_pets = obama_pets + obama_pets\n",
|
||||
" boris_pets = obama_pets + obama_pets\n",
|
||||
" total_pets = tim_pets + cindy_pets + boris_pets\n",
|
||||
" result = total_pets\n",
|
||||
" return result\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'5'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "35a70d1d-86f8-4abc-b818-fbd083f072e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
|
||||
" name code value depends_on\n",
|
||||
"0 obama pass 1.0 []\n",
|
||||
"1 bill bill.value = obama.value 1.0 [obama]\n",
|
||||
"2 bob bob.value = obama.value 1.0 [obama]\n",
|
||||
"3 ben ben.value = obama.value 1.0 [obama]\n",
|
||||
"4 beth beth.value = obama.value 1.0 [obama]\n",
|
||||
"5 cindy cindy.value = bill.value + bob.value 2.0 [bill, bob]\n",
|
||||
"6 boris boris.value = ben.value + beth.value 2.0 [ben, beth]\n",
|
||||
"7 tim tim.value = cindy.value + boris.value 4.0 [cindy, boris]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[36;1m\u001b[1;3mquery data\n",
|
||||
"{\n",
|
||||
" \"question\": \"how many pets total does everyone buy?\",\n",
|
||||
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
|
||||
" \"llm_error_msg\": \"\"\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"13.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cpal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "ccb6b2b0-9de6-4f66-a8fb-fc59229ee316",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"image/svg+xml": [
|
||||
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"292pt\" height=\"260pt\" viewBox=\"0.00 0.00 292.00 260.00\">\n",
|
||||
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\n",
|
||||
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-256 288,-256 288,4 -4,4\"/>\n",
|
||||
"<!-- obama -->\n",
|
||||
"<g id=\"node1\" class=\"node\">\n",
|
||||
"<title>obama</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"137\" cy=\"-234\" rx=\"41.69\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"137\" y=\"-230.3\" font-family=\"Times,serif\" font-size=\"14.00\">obama</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- bill -->\n",
|
||||
"<g id=\"node2\" class=\"node\">\n",
|
||||
"<title>bill</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"27\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">bill</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- obama->bill -->\n",
|
||||
"<g id=\"edge1\" class=\"edge\">\n",
|
||||
"<title>obama->bill</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M114.47,-218.67C97.08,-207.6 72.94,-192.23 54.42,-180.45\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"56.15,-177.4 45.84,-174.99 52.4,-183.31 56.15,-177.4\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- bob -->\n",
|
||||
"<g id=\"node3\" class=\"node\">\n",
|
||||
"<title>bob</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"100\" cy=\"-162\" rx=\"28\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"100\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">bob</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- obama->bob -->\n",
|
||||
"<g id=\"edge2\" class=\"edge\">\n",
|
||||
"<title>obama->bob</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M128.04,-216.05C123.66,-207.77 118.3,-197.62 113.44,-188.42\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"116.39,-186.51 108.62,-179.31 110.2,-189.79 116.39,-186.51\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- ben -->\n",
|
||||
"<g id=\"node4\" class=\"node\">\n",
|
||||
"<title>ben</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"174\" cy=\"-162\" rx=\"28\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"174\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">ben</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- obama->ben -->\n",
|
||||
"<g id=\"edge3\" class=\"edge\">\n",
|
||||
"<title>obama->ben</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M145.96,-216.05C150.34,-207.77 155.7,-197.62 160.56,-188.42\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"163.8,-189.79 165.38,-179.31 157.61,-186.51 163.8,-189.79\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- beth -->\n",
|
||||
"<g id=\"node5\" class=\"node\">\n",
|
||||
"<title>beth</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"252\" cy=\"-162\" rx=\"32\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"252\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">beth</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- obama->beth -->\n",
|
||||
"<g id=\"edge4\" class=\"edge\">\n",
|
||||
"<title>obama->beth</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M160.27,-218.83C178.18,-207.94 203.04,-192.8 222.37,-181.04\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"224.36,-183.92 231.08,-175.73 220.72,-177.95 224.36,-183.92\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy -->\n",
|
||||
"<g id=\"node6\" class=\"node\">\n",
|
||||
"<title>cindy</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"93\" cy=\"-90\" rx=\"36\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"93\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- bill->cindy -->\n",
|
||||
"<g id=\"edge5\" class=\"edge\">\n",
|
||||
"<title>bill->cindy</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M41,-146.15C49.77,-136.85 61.25,-124.67 71.2,-114.12\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"73.79,-116.47 78.11,-106.8 68.7,-111.67 73.79,-116.47\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- bob->cindy -->\n",
|
||||
"<g id=\"edge6\" class=\"edge\">\n",
|
||||
"<title>bob->cindy</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M98.27,-143.7C97.5,-135.98 96.57,-126.71 95.71,-118.11\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"99.19,-117.7 94.71,-108.1 92.22,-118.4 99.19,-117.7\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- boris -->\n",
|
||||
"<g id=\"node7\" class=\"node\">\n",
|
||||
"<title>boris</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"181\" cy=\"-90\" rx=\"34.5\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"181\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">boris</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- ben->boris -->\n",
|
||||
"<g id=\"edge7\" class=\"edge\">\n",
|
||||
"<title>ben->boris</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M175.73,-143.7C176.5,-135.98 177.43,-126.71 178.29,-118.11\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"181.78,-118.4 179.29,-108.1 174.81,-117.7 181.78,-118.4\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- beth->boris -->\n",
|
||||
"<g id=\"edge8\" class=\"edge\">\n",
|
||||
"<title>beth->boris</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M236.59,-145.81C227.01,-136.36 214.51,-124.04 203.8,-113.48\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"205.96,-110.69 196.38,-106.16 201.04,-115.67 205.96,-110.69\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- tim -->\n",
|
||||
"<g id=\"node8\" class=\"node\">\n",
|
||||
"<title>tim</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"137\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"137\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">tim</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy->tim -->\n",
|
||||
"<g id=\"edge9\" class=\"edge\">\n",
|
||||
"<title>cindy->tim</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M103.43,-72.41C108.82,-63.83 115.51,-53.19 121.49,-43.67\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"124.59,-45.32 126.95,-34.99 118.66,-41.59 124.59,-45.32\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- boris->tim -->\n",
|
||||
"<g id=\"edge10\" class=\"edge\">\n",
|
||||
"<title>boris->tim</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M170.79,-72.77C165.41,-64.19 158.68,-53.49 152.65,-43.9\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"155.43,-41.75 147.15,-35.15 149.51,-45.48 155.43,-41.75\"/>\n",
|
||||
"</g>\n",
|
||||
"</g>\n",
|
||||
"</svg>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.SVG object>"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# wait 20 secs to see display\n",
|
||||
"cpal_chain.draw(path=\"web.svg\")\n",
|
||||
"SVG(\"web.svg\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f6f345a-bb16-4e64-83c4-cbbc789a8325",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Unanswerable math\n",
|
||||
"\n",
|
||||
"Takeaway: PAL hallucinates, where CPAL, rather than hallucinate, answers with _\"unanswerable, narrative question and plot are incoherent\"_"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "068afd79-fd41-4ec2-b4d0-c64140dc413f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = (\n",
|
||||
" \"Jan has three times the number of pets as Marcia.\"\n",
|
||||
" \"Marcia has two more pets than Cindy.\"\n",
|
||||
" \"If Cindy has ten pets, how many pets does Barak have?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "02f77db2-72e8-46c2-90b3-5e37ca42f80d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mdef solution():\n",
|
||||
" \"\"\"Jan has three times the number of pets as Marcia.Marcia has two more pets than Cindy.If Cindy has ten pets, how many pets does Barak have?\"\"\"\n",
|
||||
" cindy_pets = 10\n",
|
||||
" marcia_pets = cindy_pets + 2\n",
|
||||
" jan_pets = marcia_pets * 3\n",
|
||||
" result = jan_pets\n",
|
||||
" return result\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'36'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "925958de-e998-4ffa-8b2e-5a00ddae5026",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
|
||||
" name code value depends_on\n",
|
||||
"0 cindy pass 10.0 []\n",
|
||||
"1 marcia marcia.value = cindy.value + 2 12.0 [cindy]\n",
|
||||
"2 jan jan.value = marcia.value * 3 36.0 [marcia]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[36;1m\u001b[1;3mquery data\n",
|
||||
"{\n",
|
||||
" \"question\": \"how many pets does barak have?\",\n",
|
||||
" \"expression\": \"SELECT name, value FROM df WHERE name = 'barak'\",\n",
|
||||
" \"llm_error_msg\": \"\"\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"unanswerable, query and outcome are incoherent\n",
|
||||
"\n",
|
||||
"outcome:\n",
|
||||
" name code value depends_on\n",
|
||||
"0 cindy pass 10.0 []\n",
|
||||
"1 marcia marcia.value = cindy.value + 2 12.0 [cindy]\n",
|
||||
"2 jan jan.value = marcia.value * 3 36.0 [marcia]\n",
|
||||
"query:\n",
|
||||
"{'question': 'how many pets does barak have?', 'expression': \"SELECT name, value FROM df WHERE name = 'barak'\", 'llm_error_msg': ''}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"try:\n",
|
||||
" cpal_chain.run(question)\n",
|
||||
"except Exception as e_msg:\n",
|
||||
" print(e_msg)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "095adc76",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Basic math\n",
|
||||
"\n",
|
||||
"#### Causal mediator"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "3ecf03fa-8350-4c4e-8080-84a307ba6ad4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = (\n",
|
||||
" \"Jan has three times the number of pets as Marcia. \"\n",
|
||||
" \"Marcia has two more pets than Cindy. \"\n",
|
||||
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "74e49c47-3eed-4abe-98b7-8e97bcd15944",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"PAL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "2e88395f-d014-4362-abb0-88f6800860bb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mdef solution():\n",
|
||||
" \"\"\"Jan has three times the number of pets as Marcia. Marcia has two more pets than Cindy. If Cindy has four pets, how many total pets do the three have?\"\"\"\n",
|
||||
" cindy_pets = 4\n",
|
||||
" marcia_pets = cindy_pets + 2\n",
|
||||
" jan_pets = marcia_pets * 3\n",
|
||||
" total_pets = cindy_pets + marcia_pets + jan_pets\n",
|
||||
" result = total_pets\n",
|
||||
" return result\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'28'"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "20ba6640-3d17-4b59-8101-aaba89d68cf4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"CPAL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "312a0943-a482-4ed0-a064-1e7a72e9479b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
|
||||
" name code value depends_on\n",
|
||||
"0 cindy pass 4.0 []\n",
|
||||
"1 marcia marcia.value = cindy.value + 2 6.0 [cindy]\n",
|
||||
"2 jan jan.value = marcia.value * 3 18.0 [marcia]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[36;1m\u001b[1;3mquery data\n",
|
||||
"{\n",
|
||||
" \"question\": \"how many total pets do the three have?\",\n",
|
||||
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
|
||||
" \"llm_error_msg\": \"\"\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"28.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cpal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "4466b975-ae2b-4252-972b-b3182a089ade",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"image/svg+xml": [
|
||||
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"92pt\" height=\"188pt\" viewBox=\"0.00 0.00 92.49 188.00\">\n",
|
||||
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
|
||||
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-184 88.49,-184 88.49,4 -4,4\"/>\n",
|
||||
"<!-- cindy -->\n",
|
||||
"<g id=\"node1\" class=\"node\">\n",
|
||||
"<title>cindy</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-162\" rx=\"36\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- marcia -->\n",
|
||||
"<g id=\"node2\" class=\"node\">\n",
|
||||
"<title>marcia</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy->marcia -->\n",
|
||||
"<g id=\"edge1\" class=\"edge\">\n",
|
||||
"<title>cindy->marcia</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M42.25,-143.7C42.25,-135.98 42.25,-126.71 42.25,-118.11\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"45.75,-118.1 42.25,-108.1 38.75,-118.1 45.75,-118.1\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- jan -->\n",
|
||||
"<g id=\"node3\" class=\"node\">\n",
|
||||
"<title>jan</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- marcia->jan -->\n",
|
||||
"<g id=\"edge2\" class=\"edge\">\n",
|
||||
"<title>marcia->jan</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M42.25,-71.7C42.25,-63.98 42.25,-54.71 42.25,-46.11\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"45.75,-46.1 42.25,-36.1 38.75,-46.1 45.75,-46.1\"/>\n",
|
||||
"</g>\n",
|
||||
"</g>\n",
|
||||
"</svg>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.SVG object>"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# wait 20 secs to see display\n",
|
||||
"cpal_chain.draw(path=\"web.svg\")\n",
|
||||
"SVG(\"web.svg\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "29fa7b8a-75a3-4270-82a2-2c31939cd7e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Causal collider"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "618eddac-f0ef-4ab5-90ed-72e880fdeba3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = (\n",
|
||||
" \"Jan has the number of pets as Marcia plus the number of pets as Cindy. \"\n",
|
||||
" \"Marcia has no pets. \"\n",
|
||||
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "a01563f3-7974-4de4-8bd9-0b7d710aa0d3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
|
||||
" name code value depends_on\n",
|
||||
"0 marcia pass 0.0 []\n",
|
||||
"1 cindy pass 4.0 []\n",
|
||||
"2 jan jan.value = marcia.value + cindy.value 4.0 [marcia, cindy]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[36;1m\u001b[1;3mquery data\n",
|
||||
"{\n",
|
||||
" \"question\": \"how many total pets do the three have?\",\n",
|
||||
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
|
||||
" \"llm_error_msg\": \"\"\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"8.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cpal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "0fbe7243-0522-4946-b9a2-6e21e7c49a42",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"image/svg+xml": [
|
||||
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"182pt\" height=\"116pt\" viewBox=\"0.00 0.00 182.00 116.00\">\n",
|
||||
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
|
||||
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-112 178,-112 178,4 -4,4\"/>\n",
|
||||
"<!-- marcia -->\n",
|
||||
"<g id=\"node1\" class=\"node\">\n",
|
||||
"<title>marcia</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- jan -->\n",
|
||||
"<g id=\"node2\" class=\"node\">\n",
|
||||
"<title>jan</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"90.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"90.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- marcia->jan -->\n",
|
||||
"<g id=\"edge1\" class=\"edge\">\n",
|
||||
"<title>marcia->jan</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M53.62,-72.41C59.57,-63.74 66.95,-52.97 73.53,-43.38\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"76.51,-45.21 79.28,-34.99 70.74,-41.26 76.51,-45.21\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy -->\n",
|
||||
"<g id=\"node3\" class=\"node\">\n",
|
||||
"<title>cindy</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"138.25\" cy=\"-90\" rx=\"36\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"138.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy->jan -->\n",
|
||||
"<g id=\"edge2\" class=\"edge\">\n",
|
||||
"<title>cindy->jan</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M127.11,-72.77C121.09,-63.98 113.54,-52.96 106.83,-43.19\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"109.53,-40.94 100.99,-34.67 103.75,-44.89 109.53,-40.94\"/>\n",
|
||||
"</g>\n",
|
||||
"</g>\n",
|
||||
"</svg>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.SVG object>"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# wait 20 secs to see display\n",
|
||||
"cpal_chain.draw(path=\"web.svg\")\n",
|
||||
"SVG(\"web.svg\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d4082538-ec03-44f0-aac3-07e03aad7555",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Causal confounder"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "83932c30-950b-435a-b328-7993ce8cc6bd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = (\n",
|
||||
" \"Jan has the number of pets as Marcia plus the number of pets as Cindy. \"\n",
|
||||
" \"Marcia has two more pets than Cindy. \"\n",
|
||||
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "570de307-7c6b-4fdc-80c3-4361daa8a629",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
|
||||
" name code value depends_on\n",
|
||||
"0 cindy pass 4.0 []\n",
|
||||
"1 marcia marcia.value = cindy.value + 2 6.0 [cindy]\n",
|
||||
"2 jan jan.value = cindy.value + marcia.value 10.0 [cindy, marcia]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[36;1m\u001b[1;3mquery data\n",
|
||||
"{\n",
|
||||
" \"question\": \"how many total pets do the three have?\",\n",
|
||||
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
|
||||
" \"llm_error_msg\": \"\"\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"20.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cpal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "00375615-6b6d-4357-bdb8-f64f682f7605",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"image/svg+xml": [
|
||||
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"121pt\" height=\"188pt\" viewBox=\"0.00 0.00 120.99 188.00\">\n",
|
||||
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
|
||||
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-184 116.99,-184 116.99,4 -4,4\"/>\n",
|
||||
"<!-- cindy -->\n",
|
||||
"<g id=\"node1\" class=\"node\">\n",
|
||||
"<title>cindy</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"77.25\" cy=\"-162\" rx=\"36\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"77.25\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- marcia -->\n",
|
||||
"<g id=\"node2\" class=\"node\">\n",
|
||||
"<title>marcia</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy->marcia -->\n",
|
||||
"<g id=\"edge1\" class=\"edge\">\n",
|
||||
"<title>cindy->marcia</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M68.95,-144.41C64.87,-136.25 59.86,-126.22 55.28,-117.07\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"58.33,-115.34 50.72,-107.96 52.07,-118.47 58.33,-115.34\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- jan -->\n",
|
||||
"<g id=\"node3\" class=\"node\">\n",
|
||||
"<title>jan</title>\n",
|
||||
"<ellipse fill=\"none\" stroke=\"black\" cx=\"77.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
|
||||
"<text text-anchor=\"middle\" x=\"77.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
|
||||
"</g>\n",
|
||||
"<!-- cindy->jan -->\n",
|
||||
"<g id=\"edge2\" class=\"edge\">\n",
|
||||
"<title>cindy->jan</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M83.73,-144.1C87.32,-133.84 91.42,-120.36 93.25,-108 95.58,-92.17 95.58,-87.83 93.25,-72 91.95,-63.21 89.5,-53.86 86.91,-45.5\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"90.19,-44.29 83.73,-35.9 83.55,-46.49 90.19,-44.29\"/>\n",
|
||||
"</g>\n",
|
||||
"<!-- marcia->jan -->\n",
|
||||
"<g id=\"edge3\" class=\"edge\">\n",
|
||||
"<title>marcia->jan</title>\n",
|
||||
"<path fill=\"none\" stroke=\"black\" d=\"M50.72,-72.06C54.86,-63.77 59.94,-53.62 64.53,-44.42\"/>\n",
|
||||
"<polygon fill=\"black\" stroke=\"black\" points=\"67.75,-45.82 69.09,-35.31 61.49,-42.69 67.75,-45.82\"/>\n",
|
||||
"</g>\n",
|
||||
"</g>\n",
|
||||
"</svg>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.SVG object>"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# wait 20 secs to see display\n",
|
||||
"cpal_chain.draw(path=\"web.svg\")\n",
|
||||
"SVG(\"web.svg\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "255683de-0c1c-4131-b277-99d09f5ac1fc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%load_ext autoreload\n",
|
||||
"%autoreload 2"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,218 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dd7ec7af",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Elasticsearch database\n",
|
||||
"\n",
|
||||
"Interact with Elasticsearch analytics database via Langchain. This chain builds search queries via the Elasticsearch DSL API (filters and aggregations).\n",
|
||||
"\n",
|
||||
"The Elasticsearch client must have permissions for index listing, mapping description and search queries.\n",
|
||||
"\n",
|
||||
"See [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) for instructions on how to run Elasticsearch locally.\n",
|
||||
"\n",
|
||||
"Make sure to install the Elasticsearch Python client before:\n",
|
||||
"\n",
|
||||
"```sh\n",
|
||||
"pip install elasticsearch\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "dd8eae75",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from elasticsearch import Elasticsearch\n",
|
||||
"\n",
|
||||
"from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain\n",
|
||||
"from langchain.chat_models import ChatOpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "659b5ed0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import warnings\n",
|
||||
"\n",
|
||||
"warnings.filterwarnings(\"ignore\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "5cde03bc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize Elasticsearch python client.\n",
|
||||
"# See https://elasticsearch-py.readthedocs.io/en/v8.8.2/api.html#elasticsearch.Elasticsearch\n",
|
||||
"ELASTIC_SEARCH_SERVER = \"https://elastic:gvODoJ_nRYQIJZfG7=ec@localhost:9200\"\n",
|
||||
"db = Elasticsearch(ELASTIC_SEARCH_SERVER, ca_certs=False, verify_certs=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "74a41374",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Uncomment the next cell to initially populate your db."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "430ada0f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# customers = [\n",
|
||||
"# {\"firstname\": \"Jennifer\", \"lastname\": \"Walters\"},\n",
|
||||
"# {\"firstname\": \"Monica\",\"lastname\":\"Rambeau\"},\n",
|
||||
"# {\"firstname\": \"Carol\",\"lastname\":\"Danvers\"},\n",
|
||||
"# {\"firstname\": \"Wanda\",\"lastname\":\"Maximoff\"},\n",
|
||||
"# {\"firstname\": \"Jennifer\",\"lastname\":\"Takeda\"},\n",
|
||||
"# ]\n",
|
||||
"# for i, customer in enumerate(customers):\n",
|
||||
"# db.create(index=\"customers\", document=customer, id=i)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "f36ae0d8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(model_name=\"gpt-4\", temperature=0)\n",
|
||||
"chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "b5d22d9d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new ElasticsearchDatabaseChain chain...\u001b[0m\n",
|
||||
"What are the first names of all the customers?\n",
|
||||
"ESQuery:\u001b[32;1m\u001b[1;3m{'size': 10, 'query': {'match_all': {}}, '_source': ['firstname']}\u001b[0m\n",
|
||||
"ESResult: \u001b[33;1m\u001b[1;3m{'took': 5, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 6, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'customers', '_id': '0', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}, {'_index': 'customers', '_id': '1', '_score': 1.0, '_source': {'firstname': 'Monica'}}, {'_index': 'customers', '_id': '2', '_score': 1.0, '_source': {'firstname': 'Carol'}}, {'_index': 'customers', '_id': '3', '_score': 1.0, '_source': {'firstname': 'Wanda'}}, {'_index': 'customers', '_id': '4', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}, {'_index': 'customers', '_id': 'firstname', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}]}}\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3mThe first names of all the customers are Jennifer, Monica, Carol, Wanda, and Jennifer.\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The first names of all the customers are Jennifer, Monica, Carol, Wanda, and Jennifer.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"What are the first names of all the customers?\"\n",
|
||||
"chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9b4bfada",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Custom prompt\n",
|
||||
"\n",
|
||||
"For best results you'll likely need to customize the prompt."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "0a494f5b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.elasticsearch_database.prompts import DEFAULT_DSL_TEMPLATE\n",
|
||||
"from langchain.prompts.prompt import PromptTemplate\n",
|
||||
"\n",
|
||||
"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
|
||||
"\n",
|
||||
"Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",
|
||||
"\n",
|
||||
"Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",
|
||||
"\n",
|
||||
"Use the following format:\n",
|
||||
"\n",
|
||||
"Question: Question here\n",
|
||||
"ESQuery: Elasticsearch Query formatted as json\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"PROMPT = PromptTemplate.from_template(\n",
|
||||
" PROMPT_TEMPLATE,\n",
|
||||
")\n",
|
||||
"chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, query_prompt=PROMPT)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "372b8f93",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Adding example rows from each index\n",
|
||||
"\n",
|
||||
"Sometimes, the format of the data is not obvious and it is optimal to include a sample of rows from the indices in the prompt to allow the LLM to understand the data before providing a final query. Here we will use this feature to let the LLM know that artists are saved with their full names by providing ten rows from the index."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eef818de",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = ElasticsearchDatabaseChain.from_llm(\n",
|
||||
" llm=ChatOpenAI(temperature=0),\n",
|
||||
" database=db,\n",
|
||||
" sample_documents_in_index_info=2, # 2 rows from each index will be included in the prompt as sample data\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -14,7 +14,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 2,
|
||||
"id": "34f04daf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -35,7 +35,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 3,
|
||||
"id": "a2648974",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -56,15 +56,110 @@
|
||||
"id": "78ff9df9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To extract entities, we need to create a schema like the following, were we specify all the properties we want to find and the type we expect them to have. We can also specify which of these properties are required and which are optional."
|
||||
"To extract entities, we need to create a schema where we specify all the properties we want to find and the type we expect them to have. We can also specify which of these properties are required and which are optional."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 4,
|
||||
"id": "4ac43eba",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"name\": {\"type\": \"string\"},\n",
|
||||
" \"height\": {\"type\": \"integer\"},\n",
|
||||
" \"hair_color\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
" \"required\": [\"name\", \"height\"],\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "640bd005",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
" \"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "64313214",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "17c48adb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see, we extracted the required entities and their properties in the required format (it even calculated Claudia's height before returning!)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "cc5436ed",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
|
||||
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8d51fcdc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Several entity types"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5813affe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice that we are using OpenAI functions under the hood and thus the model can only call one function per request (with one, unique schema)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "511b9838",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If we want to extract more than one entity type, we need to introduce a little hack - we will define our properties with an included entity type. \n",
|
||||
"\n",
|
||||
"Following we have an example where we also want to extract dog attributes from the passage. Notice the 'person_' and 'dog_' prefixes we use for each property; this tells the model which entity type the property refers to. In this way, the model can return properties from several entity types in one single call."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "cf243a26",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
@@ -103,10 +198,10 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "17c48adb",
|
||||
"id": "eb074f7b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see, we extracted the required entities and their properties in the required format:"
|
||||
"People attributes and dog attributes were correctly extracted from the text in the same call"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -128,7 +223,207 @@
|
||||
" 'person_hair_color': 'brunette'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0273e0e2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Unrelated entities"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c07b3480",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"What if our entities are unrelated? In that case, the model will return the unrelated entities in different dictionaries, allowing us to successfully extract several unrelated entity types in the same call."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "01d98af0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice that we use `required: []`: we need to allow the model to return **only** person attributes or **only** dog attributes for a single entity (person or dog)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 48,
|
||||
"id": "e584c993",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"person_name\": {\"type\": \"string\"},\n",
|
||||
" \"person_height\": {\"type\": \"integer\"},\n",
|
||||
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
||||
" \"dog_name\": {\"type\": \"string\"},\n",
|
||||
" \"dog_breed\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
" \"required\": [],\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 49,
|
||||
"id": "ad6b105f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"\n",
|
||||
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 50,
|
||||
"id": "6bfe5a33",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "24fe09af",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We have each entity in its own separate dictionary, with only the appropriate attributes being returned"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 51,
|
||||
"id": "f6e1fd89",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
|
||||
" {'person_name': 'Claudia',\n",
|
||||
" 'person_height': 6,\n",
|
||||
" 'person_hair_color': 'brunette'},\n",
|
||||
" {'dog_name': 'Willow', 'dog_breed': 'German Shepherd'},\n",
|
||||
" {'dog_name': 'Milo', 'dog_breed': 'border collie'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 51,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0ac466d1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Extra info for an entity"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d240ffc1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"What if.. _we don't know what we want?_ More specifically, say we know a few properties we want to extract for a given entity but we also want to know if there's any extra information in the passage. Fortunately, we don't need to structure everything - we can have unstructured extraction as well. \n",
|
||||
"\n",
|
||||
"We can do this by introducing another hack, namely the *extra_info* attribute - let's see an example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 68,
|
||||
"id": "f19685f6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"person_name\": {\"type\": \"string\"},\n",
|
||||
" \"person_height\": {\"type\": \"integer\"},\n",
|
||||
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
||||
" \"dog_name\": {\"type\": \"string\"},\n",
|
||||
" \"dog_breed\": {\"type\": \"string\"},\n",
|
||||
" \"dog_extra_info\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 81,
|
||||
"id": "200c3477",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"\n",
|
||||
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 82,
|
||||
"id": "ddad7dc6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e5c0dbbc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is nice to know more about Willow and Milo!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 83,
|
||||
"id": "c22cfd30",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
|
||||
" {'person_name': 'Claudia',\n",
|
||||
" 'person_height': 6,\n",
|
||||
" 'person_hair_color': 'brunette'},\n",
|
||||
" {'dog_name': 'Willow',\n",
|
||||
" 'dog_breed': 'German Shepherd',\n",
|
||||
" 'dog_extra_information': 'likes to play with other dogs'},\n",
|
||||
" {'dog_name': 'Milo',\n",
|
||||
" 'dog_breed': 'border collie',\n",
|
||||
" 'dog_extra_information': 'lives close by'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 83,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
|
||||
@@ -72,7 +72,10 @@
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"from langchain.schema import BaseRetriever\n",
|
||||
"from langchain.callbacks.manager import AsyncCallbackManagerForRetrieverRun, CallbackManagerForRetrieverRun\n",
|
||||
"from langchain.callbacks.manager import (\n",
|
||||
" AsyncCallbackManagerForRetrieverRun,\n",
|
||||
" CallbackManagerForRetrieverRun,\n",
|
||||
")\n",
|
||||
"from langchain.utilities import GoogleSerperAPIWrapper\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
@@ -97,13 +100,15 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class SerperSearchRetriever(BaseRetriever):\n",
|
||||
"\n",
|
||||
" search: GoogleSerperAPIWrapper = None\n",
|
||||
"\n",
|
||||
" def _get_relevant_documents(self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any) -> List[Document]:\n",
|
||||
" def _get_relevant_documents(\n",
|
||||
" self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any\n",
|
||||
" ) -> List[Document]:\n",
|
||||
" return [Document(page_content=self.search.run(query))]\n",
|
||||
"\n",
|
||||
" async def _aget_relevant_documents(self,\n",
|
||||
" async def _aget_relevant_documents(\n",
|
||||
" self,\n",
|
||||
" query: str,\n",
|
||||
" *,\n",
|
||||
" run_manager: AsyncCallbackManagerForRetrieverRun,\n",
|
||||
|
||||
@@ -83,9 +83,15 @@
|
||||
"schema = client.schema()\n",
|
||||
"schema.propertyKey(\"name\").asText().ifNotExist().create()\n",
|
||||
"schema.propertyKey(\"birthDate\").asText().ifNotExist().create()\n",
|
||||
"schema.vertexLabel(\"Person\").properties(\"name\", \"birthDate\").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
|
||||
"schema.vertexLabel(\"Movie\").properties(\"name\").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
|
||||
"schema.edgeLabel(\"ActedIn\").sourceLabel(\"Person\").targetLabel(\"Movie\").ifNotExist().create()"
|
||||
"schema.vertexLabel(\"Person\").properties(\n",
|
||||
" \"name\", \"birthDate\"\n",
|
||||
").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
|
||||
"schema.vertexLabel(\"Movie\").properties(\"name\").usePrimaryKeyId().primaryKeys(\n",
|
||||
" \"name\"\n",
|
||||
").ifNotExist().create()\n",
|
||||
"schema.edgeLabel(\"ActedIn\").sourceLabel(\"Person\").targetLabel(\n",
|
||||
" \"Movie\"\n",
|
||||
").ifNotExist().create()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -124,7 +130,9 @@
|
||||
"\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather\", {})\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Part II\", {})\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Coda The Death of Michael Corleone\", {})\n",
|
||||
"g.addEdge(\n",
|
||||
" \"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Coda The Death of Michael Corleone\", {}\n",
|
||||
")\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Robert De Niro\", \"2:The Godfather Part II\", {})"
|
||||
]
|
||||
},
|
||||
@@ -164,7 +172,7 @@
|
||||
" password=\"admin\",\n",
|
||||
" address=\"localhost\",\n",
|
||||
" port=8080,\n",
|
||||
" graph=\"hugegraph\"\n",
|
||||
" graph=\"hugegraph\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -228,9 +236,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = HugeGraphQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
|
||||
")"
|
||||
"chain = HugeGraphQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -31,6 +31,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import kuzu\n",
|
||||
"\n",
|
||||
"db = kuzu.Database(\"test_db\")\n",
|
||||
"conn = kuzu.Connection(db)"
|
||||
]
|
||||
@@ -61,7 +62,9 @@
|
||||
],
|
||||
"source": [
|
||||
"conn.execute(\"CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))\")\n",
|
||||
"conn.execute(\"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))\")\n",
|
||||
"conn.execute(\n",
|
||||
" \"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))\"\n",
|
||||
")\n",
|
||||
"conn.execute(\"CREATE REL TABLE ActedIn (FROM Person TO Movie)\")"
|
||||
]
|
||||
},
|
||||
@@ -94,11 +97,21 @@
|
||||
"conn.execute(\"CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})\")\n",
|
||||
"conn.execute(\"CREATE (:Movie {name: 'The Godfather'})\")\n",
|
||||
"conn.execute(\"CREATE (:Movie {name: 'The Godfather: Part II'})\")\n",
|
||||
"conn.execute(\"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})\")\n",
|
||||
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)\")\n",
|
||||
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\")\n",
|
||||
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)\")\n",
|
||||
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\")"
|
||||
"conn.execute(\n",
|
||||
" \"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})\"\n",
|
||||
")\n",
|
||||
"conn.execute(\n",
|
||||
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)\"\n",
|
||||
")\n",
|
||||
"conn.execute(\n",
|
||||
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\"\n",
|
||||
")\n",
|
||||
"conn.execute(\n",
|
||||
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)\"\n",
|
||||
")\n",
|
||||
"conn.execute(\n",
|
||||
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -137,9 +150,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = KuzuQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
|
||||
")"
|
||||
"chain = KuzuQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -60,7 +60,8 @@
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
"id": "7af596b5"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -150,20 +151,20 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
|
||||
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
|
||||
"Identified intent:\n",
|
||||
"\u001B[32;1m\u001B[1;3mSELECT\u001B[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mSELECT\u001b[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"SELECT ?homepage\n",
|
||||
"WHERE {\n",
|
||||
" ?person foaf:name \"Tim Berners-Lee\" .\n",
|
||||
" ?person foaf:workplaceHomepage ?homepage .\n",
|
||||
"}\u001B[0m\n",
|
||||
"}\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001B[32;1m\u001B[1;3m[]\u001B[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m[]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001B[1m> Finished chain.\u001B[0m\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -207,19 +208,19 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
|
||||
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
|
||||
"Identified intent:\n",
|
||||
"\u001B[32;1m\u001B[1;3mUPDATE\u001B[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mUPDATE\u001b[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"INSERT {\n",
|
||||
" ?person foaf:workplaceHomepage <http://www.w3.org/foo/bar/> .\n",
|
||||
"}\n",
|
||||
"WHERE {\n",
|
||||
" ?person foaf:name \"Timothy Berners-Lee\" .\n",
|
||||
"}\u001B[0m\n",
|
||||
"}\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001B[1m> Finished chain.\u001B[0m\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -234,7 +235,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\")"
|
||||
"chain.run(\n",
|
||||
" \"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -297,4 +300,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
||||
160
docs/extras/modules/chains/additional/llm_symbolic_math.ipynb
Normal file
@@ -0,0 +1,160 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# LLM Symbolic Math \n",
|
||||
"This notebook showcases using LLMs and Python to Solve Algebraic Equations."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Calculating the limit of an equation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMSymbolicMathChain chain...\u001b[0m\n",
|
||||
"What is the limit of sin(x) / x as x goes to 0?\u001b[32;1m\u001b[1;3mAnswer: 1\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Answer: 1'"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains.llm_symbolic_math.base import LLMSymbolicMathChain\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"llm_symbolic_math = LLMSymbolicMathChain.from_llm(llm, verbose=True)\n",
|
||||
"\n",
|
||||
"llm_symbolic_math.run(\"What is the limit of sin(x) / x as x goes to 0?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Calculating an integral"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMSymbolicMathChain chain...\u001b[0m\n",
|
||||
"What is the integral of e^-x from 0 to infinity?\u001b[32;1m\u001b[1;3mAnswer: 1\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Answer: 1'"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_symbolic_math.run(\"What is the integral of e^-x from 0 to infinity?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Calculating an algebraic equation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMSymbolicMathChain chain...\u001b[0m\n",
|
||||
"What are the solutions to this equation x**2 - x?\u001b[32;1m\u001b[1;3mAnswer: 0 and 1.\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Answer: 0 and 1.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_symbolic_math.run(\"What are the solutions to this equation x**2 - x?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -38,7 +38,7 @@
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"texts = text_splitter.split_documents(documents)\n",
|
||||
"for i, text in enumerate(texts):\n",
|
||||
" text.metadata['source'] = f\"{i}-pl\"\n",
|
||||
" text.metadata[\"source\"] = f\"{i}-pl\"\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"docsearch = Chroma.from_documents(texts, embeddings)"
|
||||
]
|
||||
@@ -97,8 +97,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"final_qa_chain = StuffDocumentsChain(\n",
|
||||
" llm_chain=qa_chain, \n",
|
||||
" document_variable_name='context',\n",
|
||||
" llm_chain=qa_chain,\n",
|
||||
" document_variable_name=\"context\",\n",
|
||||
" document_prompt=doc_prompt,\n",
|
||||
")"
|
||||
]
|
||||
@@ -111,8 +111,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retrieval_qa = RetrievalQA(\n",
|
||||
" retriever=docsearch.as_retriever(),\n",
|
||||
" combine_documents_chain=final_qa_chain\n",
|
||||
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -175,8 +174,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"final_qa_chain_pydantic = StuffDocumentsChain(\n",
|
||||
" llm_chain=qa_chain_pydantic, \n",
|
||||
" document_variable_name='context',\n",
|
||||
" llm_chain=qa_chain_pydantic,\n",
|
||||
" document_variable_name=\"context\",\n",
|
||||
" document_prompt=doc_prompt,\n",
|
||||
")"
|
||||
]
|
||||
@@ -189,8 +188,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retrieval_qa_pydantic = RetrievalQA(\n",
|
||||
" retriever=docsearch.as_retriever(),\n",
|
||||
" combine_documents_chain=final_qa_chain_pydantic\n",
|
||||
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -235,6 +233,7 @@
|
||||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||||
"from langchain.memory import ConversationBufferMemory\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"\n",
|
||||
"memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n",
|
||||
"_template = \"\"\"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\\\n",
|
||||
"Make sure to avoid using any unclear pronouns.\n",
|
||||
@@ -258,10 +257,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa = ConversationalRetrievalChain(\n",
|
||||
" question_generator=condense_question_chain, \n",
|
||||
" question_generator=condense_question_chain,\n",
|
||||
" retriever=docsearch.as_retriever(),\n",
|
||||
" memory=memory, \n",
|
||||
" combine_docs_chain=final_qa_chain\n",
|
||||
" memory=memory,\n",
|
||||
" combine_docs_chain=final_qa_chain,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -389,7 +388,9 @@
|
||||
" \"\"\"An answer to the question being asked, with sources.\"\"\"\n",
|
||||
"\n",
|
||||
" answer: str = Field(..., description=\"Answer to the question that was asked\")\n",
|
||||
" countries_referenced: List[str] = Field(..., description=\"All of the countries mentioned in the sources\")\n",
|
||||
" countries_referenced: List[str] = Field(\n",
|
||||
" ..., description=\"All of the countries mentioned in the sources\"\n",
|
||||
" )\n",
|
||||
" sources: List[str] = Field(\n",
|
||||
" ..., description=\"List of sources used to answer the question\"\n",
|
||||
" )\n",
|
||||
@@ -405,20 +406,23 @@
|
||||
" HumanMessage(content=\"Answer question using the following context\"),\n",
|
||||
" HumanMessagePromptTemplate.from_template(\"{context}\"),\n",
|
||||
" HumanMessagePromptTemplate.from_template(\"Question: {question}\"),\n",
|
||||
" HumanMessage(content=\"Tips: Make sure to answer in the correct format. Return all of the countries mentioned in the sources in uppercase characters.\"),\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Tips: Make sure to answer in the correct format. Return all of the countries mentioned in the sources in uppercase characters.\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"chain_prompt = ChatPromptTemplate(messages=prompt_messages)\n",
|
||||
"\n",
|
||||
"qa_chain_pydantic = create_qa_with_structure_chain(llm, CustomResponseSchema, output_parser=\"pydantic\", prompt=chain_prompt)\n",
|
||||
"qa_chain_pydantic = create_qa_with_structure_chain(\n",
|
||||
" llm, CustomResponseSchema, output_parser=\"pydantic\", prompt=chain_prompt\n",
|
||||
")\n",
|
||||
"final_qa_chain_pydantic = StuffDocumentsChain(\n",
|
||||
" llm_chain=qa_chain_pydantic,\n",
|
||||
" document_variable_name='context',\n",
|
||||
" document_variable_name=\"context\",\n",
|
||||
" document_prompt=doc_prompt,\n",
|
||||
")\n",
|
||||
"retrieval_qa_pydantic = RetrievalQA(\n",
|
||||
" retriever=docsearch.as_retriever(),\n",
|
||||
" combine_documents_chain=final_qa_chain_pydantic\n",
|
||||
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic\n",
|
||||
")\n",
|
||||
"query = \"What did he say about russia\"\n",
|
||||
"retrieval_qa_pydantic.run(query)"
|
||||
|
||||
@@ -35,7 +35,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = get_openapi_chain(\"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\")"
|
||||
"chain = get_openapi_chain(\n",
|
||||
" \"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -186,7 +188,9 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = get_openapi_chain(\"https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml\")"
|
||||
"chain = get_openapi_chain(\n",
|
||||
" \"https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -28,7 +28,7 @@
|
||||
"\n",
|
||||
"from pydantic import Extra\n",
|
||||
"\n",
|
||||
"from langchain.base_language import BaseLanguageModel\n",
|
||||
"from langchain.schemea import BaseLanguageModel\n",
|
||||
"from langchain.callbacks.manager import (\n",
|
||||
" AsyncCallbackManagerForChainRun,\n",
|
||||
" CallbackManagerForChainRun,\n",
|
||||
|
||||
@@ -22,7 +22,8 @@
|
||||
"from typing import Optional\n",
|
||||
"\n",
|
||||
"from langchain.chains.openai_functions import (\n",
|
||||
" create_openai_fn_chain, create_structured_output_chain\n",
|
||||
" create_openai_fn_chain,\n",
|
||||
" create_structured_output_chain,\n",
|
||||
")\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate\n",
|
||||
@@ -58,8 +59,10 @@
|
||||
"source": [
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Person(BaseModel):\n",
|
||||
" \"\"\"Identifying information about a person.\"\"\"\n",
|
||||
"\n",
|
||||
" name: str = Field(..., description=\"The person's name\")\n",
|
||||
" age: int = Field(..., description=\"The person's age\")\n",
|
||||
" fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")"
|
||||
@@ -103,13 +106,15 @@
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
|
||||
"\n",
|
||||
"prompt_msgs = [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"You are a world class algorithm for extracting information in structured formats.\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(content=\"Use the given format to extract information from the following input:\"),\n",
|
||||
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
|
||||
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
|
||||
" ]\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"You are a world class algorithm for extracting information in structured formats.\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Use the given format to extract information from the following input:\"\n",
|
||||
" ),\n",
|
||||
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
|
||||
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
|
||||
"]\n",
|
||||
"prompt = ChatPromptTemplate(messages=prompt_msgs)\n",
|
||||
"\n",
|
||||
"chain = create_structured_output_chain(Person, llm, prompt, verbose=True)\n",
|
||||
@@ -162,12 +167,17 @@
|
||||
"source": [
|
||||
"from typing import Sequence\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class People(BaseModel):\n",
|
||||
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
|
||||
"\n",
|
||||
" people: Sequence[Person] = Field(..., description=\"The people in the text\")\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"chain = create_structured_output_chain(People, llm, prompt, verbose=True)\n",
|
||||
"chain.run(\"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\")"
|
||||
"chain.run(\n",
|
||||
" \"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -192,27 +202,16 @@
|
||||
" \"description\": \"Identifying information about a person.\",\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"name\": {\n",
|
||||
" \"title\": \"Name\",\n",
|
||||
" \"description\": \"The person's name\",\n",
|
||||
" \"type\": \"string\"\n",
|
||||
" },\n",
|
||||
" \"age\": {\n",
|
||||
" \"title\": \"Age\",\n",
|
||||
" \"description\": \"The person's age\",\n",
|
||||
" \"type\": \"integer\"\n",
|
||||
" },\n",
|
||||
" \"fav_food\": {\n",
|
||||
" \"title\": \"Fav Food\",\n",
|
||||
" \"description\": \"The person's favorite food\",\n",
|
||||
" \"type\": \"string\"\n",
|
||||
" }\n",
|
||||
" \"name\": {\"title\": \"Name\", \"description\": \"The person's name\", \"type\": \"string\"},\n",
|
||||
" \"age\": {\"title\": \"Age\", \"description\": \"The person's age\", \"type\": \"integer\"},\n",
|
||||
" \"fav_food\": {\n",
|
||||
" \"title\": \"Fav Food\",\n",
|
||||
" \"description\": \"The person's favorite food\",\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" \"required\": [\n",
|
||||
" \"name\",\n",
|
||||
" \"age\"\n",
|
||||
" ]\n",
|
||||
"}\n"
|
||||
" \"required\": [\"name\", \"age\"],\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -286,13 +285,15 @@
|
||||
"source": [
|
||||
"class RecordPerson(BaseModel):\n",
|
||||
" \"\"\"Record some identifying information about a pe.\"\"\"\n",
|
||||
"\n",
|
||||
" name: str = Field(..., description=\"The person's name\")\n",
|
||||
" age: int = Field(..., description=\"The person's age\")\n",
|
||||
" fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"class RecordDog(BaseModel):\n",
|
||||
" \"\"\"Record some identifying information about a dog.\"\"\"\n",
|
||||
"\n",
|
||||
" name: str = Field(..., description=\"The dog's name\")\n",
|
||||
" color: str = Field(..., description=\"The dog's color\")\n",
|
||||
" fav_food: Optional[str] = Field(None, description=\"The dog's favorite food\")"
|
||||
@@ -333,10 +334,10 @@
|
||||
],
|
||||
"source": [
|
||||
"prompt_msgs = [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"You are a world class algorithm for recording entities\"\n",
|
||||
" SystemMessage(content=\"You are a world class algorithm for recording entities\"),\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Make calls to the relevant function to record the entities in the following input:\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(content=\"Make calls to the relevant function to record the entities in the following input:\"),\n",
|
||||
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
|
||||
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
|
||||
"]\n",
|
||||
@@ -393,11 +394,16 @@
|
||||
"source": [
|
||||
"class OptionalFavFood(BaseModel):\n",
|
||||
" \"\"\"Either a food or null.\"\"\"\n",
|
||||
" food: Optional[str] = Field(None, description=\"Either the name of a food or null. Should be null if the food isn't known.\")\n",
|
||||
"\n",
|
||||
" food: Optional[str] = Field(\n",
|
||||
" None,\n",
|
||||
" description=\"Either the name of a food or null. Should be null if the food isn't known.\",\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def record_person(name: str, age: int, fav_food: OptionalFavFood) -> str:\n",
|
||||
" \"\"\"Record some basic identifying information about a person.\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" name: The person's name.\n",
|
||||
" age: The person's age in years.\n",
|
||||
@@ -405,9 +411,11 @@
|
||||
" \"\"\"\n",
|
||||
" return f\"Recording person {name} of age {age} with favorite food {fav_food.food}!\"\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"chain = create_openai_fn_chain([record_person], llm, prompt, verbose=True)\n",
|
||||
"chain.run(\"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\")"
|
||||
"chain.run(\n",
|
||||
" \"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -458,7 +466,7 @@
|
||||
"source": [
|
||||
"def record_dog(name: str, color: str, fav_food: OptionalFavFood) -> str:\n",
|
||||
" \"\"\"Record some basic identifying information about a dog.\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" name: The dog's name.\n",
|
||||
" color: The dog's color.\n",
|
||||
@@ -468,7 +476,9 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"chain = create_openai_fn_chain([record_person, record_dog], llm, prompt, verbose=True)\n",
|
||||
"chain.run(\"I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\")"
|
||||
"chain.run(\n",
|
||||
" \"I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -77,7 +77,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader = BraveSearchLoader(query=\"obama middle name\", api_key=api_key, search_kwargs={\"count\": 3})\n",
|
||||
"loader = BraveSearchLoader(\n",
|
||||
" query=\"obama middle name\", api_key=api_key, search_kwargs={\"count\": 3}\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"len(docs)"
|
||||
]
|
||||
|
||||
@@ -0,0 +1,81 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Browserless"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import BrowserlessLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"BROWSERLESS_API_TOKEN = \"YOUR_API_TOKEN\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<!DOCTYPE html><html class=\"client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-enabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled vector-feature-zebra-design-disabled\" lang=\"en\" dir=\"ltr\"><head>\n",
|
||||
"<meta charset=\"UTF-8\">\n",
|
||||
"<title>Document classification - Wikipedia</title>\n",
|
||||
"<script>document.documentElement.className=\"client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-enabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled vector-feature-zebra-design-disabled\";(function(){var cookie=document.cookie.match(/(?:^|; )enwikimwclien\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader = BrowserlessLoader(\n",
|
||||
" api_token=BROWSERLESS_API_TOKEN,\n",
|
||||
" urls=[\n",
|
||||
" \"https://en.wikipedia.org/wiki/Document_classification\",\n",
|
||||
" ],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = loader.load()\n",
|
||||
"\n",
|
||||
"print(documents[0].page_content[:1000])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,96 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Datadog Logs\n",
|
||||
"\n",
|
||||
">[Datadog](https://www.datadoghq.com/) is a monitoring and analytics platform for cloud-scale applications.\n",
|
||||
"\n",
|
||||
"This loader fetches the logs from your applications in Datadog using the `datadog_api_client` Python package. You must initialize the loader with your `Datadog API key` and `APP key`, and you need to pass in the query to extract the desired logs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import DatadogLogsLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install datadog-api-client"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"service:agent status:error\"\n",
|
||||
"\n",
|
||||
"loader = DatadogLogsLoader(\n",
|
||||
" query=query,\n",
|
||||
" api_key=DD_API_KEY,\n",
|
||||
" app_key=DD_APP_KEY,\n",
|
||||
" from_time=1688732708951, # Optional, timestamp in milliseconds\n",
|
||||
" to_time=1688736308951, # Optional, timestamp in milliseconds\n",
|
||||
" limit=100, # Optional, default is 100\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='message: grep: /etc/datadog-agent/system-probe.yaml: No such file or directory', metadata={'id': 'AgAAAYkwpLImvkjRpQAAAAAAAAAYAAAAAEFZa3dwTUFsQUFEWmZfLU5QdElnM3dBWQAAACQAAAAAMDE4OTMwYTQtYzk3OS00MmJjLTlhNDAtOTY4N2EwY2I5ZDdk', 'status': 'error', 'service': 'agent', 'tags': ['accessible-from-goog-gke-node', 'allow-external-ingress-high-ports', 'allow-external-ingress-http', 'allow-external-ingress-https', 'container_id:c7d8ecd27b5b3cfdf3b0df04b8965af6f233f56b7c3c2ffabfab5e3b6ccbd6a5', 'container_name:lab_datadog_1', 'datadog.pipelines:false', 'datadog.submission_auth:private_api_key', 'docker_image:datadog/agent:7.41.1', 'env:dd101-dev', 'hostname:lab-host', 'image_name:datadog/agent', 'image_tag:7.41.1', 'instance-id:7497601202021312403', 'instance-type:custom-1-4096', 'instruqt_aws_accounts:', 'instruqt_azure_subscriptions:', 'instruqt_gcp_projects:', 'internal-hostname:lab-host.d4rjybavkary.svc.cluster.local', 'numeric_project_id:3390740675', 'p-d4rjybavkary', 'project:instruqt-prod', 'service:agent', 'short_image:agent', 'source:agent', 'zone:europe-west1-b'], 'timestamp': datetime.datetime(2023, 7, 7, 13, 57, 27, 206000, tzinfo=tzutc())}),\n",
|
||||
" Document(page_content='message: grep: /etc/datadog-agent/system-probe.yaml: No such file or directory', metadata={'id': 'AgAAAYkwpLImvkjRpgAAAAAAAAAYAAAAAEFZa3dwTUFsQUFEWmZfLU5QdElnM3dBWgAAACQAAAAAMDE4OTMwYTQtYzk3OS00MmJjLTlhNDAtOTY4N2EwY2I5ZDdk', 'status': 'error', 'service': 'agent', 'tags': ['accessible-from-goog-gke-node', 'allow-external-ingress-high-ports', 'allow-external-ingress-http', 'allow-external-ingress-https', 'container_id:c7d8ecd27b5b3cfdf3b0df04b8965af6f233f56b7c3c2ffabfab5e3b6ccbd6a5', 'container_name:lab_datadog_1', 'datadog.pipelines:false', 'datadog.submission_auth:private_api_key', 'docker_image:datadog/agent:7.41.1', 'env:dd101-dev', 'hostname:lab-host', 'image_name:datadog/agent', 'image_tag:7.41.1', 'instance-id:7497601202021312403', 'instance-type:custom-1-4096', 'instruqt_aws_accounts:', 'instruqt_azure_subscriptions:', 'instruqt_gcp_projects:', 'internal-hostname:lab-host.d4rjybavkary.svc.cluster.local', 'numeric_project_id:3390740675', 'p-d4rjybavkary', 'project:instruqt-prod', 'service:agent', 'short_image:agent', 'source:agent', 'zone:europe-west1-b'], 'timestamp': datetime.datetime(2023, 7, 7, 13, 57, 27, 206000, tzinfo=tzutc())})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"documents = loader.load()\n",
|
||||
"documents"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.11"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
Stanley Cups
|
||||
Team Location Stanley Cups
|
||||
Blues STL 1
|
||||
Flyers PHI 2
|
||||
Maple Leafs TOR 13
|
||||
|
@@ -1840,7 +1840,7 @@ This category contains articles that are incomplete and are tagged with the {{T|
|
||||
<username>FANDOM</username>
|
||||
<id>32769624</id>
|
||||
</contributor>
|
||||
<comment>Created page with "{{LicenseBox|text=''This work is licensed under the [https://opensource.org/licenses/MIT MIT License].''}}{{#ifeq: {{NAMESPACENUMBER}} | 0 | <includeonly>Category:MIT licens..."</comment>
|
||||
<comment>Created page with "{{LicenseBox|text=''This work is licensed under the [https://opensource.org/licenses/MIT MIT License].''}}{{#ifeq: {{NAMESPACENUMBER}} | 0 | <includeonly>Category:MIT license..."</comment>
|
||||
<origin>104</origin>
|
||||
<model>wikitext</model>
|
||||
<format>text/x-wiki</format>
|
||||
|
||||
@@ -126,11 +126,11 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"file_id=\"1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz\"\n",
|
||||
"file_id = \"1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz\"\n",
|
||||
"loader = GoogleDriveLoader(\n",
|
||||
" file_ids=[file_id],\n",
|
||||
" file_loader_cls=UnstructuredFileIOLoader,\n",
|
||||
" file_loader_kwargs={\"mode\": \"elements\"}\n",
|
||||
" file_loader_kwargs={\"mode\": \"elements\"},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -180,11 +180,11 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"folder_id=\"1asMOHY1BqBS84JcRbOag5LOJac74gpmD\"\n",
|
||||
"folder_id = \"1asMOHY1BqBS84JcRbOag5LOJac74gpmD\"\n",
|
||||
"loader = GoogleDriveLoader(\n",
|
||||
" folder_id=folder_id,\n",
|
||||
" file_loader_cls=UnstructuredFileIOLoader,\n",
|
||||
" file_loader_kwargs={\"mode\": \"elements\"}\n",
|
||||
" file_loader_kwargs={\"mode\": \"elements\"},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -101,7 +101,7 @@
|
||||
" \"../Papers/\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".pdf\"],\n",
|
||||
" parser= GrobidParser(segment_sentences=False)\n",
|
||||
" parser=GrobidParser(segment_sentences=False),\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
|
||||
@@ -18,7 +18,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import WebBaseLoader\n",
|
||||
"loader_web = WebBaseLoader(\"https://github.com/basecamp/handbook/blob/master/37signals-is-you.md\")"
|
||||
"\n",
|
||||
"loader_web = WebBaseLoader(\n",
|
||||
" \"https://github.com/basecamp/handbook/blob/master/37signals-is-you.md\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -29,6 +32,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import PyPDFLoader\n",
|
||||
"\n",
|
||||
"loader_pdf = PyPDFLoader(\"../MachineLearning-Lecture01.pdf\")"
|
||||
]
|
||||
},
|
||||
@@ -40,7 +44,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.merge import MergedDataLoader\n",
|
||||
"loader_all=MergedDataLoader(loaders=[loader_web,loader_pdf])"
|
||||
"\n",
|
||||
"loader_all = MergedDataLoader(loaders=[loader_web, loader_pdf])"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -50,7 +55,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs_all=loader_all.load()"
|
||||
"docs_all = loader_all.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -36,7 +36,9 @@
|
||||
],
|
||||
"source": [
|
||||
"# Create a new loader object for the MHTML file\n",
|
||||
"loader = MHTMLLoader(file_path='../../../../../../tests/integration_tests/examples/example.mht')\n",
|
||||
"loader = MHTMLLoader(\n",
|
||||
" file_path=\"../../../../../../tests/integration_tests/examples/example.mht\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Load the document from the file\n",
|
||||
"documents = loader.load()\n",
|
||||
|
||||
@@ -53,11 +53,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dataset = \"vw6y-z8j6\" # 311 data\n",
|
||||
"dataset = \"tmnf-yvry\" # crime data\n",
|
||||
"loader = OpenCityDataLoader(city_id=\"data.sfgov.org\",\n",
|
||||
" dataset_id=dataset,\n",
|
||||
" limit=2000)"
|
||||
"dataset = \"vw6y-z8j6\" # 311 data\n",
|
||||
"dataset = \"tmnf-yvry\" # crime data\n",
|
||||
"loader = OpenCityDataLoader(city_id=\"data.sfgov.org\", dataset_id=dataset, limit=2000)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -33,9 +33,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredOrgModeLoader(\n",
|
||||
" file_path=\"example_data/README.org\", mode=\"elements\"\n",
|
||||
")\n",
|
||||
"loader = UnstructuredOrgModeLoader(file_path=\"example_data/README.org\", mode=\"elements\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -239,7 +239,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Use lazy load for larger table, which won't read the full table into memory \n",
|
||||
"# Use lazy load for larger table, which won't read the full table into memory\n",
|
||||
"for i in loader.lazy_load():\n",
|
||||
" print(i)"
|
||||
]
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "5a7cc773",
|
||||
"metadata": {},
|
||||
@@ -25,7 +24,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 1,
|
||||
"id": "2e3532b2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -34,7 +33,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6384c057",
|
||||
"metadata": {},
|
||||
@@ -44,19 +42,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 2,
|
||||
"id": "d69e5620",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = 'https://js.langchain.com/docs/modules/memory/examples/'\n",
|
||||
"loader=RecursiveUrlLoader(url=url)\n",
|
||||
"docs=loader.load()"
|
||||
"url = \"https://js.langchain.com/docs/modules/memory/examples/\"\n",
|
||||
"loader = RecursiveUrlLoader(url=url)\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 3,
|
||||
"id": "084fb2ce",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -66,7 +64,7 @@
|
||||
"12"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -77,17 +75,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"id": "89355b7c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\n\\n\\n\\nDynamoDB-Backed Chat Memory | \\uf8ffü¶úÔ∏è\\uf8ffüîó Lan'"
|
||||
"'\\n\\n\\n\\n\\nBuffer Window Memory | 🦜️🔗 Langchain\\n\\n\\n\\n\\n\\nSki'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -98,20 +96,20 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"id": "13bd7e16",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'source': 'https://js.langchain.com/docs/modules/memory/examples/dynamodb',\n",
|
||||
" 'title': 'DynamoDB-Backed Chat Memory | \\uf8ffü¶úÔ∏è\\uf8ffüîó Langchain',\n",
|
||||
" 'description': 'For longer-term persistence across chat sessions, you can swap out the default in-memory chatHistory that backs chat memory classes like BufferMemory for a DynamoDB instance.',\n",
|
||||
"{'source': 'https://js.langchain.com/docs/modules/memory/examples/buffer_window_memory',\n",
|
||||
" 'title': 'Buffer Window Memory | 🦜️🔗 Langchain',\n",
|
||||
" 'description': 'BufferWindowMemory keeps track of the back-and-forths in conversation, and then uses a window of size k to surface the last k back-and-forths to use as memory.',\n",
|
||||
" 'language': 'en'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -121,14 +119,29 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "40fc13ef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's try a more extensive example, the `docs` root dir.\n",
|
||||
"\n",
|
||||
"We will skip everything under `api`."
|
||||
"We will skip everything under `api`.\n",
|
||||
"\n",
|
||||
"For this, we can `lazy_load` each page as we crawl the tree, using `WebBaseLoader` to load each as we go."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5c938b9f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = \"https://js.langchain.com/docs/\"\n",
|
||||
"exclude_dirs = [\"https://js.langchain.com/docs/api/\"]\n",
|
||||
"loader = RecursiveUrlLoader(url=url, exclude_dirs=exclude_dirs)\n",
|
||||
"# Lazy load each\n",
|
||||
"docs = [print(doc) or doc for doc in loader.lazy_load()]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -138,22 +151,22 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"url = 'https://js.langchain.com/docs/'\n",
|
||||
"exclude_dirs=['https://js.langchain.com/docs/api/']\n",
|
||||
"loader=RecursiveUrlLoader(url=url,exclude_dirs=exclude_dirs)\n",
|
||||
"docs=loader.load()"
|
||||
"# Load all pages\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "457e30f3",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"176"
|
||||
"188"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
@@ -174,7 +187,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\n\\n\\n\\nHacker News | \\uf8ffü¶úÔ∏è\\uf8ffüîó Langchain\\n\\n\\n\\n\\n\\nSkip'"
|
||||
"'\\n\\n\\n\\n\\nAgent Simulations | 🦜️🔗 Langchain\\n\\n\\n\\n\\n\\nSkip t'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
@@ -195,9 +208,9 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'source': 'https://js.langchain.com/docs/modules/indexes/document_loaders/examples/web_loaders/hn',\n",
|
||||
" 'title': 'Hacker News | \\uf8ffü¶úÔ∏è\\uf8ffüîó Langchain',\n",
|
||||
" 'description': 'This example goes over how to load data from the hacker news website, using Cheerio. One document will be created for each page.',\n",
|
||||
"{'source': 'https://js.langchain.com/docs/use_cases/agent_simulations/',\n",
|
||||
" 'title': 'Agent Simulations | 🦜️🔗 Langchain',\n",
|
||||
" 'description': 'Agent simulations involve taking multiple agents and having them interact with each other.',\n",
|
||||
" 'language': 'en'}"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -33,9 +33,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredRSTLoader(\n",
|
||||
" file_path=\"example_data/README.rst\", mode=\"elements\"\n",
|
||||
")\n",
|
||||
"loader = UnstructuredRSTLoader(file_path=\"example_data/README.rst\", mode=\"elements\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -30,7 +30,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import warnings\n",
|
||||
"warnings.filterwarnings('ignore')\n",
|
||||
"\n",
|
||||
"warnings.filterwarnings(\"ignore\")\n",
|
||||
"from pprint import pprint\n",
|
||||
"from langchain.text_splitter import Language\n",
|
||||
"from langchain.document_loaders.generic import GenericLoader\n",
|
||||
@@ -48,7 +49,7 @@
|
||||
" \"./example_data/source_code\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".py\", \".js\"],\n",
|
||||
" parser=LanguageParser()\n",
|
||||
" parser=LanguageParser(),\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
@@ -200,7 +201,7 @@
|
||||
" \"./example_data/source_code\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".py\"],\n",
|
||||
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000)\n",
|
||||
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000),\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
@@ -281,7 +282,7 @@
|
||||
" \"./example_data/source_code\",\n",
|
||||
" glob=\"*\",\n",
|
||||
" suffixes=[\".js\"],\n",
|
||||
" parser=LanguageParser(language=Language.JS)\n",
|
||||
" parser=LanguageParser(language=Language.JS),\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
|
||||
@@ -43,10 +43,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"conf = CosConfig(\n",
|
||||
" Region=\"your cos region\",\n",
|
||||
" SecretId=\"your cos secret_id\",\n",
|
||||
" SecretKey=\"your cos secret_key\",\n",
|
||||
" )\n",
|
||||
" Region=\"your cos region\",\n",
|
||||
" SecretId=\"your cos secret_id\",\n",
|
||||
" SecretKey=\"your cos secret_key\",\n",
|
||||
")\n",
|
||||
"loader = TencentCOSDirectoryLoader(conf=conf, bucket=\"you_cos_bucket\")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -43,10 +43,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"conf = CosConfig(\n",
|
||||
" Region=\"your cos region\",\n",
|
||||
" SecretId=\"your cos secret_id\",\n",
|
||||
" SecretKey=\"your cos secret_key\",\n",
|
||||
" )\n",
|
||||
" Region=\"your cos region\",\n",
|
||||
" SecretId=\"your cos secret_id\",\n",
|
||||
" SecretKey=\"your cos secret_key\",\n",
|
||||
")\n",
|
||||
"loader = TencentCOSFileLoader(conf=conf, bucket=\"you_cos_bucket\", key=\"fake.docx\")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -0,0 +1,181 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# TSV\n",
|
||||
"\n",
|
||||
">A [tab-separated values (TSV)](https://en.wikipedia.org/wiki/Tab-separated_values) file is a simple, text-based file format for storing tabular data.[3] Records are separated by newlines, and values within a record are separated by tab characters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## `UnstructuredTSVLoader`\n",
|
||||
"\n",
|
||||
"You can also load the table using the `UnstructuredTSVLoader`. One advantage of using `UnstructuredTSVLoader` is that if you use it in `\"elements\"` mode, an HTML representation of the table will be available in the metadata."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.tsv import UnstructuredTSVLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredTSVLoader(\n",
|
||||
" file_path=\"example_data/mlb_teams_2012.csv\", mode=\"elements\"\n",
|
||||
")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <td>Nationals, 81.34, 98</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Reds, 82.20, 97</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Yankees, 197.96, 95</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Giants, 117.62, 94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Braves, 83.31, 94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Athletics, 55.37, 94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Rangers, 120.51, 93</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Orioles, 81.43, 93</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Rays, 64.17, 90</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Angels, 154.49, 89</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Tigers, 132.30, 88</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Cardinals, 110.30, 88</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Dodgers, 95.14, 86</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>White Sox, 96.92, 85</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Brewers, 97.65, 83</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Phillies, 174.54, 81</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Diamondbacks, 74.28, 81</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Pirates, 63.43, 79</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Padres, 55.24, 76</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Mariners, 81.97, 75</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Mets, 93.35, 74</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Blue Jays, 75.48, 73</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Royals, 60.91, 72</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Marlins, 118.07, 69</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Red Sox, 173.18, 69</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Indians, 78.43, 68</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Twins, 94.08, 66</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Rockies, 78.06, 64</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Cubs, 88.19, 61</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Astros, 60.65, 55</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].metadata[\"text_as_html\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -233,7 +233,8 @@
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
"id": "672264ad"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
@@ -241,16 +242,18 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = WebBaseLoader(\n",
|
||||
" \"https://www.walmart.com/search?q=parrots\", proxies={\n",
|
||||
" \"https://www.walmart.com/search?q=parrots\",\n",
|
||||
" proxies={\n",
|
||||
" \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n",
|
||||
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n",
|
||||
" }\n",
|
||||
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\",\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n"
|
||||
"docs = loader.load()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
"id": "9caf0310"
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -274,4 +277,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,304 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Xorbits Pandas DataFrame\n",
|
||||
"\n",
|
||||
"This notebook goes over how to load data from a [xorbits.pandas](https://doc.xorbits.io/en/latest/reference/pandas/frame.html) DataFrame."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install xorbits"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import xorbits.pandas as pd"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df = pd.read_csv(\"example_data/mlb_teams_2012.csv\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "b0d1d84e23c04f1296f63b3ea3dd1e5b",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>Team</th>\n",
|
||||
" <th>\"Payroll (millions)\"</th>\n",
|
||||
" <th>\"Wins\"</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Nationals</td>\n",
|
||||
" <td>81.34</td>\n",
|
||||
" <td>98</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>Reds</td>\n",
|
||||
" <td>82.20</td>\n",
|
||||
" <td>97</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>Yankees</td>\n",
|
||||
" <td>197.96</td>\n",
|
||||
" <td>95</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>Giants</td>\n",
|
||||
" <td>117.62</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>Braves</td>\n",
|
||||
" <td>83.31</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" Team \"Payroll (millions)\" \"Wins\"\n",
|
||||
"0 Nationals 81.34 98\n",
|
||||
"1 Reds 82.20 97\n",
|
||||
"2 Yankees 197.96 95\n",
|
||||
"3 Giants 117.62 94\n",
|
||||
"4 Braves 83.31 94"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import XorbitsLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = XorbitsLoader(df, page_content_column=\"Team\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "c8c8b67f1aae4a3c9de7734bb6cf738e",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Nationals', metadata={' \"Payroll (millions)\"': 81.34, ' \"Wins\"': 98}),\n",
|
||||
" Document(page_content='Reds', metadata={' \"Payroll (millions)\"': 82.2, ' \"Wins\"': 97}),\n",
|
||||
" Document(page_content='Yankees', metadata={' \"Payroll (millions)\"': 197.96, ' \"Wins\"': 95}),\n",
|
||||
" Document(page_content='Giants', metadata={' \"Payroll (millions)\"': 117.62, ' \"Wins\"': 94}),\n",
|
||||
" Document(page_content='Braves', metadata={' \"Payroll (millions)\"': 83.31, ' \"Wins\"': 94}),\n",
|
||||
" Document(page_content='Athletics', metadata={' \"Payroll (millions)\"': 55.37, ' \"Wins\"': 94}),\n",
|
||||
" Document(page_content='Rangers', metadata={' \"Payroll (millions)\"': 120.51, ' \"Wins\"': 93}),\n",
|
||||
" Document(page_content='Orioles', metadata={' \"Payroll (millions)\"': 81.43, ' \"Wins\"': 93}),\n",
|
||||
" Document(page_content='Rays', metadata={' \"Payroll (millions)\"': 64.17, ' \"Wins\"': 90}),\n",
|
||||
" Document(page_content='Angels', metadata={' \"Payroll (millions)\"': 154.49, ' \"Wins\"': 89}),\n",
|
||||
" Document(page_content='Tigers', metadata={' \"Payroll (millions)\"': 132.3, ' \"Wins\"': 88}),\n",
|
||||
" Document(page_content='Cardinals', metadata={' \"Payroll (millions)\"': 110.3, ' \"Wins\"': 88}),\n",
|
||||
" Document(page_content='Dodgers', metadata={' \"Payroll (millions)\"': 95.14, ' \"Wins\"': 86}),\n",
|
||||
" Document(page_content='White Sox', metadata={' \"Payroll (millions)\"': 96.92, ' \"Wins\"': 85}),\n",
|
||||
" Document(page_content='Brewers', metadata={' \"Payroll (millions)\"': 97.65, ' \"Wins\"': 83}),\n",
|
||||
" Document(page_content='Phillies', metadata={' \"Payroll (millions)\"': 174.54, ' \"Wins\"': 81}),\n",
|
||||
" Document(page_content='Diamondbacks', metadata={' \"Payroll (millions)\"': 74.28, ' \"Wins\"': 81}),\n",
|
||||
" Document(page_content='Pirates', metadata={' \"Payroll (millions)\"': 63.43, ' \"Wins\"': 79}),\n",
|
||||
" Document(page_content='Padres', metadata={' \"Payroll (millions)\"': 55.24, ' \"Wins\"': 76}),\n",
|
||||
" Document(page_content='Mariners', metadata={' \"Payroll (millions)\"': 81.97, ' \"Wins\"': 75}),\n",
|
||||
" Document(page_content='Mets', metadata={' \"Payroll (millions)\"': 93.35, ' \"Wins\"': 74}),\n",
|
||||
" Document(page_content='Blue Jays', metadata={' \"Payroll (millions)\"': 75.48, ' \"Wins\"': 73}),\n",
|
||||
" Document(page_content='Royals', metadata={' \"Payroll (millions)\"': 60.91, ' \"Wins\"': 72}),\n",
|
||||
" Document(page_content='Marlins', metadata={' \"Payroll (millions)\"': 118.07, ' \"Wins\"': 69}),\n",
|
||||
" Document(page_content='Red Sox', metadata={' \"Payroll (millions)\"': 173.18, ' \"Wins\"': 69}),\n",
|
||||
" Document(page_content='Indians', metadata={' \"Payroll (millions)\"': 78.43, ' \"Wins\"': 68}),\n",
|
||||
" Document(page_content='Twins', metadata={' \"Payroll (millions)\"': 94.08, ' \"Wins\"': 66}),\n",
|
||||
" Document(page_content='Rockies', metadata={' \"Payroll (millions)\"': 78.06, ' \"Wins\"': 64}),\n",
|
||||
" Document(page_content='Cubs', metadata={' \"Payroll (millions)\"': 88.19, ' \"Wins\"': 61}),\n",
|
||||
" Document(page_content='Astros', metadata={' \"Payroll (millions)\"': 60.65, ' \"Wins\"': 55})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "fc85c9f59b3644689d05853159fbd358",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0.00/100 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='Nationals' metadata={' \"Payroll (millions)\"': 81.34, ' \"Wins\"': 98}\n",
|
||||
"page_content='Reds' metadata={' \"Payroll (millions)\"': 82.2, ' \"Wins\"': 97}\n",
|
||||
"page_content='Yankees' metadata={' \"Payroll (millions)\"': 197.96, ' \"Wins\"': 95}\n",
|
||||
"page_content='Giants' metadata={' \"Payroll (millions)\"': 117.62, ' \"Wins\"': 94}\n",
|
||||
"page_content='Braves' metadata={' \"Payroll (millions)\"': 83.31, ' \"Wins\"': 94}\n",
|
||||
"page_content='Athletics' metadata={' \"Payroll (millions)\"': 55.37, ' \"Wins\"': 94}\n",
|
||||
"page_content='Rangers' metadata={' \"Payroll (millions)\"': 120.51, ' \"Wins\"': 93}\n",
|
||||
"page_content='Orioles' metadata={' \"Payroll (millions)\"': 81.43, ' \"Wins\"': 93}\n",
|
||||
"page_content='Rays' metadata={' \"Payroll (millions)\"': 64.17, ' \"Wins\"': 90}\n",
|
||||
"page_content='Angels' metadata={' \"Payroll (millions)\"': 154.49, ' \"Wins\"': 89}\n",
|
||||
"page_content='Tigers' metadata={' \"Payroll (millions)\"': 132.3, ' \"Wins\"': 88}\n",
|
||||
"page_content='Cardinals' metadata={' \"Payroll (millions)\"': 110.3, ' \"Wins\"': 88}\n",
|
||||
"page_content='Dodgers' metadata={' \"Payroll (millions)\"': 95.14, ' \"Wins\"': 86}\n",
|
||||
"page_content='White Sox' metadata={' \"Payroll (millions)\"': 96.92, ' \"Wins\"': 85}\n",
|
||||
"page_content='Brewers' metadata={' \"Payroll (millions)\"': 97.65, ' \"Wins\"': 83}\n",
|
||||
"page_content='Phillies' metadata={' \"Payroll (millions)\"': 174.54, ' \"Wins\"': 81}\n",
|
||||
"page_content='Diamondbacks' metadata={' \"Payroll (millions)\"': 74.28, ' \"Wins\"': 81}\n",
|
||||
"page_content='Pirates' metadata={' \"Payroll (millions)\"': 63.43, ' \"Wins\"': 79}\n",
|
||||
"page_content='Padres' metadata={' \"Payroll (millions)\"': 55.24, ' \"Wins\"': 76}\n",
|
||||
"page_content='Mariners' metadata={' \"Payroll (millions)\"': 81.97, ' \"Wins\"': 75}\n",
|
||||
"page_content='Mets' metadata={' \"Payroll (millions)\"': 93.35, ' \"Wins\"': 74}\n",
|
||||
"page_content='Blue Jays' metadata={' \"Payroll (millions)\"': 75.48, ' \"Wins\"': 73}\n",
|
||||
"page_content='Royals' metadata={' \"Payroll (millions)\"': 60.91, ' \"Wins\"': 72}\n",
|
||||
"page_content='Marlins' metadata={' \"Payroll (millions)\"': 118.07, ' \"Wins\"': 69}\n",
|
||||
"page_content='Red Sox' metadata={' \"Payroll (millions)\"': 173.18, ' \"Wins\"': 69}\n",
|
||||
"page_content='Indians' metadata={' \"Payroll (millions)\"': 78.43, ' \"Wins\"': 68}\n",
|
||||
"page_content='Twins' metadata={' \"Payroll (millions)\"': 94.08, ' \"Wins\"': 66}\n",
|
||||
"page_content='Rockies' metadata={' \"Payroll (millions)\"': 78.06, ' \"Wins\"': 64}\n",
|
||||
"page_content='Cubs' metadata={' \"Payroll (millions)\"': 88.19, ' \"Wins\"': 61}\n",
|
||||
"page_content='Astros' metadata={' \"Payroll (millions)\"': 60.65, ' \"Wins\"': 55}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Use lazy load for larger table, which won't read the full table into memory\n",
|
||||
"for i in loader.lazy_load():\n",
|
||||
" print(i)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "base",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.13"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1 @@
|
||||
label: 'Integrations'
|
||||
@@ -0,0 +1,269 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Doctran Extract Properties\n",
|
||||
"\n",
|
||||
"We can extract useful features of documents using the [Doctran](https://github.com/psychic-api/doctran) library, which uses OpenAI's function calling feature to extract specific metadata.\n",
|
||||
"\n",
|
||||
"Extracting metadata from documents is helpful for a variety of tasks, including:\n",
|
||||
"* Classification: classifying documents into different categories\n",
|
||||
"* Data mining: Extract structured data that can be used for data analysis\n",
|
||||
"* Style transfer: Change the way text is written to more closely match expected user input, improving vector search results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install doctran"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.document_transformers import DoctranPropertyExtractor"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from dotenv import load_dotenv\n",
|
||||
"\n",
|
||||
"load_dotenv()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Input\n",
|
||||
"This is the document we'll extract properties from."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Generated with ChatGPT]\n",
|
||||
"\n",
|
||||
"Confidential Document - For Internal Use Only\n",
|
||||
"\n",
|
||||
"Date: July 1, 2023\n",
|
||||
"\n",
|
||||
"Subject: Updates and Discussions on Various Topics\n",
|
||||
"\n",
|
||||
"Dear Team,\n",
|
||||
"\n",
|
||||
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
|
||||
"\n",
|
||||
"Security and Privacy Measures\n",
|
||||
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
|
||||
"\n",
|
||||
"HR Updates and Employee Benefits\n",
|
||||
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Marketing Initiatives and Campaigns\n",
|
||||
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
|
||||
"\n",
|
||||
"Research and Development Projects\n",
|
||||
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
|
||||
"\n",
|
||||
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
|
||||
"\n",
|
||||
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofounder & CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"sample_text = \"\"\"[Generated with ChatGPT]\n",
|
||||
"\n",
|
||||
"Confidential Document - For Internal Use Only\n",
|
||||
"\n",
|
||||
"Date: July 1, 2023\n",
|
||||
"\n",
|
||||
"Subject: Updates and Discussions on Various Topics\n",
|
||||
"\n",
|
||||
"Dear Team,\n",
|
||||
"\n",
|
||||
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
|
||||
"\n",
|
||||
"Security and Privacy Measures\n",
|
||||
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
|
||||
"\n",
|
||||
"HR Updates and Employee Benefits\n",
|
||||
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Marketing Initiatives and Campaigns\n",
|
||||
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
|
||||
"\n",
|
||||
"Research and Development Projects\n",
|
||||
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
|
||||
"\n",
|
||||
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
|
||||
"\n",
|
||||
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofounder & CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n",
|
||||
"\"\"\"\n",
|
||||
"print(sample_text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"documents = [Document(page_content=sample_text)]\n",
|
||||
"properties = [\n",
|
||||
" {\n",
|
||||
" \"name\": \"category\",\n",
|
||||
" \"description\": \"What type of email this is.\",\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" \"enum\": [\"update\", \"action_item\", \"customer_feedback\", \"announcement\", \"other\"],\n",
|
||||
" \"required\": True,\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"mentions\",\n",
|
||||
" \"description\": \"A list of all people mentioned in this email.\",\n",
|
||||
" \"type\": \"array\",\n",
|
||||
" \"items\": {\n",
|
||||
" \"name\": \"full_name\",\n",
|
||||
" \"description\": \"The full name of the person mentioned.\",\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" },\n",
|
||||
" \"required\": True,\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"eli5\",\n",
|
||||
" \"description\": \"Explain this email to me like I'm 5 years old.\",\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" \"required\": True,\n",
|
||||
" },\n",
|
||||
"]\n",
|
||||
"property_extractor = DoctranPropertyExtractor(properties=properties)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Output\n",
|
||||
"After extracting properties from a document, the result will be returned as a new document with properties provided in the metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"extracted_document = await property_extractor.atransform_documents(\n",
|
||||
" documents, properties=properties\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{\n",
|
||||
" \"extracted_properties\": {\n",
|
||||
" \"category\": \"update\",\n",
|
||||
" \"mentions\": [\n",
|
||||
" \"John Doe\",\n",
|
||||
" \"Jane Smith\",\n",
|
||||
" \"Michael Johnson\",\n",
|
||||
" \"Sarah Thompson\",\n",
|
||||
" \"David Rodriguez\",\n",
|
||||
" \"Jason Fan\"\n",
|
||||
" ],\n",
|
||||
" \"eli5\": \"This is an email from the CEO, Jason Fan, giving updates about different areas in the company. He talks about new security measures and praises John Doe for his work. He also mentions new hires and praises Jane Smith for her work in customer service. The CEO reminds everyone about the upcoming benefits enrollment and says to contact Michael Johnson with any questions. He talks about the marketing team's work and praises Sarah Thompson for increasing their social media followers. There's also a product launch event on July 15th. Lastly, he talks about the research and development projects and praises David Rodriguez for his work. There's a brainstorming session on July 10th.\"\n",
|
||||
" }\n",
|
||||
"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(json.dumps(extracted_document[0].metadata, indent=2))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,266 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Doctran Interrogate Documents\n",
|
||||
"Documents used in a vector store knowledge base are typically stored in narrative or conversational format. However, most user queries are in question format. If we convert documents into Q&A format before vectorizing them, we can increase the liklihood of retrieving relevant documents, and decrease the liklihood of retrieving irrelevant documents.\n",
|
||||
"\n",
|
||||
"We can accomplish this using the [Doctran](https://github.com/psychic-api/doctran) library, which uses OpenAI's function calling feature to \"interrogate\" documents.\n",
|
||||
"\n",
|
||||
"See [this notebook](https://github.com/psychic-api/doctran/blob/main/benchmark.ipynb) for benchmarks on vector similarity scores for various queries based on raw documents versus interrogated documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install doctran"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.document_transformers import DoctranQATransformer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from dotenv import load_dotenv\n",
|
||||
"\n",
|
||||
"load_dotenv()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Input\n",
|
||||
"This is the document we'll interrogate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Generated with ChatGPT]\n",
|
||||
"\n",
|
||||
"Confidential Document - For Internal Use Only\n",
|
||||
"\n",
|
||||
"Date: July 1, 2023\n",
|
||||
"\n",
|
||||
"Subject: Updates and Discussions on Various Topics\n",
|
||||
"\n",
|
||||
"Dear Team,\n",
|
||||
"\n",
|
||||
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
|
||||
"\n",
|
||||
"Security and Privacy Measures\n",
|
||||
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
|
||||
"\n",
|
||||
"HR Updates and Employee Benefits\n",
|
||||
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Marketing Initiatives and Campaigns\n",
|
||||
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
|
||||
"\n",
|
||||
"Research and Development Projects\n",
|
||||
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
|
||||
"\n",
|
||||
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
|
||||
"\n",
|
||||
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofounder & CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"sample_text = \"\"\"[Generated with ChatGPT]\n",
|
||||
"\n",
|
||||
"Confidential Document - For Internal Use Only\n",
|
||||
"\n",
|
||||
"Date: July 1, 2023\n",
|
||||
"\n",
|
||||
"Subject: Updates and Discussions on Various Topics\n",
|
||||
"\n",
|
||||
"Dear Team,\n",
|
||||
"\n",
|
||||
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
|
||||
"\n",
|
||||
"Security and Privacy Measures\n",
|
||||
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
|
||||
"\n",
|
||||
"HR Updates and Employee Benefits\n",
|
||||
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Marketing Initiatives and Campaigns\n",
|
||||
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
|
||||
"\n",
|
||||
"Research and Development Projects\n",
|
||||
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
|
||||
"\n",
|
||||
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
|
||||
"\n",
|
||||
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofounder & CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n",
|
||||
"\"\"\"\n",
|
||||
"print(sample_text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"documents = [Document(page_content=sample_text)]\n",
|
||||
"qa_transformer = DoctranQATransformer()\n",
|
||||
"transformed_document = await qa_transformer.atransform_documents(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Output\n",
|
||||
"After interrogating a document, the result will be returned as a new document with questions and answers provided in the metadata."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{\n",
|
||||
" \"questions_and_answers\": [\n",
|
||||
" {\n",
|
||||
" \"question\": \"What is the purpose of this document?\",\n",
|
||||
" \"answer\": \"The purpose of this document is to provide important updates and discuss various topics that require the team's attention.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who is responsible for enhancing the network security?\",\n",
|
||||
" \"answer\": \"John Doe from the IT department is responsible for enhancing the network security.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Where should potential security risks or incidents be reported?\",\n",
|
||||
" \"answer\": \"Potential security risks or incidents should be reported to the dedicated team at security@example.com.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who has been recognized for outstanding performance in customer service?\",\n",
|
||||
" \"answer\": \"Jane Smith has been recognized for her outstanding performance in customer service.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"When is the open enrollment period for the employee benefits program?\",\n",
|
||||
" \"answer\": \"The document does not specify the exact dates for the open enrollment period for the employee benefits program, but it mentions that it is fast approaching.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who should be contacted for questions or assistance regarding the employee benefits program?\",\n",
|
||||
" \"answer\": \"For questions or assistance regarding the employee benefits program, the HR representative, Michael Johnson, should be contacted.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who has been acknowledged for managing the company's social media platforms?\",\n",
|
||||
" \"answer\": \"Sarah Thompson has been acknowledged for managing the company's social media platforms.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"When is the upcoming product launch event?\",\n",
|
||||
" \"answer\": \"The upcoming product launch event is on July 15th.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who has been recognized for their contributions to the development of the company's technology?\",\n",
|
||||
" \"answer\": \"David Rodriguez has been recognized for his contributions to the development of the company's technology.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"When is the monthly R&D brainstorming session?\",\n",
|
||||
" \"answer\": \"The monthly R&D brainstorming session is scheduled for July 10th.\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"question\": \"Who should be contacted for questions or concerns regarding the topics discussed in the document?\",\n",
|
||||
" \"answer\": \"For questions or concerns regarding the topics discussed in the document, Jason Fan, the Cofounder & CEO, should be contacted.\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"transformed_document = await qa_transformer.atransform_documents(documents)\n",
|
||||
"print(json.dumps(transformed_document[0].metadata, indent=2))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,208 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Doctran Translate Documents\n",
|
||||
"Comparing documents through embeddings has the benefit of working across multiple languages. \"Harrison says hello\" and \"Harrison dice hola\" will occupy similar positions in the vector space because they have the same meaning semantically.\n",
|
||||
"\n",
|
||||
"However, it can still be useful to use a LLM translate documents into other languages before vectorizing them. This is especially helpful when users are expected to query the knowledge base in different languages, or when state of the art embeddings models are not available for a given language.\n",
|
||||
"\n",
|
||||
"We can accomplish this using the [Doctran](https://github.com/psychic-api/doctran) library, which uses OpenAI's function calling feature to translate documents between languages."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install doctran"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.document_transformers import DoctranTextTranslator"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from dotenv import load_dotenv\n",
|
||||
"\n",
|
||||
"load_dotenv()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Input\n",
|
||||
"This is the document we'll translate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sample_text = \"\"\"[Generated with ChatGPT]\n",
|
||||
"\n",
|
||||
"Confidential Document - For Internal Use Only\n",
|
||||
"\n",
|
||||
"Date: July 1, 2023\n",
|
||||
"\n",
|
||||
"Subject: Updates and Discussions on Various Topics\n",
|
||||
"\n",
|
||||
"Dear Team,\n",
|
||||
"\n",
|
||||
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
|
||||
"\n",
|
||||
"Security and Privacy Measures\n",
|
||||
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
|
||||
"\n",
|
||||
"HR Updates and Employee Benefits\n",
|
||||
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Marketing Initiatives and Campaigns\n",
|
||||
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
|
||||
"\n",
|
||||
"Research and Development Projects\n",
|
||||
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
|
||||
"\n",
|
||||
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
|
||||
"\n",
|
||||
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
|
||||
"\n",
|
||||
"Best regards,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofounder & CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"documents = [Document(page_content=sample_text)]\n",
|
||||
"qa_translator = DoctranTextTranslator(language=\"spanish\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Output\n",
|
||||
"After translating a document, the result will be returned as a new document with the page_content translated into the target language"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"translated_document = await qa_translator.atransform_documents(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Generado con ChatGPT]\n",
|
||||
"\n",
|
||||
"Documento confidencial - Solo para uso interno\n",
|
||||
"\n",
|
||||
"Fecha: 1 de julio de 2023\n",
|
||||
"\n",
|
||||
"Asunto: Actualizaciones y discusiones sobre varios temas\n",
|
||||
"\n",
|
||||
"Estimado equipo,\n",
|
||||
"\n",
|
||||
"Espero que este correo electrónico les encuentre bien. En este documento, me gustaría proporcionarles algunas actualizaciones importantes y discutir varios temas que requieren nuestra atención. Por favor, traten la información contenida aquí como altamente confidencial.\n",
|
||||
"\n",
|
||||
"Medidas de seguridad y privacidad\n",
|
||||
"Como parte de nuestro compromiso continuo para garantizar la seguridad y privacidad de los datos de nuestros clientes, hemos implementado medidas robustas en todos nuestros sistemas. Nos gustaría elogiar a John Doe (correo electrónico: john.doe@example.com) del departamento de TI por su diligente trabajo en mejorar nuestra seguridad de red. En adelante, recordamos amablemente a todos que se adhieran estrictamente a nuestras políticas y directrices de protección de datos. Además, si se encuentran con cualquier riesgo de seguridad o incidente potencial, por favor repórtelo inmediatamente a nuestro equipo dedicado en security@example.com.\n",
|
||||
"\n",
|
||||
"Actualizaciones de RRHH y beneficios para empleados\n",
|
||||
"Recientemente, dimos la bienvenida a varios nuevos miembros del equipo que han hecho contribuciones significativas a sus respectivos departamentos. Me gustaría reconocer a Jane Smith (SSN: 049-45-5928) por su sobresaliente rendimiento en el servicio al cliente. Jane ha recibido constantemente comentarios positivos de nuestros clientes. Además, recuerden que el período de inscripción abierta para nuestro programa de beneficios para empleados se acerca rápidamente. Si tienen alguna pregunta o necesitan asistencia, por favor contacten a nuestro representante de RRHH, Michael Johnson (teléfono: 418-492-3850, correo electrónico: michael.johnson@example.com).\n",
|
||||
"\n",
|
||||
"Iniciativas y campañas de marketing\n",
|
||||
"Nuestro equipo de marketing ha estado trabajando activamente en el desarrollo de nuevas estrategias para aumentar la conciencia de marca y fomentar la participación del cliente. Nos gustaría agradecer a Sarah Thompson (teléfono: 415-555-1234) por sus excepcionales esfuerzos en la gestión de nuestras plataformas de redes sociales. Sarah ha aumentado con éxito nuestra base de seguidores en un 20% solo en el último mes. Además, por favor marquen sus calendarios para el próximo evento de lanzamiento de producto el 15 de julio. Animamos a todos los miembros del equipo a asistir y apoyar este emocionante hito para nuestra empresa.\n",
|
||||
"\n",
|
||||
"Proyectos de investigación y desarrollo\n",
|
||||
"En nuestra búsqueda de la innovación, nuestro departamento de investigación y desarrollo ha estado trabajando incansablemente en varios proyectos. Me gustaría reconocer el excepcional trabajo de David Rodríguez (correo electrónico: david.rodriguez@example.com) en su papel de líder de proyecto. Las contribuciones de David al desarrollo de nuestra tecnología de vanguardia han sido fundamentales. Además, nos gustaría recordar a todos que compartan sus ideas y sugerencias para posibles nuevos proyectos durante nuestra sesión de lluvia de ideas de I+D mensual, programada para el 10 de julio.\n",
|
||||
"\n",
|
||||
"Por favor, traten la información de este documento con la máxima confidencialidad y asegúrense de que no se comparte con personas no autorizadas. Si tienen alguna pregunta o inquietud sobre los temas discutidos, no duden en ponerse en contacto conmigo directamente.\n",
|
||||
"\n",
|
||||
"Gracias por su atención, y sigamos trabajando juntos para alcanzar nuestros objetivos.\n",
|
||||
"\n",
|
||||
"Saludos cordiales,\n",
|
||||
"\n",
|
||||
"Jason Fan\n",
|
||||
"Cofundador y CEO\n",
|
||||
"Psychic\n",
|
||||
"jason@psychic.dev\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(translated_document[0].page_content)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,261 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAI Functions Metadata Tagger\n",
|
||||
"\n",
|
||||
"It can often be useful to tag ingested documents with structured metadata, such as the title, tone, or length of a document, to allow for more targeted similarity search later. However, for large numbers of documents, performing this labelling process manually can be tedious.\n",
|
||||
"\n",
|
||||
"The `OpenAIMetadataTagger` document transformer automates this process by extracting metadata from each provided document according to a provided schema. It uses a configurable OpenAI Functions-powered chain under the hood, so if you pass a custom LLM instance, it must be an OpenAI model with functions support. \n",
|
||||
"\n",
|
||||
"**Note:** This document transformer works best with complete documents, so it's best to run it first with whole documents before doing any other splitting or processing!\n",
|
||||
"\n",
|
||||
"For example, let's say you wanted to index a set of movie reviews. You could initialize the document transformer with a valid JSON Schema object as follows:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.document_transformers.openai_functions import create_metadata_tagger"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"movie_title\": {\"type\": \"string\"},\n",
|
||||
" \"critic\": {\"type\": \"string\"},\n",
|
||||
" \"tone\": {\"type\": \"string\", \"enum\": [\"positive\", \"negative\"]},\n",
|
||||
" \"rating\": {\n",
|
||||
" \"type\": \"integer\",\n",
|
||||
" \"description\": \"The number of stars the critic rated the movie\",\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" \"required\": [\"movie_title\", \"critic\", \"tone\"],\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# Must be an OpenAI model that supports functions\n",
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||
"\n",
|
||||
"document_transformer = create_metadata_tagger(metadata_schema=schema, llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can then simply pass the document transformer a list of documents, and it will extract metadata from the contents:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"original_documents = [\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Review of The Bee Movie\\nBy Roger Ebert\\n\\nThis is the greatest movie ever made. 4 out of 5 stars.\"\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Review of The Godfather\\nBy Anonymous\\n\\nThis movie was super boring. 1 out of 5 stars.\",\n",
|
||||
" metadata={\"reliable\": False},\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"enhanced_documents = document_transformer.transform_documents(original_documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Review of The Bee Movie\n",
|
||||
"By Roger Ebert\n",
|
||||
"\n",
|
||||
"This is the greatest movie ever made. 4 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Bee Movie\", \"critic\": \"Roger Ebert\", \"tone\": \"positive\", \"rating\": 4}\n",
|
||||
"\n",
|
||||
"---------------\n",
|
||||
"\n",
|
||||
"Review of The Godfather\n",
|
||||
"By Anonymous\n",
|
||||
"\n",
|
||||
"This movie was super boring. 1 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Godfather\", \"critic\": \"Anonymous\", \"tone\": \"negative\", \"rating\": 1, \"reliable\": false}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"\n",
|
||||
"print(\n",
|
||||
" *[d.page_content + \"\\n\\n\" + json.dumps(d.metadata) for d in enhanced_documents],\n",
|
||||
" sep=\"\\n\\n---------------\\n\\n\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The new documents can then be further processed by a text splitter before being loaded into a vector store. Extracted fields will not overwrite existing metadata.\n",
|
||||
"\n",
|
||||
"You can also initialize the document transformer with a Pydantic schema:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Review of The Bee Movie\n",
|
||||
"By Roger Ebert\n",
|
||||
"\n",
|
||||
"This is the greatest movie ever made. 4 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Bee Movie\", \"critic\": \"Roger Ebert\", \"tone\": \"positive\", \"rating\": 4}\n",
|
||||
"\n",
|
||||
"---------------\n",
|
||||
"\n",
|
||||
"Review of The Godfather\n",
|
||||
"By Anonymous\n",
|
||||
"\n",
|
||||
"This movie was super boring. 1 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Godfather\", \"critic\": \"Anonymous\", \"tone\": \"negative\", \"rating\": 1, \"reliable\": false}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from typing import Literal\n",
|
||||
"\n",
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Properties(BaseModel):\n",
|
||||
" movie_title: str\n",
|
||||
" critic: str\n",
|
||||
" tone: Literal[\"positive\", \"negative\"]\n",
|
||||
" rating: int = Field(description=\"Rating out of 5 stars\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"document_transformer = create_metadata_tagger(Properties, llm)\n",
|
||||
"enhanced_documents = document_transformer.transform_documents(original_documents)\n",
|
||||
"\n",
|
||||
"print(\n",
|
||||
" *[d.page_content + \"\\n\\n\" + json.dumps(d.metadata) for d in enhanced_documents],\n",
|
||||
" sep=\"\\n\\n---------------\\n\\n\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"\n",
|
||||
"## Customization\n",
|
||||
"\n",
|
||||
"You can pass the underlying tagging chain the standard LLMChain arguments in the document transformer constructor. For example, if you wanted to ask the LLM to focus specific details in the input documents, or extract metadata in a certain style, you could pass in a custom prompt:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Review of The Bee Movie\n",
|
||||
"By Roger Ebert\n",
|
||||
"\n",
|
||||
"This is the greatest movie ever made. 4 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Bee Movie\", \"critic\": \"Roger Ebert\", \"tone\": \"positive\", \"rating\": 4}\n",
|
||||
"\n",
|
||||
"---------------\n",
|
||||
"\n",
|
||||
"Review of The Godfather\n",
|
||||
"By Anonymous\n",
|
||||
"\n",
|
||||
"This movie was super boring. 1 out of 5 stars.\n",
|
||||
"\n",
|
||||
"{\"movie_title\": \"The Godfather\", \"critic\": \"Roger Ebert\", \"tone\": \"negative\", \"rating\": 1, \"reliable\": false}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(\n",
|
||||
" \"\"\"Extract relevant information from the following text.\n",
|
||||
"Anonymous critics are actually Roger Ebert.\n",
|
||||
"\n",
|
||||
"{input}\n",
|
||||
"\"\"\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"document_transformer = create_metadata_tagger(schema, llm, prompt=prompt)\n",
|
||||
"enhanced_documents = document_transformer.transform_documents(original_documents)\n",
|
||||
"\n",
|
||||
"print(\n",
|
||||
" *[d.page_content + \"\\n\\n\" + json.dumps(d.metadata) for d in enhanced_documents],\n",
|
||||
" sep=\"\\n\\n---------------\\n\\n\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -155,9 +155,12 @@
|
||||
"\n",
|
||||
"# Char-level splits\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"\n",
|
||||
"chunk_size = 250\n",
|
||||
"chunk_overlap = 30\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(\n",
|
||||
" chunk_size=chunk_size, chunk_overlap=chunk_overlap\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Split\n",
|
||||
"splits = text_splitter.split_documents(md_header_splits)\n",
|
||||
|
||||
@@ -28,14 +28,14 @@
|
||||
"# Load blog post\n",
|
||||
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
|
||||
"data = loader.load()\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"# Split\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
|
||||
"splits = text_splitter.split_documents(data)\n",
|
||||
"\n",
|
||||
"# VectorDB\n",
|
||||
"embedding = OpenAIEmbeddings()\n",
|
||||
"vectordb = Chroma.from_documents(documents=splits,embedding=embedding)"
|
||||
"vectordb = Chroma.from_documents(documents=splits, embedding=embedding)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -57,9 +57,12 @@
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.retrievers.multi_query import MultiQueryRetriever\n",
|
||||
"question=\"What are the approaches to Task Decomposition?\"\n",
|
||||
"\n",
|
||||
"question = \"What are the approaches to Task Decomposition?\"\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectordb.as_retriever(),llm=llm)"
|
||||
"retriever_from_llm = MultiQueryRetriever.from_llm(\n",
|
||||
" retriever=vectordb.as_retriever(), llm=llm\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -71,8 +74,9 @@
|
||||
"source": [
|
||||
"# Set logging for the queries\n",
|
||||
"import logging\n",
|
||||
"\n",
|
||||
"logging.basicConfig()\n",
|
||||
"logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)"
|
||||
"logging.getLogger(\"langchain.retrievers.multi_query\").setLevel(logging.INFO)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -127,20 +131,24 @@
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.output_parsers import PydanticOutputParser\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Output parser will split the LLM result into a list of queries\n",
|
||||
"class LineList(BaseModel):\n",
|
||||
" # \"lines\" is the key (attribute name) of the parsed output\n",
|
||||
" lines: List[str] = Field(description=\"Lines of text\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class LineListOutputParser(PydanticOutputParser):\n",
|
||||
" def __init__(self) -> None:\n",
|
||||
" super().__init__(pydantic_object=LineList)\n",
|
||||
"\n",
|
||||
" def parse(self, text: str) -> LineList:\n",
|
||||
" lines = text.strip().split(\"\\n\")\n",
|
||||
" return LineList(lines=lines)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"output_parser = LineListOutputParser()\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"QUERY_PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"question\"],\n",
|
||||
" template=\"\"\"You are an AI language model assistant. Your task is to generate five \n",
|
||||
@@ -153,10 +161,10 @@
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"\n",
|
||||
"# Chain\n",
|
||||
"llm_chain = LLMChain(llm=llm,prompt=QUERY_PROMPT,output_parser=output_parser)\n",
|
||||
" \n",
|
||||
"llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)\n",
|
||||
"\n",
|
||||
"# Other inputs\n",
|
||||
"question=\"What are the approaches to Task Decomposition?\""
|
||||
"question = \"What are the approaches to Task Decomposition?\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -185,12 +193,14 @@
|
||||
],
|
||||
"source": [
|
||||
"# Run\n",
|
||||
"retriever = MultiQueryRetriever(retriever=vectordb.as_retriever(), \n",
|
||||
" llm_chain=llm_chain,\n",
|
||||
" parser_key=\"lines\") # \"lines\" is the key (attribute name) of the parsed output\n",
|
||||
"retriever = MultiQueryRetriever(\n",
|
||||
" retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key=\"lines\"\n",
|
||||
") # \"lines\" is the key (attribute name) of the parsed output\n",
|
||||
"\n",
|
||||
"# Results\n",
|
||||
"unique_docs = retriever.get_relevant_documents(query=\"What does the course say about regression?\")\n",
|
||||
"unique_docs = retriever.get_relevant_documents(\n",
|
||||
" query=\"What does the course say about regression?\"\n",
|
||||
")\n",
|
||||
"len(unique_docs)"
|
||||
]
|
||||
}
|
||||
|
||||
@@ -59,11 +59,11 @@
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
|
||||
"os.environ['MYSCALE_HOST'] = getpass.getpass('MyScale URL:')\n",
|
||||
"os.environ['MYSCALE_PORT'] = getpass.getpass('MyScale Port:')\n",
|
||||
"os.environ['MYSCALE_USERNAME'] = getpass.getpass('MyScale Username:')\n",
|
||||
"os.environ['MYSCALE_PASSWORD'] = getpass.getpass('MyScale Password:')"
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",
|
||||
"os.environ[\"MYSCALE_HOST\"] = getpass.getpass(\"MyScale URL:\")\n",
|
||||
"os.environ[\"MYSCALE_PORT\"] = getpass.getpass(\"MyScale Port:\")\n",
|
||||
"os.environ[\"MYSCALE_USERNAME\"] = getpass.getpass(\"MyScale Username:\")\n",
|
||||
"os.environ[\"MYSCALE_PASSWORD\"] = getpass.getpass(\"MyScale Password:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -103,16 +103,40 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = [\n",
|
||||
" Document(page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\", metadata={\"date\": \"1993-07-02\", \"rating\": 7.7, \"genre\": [\"science fiction\"]}),\n",
|
||||
" Document(page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\", metadata={\"date\": \"2010-12-30\", \"director\": \"Christopher Nolan\", \"rating\": 8.2}),\n",
|
||||
" Document(page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\", metadata={\"date\": \"2006-04-23\", \"director\": \"Satoshi Kon\", \"rating\": 8.6}),\n",
|
||||
" Document(page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\", metadata={\"date\": \"2019-08-22\", \"director\": \"Greta Gerwig\", \"rating\": 8.3}),\n",
|
||||
" Document(page_content=\"Toys come alive and have a blast doing so\", metadata={\"date\": \"1995-02-11\", \"genre\": [\"animated\"]}),\n",
|
||||
" Document(page_content=\"Three men walk into the Zone, three men walk out of the Zone\", metadata={\"date\": \"1979-09-10\", \"rating\": 9.9, \"director\": \"Andrei Tarkovsky\", \"genre\": [\"science fiction\", \"adventure\"], \"rating\": 9.9})\n",
|
||||
" Document(\n",
|
||||
" page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n",
|
||||
" metadata={\"date\": \"1993-07-02\", \"rating\": 7.7, \"genre\": [\"science fiction\"]},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n",
|
||||
" metadata={\"date\": \"2010-12-30\", \"director\": \"Christopher Nolan\", \"rating\": 8.2},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n",
|
||||
" metadata={\"date\": \"2006-04-23\", \"director\": \"Satoshi Kon\", \"rating\": 8.6},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n",
|
||||
" metadata={\"date\": \"2019-08-22\", \"director\": \"Greta Gerwig\", \"rating\": 8.3},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Toys come alive and have a blast doing so\",\n",
|
||||
" metadata={\"date\": \"1995-02-11\", \"genre\": [\"animated\"]},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n",
|
||||
" metadata={\n",
|
||||
" \"date\": \"1979-09-10\",\n",
|
||||
" \"rating\": 9.9,\n",
|
||||
" \"director\": \"Andrei Tarkovsky\",\n",
|
||||
" \"genre\": [\"science fiction\", \"adventure\"],\n",
|
||||
" \"rating\": 9.9,\n",
|
||||
" },\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"vectorstore = MyScale.from_documents(\n",
|
||||
" docs, \n",
|
||||
" embeddings, \n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -138,39 +162,39 @@
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"\n",
|
||||
"metadata_field_info=[\n",
|
||||
"metadata_field_info = [\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"genre\",\n",
|
||||
" description=\"The genres of the movie\", \n",
|
||||
" type=\"list[string]\", \n",
|
||||
" description=\"The genres of the movie\",\n",
|
||||
" type=\"list[string]\",\n",
|
||||
" ),\n",
|
||||
" # If you want to include length of a list, just define it as a new column\n",
|
||||
" # This will teach the LLM to use it as a column when constructing filter.\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"length(genre)\",\n",
|
||||
" description=\"The length of genres of the movie\", \n",
|
||||
" type=\"integer\", \n",
|
||||
" description=\"The length of genres of the movie\",\n",
|
||||
" type=\"integer\",\n",
|
||||
" ),\n",
|
||||
" # Now you can define a column as timestamp. By simply set the type to timestamp.\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"date\",\n",
|
||||
" description=\"The date the movie was released\", \n",
|
||||
" type=\"timestamp\", \n",
|
||||
" description=\"The date the movie was released\",\n",
|
||||
" type=\"timestamp\",\n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"director\",\n",
|
||||
" description=\"The name of the movie director\", \n",
|
||||
" type=\"string\", \n",
|
||||
" description=\"The name of the movie director\",\n",
|
||||
" type=\"string\",\n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"rating\",\n",
|
||||
" description=\"A 1-10 rating for the movie\",\n",
|
||||
" type=\"float\"\n",
|
||||
" name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"document_content_description = \"Brief summary of a movie\"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)"
|
||||
"retriever = SelfQueryRetriever.from_llm(\n",
|
||||
" llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -225,7 +249,9 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This example specifies a composite filter\n",
|
||||
"retriever.get_relevant_documents(\"What's a highly rated (above 8.5) science fiction film?\")"
|
||||
"retriever.get_relevant_documents(\n",
|
||||
" \"What's a highly rated (above 8.5) science fiction film?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -236,7 +262,9 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This example specifies a query and composite filter\n",
|
||||
"retriever.get_relevant_documents(\"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\")"
|
||||
"retriever.get_relevant_documents(\n",
|
||||
" \"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -290,7 +318,9 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Contain works for lists: so you can match a list with contain comparator!\n",
|
||||
"retriever.get_relevant_documents(\"What's a movie who has genres science fiction and adventure?\")"
|
||||
"retriever.get_relevant_documents(\n",
|
||||
" \"What's a movie who has genres science fiction and adventure?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -315,12 +345,12 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = SelfQueryRetriever.from_llm(\n",
|
||||
" llm, \n",
|
||||
" vectorstore, \n",
|
||||
" document_content_description, \n",
|
||||
" metadata_field_info, \n",
|
||||
" llm,\n",
|
||||
" vectorstore,\n",
|
||||
" document_content_description,\n",
|
||||
" metadata_field_info,\n",
|
||||
" enable_limit=True,\n",
|
||||
" verbose=True\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -18,7 +18,7 @@
|
||||
"## Creating a Pinecone index\n",
|
||||
"First we'll want to create a `Pinecone` VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
|
||||
"\n",
|
||||
"To use Pinecone, you to have `pinecone` package installed and you must have an API key and an Environment. Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n",
|
||||
"To use Pinecone, you have to have `pinecone` package installed and you must have an API key and an Environment. Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n",
|
||||
"\n",
|
||||
"NOTE: The self-query retriever requires you to have `lark` package installed."
|
||||
]
|
||||
|
||||
@@ -53,9 +53,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = AmazonKendraRetriever(\n",
|
||||
" index_id=\"c0806df7-e76b-4bce-9b5c-d5582f6b1a03\"\n",
|
||||
")"
|
||||
"retriever = AmazonKendraRetriever(index_id=\"c0806df7-e76b-4bce-9b5c-d5582f6b1a03\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -91,7 +91,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = AzureCognitiveSearchRetriever(content_key=\"content\")"
|
||||
"retriever = AzureCognitiveSearchRetriever(content_key=\"content\", top_k=10)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -111,6 +111,36 @@
|
||||
"source": [
|
||||
"retriever.get_relevant_documents(\"what is langchain\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "72eca08e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can change the number of results returned with the `top_k` parameter. The default value is `None`, which returns all results. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "097146c5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6d9963f5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dc120696",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -1,21 +1,31 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9fc6205b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Databerry\n",
|
||||
"# Chaindesk\n",
|
||||
"\n",
|
||||
">[Databerry platform](https://docs.databerry.ai/introduction) brings data from anywhere (Datsources: Text, PDF, Word, PowerPpoint, Excel, Notion, Airtable, Google Sheets, etc..) into Datastores (container of multiple Datasources).\n",
|
||||
"Then your Datastores can be connected to ChatGPT via Plugins or any other Large Langue Model (LLM) via the `Databerry API`.\n",
|
||||
">[Chaindesk platform](https://docs.chaindesk.ai/introduction) brings data from anywhere (Datsources: Text, PDF, Word, PowerPpoint, Excel, Notion, Airtable, Google Sheets, etc..) into Datastores (container of multiple Datasources).\n",
|
||||
"Then your Datastores can be connected to ChatGPT via Plugins or any other Large Langue Model (LLM) via the `Chaindesk API`.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use [Databerry's](https://www.databerry.ai/) retriever.\n",
|
||||
"This notebook shows how to use [Chaindesk's](https://www.chaindesk.ai/) retriever.\n",
|
||||
"\n",
|
||||
"First, you will need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url. You need the [API Key](https://docs.databerry.ai/api-reference/authentication)."
|
||||
"First, you will need to sign up for Chaindesk, create a datastore, add some data and get your datastore api endpoint url. You need the [API Key](https://docs.chaindesk.ai/api-reference/authentication)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3697b9fd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "944e172b",
|
||||
"metadata": {},
|
||||
@@ -34,7 +44,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.retrievers import DataberryRetriever"
|
||||
"from langchain.retrievers import ChaindeskRetriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -46,9 +56,9 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = DataberryRetriever(\n",
|
||||
" datastore_url=\"https://clg1xg2h80000l708dymr0fxc.databerry.ai/query\",\n",
|
||||
" # api_key=\"DATABERRY_API_KEY\", # optional if datastore is public\n",
|
||||
"retriever = ChaindeskRetriever(\n",
|
||||
" datastore_url=\"https://clg1xg2h80000l708dymr0fxc.chaindesk.ai/query\",\n",
|
||||
" # api_key=\"CHAINDESK_API_KEY\", # optional if datastore is public\n",
|
||||
" # top_k=10 # optional\n",
|
||||
")"
|
||||
]
|
||||
@@ -111,10 +111,10 @@
|
||||
"db.index(\n",
|
||||
" [\n",
|
||||
" MyDoc(\n",
|
||||
" title=f'My document {i}',\n",
|
||||
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
|
||||
" title=f\"My document {i}\",\n",
|
||||
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
|
||||
" year=i,\n",
|
||||
" color=random.choice(['red', 'green', 'blue']),\n",
|
||||
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
|
||||
" )\n",
|
||||
" for i in range(100)\n",
|
||||
" ]\n",
|
||||
@@ -142,15 +142,15 @@
|
||||
"source": [
|
||||
"# create a retriever\n",
|
||||
"retriever = DocArrayRetriever(\n",
|
||||
" index=db, \n",
|
||||
" embeddings=embeddings, \n",
|
||||
" search_field='title_embedding', \n",
|
||||
" content_field='title',\n",
|
||||
" index=db,\n",
|
||||
" embeddings=embeddings,\n",
|
||||
" search_field=\"title_embedding\",\n",
|
||||
" content_field=\"title\",\n",
|
||||
" filters=filter_query,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# find the relevant document\n",
|
||||
"doc = retriever.get_relevant_documents('some query')\n",
|
||||
"doc = retriever.get_relevant_documents(\"some query\")\n",
|
||||
"print(doc)"
|
||||
]
|
||||
},
|
||||
@@ -179,16 +179,16 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"# initialize the index\n",
|
||||
"db = HnswDocumentIndex[MyDoc](work_dir='hnsw_index')\n",
|
||||
"db = HnswDocumentIndex[MyDoc](work_dir=\"hnsw_index\")\n",
|
||||
"\n",
|
||||
"# index data\n",
|
||||
"db.index(\n",
|
||||
" [\n",
|
||||
" MyDoc(\n",
|
||||
" title=f'My document {i}',\n",
|
||||
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
|
||||
" title=f\"My document {i}\",\n",
|
||||
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
|
||||
" year=i,\n",
|
||||
" color=random.choice(['red', 'green', 'blue']),\n",
|
||||
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
|
||||
" )\n",
|
||||
" for i in range(100)\n",
|
||||
" ]\n",
|
||||
@@ -216,15 +216,15 @@
|
||||
"source": [
|
||||
"# create a retriever\n",
|
||||
"retriever = DocArrayRetriever(\n",
|
||||
" index=db, \n",
|
||||
" embeddings=embeddings, \n",
|
||||
" search_field='title_embedding', \n",
|
||||
" content_field='title',\n",
|
||||
" index=db,\n",
|
||||
" embeddings=embeddings,\n",
|
||||
" search_field=\"title_embedding\",\n",
|
||||
" content_field=\"title\",\n",
|
||||
" filters=filter_query,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# find the relevant document\n",
|
||||
"doc = retriever.get_relevant_documents('some query')\n",
|
||||
"doc = retriever.get_relevant_documents(\"some query\")\n",
|
||||
"print(doc)"
|
||||
]
|
||||
},
|
||||
@@ -249,11 +249,12 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# There's a small difference with the Weaviate backend compared to the others. \n",
|
||||
"# Here, you need to 'mark' the field used for vector search with 'is_embedding=True'. \n",
|
||||
"# There's a small difference with the Weaviate backend compared to the others.\n",
|
||||
"# Here, you need to 'mark' the field used for vector search with 'is_embedding=True'.\n",
|
||||
"# So, let's create a new schema for Weaviate that takes care of this requirement.\n",
|
||||
"\n",
|
||||
"from pydantic import Field \n",
|
||||
"from pydantic import Field\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class WeaviateDoc(BaseDoc):\n",
|
||||
" title: str\n",
|
||||
@@ -275,19 +276,17 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"# initialize the index\n",
|
||||
"dbconfig = WeaviateDocumentIndex.DBConfig(\n",
|
||||
" host=\"http://localhost:8080\"\n",
|
||||
")\n",
|
||||
"dbconfig = WeaviateDocumentIndex.DBConfig(host=\"http://localhost:8080\")\n",
|
||||
"db = WeaviateDocumentIndex[WeaviateDoc](db_config=dbconfig)\n",
|
||||
"\n",
|
||||
"# index data\n",
|
||||
"db.index(\n",
|
||||
" [\n",
|
||||
" MyDoc(\n",
|
||||
" title=f'My document {i}',\n",
|
||||
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
|
||||
" title=f\"My document {i}\",\n",
|
||||
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
|
||||
" year=i,\n",
|
||||
" color=random.choice(['red', 'green', 'blue']),\n",
|
||||
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
|
||||
" )\n",
|
||||
" for i in range(100)\n",
|
||||
" ]\n",
|
||||
@@ -315,15 +314,15 @@
|
||||
"source": [
|
||||
"# create a retriever\n",
|
||||
"retriever = DocArrayRetriever(\n",
|
||||
" index=db, \n",
|
||||
" embeddings=embeddings, \n",
|
||||
" search_field='title_embedding', \n",
|
||||
" content_field='title',\n",
|
||||
" index=db,\n",
|
||||
" embeddings=embeddings,\n",
|
||||
" search_field=\"title_embedding\",\n",
|
||||
" content_field=\"title\",\n",
|
||||
" filters=filter_query,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# find the relevant document\n",
|
||||
"doc = retriever.get_relevant_documents('some query')\n",
|
||||
"doc = retriever.get_relevant_documents(\"some query\")\n",
|
||||
"print(doc)"
|
||||
]
|
||||
},
|
||||
@@ -353,18 +352,17 @@
|
||||
"\n",
|
||||
"# initialize the index\n",
|
||||
"db = ElasticDocIndex[MyDoc](\n",
|
||||
" hosts=\"http://localhost:9200\", \n",
|
||||
" index_name=\"docarray_retriever\"\n",
|
||||
" hosts=\"http://localhost:9200\", index_name=\"docarray_retriever\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# index data\n",
|
||||
"db.index(\n",
|
||||
" [\n",
|
||||
" MyDoc(\n",
|
||||
" title=f'My document {i}',\n",
|
||||
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
|
||||
" title=f\"My document {i}\",\n",
|
||||
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
|
||||
" year=i,\n",
|
||||
" color=random.choice(['red', 'green', 'blue']),\n",
|
||||
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
|
||||
" )\n",
|
||||
" for i in range(100)\n",
|
||||
" ]\n",
|
||||
@@ -392,15 +390,15 @@
|
||||
"source": [
|
||||
"# create a retriever\n",
|
||||
"retriever = DocArrayRetriever(\n",
|
||||
" index=db, \n",
|
||||
" embeddings=embeddings, \n",
|
||||
" search_field='title_embedding', \n",
|
||||
" content_field='title',\n",
|
||||
" index=db,\n",
|
||||
" embeddings=embeddings,\n",
|
||||
" search_field=\"title_embedding\",\n",
|
||||
" content_field=\"title\",\n",
|
||||
" filters=filter_query,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# find the relevant document\n",
|
||||
"doc = retriever.get_relevant_documents('some query')\n",
|
||||
"doc = retriever.get_relevant_documents(\"some query\")\n",
|
||||
"print(doc)"
|
||||
]
|
||||
},
|
||||
@@ -445,10 +443,10 @@
|
||||
"db.index(\n",
|
||||
" [\n",
|
||||
" MyDoc(\n",
|
||||
" title=f'My document {i}',\n",
|
||||
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
|
||||
" title=f\"My document {i}\",\n",
|
||||
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
|
||||
" year=i,\n",
|
||||
" color=random.choice(['red', 'green', 'blue']),\n",
|
||||
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
|
||||
" )\n",
|
||||
" for i in range(100)\n",
|
||||
" ]\n",
|
||||
@@ -486,15 +484,15 @@
|
||||
"source": [
|
||||
"# create a retriever\n",
|
||||
"retriever = DocArrayRetriever(\n",
|
||||
" index=db, \n",
|
||||
" embeddings=embeddings, \n",
|
||||
" search_field='title_embedding', \n",
|
||||
" content_field='title',\n",
|
||||
" index=db,\n",
|
||||
" embeddings=embeddings,\n",
|
||||
" search_field=\"title_embedding\",\n",
|
||||
" content_field=\"title\",\n",
|
||||
" filters=filter_query,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# find the relevant document\n",
|
||||
"doc = retriever.get_relevant_documents('some query')\n",
|
||||
"doc = retriever.get_relevant_documents(\"some query\")\n",
|
||||
"print(doc)"
|
||||
]
|
||||
},
|
||||
@@ -552,7 +550,7 @@
|
||||
" \"director\": \"Francis Ford Coppola\",\n",
|
||||
" \"rating\": 9.2,\n",
|
||||
" },\n",
|
||||
"]\n"
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -573,9 +571,9 @@
|
||||
],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os \n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -591,6 +589,7 @@
|
||||
"from docarray.typing import NdArray\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# define schema for your movie documents\n",
|
||||
"class MyDoc(BaseDoc):\n",
|
||||
" title: str\n",
|
||||
@@ -598,7 +597,7 @@
|
||||
" description_embedding: NdArray[1536]\n",
|
||||
" rating: float\n",
|
||||
" director: str\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
@@ -626,7 +625,7 @@
|
||||
"from docarray.index import HnswDocumentIndex\n",
|
||||
"\n",
|
||||
"# initialize the index\n",
|
||||
"db = HnswDocumentIndex[MyDoc](work_dir='movie_search')\n",
|
||||
"db = HnswDocumentIndex[MyDoc](work_dir=\"movie_search\")\n",
|
||||
"\n",
|
||||
"# add data\n",
|
||||
"db.index(docs)"
|
||||
@@ -663,14 +662,14 @@
|
||||
"\n",
|
||||
"# create a retriever\n",
|
||||
"retriever = DocArrayRetriever(\n",
|
||||
" index=db, \n",
|
||||
" embeddings=embeddings, \n",
|
||||
" search_field='description_embedding', \n",
|
||||
" content_field='description'\n",
|
||||
" index=db,\n",
|
||||
" embeddings=embeddings,\n",
|
||||
" search_field=\"description_embedding\",\n",
|
||||
" content_field=\"description\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# find the relevant document\n",
|
||||
"doc = retriever.get_relevant_documents('movie about dreams')\n",
|
||||
"doc = retriever.get_relevant_documents(\"movie about dreams\")\n",
|
||||
"print(doc)"
|
||||
]
|
||||
},
|
||||
@@ -703,16 +702,16 @@
|
||||
"\n",
|
||||
"# create a retriever\n",
|
||||
"retriever = DocArrayRetriever(\n",
|
||||
" index=db, \n",
|
||||
" embeddings=embeddings, \n",
|
||||
" search_field='description_embedding', \n",
|
||||
" content_field='description',\n",
|
||||
" filters={'director': {'$eq': 'Christopher Nolan'}},\n",
|
||||
" index=db,\n",
|
||||
" embeddings=embeddings,\n",
|
||||
" search_field=\"description_embedding\",\n",
|
||||
" content_field=\"description\",\n",
|
||||
" filters={\"director\": {\"$eq\": \"Christopher Nolan\"}},\n",
|
||||
" top_k=2,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# find relevant documents\n",
|
||||
"docs = retriever.get_relevant_documents('space travel')\n",
|
||||
"docs = retriever.get_relevant_documents(\"space travel\")\n",
|
||||
"print(docs)"
|
||||
]
|
||||
},
|
||||
@@ -745,17 +744,17 @@
|
||||
"\n",
|
||||
"# create a retriever\n",
|
||||
"retriever = DocArrayRetriever(\n",
|
||||
" index=db, \n",
|
||||
" embeddings=embeddings, \n",
|
||||
" search_field='description_embedding', \n",
|
||||
" content_field='description',\n",
|
||||
" filters={'rating': {'$gte': 8.7}},\n",
|
||||
" search_type='mmr',\n",
|
||||
" index=db,\n",
|
||||
" embeddings=embeddings,\n",
|
||||
" search_field=\"description_embedding\",\n",
|
||||
" content_field=\"description\",\n",
|
||||
" filters={\"rating\": {\"$gte\": 8.7}},\n",
|
||||
" search_type=\"mmr\",\n",
|
||||
" top_k=3,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# find relevant documents\n",
|
||||
"docs = retriever.get_relevant_documents('action movies')\n",
|
||||
"docs = retriever.get_relevant_documents(\"action movies\")\n",
|
||||
"print(docs)"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "fc0db1bc",
|
||||
"metadata": {},
|
||||
@@ -25,7 +26,10 @@
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.embeddings import HuggingFaceEmbeddings\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.document_transformers import EmbeddingsRedundantFilter\n",
|
||||
"from langchain.document_transformers import (\n",
|
||||
" EmbeddingsRedundantFilter,\n",
|
||||
" EmbeddingsClusteringFilter,\n",
|
||||
")\n",
|
||||
"from langchain.retrievers.document_compressors import DocumentCompressorPipeline\n",
|
||||
"from langchain.retrievers import ContextualCompressionRetriever\n",
|
||||
"\n",
|
||||
@@ -70,6 +74,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c152339d",
|
||||
"metadata": {},
|
||||
@@ -92,6 +97,46 @@
|
||||
" base_compressor=pipeline, base_retriever=lotr\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c10022fa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pick a representative sample of documents from the merged retrievers."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b3885482",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This filter will divide the documents vectors into clusters or \"centers\" of meaning.\n",
|
||||
"# Then it will pick the closest document to that center for the final results.\n",
|
||||
"# By default the result document will be ordered/grouped by clusters.\n",
|
||||
"filter_ordered_cluster = EmbeddingsClusteringFilter(\n",
|
||||
" embeddings=filter_embeddings,\n",
|
||||
" num_clusters=10,\n",
|
||||
" num_closest=1,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# If you want the final document to be ordered by the original retriever scores\n",
|
||||
"# you need to add the \"sorted\" parameter.\n",
|
||||
"filter_ordered_by_retriever = EmbeddingsClusteringFilter(\n",
|
||||
" embeddings=filter_embeddings,\n",
|
||||
" num_clusters=10,\n",
|
||||
" num_closest=1,\n",
|
||||
" sorted=True,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"pipeline = DocumentCompressorPipeline(transformers=[filter_ordered_by_retriever])\n",
|
||||
"compression_retriever = ContextualCompressionRetriever(\n",
|
||||
" base_compressor=pipeline, base_retriever=lotr\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -123,7 +123,7 @@
|
||||
"\n",
|
||||
"index_name = \"langchain-pinecone-hybrid-search\"\n",
|
||||
"\n",
|
||||
"pinecone.init(api_key=api_key, enviroment=env)\n",
|
||||
"pinecone.init(api_key=api_key, environment=env)\n",
|
||||
"pinecone.whoami()"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -111,9 +111,7 @@
|
||||
"\n",
|
||||
"# Set up Zep Chat History. We'll use this to add chat histories to the memory store\n",
|
||||
"zep_chat_history = ZepChatMessageHistory(\n",
|
||||
" session_id=session_id,\n",
|
||||
" url=ZEP_API_URL,\n",
|
||||
" api_key=zep_api_key\n",
|
||||
" session_id=session_id, url=ZEP_API_URL, api_key=zep_api_key\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -247,7 +245,7 @@
|
||||
" session_id=session_id, # Ensure that you provide the session_id when instantiating the Retriever\n",
|
||||
" url=ZEP_API_URL,\n",
|
||||
" top_k=5,\n",
|
||||
" api_key=zep_api_key\n",
|
||||
" api_key=zep_api_key,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"await zep_retriever.aget_relevant_documents(\"Who wrote Parable of the Sower?\")"
|
||||
|
||||
@@ -65,7 +65,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Please login and get your API key from https://clarifai.com/settings/security \n",
|
||||
"# Please login and get your API key from https://clarifai.com/settings/security\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"CLARIFAI_PAT = getpass()"
|
||||
@@ -130,9 +130,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"USER_ID = 'openai'\n",
|
||||
"APP_ID = 'embed'\n",
|
||||
"MODEL_ID = 'text-embedding-ada'\n",
|
||||
"USER_ID = \"openai\"\n",
|
||||
"APP_ID = \"embed\"\n",
|
||||
"MODEL_ID = \"text-embedding-ada\"\n",
|
||||
"\n",
|
||||
"# You can provide a specific model version as the model_version_id arg.\n",
|
||||
"# MODEL_VERSION_ID = \"MODEL_VERSION_ID\""
|
||||
@@ -148,7 +148,9 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize a Clarifai embedding model\n",
|
||||
"embeddings = ClarifaiEmbeddings(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)"
|
||||
"embeddings = ClarifaiEmbeddings(\n",
|
||||
" pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||