Compare commits
343 Commits
bagatur/ne
...
bagatur/me
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5d62621b99 | ||
|
|
cb642ef658 | ||
|
|
5e2d0cf54e | ||
|
|
f97d3a76e7 | ||
|
|
5edf819524 | ||
|
|
610f46d83a | ||
|
|
c1badc1fa2 | ||
|
|
0d01cede03 | ||
|
|
63921e327d | ||
|
|
aab01b55db | ||
|
|
0da5803f5a | ||
|
|
a28eea5767 | ||
|
|
fa0b8f3368 | ||
|
|
12a373810c | ||
|
|
d57d08fd01 | ||
|
|
4339d21cf1 | ||
|
|
1960ac8d25 | ||
|
|
2ab04a4e32 | ||
|
|
985873c497 | ||
|
|
709a67d9bf | ||
|
|
9731ce5a40 | ||
|
|
cacaf487c3 | ||
|
|
135cb86215 | ||
|
|
d04fe0d3ea | ||
|
|
30151c99c7 | ||
|
|
ade482c17e | ||
|
|
87da56fb1e | ||
|
|
adb21782b8 | ||
|
|
3e5cda3405 | ||
|
|
dc30edf51c | ||
|
|
dff00ea91e | ||
|
|
0f48e6c36e | ||
|
|
a0800c9f15 | ||
|
|
2bcf581a23 | ||
|
|
22b6549a34 | ||
|
|
dacf96895a | ||
|
|
c37be7f5fb | ||
|
|
cf792891f1 | ||
|
|
f5ea725796 | ||
|
|
6bedfdf25a | ||
|
|
7cf5c582d2 | ||
|
|
9666e752b1 | ||
|
|
78ffcdd9a9 | ||
|
|
20d2c0571c | ||
|
|
9963b32e59 | ||
|
|
b048236c1a | ||
|
|
c19888c12c | ||
|
|
d0ff0db698 | ||
|
|
5990651070 | ||
|
|
25f2c82ae8 | ||
|
|
6283f3b63c | ||
|
|
9e1dbd4b49 | ||
|
|
b88dfcb42a | ||
|
|
c215481531 | ||
|
|
a9c86774da | ||
|
|
a8c916955f | ||
|
|
342087bdfa | ||
|
|
cbaea8d63b | ||
|
|
cd81e8a8f2 | ||
|
|
278ef0bdcf | ||
|
|
fa05e18278 | ||
|
|
20ce283fa7 | ||
|
|
6424b3cde0 | ||
|
|
da18e177f1 | ||
|
|
c326751085 | ||
|
|
6d19709b65 | ||
|
|
677da6a0fd | ||
|
|
64a958c85d | ||
|
|
1751fe114d | ||
|
|
882b97cfd2 | ||
|
|
3ddabe8b2c | ||
|
|
fdcd50aab4 | ||
|
|
9777c2801d | ||
|
|
93bbf67afc | ||
|
|
c184be5511 | ||
|
|
dacd5dcba8 | ||
|
|
80dd162e0d | ||
|
|
db4b256a28 | ||
|
|
3458489936 | ||
|
|
e420bf22b6 | ||
|
|
cc83f54694 | ||
|
|
d414d47c78 | ||
|
|
a40c12bb88 | ||
|
|
d8e2dd4c89 | ||
|
|
e2e582f1f6 | ||
|
|
5508baf1eb | ||
|
|
0154958243 | ||
|
|
a8e8a31b41 | ||
|
|
ef87affd4d | ||
|
|
1c64db575c | ||
|
|
ef2500584c | ||
|
|
8a03836160 | ||
|
|
f0ae10a20e | ||
|
|
39a5d02225 | ||
|
|
5b9bdcac1b | ||
|
|
eb92da84a1 | ||
|
|
2a06e7b216 | ||
|
|
f1072cc31f | ||
|
|
b379c5f9c8 | ||
|
|
e1f4f9ac3e | ||
|
|
b2d9970fc1 | ||
|
|
900c1f3e8d | ||
|
|
02545a54b3 | ||
|
|
632a83c48e | ||
|
|
fc64e6349e | ||
|
|
ca8232a3c1 | ||
|
|
f29312eb84 | ||
|
|
c06f34fa35 | ||
|
|
83986ea98a | ||
|
|
81163e3c0c | ||
|
|
f3ba9ce7f4 | ||
|
|
f1e602996a | ||
|
|
35812d0096 | ||
|
|
d564ec944c | ||
|
|
65e893b9cd | ||
|
|
64a54d8ad8 | ||
|
|
3c7cc4d440 | ||
|
|
3408810748 | ||
|
|
acb54d8b9d | ||
|
|
a1e89aa8d5 | ||
|
|
c75e1aa5ed | ||
|
|
2b663089b5 | ||
|
|
b868ef23bc | ||
|
|
e99ef12cb1 | ||
|
|
1720e99397 | ||
|
|
dfb9ff1079 | ||
|
|
1ea2f9adf4 | ||
|
|
d4c49b16e4 | ||
|
|
fba29f203a | ||
|
|
3c4f32c8b8 | ||
|
|
4d0b7bb8e1 | ||
|
|
033b874701 | ||
|
|
4e7e6bfe0a | ||
|
|
a9bf409a09 | ||
|
|
fa478638a9 | ||
|
|
182b059bf4 | ||
|
|
0fa4516ce4 | ||
|
|
04f2d69b83 | ||
|
|
0fea987dd2 | ||
|
|
00eff8c4a7 | ||
|
|
6c308aabae | ||
|
|
8bc1a3dca8 | ||
|
|
652c542b2f | ||
|
|
7c0b1b8171 | ||
|
|
d09cdb4880 | ||
|
|
3d1095218c | ||
|
|
949b2cf177 | ||
|
|
66a47d9a61 | ||
|
|
2a3758a98e | ||
|
|
dda5b1e370 | ||
|
|
de1f63505b | ||
|
|
4999e8af7e | ||
|
|
0565d81dc5 | ||
|
|
9f08d29bc8 | ||
|
|
249752e8ee | ||
|
|
973866c894 | ||
|
|
b2e6d01e8f | ||
|
|
875ea4b4c6 | ||
|
|
c7a5bb6031 | ||
|
|
28e1ee4891 | ||
|
|
a7eba8b006 | ||
|
|
d11841d760 | ||
|
|
05aa02005b | ||
|
|
f116e10d53 | ||
|
|
bb4f7936f9 | ||
|
|
a03003f5fd | ||
|
|
85a1c6d0b7 | ||
|
|
9930ddc555 | ||
|
|
02c5c13a6e | ||
|
|
fdbeb52756 | ||
|
|
0c8a88b3fa | ||
|
|
08feed3332 | ||
|
|
a758496236 | ||
|
|
103094286e | ||
|
|
fd8fe209cb | ||
|
|
e51bccdb28 | ||
|
|
09a92bb9bf | ||
|
|
a214fe8a2d | ||
|
|
a956b69720 | ||
|
|
d87cfd33e8 | ||
|
|
069c0a041f | ||
|
|
5cd244e9b7 | ||
|
|
be9bc62f8b | ||
|
|
0808949e54 | ||
|
|
129d056085 | ||
|
|
5b3dbf12a5 | ||
|
|
9f545825b7 | ||
|
|
616e728ef9 | ||
|
|
83d2a871eb | ||
|
|
292ae8468e | ||
|
|
b58d492e05 | ||
|
|
df8e35fd81 | ||
|
|
083726ecda | ||
|
|
82f28ca9ef | ||
|
|
82fb56b79c | ||
|
|
99e5eaa9b1 | ||
|
|
d4f790fd40 | ||
|
|
c29fbede59 | ||
|
|
eee0d1d0dd | ||
|
|
ade683c589 | ||
|
|
50b8f4dcc7 | ||
|
|
2b06792c81 | ||
|
|
61e4a06447 | ||
|
|
ead04487fd | ||
|
|
354c42afd2 | ||
|
|
8976483f3a | ||
|
|
4452314aab | ||
|
|
edcb03943e | ||
|
|
89a8121eaa | ||
|
|
d5eb228874 | ||
|
|
463019ac3e | ||
|
|
a3dd4dcadf | ||
|
|
9417961b17 | ||
|
|
d3f10d2f4f | ||
|
|
6ae58da668 | ||
|
|
ddcb4ff5fb | ||
|
|
1baedc4e18 | ||
|
|
46f3850794 | ||
|
|
24a197f96a | ||
|
|
8ddaaf3d41 | ||
|
|
a5e7dcec61 | ||
|
|
c1b1666ec8 | ||
|
|
7fe474d198 | ||
|
|
0689628489 | ||
|
|
ab21af71be | ||
|
|
6f69b19ff5 | ||
|
|
89bec58cbb | ||
|
|
9e906c39ba | ||
|
|
6b0a849f59 | ||
|
|
c447e9a854 | ||
|
|
0dd2c21089 | ||
|
|
589927e9e1 | ||
|
|
bd80cad6db | ||
|
|
8c1a528c71 | ||
|
|
25cbcd9374 | ||
|
|
5d60ced7b3 | ||
|
|
ce78877a87 | ||
|
|
0c4683ebcc | ||
|
|
b11c233304 | ||
|
|
8c986221e4 | ||
|
|
8f2d321dd0 | ||
|
|
019aa04b06 | ||
|
|
77b359edf5 | ||
|
|
7e63270e04 | ||
|
|
a69d1b84f4 | ||
|
|
f2560188ec | ||
|
|
c0d67420e5 | ||
|
|
c194828be0 | ||
|
|
c71afb46d1 | ||
|
|
995ef8a7fc | ||
|
|
de8dfde7f7 | ||
|
|
e842131425 | ||
|
|
6dedd94ba4 | ||
|
|
c5e23293f8 | ||
|
|
90d7c55343 | ||
|
|
3c8e9a9641 | ||
|
|
2673b3a314 | ||
|
|
4c2de2a7f2 | ||
|
|
1c089cadd7 | ||
|
|
2e8733cf54 | ||
|
|
b04e472acf | ||
|
|
090411842e | ||
|
|
84a97d55e1 | ||
|
|
09aa1eac03 | ||
|
|
0f9f213833 | ||
|
|
15f1af8ed6 | ||
|
|
8bebc9206f | ||
|
|
a3c79b1909 | ||
|
|
23928a3311 | ||
|
|
ba5fbaba70 | ||
|
|
3e6cea46e2 | ||
|
|
63601551b1 | ||
|
|
1d55141c50 | ||
|
|
2519580994 | ||
|
|
74a64cfbab | ||
|
|
b9ca5cc5ea | ||
|
|
afba2be3dc | ||
|
|
9abf60acb6 | ||
|
|
b30f449dae | ||
|
|
bfbb97b74c | ||
|
|
1b3942ba74 | ||
|
|
852722ea45 | ||
|
|
358562769a | ||
|
|
3eccd72382 | ||
|
|
76d09b4ed0 | ||
|
|
1aae77f26f | ||
|
|
cf17c58b47 | ||
|
|
a091b4bf4c | ||
|
|
e0162baa3b | ||
|
|
5e9687a196 | ||
|
|
0a04e63811 | ||
|
|
0470198fb5 | ||
|
|
e986afa13a | ||
|
|
4b505060bd | ||
|
|
664ff28cba | ||
|
|
08a8363fc6 | ||
|
|
5e43768f61 | ||
|
|
a8aa1aba1c | ||
|
|
68d8f73698 | ||
|
|
ef0664728e | ||
|
|
eac4ddb4bb | ||
|
|
71d5b7c9bf | ||
|
|
41279a3ae1 | ||
|
|
22858d99b5 | ||
|
|
249d7d06a2 | ||
|
|
9529483c2a | ||
|
|
969e1683de | ||
|
|
c478fc208e | ||
|
|
d0a0d560ad | ||
|
|
93dd499997 | ||
|
|
a69cb95850 | ||
|
|
7810ea5812 | ||
|
|
b0896210c7 | ||
|
|
7124f2ebfa | ||
|
|
17ae2998e7 | ||
|
|
3f601b5809 | ||
|
|
03ea0762a1 | ||
|
|
4f1feaca83 | ||
|
|
89be10f6b4 | ||
|
|
04bc5f3b18 | ||
|
|
feec422bf7 | ||
|
|
5935767056 | ||
|
|
b5a57acf6c | ||
|
|
49f1d8477c | ||
|
|
f11e5442d6 | ||
|
|
72f9150a50 | ||
|
|
c172f972ea | ||
|
|
8189dea0d8 | ||
|
|
d95eeaedbe | ||
|
|
621da3c164 | ||
|
|
0fa69d8988 | ||
|
|
9b24f0b067 | ||
|
|
8d69dacdf3 | ||
|
|
cdfe2c96c5 | ||
|
|
15a5002746 | ||
|
|
f8ed93e7bd | ||
|
|
05cdd22c39 | ||
|
|
eb0134fbb3 | ||
|
|
50b13ab938 | ||
|
|
5919c0f4a2 | ||
|
|
bcdf3be530 | ||
|
|
4806504ebc | ||
|
|
96843f3bd4 |
27
.github/CONTRIBUTING.md
vendored
@@ -33,7 +33,7 @@ best way to get our attention.
|
||||
### 🚩GitHub Issues
|
||||
|
||||
Our [issues](https://github.com/hwchase17/langchain/issues) page is kept up to date
|
||||
with bugs, improvements, and feature requests.
|
||||
with bugs, improvements, and feature requests.
|
||||
|
||||
There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help
|
||||
organize issues.
|
||||
@@ -44,7 +44,7 @@ If you are adding an issue, please try to keep it focused on a single, modular b
|
||||
If two issues are related, or blocking, please link them rather than combining them.
|
||||
|
||||
We will try to keep these issues as up to date as possible, though
|
||||
with the rapid rate of develop in this field some may get out of date.
|
||||
with the rapid rate of development in this field some may get out of date.
|
||||
If you notice this happening, please let us know.
|
||||
|
||||
### 🙋Getting Help
|
||||
@@ -61,11 +61,11 @@ we do not want these to get in the way of getting good code into the codebase.
|
||||
|
||||
> **Note:** You can run this repository locally (which is described below) or in a [development container](https://containers.dev/) (which is described in the [.devcontainer folder](https://github.com/hwchase17/langchain/tree/master/.devcontainer)).
|
||||
|
||||
This project uses [Poetry](https://python-poetry.org/) as a dependency manager. Check out Poetry's [documentation on how to install it](https://python-poetry.org/docs/#installation) on your system before proceeding.
|
||||
This project uses [Poetry](https://python-poetry.org/) v1.5.1 as a dependency manager. Check out Poetry's [documentation on how to install it](https://python-poetry.org/docs/#installation) on your system before proceeding.
|
||||
|
||||
❗Note: If you use `Conda` or `Pyenv` as your environment / package manager, avoid dependency conflicts by doing the following first:
|
||||
1. *Before installing Poetry*, create and activate a new Conda env (e.g. `conda create -n langchain python=3.9`)
|
||||
2. Install Poetry (see above)
|
||||
2. Install Poetry v1.5.1 (see above)
|
||||
3. Tell Poetry to use the virtualenv python environment (`poetry config virtualenvs.prefer-active-python true`)
|
||||
4. Continue with the following steps.
|
||||
|
||||
@@ -73,21 +73,21 @@ There are two separate projects in this repository:
|
||||
- `langchain`: core langchain code, abstractions, and use cases
|
||||
- `langchain.experimental`: more experimental code
|
||||
|
||||
Each of these has their OWN development environment.
|
||||
Each of these has their OWN development environment.
|
||||
In order to run any of the commands below, please move into their respective directories.
|
||||
For example, to contribute to `langchain` run `cd libs/langchain` before getting started with the below.
|
||||
|
||||
To install requirements:
|
||||
|
||||
```bash
|
||||
poetry install -E all
|
||||
poetry install --with test
|
||||
```
|
||||
|
||||
This will install all requirements for running the package, examples, linting, formatting, tests, and coverage. Note the `-E all` flag will install all optional dependencies necessary for integration testing.
|
||||
This will install all requirements for running the package, examples, linting, formatting, tests, and coverage.
|
||||
|
||||
❗Note: If you're running Poetry 1.4.1 and receive a `WheelFileValidationError` for `debugpy` during installation, you can try either downgrading to Poetry 1.4.0 or disabling "modern installation" (`poetry config installer.modern-installation false`) and re-install requirements. See [this `debugpy` issue](https://github.com/microsoft/debugpy/issues/1246) for more details.
|
||||
❗Note: If during installation you receive a `WheelFileValidationError` for `debugpy`, please make sure you are running Poetry v1.5.1. This bug was present in older versions of Poetry (e.g. 1.4.1) and has been resolved in newer releases. If you are still seeing this bug on v1.5.1, you may also try disabling "modern installation" (`poetry config installer.modern-installation false`) and re-installing requirements. See [this `debugpy` issue](https://github.com/microsoft/debugpy/issues/1246) for more details.
|
||||
|
||||
Now, you should be able to run the common tasks in the following section. To double check, run `make test`, all tests should pass. If they don't you may need to pip install additional dependencies, such as `numexpr` and `openapi_schema_pydantic`.
|
||||
Now assuming `make` and `pytest` are installed, you should be able to run the common tasks in the following section. To double check, run `make test` under `libs/langchain`, all tests should pass. If they don't, you may need to pip install additional dependencies, such as `numexpr` and `openapi_schema_pydantic`.
|
||||
|
||||
## ✅ Common Tasks
|
||||
|
||||
@@ -134,7 +134,7 @@ We recognize linting can be annoying - if you do not want to do it, please conta
|
||||
### Spellcheck
|
||||
|
||||
Spellchecking for this project is done via [codespell](https://github.com/codespell-project/codespell).
|
||||
Note that `codespell` finds common typos, so could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
|
||||
Note that `codespell` finds common typos, so it could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
|
||||
|
||||
To check spelling for this project:
|
||||
|
||||
@@ -175,9 +175,9 @@ If you're adding a new dependency to Langchain, assume that it will be an option
|
||||
that most users won't have it installed.
|
||||
|
||||
Users that do not have the dependency installed should be able to **import** your code without
|
||||
any side effects (no warnings, no errors, no exceptions).
|
||||
any side effects (no warnings, no errors, no exceptions).
|
||||
|
||||
To introduce the dependency to the pyproject.toml file correctly, please do the following:
|
||||
To introduce the dependency to the pyproject.toml file correctly, please do the following:
|
||||
|
||||
1. Add the dependency to the main group as an optional dependency
|
||||
```bash
|
||||
@@ -220,7 +220,7 @@ If you add new logic, please add a unit test.
|
||||
|
||||
Integration tests cover logic that requires making calls to outside APIs (often integration with other services).
|
||||
|
||||
**warning** Almost no tests should be integration tests.
|
||||
**warning** Almost no tests should be integration tests.
|
||||
|
||||
Tests that require making network connections make it difficult for other
|
||||
developers to test the code.
|
||||
@@ -307,4 +307,3 @@ even patch releases may contain [non-backwards-compatible changes](https://semve
|
||||
|
||||
If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
|
||||
If you have a Twitter account you would like us to mention, please let us know in the PR or in another manner.
|
||||
|
||||
|
||||
2
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
@@ -1,5 +1,5 @@
|
||||
name: "\U0001F41B Bug Report"
|
||||
description: Submit a bug report to help us improve LangChain
|
||||
description: Submit a bug report to help us improve LangChain. To report a security issue, please instead use the security option below.
|
||||
labels: ["02 Bug Report"]
|
||||
body:
|
||||
- type: markdown
|
||||
|
||||
66
.github/actions/poetry_setup/action.yml
vendored
@@ -15,19 +15,13 @@ inputs:
|
||||
description: Poetry version
|
||||
required: true
|
||||
|
||||
install-command:
|
||||
description: Command run for installing dependencies
|
||||
required: false
|
||||
default: poetry install
|
||||
|
||||
cache-key:
|
||||
description: Cache key to use for manual handling of caching
|
||||
required: true
|
||||
|
||||
working-directory:
|
||||
description: Directory to run install-command in
|
||||
required: false
|
||||
default: ""
|
||||
description: Directory whose poetry.lock file should be cached
|
||||
required: true
|
||||
|
||||
runs:
|
||||
using: composite
|
||||
@@ -38,41 +32,35 @@ runs:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
|
||||
- uses: actions/cache@v3
|
||||
id: cache-pip
|
||||
name: Cache Pip ${{ inputs.python-version }}
|
||||
id: cache-bin-poetry
|
||||
name: Cache Poetry binary - Python ${{ inputs.python-version }}
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "15"
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "1"
|
||||
with:
|
||||
path: |
|
||||
/opt/pipx/venvs/poetry
|
||||
/opt/pipx_bin/poetry
|
||||
# This step caches the poetry installation, so make sure it's keyed on the poetry version as well.
|
||||
key: bin-poetry-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-${{ inputs.poetry-version }}
|
||||
|
||||
- name: Install poetry
|
||||
if: steps.cache-bin-poetry.outputs.cache-hit != 'true'
|
||||
shell: bash
|
||||
env:
|
||||
POETRY_VERSION: ${{ inputs.poetry-version }}
|
||||
PYTHON_VERSION: ${{ inputs.python-version }}
|
||||
run: pipx install "poetry==$POETRY_VERSION" --python "python$PYTHON_VERSION" --verbose
|
||||
|
||||
- name: Restore pip and poetry cached dependencies
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "4"
|
||||
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
|
||||
with:
|
||||
path: |
|
||||
~/.cache/pip
|
||||
key: pip-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}
|
||||
|
||||
- run: pipx install poetry==${{ inputs.poetry-version }} --python python${{ inputs.python-version }}
|
||||
shell: bash
|
||||
|
||||
- name: Check Poetry File
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry check
|
||||
|
||||
- name: Check lock file
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry lock --check
|
||||
|
||||
- uses: actions/cache@v3
|
||||
id: cache-poetry
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "15"
|
||||
with:
|
||||
path: |
|
||||
~/.cache/pypoetry/virtualenvs
|
||||
~/.cache/pypoetry/cache
|
||||
~/.cache/pypoetry/artifacts
|
||||
key: poetry-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-poetry-${{ inputs.poetry-version }}-${{ inputs.cache-key }}-${{ hashFiles('poetry.lock') }}
|
||||
|
||||
- run: ${{ inputs.install-command }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
shell: bash
|
||||
${{ env.WORKDIR }}/.venv
|
||||
key: py-deps-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-poetry-${{ inputs.poetry-version }}-${{ inputs.cache-key }}-${{ hashFiles(format('{0}/**/poetry.lock', env.WORKDIR)) }}
|
||||
|
||||
606
.github/tools/git-restore-mtime
vendored
Executable file
@@ -0,0 +1,606 @@
|
||||
#!/usr/bin/env python3
|
||||
#
|
||||
# git-restore-mtime - Change mtime of files based on commit date of last change
|
||||
#
|
||||
# Copyright (C) 2012 Rodrigo Silva (MestreLion) <linux@rodrigosilva.com>
|
||||
#
|
||||
# This program is free software: you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation, either version 3 of the License, or
|
||||
# (at your option) any later version.
|
||||
#
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
#
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program. See <http://www.gnu.org/licenses/gpl.html>
|
||||
#
|
||||
# Source: https://github.com/MestreLion/git-tools
|
||||
# Version: July 13, 2023 (commit hash 5f832e72453e035fccae9d63a5056918d64476a2)
|
||||
"""
|
||||
Change the modification time (mtime) of files in work tree, based on the
|
||||
date of the most recent commit that modified the file, including renames.
|
||||
|
||||
Ignores untracked files and uncommitted deletions, additions and renames, and
|
||||
by default modifications too.
|
||||
---
|
||||
Useful prior to generating release tarballs, so each file is archived with a
|
||||
date that is similar to the date when the file was actually last modified,
|
||||
assuming the actual modification date and its commit date are close.
|
||||
"""
|
||||
|
||||
# TODO:
|
||||
# - Add -z on git whatchanged/ls-files, so we don't deal with filename decoding
|
||||
# - When Python is bumped to 3.7, use text instead of universal_newlines on subprocess
|
||||
# - Update "Statistics for some large projects" with modern hardware and repositories.
|
||||
# - Create a README.md for git-restore-mtime alone. It deserves extensive documentation
|
||||
# - Move Statistics there
|
||||
# - See git-extras as a good example on project structure and documentation
|
||||
|
||||
# FIXME:
|
||||
# - When current dir is outside the worktree, e.g. using --work-tree, `git ls-files`
|
||||
# assume any relative pathspecs are to worktree root, not the current dir. As such,
|
||||
# relative pathspecs may not work.
|
||||
# - Renames are tricky:
|
||||
# - R100 should not change mtime, but original name is not on filelist. Should
|
||||
# track renames until a valid (A, M) mtime found and then set on current name.
|
||||
# - Should set mtime for both current and original directories.
|
||||
# - Check mode changes with unchanged blobs?
|
||||
# - Check file (A, D) for the directory mtime is not sufficient:
|
||||
# - Renames also change dir mtime, unless rename was on a parent dir
|
||||
# - If most recent change of all files in a dir was a Modification (M),
|
||||
# dir might not be touched at all.
|
||||
# - Dirs containing only subdirectories but no direct files will also
|
||||
# not be touched. They're files' [grand]parent dir, but never their dirname().
|
||||
# - Some solutions:
|
||||
# - After files done, perform some dir processing for missing dirs, finding latest
|
||||
# file (A, D, R)
|
||||
# - Simple approach: dir mtime is the most recent child (dir or file) mtime
|
||||
# - Use a virtual concept of "created at most at" to fill missing info, bubble up
|
||||
# to parents and grandparents
|
||||
# - When handling [grand]parent dirs, stay inside <pathspec>
|
||||
# - Better handling of merge commits. `-m` is plain *wrong*. `-c/--cc` is perfect, but
|
||||
# painfully slow. First pass without merge commits is not accurate. Maybe add a new
|
||||
# `--accurate` mode for `--cc`?
|
||||
|
||||
if __name__ != "__main__":
|
||||
raise ImportError("{} should not be used as a module.".format(__name__))
|
||||
|
||||
import argparse
|
||||
import datetime
|
||||
import logging
|
||||
import os.path
|
||||
import shlex
|
||||
import signal
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
|
||||
__version__ = "2022.12+dev"
|
||||
|
||||
# Update symlinks only if the platform supports not following them
|
||||
UPDATE_SYMLINKS = bool(os.utime in getattr(os, 'supports_follow_symlinks', []))
|
||||
|
||||
# Call os.path.normpath() only if not in a POSIX platform (Windows)
|
||||
NORMALIZE_PATHS = (os.path.sep != '/')
|
||||
|
||||
# How many files to process in each batch when re-trying merge commits
|
||||
STEPMISSING = 100
|
||||
|
||||
# (Extra) keywords for the os.utime() call performed by touch()
|
||||
UTIME_KWS = {} if not UPDATE_SYMLINKS else {'follow_symlinks': False}
|
||||
|
||||
|
||||
# Command-line interface ######################################################
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(
|
||||
description=__doc__.split('\n---')[0])
|
||||
|
||||
group = parser.add_mutually_exclusive_group()
|
||||
group.add_argument('--quiet', '-q', dest='loglevel',
|
||||
action="store_const", const=logging.WARNING, default=logging.INFO,
|
||||
help="Suppress informative messages and summary statistics.")
|
||||
group.add_argument('--verbose', '-v', action="count", help="""
|
||||
Print additional information for each processed file.
|
||||
Specify twice to further increase verbosity.
|
||||
""")
|
||||
|
||||
parser.add_argument('--cwd', '-C', metavar="DIRECTORY", help="""
|
||||
Run as if %(prog)s was started in directory %(metavar)s.
|
||||
This affects how --work-tree, --git-dir and PATHSPEC arguments are handled.
|
||||
See 'man 1 git' or 'git --help' for more information.
|
||||
""")
|
||||
|
||||
parser.add_argument('--git-dir', dest='gitdir', metavar="GITDIR", help="""
|
||||
Path to the git repository, by default auto-discovered by searching
|
||||
the current directory and its parents for a .git/ subdirectory.
|
||||
""")
|
||||
|
||||
parser.add_argument('--work-tree', dest='workdir', metavar="WORKTREE", help="""
|
||||
Path to the work tree root, by default the parent of GITDIR if it's
|
||||
automatically discovered, or the current directory if GITDIR is set.
|
||||
""")
|
||||
|
||||
parser.add_argument('--force', '-f', default=False, action="store_true", help="""
|
||||
Force updating files with uncommitted modifications.
|
||||
Untracked files and uncommitted deletions, renames and additions are
|
||||
always ignored.
|
||||
""")
|
||||
|
||||
parser.add_argument('--merge', '-m', default=False, action="store_true", help="""
|
||||
Include merge commits.
|
||||
Leads to more recent times and more files per commit, thus with the same
|
||||
time, which may or may not be what you want.
|
||||
Including merge commits may lead to fewer commits being evaluated as files
|
||||
are found sooner, which can improve performance, sometimes substantially.
|
||||
But as merge commits are usually huge, processing them may also take longer.
|
||||
By default, merge commits are only used for files missing from regular commits.
|
||||
""")
|
||||
|
||||
parser.add_argument('--first-parent', default=False, action="store_true", help="""
|
||||
Consider only the first parent, the "main branch", when evaluating merge commits.
|
||||
Only effective when merge commits are processed, either when --merge is
|
||||
used or when finding missing files after the first regular log search.
|
||||
See --skip-missing.
|
||||
""")
|
||||
|
||||
parser.add_argument('--skip-missing', '-s', dest="missing", default=True,
|
||||
action="store_false", help="""
|
||||
Do not try to find missing files.
|
||||
If merge commits were not evaluated with --merge and some files were
|
||||
not found in regular commits, by default %(prog)s searches for these
|
||||
files again in the merge commits.
|
||||
This option disables this retry, so files found only in merge commits
|
||||
will not have their timestamp updated.
|
||||
""")
|
||||
|
||||
parser.add_argument('--no-directories', '-D', dest='dirs', default=True,
|
||||
action="store_false", help="""
|
||||
Do not update directory timestamps.
|
||||
By default, use the time of its most recently created, renamed or deleted file.
|
||||
Note that just modifying a file will NOT update its directory time.
|
||||
""")
|
||||
|
||||
parser.add_argument('--test', '-t', default=False, action="store_true",
|
||||
help="Test run: do not actually update any file timestamp.")
|
||||
|
||||
parser.add_argument('--commit-time', '-c', dest='commit_time', default=False,
|
||||
action='store_true', help="Use commit time instead of author time.")
|
||||
|
||||
parser.add_argument('--oldest-time', '-o', dest='reverse_order', default=False,
|
||||
action='store_true', help="""
|
||||
Update times based on the oldest, instead of the most recent commit of a file.
|
||||
This reverses the order in which the git log is processed to emulate a
|
||||
file "creation" date. Note this will be inaccurate for files deleted and
|
||||
re-created at later dates.
|
||||
""")
|
||||
|
||||
parser.add_argument('--skip-older-than', metavar='SECONDS', type=int, help="""
|
||||
Ignore files that are currently older than %(metavar)s.
|
||||
Useful in workflows that assume such files already have a correct timestamp,
|
||||
as it may improve performance by processing fewer files.
|
||||
""")
|
||||
|
||||
parser.add_argument('--skip-older-than-commit', '-N', default=False,
|
||||
action='store_true', help="""
|
||||
Ignore files older than the timestamp it would be updated to.
|
||||
Such files may be considered "original", likely in the author's repository.
|
||||
""")
|
||||
|
||||
parser.add_argument('--unique-times', default=False, action="store_true", help="""
|
||||
Set the microseconds to a unique value per commit.
|
||||
Allows telling apart changes that would otherwise have identical timestamps,
|
||||
as git's time accuracy is in seconds.
|
||||
""")
|
||||
|
||||
parser.add_argument('pathspec', nargs='*', metavar='PATHSPEC', help="""
|
||||
Only modify paths matching %(metavar)s, relative to current directory.
|
||||
By default, update all but untracked files and submodules.
|
||||
""")
|
||||
|
||||
parser.add_argument('--version', '-V', action='version',
|
||||
version='%(prog)s version {version}'.format(version=get_version()))
|
||||
|
||||
args_ = parser.parse_args()
|
||||
if args_.verbose:
|
||||
args_.loglevel = max(logging.TRACE, logging.DEBUG // args_.verbose)
|
||||
args_.debug = args_.loglevel <= logging.DEBUG
|
||||
return args_
|
||||
|
||||
|
||||
def get_version(version=__version__):
|
||||
if not version.endswith('+dev'):
|
||||
return version
|
||||
try:
|
||||
cwd = os.path.dirname(os.path.realpath(__file__))
|
||||
return Git(cwd=cwd, errors=False).describe().lstrip('v')
|
||||
except Git.Error:
|
||||
return '-'.join((version, "unknown"))
|
||||
|
||||
|
||||
# Helper functions ############################################################
|
||||
|
||||
def setup_logging():
|
||||
"""Add TRACE logging level and corresponding method, return the root logger"""
|
||||
logging.TRACE = TRACE = logging.DEBUG // 2
|
||||
logging.Logger.trace = lambda _, m, *a, **k: _.log(TRACE, m, *a, **k)
|
||||
return logging.getLogger()
|
||||
|
||||
|
||||
def normalize(path):
|
||||
r"""Normalize paths from git, handling non-ASCII characters.
|
||||
|
||||
Git stores paths as UTF-8 normalization form C.
|
||||
If path contains non-ASCII or non-printable characters, git outputs the UTF-8
|
||||
in octal-escaped notation, escaping double-quotes and backslashes, and then
|
||||
double-quoting the whole path.
|
||||
https://git-scm.com/docs/git-config#Documentation/git-config.txt-corequotePath
|
||||
|
||||
This function reverts this encoding, so:
|
||||
normalize(r'"Back\\slash_double\"quote_a\303\247a\303\255"') =>
|
||||
r'Back\slash_double"quote_açaí')
|
||||
|
||||
Paths with invalid UTF-8 encoding, such as single 0x80-0xFF bytes (e.g, from
|
||||
Latin1/Windows-1251 encoding) are decoded using surrogate escape, the same
|
||||
method used by Python for filesystem paths. So 0xE6 ("æ" in Latin1, r'\\346'
|
||||
from Git) is decoded as "\udce6". See https://peps.python.org/pep-0383/ and
|
||||
https://vstinner.github.io/painful-history-python-filesystem-encoding.html
|
||||
|
||||
Also see notes on `windows/non-ascii-paths.txt` about path encodings on
|
||||
non-UTF-8 platforms and filesystems.
|
||||
"""
|
||||
if path and path[0] == '"':
|
||||
# Python 2: path = path[1:-1].decode("string-escape")
|
||||
# Python 3: https://stackoverflow.com/a/46650050/624066
|
||||
path = (path[1:-1] # Remove enclosing double quotes
|
||||
.encode('latin1') # Convert to bytes, required by 'unicode-escape'
|
||||
.decode('unicode-escape') # Perform the actual octal-escaping decode
|
||||
.encode('latin1') # 1:1 mapping to bytes, UTF-8 encoded
|
||||
.decode('utf8', 'surrogateescape')) # Decode from UTF-8
|
||||
if NORMALIZE_PATHS:
|
||||
# Make sure the slash matches the OS; for Windows we need a backslash
|
||||
path = os.path.normpath(path)
|
||||
return path
|
||||
|
||||
|
||||
def dummy(*_args, **_kwargs):
|
||||
"""No-op function used in dry-run tests"""
|
||||
|
||||
|
||||
def touch(path, mtime):
|
||||
"""The actual mtime update"""
|
||||
os.utime(path, (mtime, mtime), **UTIME_KWS)
|
||||
|
||||
|
||||
def touch_ns(path, mtime_ns):
|
||||
"""The actual mtime update, using nanoseconds for unique timestamps"""
|
||||
os.utime(path, None, ns=(mtime_ns, mtime_ns), **UTIME_KWS)
|
||||
|
||||
|
||||
def isodate(secs: int):
|
||||
# time.localtime() accepts floats, but discards fractional part
|
||||
return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(secs))
|
||||
|
||||
|
||||
def isodate_ns(ns: int):
|
||||
# for integers fromtimestamp() is equivalent and ~16% slower than isodate()
|
||||
return datetime.datetime.fromtimestamp(ns / 1000000000).isoformat(sep=' ')
|
||||
|
||||
|
||||
def get_mtime_ns(secs: int, idx: int):
|
||||
# Time resolution for filesystems and functions:
|
||||
# ext-4 and other POSIX filesystems: 1 nanosecond
|
||||
# NTFS (Windows default): 100 nanoseconds
|
||||
# datetime.datetime() (due to 64-bit float epoch): 1 microsecond
|
||||
us = idx % 1000000 # 10**6
|
||||
return 1000 * (1000000 * secs + us)
|
||||
|
||||
|
||||
def get_mtime_path(path):
|
||||
return os.path.getmtime(path)
|
||||
|
||||
|
||||
# Git class and parse_log(), the heart of the script ##########################
|
||||
|
||||
class Git:
|
||||
def __init__(self, workdir=None, gitdir=None, cwd=None, errors=True):
|
||||
self.gitcmd = ['git']
|
||||
self.errors = errors
|
||||
self._proc = None
|
||||
if workdir: self.gitcmd.extend(('--work-tree', workdir))
|
||||
if gitdir: self.gitcmd.extend(('--git-dir', gitdir))
|
||||
if cwd: self.gitcmd.extend(('-C', cwd))
|
||||
self.workdir, self.gitdir = self._get_repo_dirs()
|
||||
|
||||
def ls_files(self, paths: list = None):
|
||||
return (normalize(_) for _ in self._run('ls-files --full-name', paths))
|
||||
|
||||
def ls_dirty(self, force=False):
|
||||
return (normalize(_[3:].split(' -> ', 1)[-1])
|
||||
for _ in self._run('status --porcelain')
|
||||
if _[:2] != '??' and (not force or (_[0] in ('R', 'A')
|
||||
or _[1] == 'D')))
|
||||
|
||||
def log(self, merge=False, first_parent=False, commit_time=False,
|
||||
reverse_order=False, paths: list = None):
|
||||
cmd = 'whatchanged --pretty={}'.format('%ct' if commit_time else '%at')
|
||||
if merge: cmd += ' -m'
|
||||
if first_parent: cmd += ' --first-parent'
|
||||
if reverse_order: cmd += ' --reverse'
|
||||
return self._run(cmd, paths)
|
||||
|
||||
def describe(self):
|
||||
return self._run('describe --tags', check=True)[0]
|
||||
|
||||
def terminate(self):
|
||||
if self._proc is None:
|
||||
return
|
||||
try:
|
||||
self._proc.terminate()
|
||||
except OSError:
|
||||
# Avoid errors on OpenBSD
|
||||
pass
|
||||
|
||||
def _get_repo_dirs(self):
|
||||
return (os.path.normpath(_) for _ in
|
||||
self._run('rev-parse --show-toplevel --absolute-git-dir', check=True))
|
||||
|
||||
def _run(self, cmdstr: str, paths: list = None, output=True, check=False):
|
||||
cmdlist = self.gitcmd + shlex.split(cmdstr)
|
||||
if paths:
|
||||
cmdlist.append('--')
|
||||
cmdlist.extend(paths)
|
||||
popen_args = dict(universal_newlines=True, encoding='utf8')
|
||||
if not self.errors:
|
||||
popen_args['stderr'] = subprocess.DEVNULL
|
||||
log.trace("Executing: %s", ' '.join(cmdlist))
|
||||
if not output:
|
||||
return subprocess.call(cmdlist, **popen_args)
|
||||
if check:
|
||||
try:
|
||||
stdout: str = subprocess.check_output(cmdlist, **popen_args)
|
||||
return stdout.splitlines()
|
||||
except subprocess.CalledProcessError as e:
|
||||
raise self.Error(e.returncode, e.cmd, e.output, e.stderr)
|
||||
self._proc = subprocess.Popen(cmdlist, stdout=subprocess.PIPE, **popen_args)
|
||||
return (_.rstrip() for _ in self._proc.stdout)
|
||||
|
||||
def __del__(self):
|
||||
self.terminate()
|
||||
|
||||
class Error(subprocess.CalledProcessError):
|
||||
"""Error from git executable"""
|
||||
|
||||
|
||||
def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
|
||||
mtime = 0
|
||||
datestr = isodate(0)
|
||||
for line in git.log(
|
||||
merge,
|
||||
args.first_parent,
|
||||
args.commit_time,
|
||||
args.reverse_order,
|
||||
filterlist
|
||||
):
|
||||
stats['loglines'] += 1
|
||||
|
||||
# Blank line between Date and list of files
|
||||
if not line:
|
||||
continue
|
||||
|
||||
# Date line
|
||||
if line[0] != ':': # Faster than `not line.startswith(':')`
|
||||
stats['commits'] += 1
|
||||
mtime = int(line)
|
||||
if args.unique_times:
|
||||
mtime = get_mtime_ns(mtime, stats['commits'])
|
||||
if args.debug:
|
||||
datestr = isodate(mtime)
|
||||
continue
|
||||
|
||||
# File line: three tokens if it describes a renaming, otherwise two
|
||||
tokens = line.split('\t')
|
||||
|
||||
# Possible statuses:
|
||||
# M: Modified (content changed)
|
||||
# A: Added (created)
|
||||
# D: Deleted
|
||||
# T: Type changed: to/from regular file, symlinks, submodules
|
||||
# R099: Renamed (moved), with % of unchanged content. 100 = pure rename
|
||||
# Not possible in log: C=Copied, U=Unmerged, X=Unknown, B=pairing Broken
|
||||
status = tokens[0].split(' ')[-1]
|
||||
file = tokens[-1]
|
||||
|
||||
# Handles non-ASCII chars and OS path separator
|
||||
file = normalize(file)
|
||||
|
||||
def do_file():
|
||||
if args.skip_older_than_commit and get_mtime_path(file) <= mtime:
|
||||
stats['skip'] += 1
|
||||
return
|
||||
if args.debug:
|
||||
log.debug("%d\t%d\t%d\t%s\t%s",
|
||||
stats['loglines'], stats['commits'], stats['files'],
|
||||
datestr, file)
|
||||
try:
|
||||
touch(os.path.join(git.workdir, file), mtime)
|
||||
stats['touches'] += 1
|
||||
except Exception as e:
|
||||
log.error("ERROR: %s: %s", e, file)
|
||||
stats['errors'] += 1
|
||||
|
||||
def do_dir():
|
||||
if args.debug:
|
||||
log.debug("%d\t%d\t-\t%s\t%s",
|
||||
stats['loglines'], stats['commits'],
|
||||
datestr, "{}/".format(dirname or '.'))
|
||||
try:
|
||||
touch(os.path.join(git.workdir, dirname), mtime)
|
||||
stats['dirtouches'] += 1
|
||||
except Exception as e:
|
||||
log.error("ERROR: %s: %s", e, dirname)
|
||||
stats['direrrors'] += 1
|
||||
|
||||
if file in filelist:
|
||||
stats['files'] -= 1
|
||||
filelist.remove(file)
|
||||
do_file()
|
||||
|
||||
if args.dirs and status in ('A', 'D'):
|
||||
dirname = os.path.dirname(file)
|
||||
if dirname in dirlist:
|
||||
dirlist.remove(dirname)
|
||||
do_dir()
|
||||
|
||||
# All files done?
|
||||
if not stats['files']:
|
||||
git.terminate()
|
||||
return
|
||||
|
||||
|
||||
# Main Logic ##################################################################
|
||||
|
||||
def main():
|
||||
start = time.time() # yes, Wall time. CPU time is not realistic for users.
|
||||
stats = {_: 0 for _ in ('loglines', 'commits', 'touches', 'skip', 'errors',
|
||||
'dirtouches', 'direrrors')}
|
||||
|
||||
logging.basicConfig(level=args.loglevel, format='%(message)s')
|
||||
log.trace("Arguments: %s", args)
|
||||
|
||||
# First things first: Where and Who are we?
|
||||
if args.cwd:
|
||||
log.debug("Changing directory: %s", args.cwd)
|
||||
try:
|
||||
os.chdir(args.cwd)
|
||||
except OSError as e:
|
||||
log.critical(e)
|
||||
return e.errno
|
||||
# Using both os.chdir() and `git -C` is redundant, but might prevent side effects
|
||||
# `git -C` alone could be enough if we make sure that:
|
||||
# - all paths, including args.pathspec, are processed by git: ls-files, rev-parse
|
||||
# - touch() / os.utime() path argument is always prepended with git.workdir
|
||||
try:
|
||||
git = Git(workdir=args.workdir, gitdir=args.gitdir, cwd=args.cwd)
|
||||
except Git.Error as e:
|
||||
# Not in a git repository, and git already informed user on stderr. So we just...
|
||||
return e.returncode
|
||||
|
||||
# Get the files managed by git and build file list to be processed
|
||||
if UPDATE_SYMLINKS and not args.skip_older_than:
|
||||
filelist = set(git.ls_files(args.pathspec))
|
||||
else:
|
||||
filelist = set()
|
||||
for path in git.ls_files(args.pathspec):
|
||||
fullpath = os.path.join(git.workdir, path)
|
||||
|
||||
# Symlink (to file, to dir or broken - git handles the same way)
|
||||
if not UPDATE_SYMLINKS and os.path.islink(fullpath):
|
||||
log.warning("WARNING: Skipping symlink, no OS support for updates: %s",
|
||||
path)
|
||||
continue
|
||||
|
||||
# skip files which are older than given threshold
|
||||
if (args.skip_older_than
|
||||
and start - get_mtime_path(fullpath) > args.skip_older_than):
|
||||
continue
|
||||
|
||||
# Always add files relative to worktree root
|
||||
filelist.add(path)
|
||||
|
||||
# If --force, silently ignore uncommitted deletions (not in the filesystem)
|
||||
# and renames / additions (will not be found in log anyway)
|
||||
if args.force:
|
||||
filelist -= set(git.ls_dirty(force=True))
|
||||
# Otherwise, ignore any dirty files
|
||||
else:
|
||||
dirty = set(git.ls_dirty())
|
||||
if dirty:
|
||||
log.warning("WARNING: Modified files in the working directory were ignored."
|
||||
"\nTo include such files, commit your changes or use --force.")
|
||||
filelist -= dirty
|
||||
|
||||
# Build dir list to be processed
|
||||
dirlist = set(os.path.dirname(_) for _ in filelist) if args.dirs else set()
|
||||
|
||||
stats['totalfiles'] = stats['files'] = len(filelist)
|
||||
log.info("{0:,} files to be processed in work dir".format(stats['totalfiles']))
|
||||
|
||||
if not filelist:
|
||||
# Nothing to do. Exit silently and without errors, just like git does
|
||||
return
|
||||
|
||||
# Process the log until all files are 'touched'
|
||||
log.debug("Line #\tLog #\tF.Left\tModification Time\tFile Name")
|
||||
parse_log(filelist, dirlist, stats, git, args.merge, args.pathspec)
|
||||
|
||||
# Missing files
|
||||
if filelist:
|
||||
# Try to find them in merge logs, if not done already
|
||||
# (usually HUGE, thus MUCH slower!)
|
||||
if args.missing and not args.merge:
|
||||
filterlist = list(filelist)
|
||||
missing = len(filterlist)
|
||||
log.info("{0:,} files not found in log, trying merge commits".format(missing))
|
||||
for i in range(0, missing, STEPMISSING):
|
||||
parse_log(filelist, dirlist, stats, git,
|
||||
merge=True, filterlist=filterlist[i:i + STEPMISSING])
|
||||
|
||||
# Still missing some?
|
||||
for file in filelist:
|
||||
log.warning("WARNING: not found in the log: %s", file)
|
||||
|
||||
# Final statistics
|
||||
# Suggestion: use git-log --before=mtime to brag about skipped log entries
|
||||
def log_info(msg, *a, width=13):
|
||||
ifmt = '{:%d,}' % (width,) # not using 'n' for consistency with ffmt
|
||||
ffmt = '{:%d,.2f}' % (width,)
|
||||
# %-formatting lacks a thousand separator, must pre-render with .format()
|
||||
log.info(msg.replace('%d', ifmt).replace('%f', ffmt).format(*a))
|
||||
|
||||
log_info(
|
||||
"Statistics:\n"
|
||||
"%f seconds\n"
|
||||
"%d log lines processed\n"
|
||||
"%d commits evaluated",
|
||||
time.time() - start, stats['loglines'], stats['commits'])
|
||||
|
||||
if args.dirs:
|
||||
if stats['direrrors']: log_info("%d directory update errors", stats['direrrors'])
|
||||
log_info("%d directories updated", stats['dirtouches'])
|
||||
|
||||
if stats['touches'] != stats['totalfiles']:
|
||||
log_info("%d files", stats['totalfiles'])
|
||||
if stats['skip']: log_info("%d files skipped", stats['skip'])
|
||||
if stats['files']: log_info("%d files missing", stats['files'])
|
||||
if stats['errors']: log_info("%d file update errors", stats['errors'])
|
||||
|
||||
log_info("%d files updated", stats['touches'])
|
||||
|
||||
if args.test:
|
||||
log.info("TEST RUN - No files modified!")
|
||||
|
||||
|
||||
# Keep only essential, global assignments here. Any other logic must be in main()
|
||||
log = setup_logging()
|
||||
args = parse_args()
|
||||
|
||||
# Set the actual touch() and other functions based on command-line arguments
|
||||
if args.unique_times:
|
||||
touch = touch_ns
|
||||
isodate = isodate_ns
|
||||
|
||||
# Make sure this is always set last to ensure --test behaves as intended
|
||||
if args.test:
|
||||
touch = dummy
|
||||
|
||||
# UI done, it's showtime!
|
||||
try:
|
||||
sys.exit(main())
|
||||
except KeyboardInterrupt:
|
||||
log.info("\nAborting")
|
||||
signal.signal(signal.SIGINT, signal.SIG_DFL)
|
||||
os.kill(os.getpid(), signal.SIGINT)
|
||||
120
.github/workflows/_lint.yml
vendored
@@ -9,38 +9,134 @@ on:
|
||||
description: "From which folder this pipeline executes"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.4.2"
|
||||
POETRY_VERSION: "1.5.1"
|
||||
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
|
||||
|
||||
jobs:
|
||||
build:
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
env:
|
||||
# This number is set "by eye": we want it to be big enough
|
||||
# so that it's bigger than the number of commits in any reasonable PR,
|
||||
# and also as small as possible since increasing the number makes
|
||||
# the initial `git fetch` slower.
|
||||
FETCH_DEPTH: 50
|
||||
strategy:
|
||||
matrix:
|
||||
# Only lint on the min and max supported Python versions.
|
||||
# It's extremely unlikely that there's a lint issue on any version in between
|
||||
# that doesn't show up on the min or max versions.
|
||||
#
|
||||
# GitHub rate-limits how many jobs can be running at any one time.
|
||||
# Starting new jobs is also relatively slow,
|
||||
# so linting on fewer versions makes CI faster.
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- name: Install poetry
|
||||
with:
|
||||
# Fetch the last FETCH_DEPTH commits, so the mtime-changing script
|
||||
# can accurately set the mtimes of files modified in the last FETCH_DEPTH commits.
|
||||
fetch-depth: ${{ env.FETCH_DEPTH }}
|
||||
- name: Restore workdir file mtimes to last-edited commit date
|
||||
id: restore-mtimes
|
||||
# This is needed to make black caching work.
|
||||
# Black's cache uses file (mtime, size) to check whether a lookup is a cache hit.
|
||||
# Without this command, files in the repo would have the current time as the modified time,
|
||||
# since the previous action step just created them.
|
||||
# This command resets the mtime to the last time the files were modified in git instead,
|
||||
# which is a high-quality and stable representation of the last modification date.
|
||||
run: |
|
||||
pipx install poetry==$POETRY_VERSION
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: actions/setup-python@v4
|
||||
# Important considerations:
|
||||
# - These commands run at base of the repo, since we never `cd` to the `WORKDIR`.
|
||||
# - We only want to alter mtimes for Python files, since that's all black checks.
|
||||
# - We don't need to alter mtimes for directories, since black doesn't look at those.
|
||||
# - We also only alter mtimes inside the `WORKDIR` since that's all we'll lint.
|
||||
# - This should run before `poetry install`, because poetry's venv also contains
|
||||
# Python files, and we don't want to alter their mtimes since they aren't linted.
|
||||
|
||||
# Ensure we fail on non-zero exits and on undefined variables.
|
||||
# Also print executed commands, for easier debugging.
|
||||
set -eux
|
||||
|
||||
# Restore the mtimes of Python files in the workdir based on git history.
|
||||
.github/tools/git-restore-mtime --no-directories "$WORKDIR/**/*.py"
|
||||
|
||||
# Since CI only does a partial fetch (to `FETCH_DEPTH`) for efficiency,
|
||||
# the local git repo doesn't have full history. There are probably files
|
||||
# that were last modified in a commit *older than* the oldest fetched commit.
|
||||
# After `git-restore-mtime`, such files have a mtime set to the oldest fetched commit.
|
||||
#
|
||||
# As new commits get added, that timestamp will keep moving forward.
|
||||
# If left unchanged, this will make `black` think that the files were edited
|
||||
# more recently than its cache suggests. Instead, we can set their mtime
|
||||
# to a fixed date in the far past that won't change and won't cause cache misses in black.
|
||||
#
|
||||
# For all workdir Python files modified in or before the oldest few fetched commits,
|
||||
# make their mtime be 2000-01-01 00:00:00.
|
||||
OLDEST_COMMIT="$(git log --reverse '--pretty=format:%H' | head -1)"
|
||||
OLDEST_COMMIT_TIME="$(git show -s '--format=%ai' "$OLDEST_COMMIT")"
|
||||
find "$WORKDIR" -name '*.py' -type f -not -newermt "$OLDEST_COMMIT_TIME" -exec touch -c -m -t '200001010000' '{}' '+'
|
||||
|
||||
echo "oldest-commit=$OLDEST_COMMIT" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
cache: poetry
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: lint
|
||||
|
||||
- name: Check Poetry File
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry check
|
||||
|
||||
- name: Check lock file
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry lock --check
|
||||
|
||||
- name: Install dependencies
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry install
|
||||
|
||||
- name: Install langchain editable
|
||||
if: ${{ inputs.working-directory != 'langchain' }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.working-directory != 'libs/langchain' }}
|
||||
run: |
|
||||
pip install -e ../langchain
|
||||
|
||||
- name: Restore black cache
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
CACHE_BASE: black-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "1"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.black_cache
|
||||
key: ${{ env.CACHE_BASE }}-${{ steps.restore-mtimes.outputs.oldest-commit }}
|
||||
restore-keys:
|
||||
# If we can't find an exact match for our cache key, accept any with this prefix.
|
||||
${{ env.CACHE_BASE }}-
|
||||
|
||||
- name: Get .mypy_cache to speed up mypy
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.mypy_cache
|
||||
key: mypy-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
|
||||
|
||||
- name: Analysing the code with our lint
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
env:
|
||||
BLACK_CACHE_DIR: .black_cache
|
||||
run: |
|
||||
make lint
|
||||
|
||||
81
.github/workflows/_pydantic_compatibility.yml
vendored
Normal file
@@ -0,0 +1,81 @@
|
||||
name: pydantic v1/v2 compatibility
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.5.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: Pydantic v1/v2 compatibility - Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: pydantic-cross-compat
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install
|
||||
|
||||
- name: Install the opposite major version of pydantic
|
||||
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
|
||||
shell: bash
|
||||
run: |
|
||||
# Determine the major part of pydantic version
|
||||
REGULAR_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
|
||||
|
||||
if [[ "$REGULAR_VERSION" == "1" ]]; then
|
||||
PYDANTIC_DEP=">=2.1,<3"
|
||||
TEST_WITH_VERSION="2"
|
||||
elif [[ "$REGULAR_VERSION" == "2" ]]; then
|
||||
PYDANTIC_DEP="<2"
|
||||
TEST_WITH_VERSION="1"
|
||||
else
|
||||
echo "Unexpected pydantic major version '$REGULAR_VERSION', cannot determine which version to use for cross-compatibility test."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Install via `pip` instead of `poetry add` to avoid changing lockfile,
|
||||
# which would prevent caching from working: the cache would get saved
|
||||
# to a different key than where it gets loaded from.
|
||||
poetry run pip install "pydantic${PYDANTIC_DEP}"
|
||||
|
||||
# Ensure that the correct pydantic is installed now.
|
||||
echo "Checking pydantic version... Expecting ${TEST_WITH_VERSION}"
|
||||
|
||||
# Determine the major part of pydantic version
|
||||
CURRENT_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
|
||||
|
||||
# Check that the major part of pydantic version is as expected, if not
|
||||
# raise an error
|
||||
if [[ "$CURRENT_VERSION" != "$TEST_WITH_VERSION" ]]; then
|
||||
echo "Error: expected pydantic version ${CURRENT_VERSION} to have been installed, but found: ${TEST_WITH_VERSION}"
|
||||
exit 1
|
||||
fi
|
||||
echo "Found pydantic version ${CURRENT_VERSION}, as expected"
|
||||
- name: Run pydantic compatibility tests
|
||||
shell: bash
|
||||
run: make test
|
||||
30
.github/workflows/_release.yml
vendored
@@ -9,21 +9,30 @@ on:
|
||||
description: "From which folder this pipeline executes"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.4.2"
|
||||
POETRY_VERSION: "1.5.1"
|
||||
|
||||
jobs:
|
||||
if_release:
|
||||
if: |
|
||||
${{ github.event.pull_request.merged == true }}
|
||||
&& ${{ contains(github.event.pull_request.labels.*.name, 'release') }}
|
||||
# Disallow publishing from branches that aren't `master`.
|
||||
if: github.ref == 'refs/heads/master'
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
# This permission is used for trusted publishing:
|
||||
# https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
|
||||
#
|
||||
# Trusted publishing has to also be configured on PyPI for each package:
|
||||
# https://docs.pypi.org/trusted-publishers/adding-a-publisher/
|
||||
id-token: write
|
||||
|
||||
# This permission is needed by `ncipollo/release-action` to create the GitHub release.
|
||||
contents: write
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- name: Install poetry
|
||||
run: pipx install poetry==$POETRY_VERSION
|
||||
run: pipx install "poetry==$POETRY_VERSION"
|
||||
- name: Set up Python 3.10
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
@@ -45,8 +54,9 @@ jobs:
|
||||
generateReleaseNotes: true
|
||||
tag: v${{ steps.check-version.outputs.version }}
|
||||
commit: master
|
||||
- name: Publish to PyPI
|
||||
env:
|
||||
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_API_TOKEN }}
|
||||
run: |
|
||||
poetry publish
|
||||
- name: Publish package distributions to PyPI
|
||||
uses: pypa/gh-action-pypi-publish@release/v1
|
||||
with:
|
||||
packages-dir: ${{ inputs.working-directory }}/dist/
|
||||
verbose: true
|
||||
print-hash: true
|
||||
|
||||
42
.github/workflows/_test.yml
vendored
@@ -7,13 +7,9 @@ on:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
test_type:
|
||||
type: string
|
||||
description: "Test types to run"
|
||||
default: '["core", "extended"]'
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.4.2"
|
||||
POETRY_VERSION: "1.5.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -28,34 +24,22 @@ jobs:
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
test_type: ${{ fromJSON(inputs.test_type) }}
|
||||
name: Python ${{ matrix.python-version }} ${{ matrix.test_type }}
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
poetry-version: "1.4.2"
|
||||
cache-key: ${{ matrix.test_type }}
|
||||
install-command: |
|
||||
if [ "${{ matrix.test_type }}" == "core" ]; then
|
||||
echo "Running core tests, installing dependencies with poetry..."
|
||||
poetry install
|
||||
else
|
||||
echo "Running extended tests, installing dependencies with poetry..."
|
||||
poetry install -E extended_testing
|
||||
fi
|
||||
- name: Install langchain editable
|
||||
if: ${{ inputs.working-directory != 'langchain' }}
|
||||
run: |
|
||||
pip install -e ../langchain
|
||||
- name: Run ${{matrix.test_type}} tests
|
||||
run: |
|
||||
if [ "${{ matrix.test_type }}" == "core" ]; then
|
||||
make test
|
||||
else
|
||||
make extended_tests
|
||||
fi
|
||||
cache-key: core
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install
|
||||
|
||||
- name: Run core tests
|
||||
shell: bash
|
||||
run: make test
|
||||
|
||||
58
.github/workflows/langchain_ci.yml
vendored
@@ -8,10 +8,25 @@ on:
|
||||
paths:
|
||||
- '.github/workflows/_lint.yml'
|
||||
- '.github/workflows/_test.yml'
|
||||
- '.github/workflows/_pydantic_compatibility.yml'
|
||||
- '.github/workflows/langchain_ci.yml'
|
||||
- 'libs/langchain/**'
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
# There's no point in testing an outdated version of the code. GitHub only allows
|
||||
# a limited number of job runners to be active at the same time, so it's better to cancel
|
||||
# pointless jobs early so that more useful jobs can run sooner.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.5.1"
|
||||
WORKDIR: "libs/langchain"
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
uses:
|
||||
@@ -19,9 +34,50 @@ jobs:
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
|
||||
test:
|
||||
uses:
|
||||
./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
secrets: inherit
|
||||
|
||||
pydantic-compatibility:
|
||||
uses:
|
||||
./.github/workflows/_pydantic_compatibility.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
|
||||
extended-tests:
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ env.WORKDIR }}
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: Python ${{ matrix.python-version }} extended tests
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: libs/langchain
|
||||
cache-key: extended
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running extended tests, installing dependencies with poetry..."
|
||||
poetry install -E extended_testing
|
||||
|
||||
- name: Run extended tests
|
||||
run: make extended_tests
|
||||
|
||||
58
.github/workflows/langchain_experimental_ci.yml
vendored
@@ -13,6 +13,20 @@ on:
|
||||
- 'libs/experimental/**'
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
# There's no point in testing an outdated version of the code. GitHub only allows
|
||||
# a limited number of job runners to be active at the same time, so it's better to cancel
|
||||
# pointless jobs early so that more useful jobs can run sooner.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.5.1"
|
||||
WORKDIR: "libs/experimental"
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
uses:
|
||||
@@ -20,10 +34,50 @@ jobs:
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
secrets: inherit
|
||||
|
||||
test:
|
||||
uses:
|
||||
./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
test_type: '["core"]'
|
||||
secrets: inherit
|
||||
secrets: inherit
|
||||
|
||||
# It's possible that langchain-experimental works fine with the latest *published* langchain,
|
||||
# but is broken with the langchain on `master`.
|
||||
#
|
||||
# We want to catch situations like that *before* releasing a new langchain, hence this test.
|
||||
test-with-latest-langchain:
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ env.WORKDIR }}
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: test with unpublished langchain - Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ env.WORKDIR }}
|
||||
cache-key: unpublished-langchain
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running tests with unpublished langchain, installing dependencies with poetry..."
|
||||
poetry install
|
||||
|
||||
echo "Editably installing langchain outside of poetry, to avoid messing up lockfile..."
|
||||
poetry run pip install -e ../langchain
|
||||
|
||||
- name: Run tests
|
||||
run: make test
|
||||
|
||||
@@ -2,13 +2,6 @@
|
||||
name: libs/experimental Release
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
types:
|
||||
- closed
|
||||
branches:
|
||||
- master
|
||||
paths:
|
||||
- 'libs/experimental/pyproject.toml'
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
@@ -17,4 +10,4 @@ jobs:
|
||||
./.github/workflows/_release.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
secrets: inherit
|
||||
secrets: inherit
|
||||
|
||||
9
.github/workflows/langchain_release.yml
vendored
@@ -2,13 +2,6 @@
|
||||
name: libs/langchain Release
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
types:
|
||||
- closed
|
||||
branches:
|
||||
- master
|
||||
paths:
|
||||
- 'libs/langchain/pyproject.toml'
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
@@ -17,4 +10,4 @@ jobs:
|
||||
./.github/workflows/_release.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
secrets: inherit
|
||||
|
||||
19
.github/workflows/scheduled_test.yml
vendored
@@ -6,7 +6,7 @@ on:
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.4.2"
|
||||
POETRY_VERSION: "1.5.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -25,18 +25,25 @@ jobs:
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: "1.4.2"
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: libs/langchain
|
||||
install-command: |
|
||||
echo "Running scheduled tests, installing dependencies with poetry..."
|
||||
poetry install --with=test_integration
|
||||
cache-key: scheduled
|
||||
|
||||
- name: Install dependencies
|
||||
working-directory: libs/langchain
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running scheduled tests, installing dependencies with poetry..."
|
||||
poetry install --with=test_integration
|
||||
|
||||
- name: Run tests
|
||||
shell: bash
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
run: |
|
||||
make scheduled_tests
|
||||
shell: bash
|
||||
|
||||
14
README.md
@@ -2,18 +2,18 @@
|
||||
|
||||
⚡ Building applications with LLMs through composability ⚡
|
||||
|
||||
[](https://github.com/hwchase17/langchain/releases)
|
||||
[](https://github.com/hwchase17/langchain/actions/workflows/langchain_ci.yml)
|
||||
[](https://github.com/hwchase17/langchain/actions/workflows/langchain_experimental_ci.yml)
|
||||
[](https://github.com/langchain-ai/langchain/releases)
|
||||
[](https://github.com/langchain-ai/langchain/actions/workflows/langchain_ci.yml)
|
||||
[](https://github.com/langchain-ai/langchain/actions/workflows/langchain_experimental_ci.yml)
|
||||
[](https://pepy.tech/project/langchain)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://twitter.com/langchainai)
|
||||
[](https://discord.gg/6adMQxSpJS)
|
||||
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/hwchase17/langchain)
|
||||
[](https://codespaces.new/hwchase17/langchain)
|
||||
[](https://star-history.com/#hwchase17/langchain)
|
||||
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
|
||||
[](https://codespaces.new/langchain-ai/langchain)
|
||||
[](https://star-history.com/#langchain-ai/langchain)
|
||||
[](https://libraries.io/github/langchain-ai/langchain)
|
||||
[](https://github.com/hwchase17/langchain/issues)
|
||||
[](https://github.com/langchain-ai/langchain/issues)
|
||||
|
||||
|
||||
Looking for the JS/TS version? Check out [LangChain.js](https://github.com/hwchase17/langchainjs).
|
||||
|
||||
6
SECURITY.md
Normal file
@@ -0,0 +1,6 @@
|
||||
# Security Policy
|
||||
|
||||
## Reporting a Vulnerability
|
||||
|
||||
Please report security vulnerabilities by email to `security@langchain.dev`.
|
||||
This email is an alias to a subset of our maintainers, and will ensure the issue is promptly triaged and acted upon as needed.
|
||||
@@ -156,7 +156,7 @@ html_context = {
|
||||
html_static_path = ["_static"]
|
||||
|
||||
# These paths are either relative to html_static_path
|
||||
# or fully qualified paths (eg. https://...)
|
||||
# or fully qualified paths (e.g. https://...)
|
||||
html_css_files = [
|
||||
"css/custom.css",
|
||||
]
|
||||
|
||||
@@ -150,7 +150,8 @@ def _load_package_modules(
|
||||
|
||||
relative_module_name = file_path.relative_to(package_path)
|
||||
|
||||
if relative_module_name.name.startswith("_"):
|
||||
# Skip if any module part starts with an underscore
|
||||
if any(part.startswith("_") for part in relative_module_name.parts):
|
||||
continue
|
||||
|
||||
# Get the full namespace of the module
|
||||
@@ -227,7 +228,7 @@ Classes
|
||||
:toctree: {module}
|
||||
"""
|
||||
|
||||
for class_ in classes:
|
||||
for class_ in sorted(classes, key=lambda c: c["qualified_name"]):
|
||||
if not class_["is_public"]:
|
||||
continue
|
||||
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
-e libs/langchain
|
||||
-e libs/experimental
|
||||
pydantic<2
|
||||
autodoc_pydantic==1.8.0
|
||||
myst_parser
|
||||
nbsphinx==0.8.9
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Community Navigator
|
||||
# Community navigator
|
||||
|
||||
Hi! Thanks for being here. We’re lucky to have a community of so many passionate developers building with LangChain–we have so much to teach and learn from each other. Community members contribute code, host meetups, write blog posts, amplify each other’s work, become each other's customers and collaborators, and so much more.
|
||||
|
||||
@@ -28,7 +28,7 @@ LangChain is the product of over 5,000+ contributions by 1,500+ contributors, an
|
||||
# 🌍 Meetups, Events, and Hackathons
|
||||
|
||||
One of our favorite things about working in AI is how much enthusiasm there is for building together. We want to help make that as easy and impactful for you as possible!
|
||||
- **Find a meetup, hackathon, or webinar:** you can find the one for you on on our [global events calendar](https://mirror-feeling-d80.notion.site/0bc81da76a184297b86ca8fc782ee9a3?v=0d80342540df465396546976a50cfb3f).
|
||||
- **Find a meetup, hackathon, or webinar:** you can find the one for you on our [global events calendar](https://mirror-feeling-d80.notion.site/0bc81da76a184297b86ca8fc782ee9a3?v=0d80342540df465396546976a50cfb3f).
|
||||
- **Submit an event to our calendar:** email us at events@langchain.dev with a link to your event page! We can also help you spread the word with our local communities.
|
||||
- **Host a meetup:** If you want to bring a group of builders together, we want to help! We can publicize your event on our event calendar/Twitter, share with our local communities in Discord, send swag, or potentially hook you up with a sponsor. Email us at events@langchain.dev to tell us about your event!
|
||||
- **Become a meetup sponsor:** we often hear from groups of builders that want to get together, but are blocked or limited on some dimension (space to host, budget for snacks, prizes to distribute, etc.). If you’d like to help, send us an email to events@langchain.dev we can share more about how it works!
|
||||
@@ -47,8 +47,8 @@ If you’re working on something you’re proud of, and think the LangChain comm
|
||||
|
||||
Here’s where our team hangs out, talks shop, spotlights cool work, and shares what we’re up to. We’d love to see you there too.
|
||||
|
||||
- **[Twitter](https://twitter.com/LangChainAI):** we post about what we’re working on and what cool things we’re seeing in the space. If you tag @langchainai in your post, we’ll almost certainly see it, and can snow you some love!
|
||||
- **[Discord](https://discord.gg/6adMQxSpJS):** connect with with >30k developers who are building with LangChain
|
||||
- **[Twitter](https://twitter.com/LangChainAI):** we post about what we’re working on and what cool things we’re seeing in the space. If you tag @langchainai in your post, we’ll almost certainly see it, and can show you some love!
|
||||
- **[Discord](https://discord.gg/6adMQxSpJS):** connect with >30k developers who are building with LangChain
|
||||
- **[GitHub](https://github.com/langchain-ai/langchain):** open pull requests, contribute to a discussion, and/or contribute
|
||||
- **[Subscribe to our bi-weekly Release Notes](https://6w1pwbss0py.typeform.com/to/KjZB1auB):** a twice/month email roundup of the coolest things going on in our orbit
|
||||
- **Slack:** if you’re building an application in production at your company, we’d love to get into a Slack channel together. Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) and we’ll get in touch about setting one up.
|
||||
|
||||
@@ -28,7 +28,7 @@ LangChain provides standard, extendable interfaces and external integrations for
|
||||
|
||||
#### [Model I/O](/docs/modules/model_io/)
|
||||
Interface with language models
|
||||
#### [Data connection](/docs/modules/data_connection/)
|
||||
#### [Retrieval](/docs/modules/data_connection/)
|
||||
Interface with application-specific data
|
||||
#### [Chains](/docs/modules/chains/)
|
||||
Construct sequences of calls
|
||||
|
||||
@@ -107,7 +107,7 @@ import PromptTemplateChatModel from "@snippets/get_started/quickstart/prompt_tem
|
||||
<PromptTemplateLLM/>
|
||||
|
||||
However, the advantages of using these over raw string formatting are several.
|
||||
You can "partial" out variables - eg you can format only some of the variables at a time.
|
||||
You can "partial" out variables - e.g. you can format only some of the variables at a time.
|
||||
You can compose them together, easily combining different templates into a single prompt.
|
||||
For explanations of these functionalities, see the [section on prompts](/docs/modules/model_io/prompts) for more detail.
|
||||
|
||||
@@ -121,12 +121,12 @@ Let's take a look at this below:
|
||||
|
||||
ChatPromptTemplates can also include other things besides ChatMessageTemplates - see the [section on prompts](/docs/modules/model_io/prompts) for more detail.
|
||||
|
||||
## Output Parsers
|
||||
## Output parsers
|
||||
|
||||
OutputParsers convert the raw output of an LLM into a format that can be used downstream.
|
||||
There are few main type of OutputParsers, including:
|
||||
|
||||
- Convert text from LLM -> structured information (eg JSON)
|
||||
- Convert text from LLM -> structured information (e.g. JSON)
|
||||
- Convert a ChatMessage into just a string
|
||||
- Convert the extra information returned from a call besides the message (like OpenAI function invocation) into a string.
|
||||
|
||||
@@ -149,7 +149,7 @@ import LLMChain from "@snippets/get_started/quickstart/llm_chain.mdx"
|
||||
|
||||
<LLMChain/>
|
||||
|
||||
## Next Steps
|
||||
## Next steps
|
||||
|
||||
This is it!
|
||||
We've now gone over how to create the core building block of LangChain applications - the LLMChains.
|
||||
|
||||
@@ -3,7 +3,7 @@ sidebar_position: 3
|
||||
---
|
||||
# Comparison Evaluators
|
||||
|
||||
Comparison evaluators in LangChain help measure two different chain or LLM outputs. These evaluators are helpful for comparative analyses, such as A/B testing between two language models, or comparing different versions of the same model. They can also be useful for things like generating preference scores for ai-assisted reinforcement learning.
|
||||
Comparison evaluators in LangChain help measure two different chains or LLM outputs. These evaluators are helpful for comparative analyses, such as A/B testing between two language models, or comparing different versions of the same model. They can also be useful for things like generating preference scores for ai-assisted reinforcement learning.
|
||||
|
||||
These evaluators inherit from the `PairwiseStringEvaluator` class, providing a comparison interface for two strings - typically, the outputs from two different prompts or models, or two versions of the same model. In essence, a comparison evaluator performs an evaluation on a pair of strings and returns a dictionary containing the evaluation score and other relevant details.
|
||||
|
||||
@@ -16,7 +16,7 @@ Here's a summary of the key methods and properties of a comparison evaluator:
|
||||
- `requires_input`: This property indicates whether this evaluator requires an input string.
|
||||
- `requires_reference`: This property specifies whether this evaluator requires a reference label.
|
||||
|
||||
Detailed information about creating custom evaluators and the available built-in comparison evaluators are provided in the following sections.
|
||||
Detailed information about creating custom evaluators and the available built-in comparison evaluators is provided in the following sections.
|
||||
|
||||
import DocCardList from "@theme/DocCardList";
|
||||
|
||||
|
||||
1396
docs/docs_skeleton/docs/guides/safety/amazon_comprehend_chain.ipynb
Normal file
@@ -2,5 +2,5 @@
|
||||
|
||||
One of the key concerns with using LLMs is that they may generate harmful or unethical text. This is an area of active research in the field. Here we present some built-in chains inspired by this research, which are intended to make the outputs of LLMs safer.
|
||||
|
||||
- [Moderation chain](/docs/use_cases/safety/moderation): Explicitly check if any output text is harmful and flag it.
|
||||
- [Constitutional chain](/docs/use_cases/safety/constitutional_chain): Prompt the model with a set of principles which should guide it's behavior.
|
||||
- [Moderation chain](/docs/guides/safety/moderation): Explicitly check if any output text is harmful and flag it.
|
||||
- [Constitutional chain](/docs/guides/safety/constitutional_chain): Prompt the model with a set of principles which should guide it's behavior.
|
||||
|
||||
@@ -2,15 +2,60 @@
|
||||
sidebar_position: 1
|
||||
---
|
||||
|
||||
# Data connection
|
||||
# Retrieval
|
||||
|
||||
Many LLM applications require user-specific data that is not part of the model's training set. LangChain gives you the
|
||||
building blocks to load, transform, store and query your data via:
|
||||
Many LLM applications require user-specific data that is not part of the model's training set.
|
||||
The primary way of accomplishing this is through Retrieval Augmented Generation (RAG).
|
||||
In this process, external data is *retrieved* and then passed to the LLM when doing the *generation* step.
|
||||
|
||||
- [Document loaders](/docs/modules/data_connection/document_loaders/): Load documents from many different sources
|
||||
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, convert documents into Q&A format, drop redundant documents, and more
|
||||
- [Text embedding models](/docs/modules/data_connection/text_embedding/): Take unstructured text and turn it into a list of floating point numbers
|
||||
- [Vector stores](/docs/modules/data_connection/vectorstores/): Store and search over embedded data
|
||||
- [Retrievers](/docs/modules/data_connection/retrievers/): Query your data
|
||||
LangChain provides all the building blocks for RAG applications - from simple to complex.
|
||||
This section of the documentation covers everything related to the *retrieval* step - e.g. the fetching of the data.
|
||||
Although this sounds simple, it can be subtly complex.
|
||||
This encompasses several key modules.
|
||||
|
||||

|
||||
|
||||
**[Document loaders](/docs/modules/data_connection/document_loaders/)**
|
||||
|
||||
Load documents from many different sources.
|
||||
LangChain provides over a 100 different document loaders as well as integrations with other major providers in the space,
|
||||
like AirByte and Unstructured.
|
||||
We provide integrations to load all types of documents (html, PDF, code) from all types of locations (private s3 buckets, public websites).
|
||||
|
||||
**[Document transformers](/docs/modules/data_connection/document_transformers/)**
|
||||
|
||||
A key part of retrieval is fetching only the relevant parts of documents.
|
||||
This involves several transformation steps in order to best prepare the documents for retrieval.
|
||||
One of the primary ones here is splitting (or chunking) a large document into smaller chunks.
|
||||
LangChain provides several different algorithms for doing this, as well as logic optimized for specific document types (code, markdown, etc).
|
||||
|
||||
**[Text embedding models](/docs/modules/data_connection/text_embedding/)**
|
||||
|
||||
Another key part of retrieval has become creating embeddings for documents.
|
||||
Embeddings capture the semantic meaning of text, allowing you to quickly and
|
||||
efficiently find other pieces of text that are similar.
|
||||
LangChain provides integrations with over 25 different embedding providers and methods,
|
||||
from open-source to proprietary API,
|
||||
allowing you to choose the one best suited for your needs.
|
||||
LangChain exposes a standard interface, allowing you to easily swap between models.
|
||||
|
||||
**[Vector stores](/docs/modules/data_connection/vectorstores/)**
|
||||
|
||||
With the rise of embeddings, there has emerged a need for databases to support efficient storage and searching of these embeddings.
|
||||
LangChain provides integrations with over 50 different vectorstores, from open-source local ones to cloud-hosted proprietary ones,
|
||||
allowing you choose the one best suited for your needs.
|
||||
LangChain exposes a standard interface, allowing you to easily swap between vector stores.
|
||||
|
||||
**[Retrievers](/docs/modules/data_connection/retrievers/)**
|
||||
|
||||
Once the data is in the database, you still need to retrieve it.
|
||||
LangChain supports many different retrieval algorithms and is one of the places where we add the most value.
|
||||
We support basic methods that are easy to get started - namely simple semantic search.
|
||||
However, we have also added a collection of algorithms on top of this to increase performance.
|
||||
These include:
|
||||
|
||||
- [Parent Document Retriever](/docs/modules/data_connection/retrievers/parent_document_retriever): This allows you to create multiple embeddings per parent document, allowing you to look up smaller chunks but return larger context.
|
||||
- [Self Query Retriever](/docs/modules/data_connection/retrievers/self_query): User questions often contain reference to something that isn't just semantic, but rather expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the *semantic* part of a query from other *metadata filters* present in the query
|
||||
- [Ensemble Retriever](/docs/modules/data_connection/retrievers/ensemble): Sometimes you may want to retrieve documents from multiple different sources, or using multiple different algorithms. The ensemble retriever allows you to easily do this.
|
||||
- And more!
|
||||
|
||||
|
||||
@@ -8,7 +8,7 @@ LangChain provides standard, extendable interfaces and external integrations for
|
||||
|
||||
#### [Model I/O](/docs/modules/model_io/)
|
||||
Interface with language models
|
||||
#### [Data connection](/docs/modules/data_connection/)
|
||||
#### [Retrieval](/docs/modules/data_connection/)
|
||||
Interface with application-specific data
|
||||
#### [Chains](/docs/modules/chains/)
|
||||
Construct sequences of calls
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Few-shot prompt templates
|
||||
|
||||
In this tutorial, we'll learn how to create a prompt template that uses few shot examples. A few shot prompt template can be constructed from either a set of examples, or from an Example Selector object.
|
||||
In this tutorial, we'll learn how to create a prompt template that uses few-shot examples. A few-shot prompt template can be constructed from either a set of examples, or from an Example Selector object.
|
||||
|
||||
import Example from "@snippets/modules/model_io/prompts/prompt_templates/few_shot_examples.mdx"
|
||||
|
||||
|
||||
@@ -6,7 +6,7 @@ sidebar_position: 0
|
||||
|
||||
Prompt templates are pre-defined recipes for generating prompts for language models.
|
||||
|
||||
A template may include instructions, few shot examples, and specific context and
|
||||
A template may include instructions, few-shot examples, and specific context and
|
||||
questions appropriate for a given task.
|
||||
|
||||
LangChain provides tooling to create and work with prompt templates.
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Partial prompt templates
|
||||
|
||||
Like other methods, it can make sense to "partial" a prompt template - eg pass in a subset of the required values, as to create a new prompt template which expects only the remaining subset of values.
|
||||
Like other methods, it can make sense to "partial" a prompt template - e.g. pass in a subset of the required values, as to create a new prompt template which expects only the remaining subset of values.
|
||||
|
||||
LangChain supports this in two ways:
|
||||
1. Partial formatting with string values.
|
||||
|
||||
@@ -2,8 +2,8 @@
|
||||
|
||||
This notebook goes over how to compose multiple prompts together. This can be useful when you want to reuse parts of prompts. This can be done with a PipelinePrompt. A PipelinePrompt consists of two main parts:
|
||||
|
||||
- Final prompt: This is the final prompt that is returned
|
||||
- Pipeline prompts: This is a list of tuples, consisting of a string name and a prompt template. Each prompt template will be formatted and then passed to future prompt templates as a variable with the same name.
|
||||
- Final prompt: The final prompt that is returned
|
||||
- Pipeline prompts: A list of tuples, consisting of a string name and a prompt template. Each prompt template will be formatted and then passed to future prompt templates as a variable with the same name.
|
||||
|
||||
import Example from "@snippets/modules/model_io/prompts/prompt_templates/prompt_composition.mdx"
|
||||
|
||||
|
||||
@@ -24,8 +24,7 @@ function Imports({ imports }) {
|
||||
<li key={imported}>
|
||||
<a href={docs}>
|
||||
<span>{imported}</span>
|
||||
</a>{" "}
|
||||
from <code>{source}</code>
|
||||
</a>
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
|
||||
BIN
docs/docs_skeleton/static/img/OSS_LLM_overview.png
Normal file
|
After Width: | Height: | Size: 288 KiB |
BIN
docs/docs_skeleton/static/img/ReAct.png
Normal file
|
After Width: | Height: | Size: 42 KiB |
BIN
docs/docs_skeleton/static/img/agents_use_case_1.png
Normal file
|
After Width: | Height: | Size: 236 KiB |
BIN
docs/docs_skeleton/static/img/agents_use_case_trace_1.png
Normal file
|
After Width: | Height: | Size: 74 KiB |
BIN
docs/docs_skeleton/static/img/agents_use_case_trace_2.png
Normal file
|
After Width: | Height: | Size: 166 KiB |
BIN
docs/docs_skeleton/static/img/agents_vs_chains.png
Normal file
|
After Width: | Height: | Size: 42 KiB |
BIN
docs/docs_skeleton/static/img/llama-memory-weights.png
Normal file
|
After Width: | Height: | Size: 44 KiB |
BIN
docs/docs_skeleton/static/img/llama_t_put.png
Normal file
|
After Width: | Height: | Size: 35 KiB |
BIN
docs/docs_skeleton/static/img/oai_function_agent.png
Normal file
|
After Width: | Height: | Size: 177 KiB |
@@ -1,15 +1,15 @@
|
||||
# Tutorials
|
||||
|
||||
Below are links to video tutorials and courses on LangChain. For written guides on common use cases for LangChain, check out the [use cases guides](/docs/use_cases).
|
||||
Below are links to tutorials and courses on LangChain. For written guides on common use cases for LangChain, check out the [use cases guides](/docs/use_cases).
|
||||
|
||||
⛓ icon marks a new addition [last update 2023-07-05]
|
||||
⛓ icon marks a new addition [last update 2023-08-20]
|
||||
|
||||
---------------------
|
||||
|
||||
### DeepLearning.AI courses
|
||||
by [Harrison Chase](https://github.com/hwchase17) and [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng)
|
||||
- [LangChain for LLM Application Development](https://learn.deeplearning.ai/langchain)
|
||||
- ⛓ [LangChain Chat with Your Data](https://learn.deeplearning.ai/langchain-chat-with-your-data)
|
||||
- [LangChain Chat with Your Data](https://learn.deeplearning.ai/langchain-chat-with-your-data)
|
||||
|
||||
### Handbook
|
||||
[LangChain AI Handbook](https://www.pinecone.io/learn/langchain/) By **James Briggs** and **Francisco Ingham**
|
||||
@@ -36,14 +36,14 @@ Below are links to video tutorials and courses on LangChain. For written guides
|
||||
- #8 [Create Custom Tools for Chatbots in LangChain](https://youtu.be/q-HNphrWsDE)
|
||||
- #9 [Build Conversational Agents with Vector DBs](https://youtu.be/H6bCqqw9xyI)
|
||||
- [Using NEW `MPT-7B` in Hugging Face and LangChain](https://youtu.be/DXpk9K7DgMo)
|
||||
- ⛓ [`MPT-30B` Chatbot with LangChain](https://youtu.be/pnem-EhT6VI)
|
||||
- [`MPT-30B` Chatbot with LangChain](https://youtu.be/pnem-EhT6VI)
|
||||
|
||||
|
||||
### [LangChain 101](https://www.youtube.com/playlist?list=PLqZXAkvF1bPNQER9mLmDbntNfSpzdDIU5) by [Greg Kamradt (Data Indy)](https://www.youtube.com/@DataIndependent)
|
||||
- [What Is LangChain? - LangChain + `ChatGPT` Overview](https://youtu.be/_v_fgW2SkkQ)
|
||||
- [Quickstart Guide](https://youtu.be/kYRB-vJFy38)
|
||||
- [Beginner Guide To 7 Essential Concepts](https://youtu.be/2xxziIWmaSA)
|
||||
- [Beginner Guide To 9 Use Cases](https://youtu.be/vGP4pQdCocw)
|
||||
- [Beginner's Guide To 7 Essential Concepts](https://youtu.be/2xxziIWmaSA)
|
||||
- [Beginner's Guide To 9 Use Cases](https://youtu.be/vGP4pQdCocw)
|
||||
- [Agents Overview + Google Searches](https://youtu.be/Jq9Sf68ozk0)
|
||||
- [`OpenAI` + `Wolfram Alpha`](https://youtu.be/UijbzCIJ99g)
|
||||
- [Ask Questions On Your Custom (or Private) Files](https://youtu.be/EnT-ZTrcPrg)
|
||||
@@ -63,7 +63,7 @@ Below are links to video tutorials and courses on LangChain. For written guides
|
||||
- [Build Your Own `AI Twitter Bot` Using LLMs](https://youtu.be/yLWLDjT01q8)
|
||||
- [ChatGPT made my interview questions for me (`Streamlit` + LangChain)](https://youtu.be/zvoAMx0WKkw)
|
||||
- [Function Calling via ChatGPT API - First Look With LangChain](https://youtu.be/0-zlUy7VUjg)
|
||||
- ⛓ [Extract Topics From Video/Audio With LLMs (Topic Modeling w/ LangChain)](https://youtu.be/pEkxRQFNAs4)
|
||||
- [Extract Topics From Video/Audio With LLMs (Topic Modeling w/ LangChain)](https://youtu.be/pEkxRQFNAs4)
|
||||
|
||||
|
||||
### [LangChain How to and guides](https://www.youtube.com/playlist?list=PL8motc6AQftk1Bs42EW45kwYbyJ4jOdiZ) by [Sam Witteveen](https://www.youtube.com/@samwitteveenai)
|
||||
@@ -73,7 +73,7 @@ Below are links to video tutorials and courses on LangChain. For written guides
|
||||
- [Conversations with Memory (explanation & code walkthrough)](https://youtu.be/X550Zbz_ROE)
|
||||
- [Chat with `Flan20B`](https://youtu.be/VW5LBavIfY4)
|
||||
- [Using `Hugging Face Models` locally (code walkthrough)](https://youtu.be/Kn7SX2Mx_Jk)
|
||||
- [`PAL` : Program-aided Language Models with LangChain code](https://youtu.be/dy7-LvDu-3s)
|
||||
- [`PAL`: Program-aided Language Models with LangChain code](https://youtu.be/dy7-LvDu-3s)
|
||||
- [Building a Summarization System with LangChain and `GPT-3` - Part 1](https://youtu.be/LNq_2s_H01Y)
|
||||
- [Building a Summarization System with LangChain and `GPT-3` - Part 2](https://youtu.be/d-yeHDLgKHw)
|
||||
- [Microsoft's `Visual ChatGPT` using LangChain](https://youtu.be/7YEiEyfPF5U)
|
||||
@@ -85,7 +85,7 @@ Below are links to video tutorials and courses on LangChain. For written guides
|
||||
- [`BabyAGI`: Discover the Power of Task-Driven Autonomous Agents!](https://youtu.be/QBcDLSE2ERA)
|
||||
- [Improve your `BabyAGI` with LangChain](https://youtu.be/DRgPyOXZ-oE)
|
||||
- [Master `PDF` Chat with LangChain - Your essential guide to queries on documents](https://youtu.be/ZzgUqFtxgXI)
|
||||
- [Using LangChain with `DuckDuckGO` `Wikipedia` & `PythonREPL` Tools](https://youtu.be/KerHlb8nuVc)
|
||||
- [Using LangChain with `DuckDuckGO`, `Wikipedia` & `PythonREPL` Tools](https://youtu.be/KerHlb8nuVc)
|
||||
- [Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)](https://youtu.be/biS8G8x8DdA)
|
||||
- [LangChain Retrieval QA Over Multiple Files with `ChromaDB`](https://youtu.be/3yPBVii7Ct0)
|
||||
- [LangChain Retrieval QA with Instructor Embeddings & `ChromaDB` for PDFs](https://youtu.be/cFCGUjc33aU)
|
||||
@@ -99,7 +99,7 @@ Below are links to video tutorials and courses on LangChain. For written guides
|
||||
- [`OpenAI Functions` + LangChain : Building a Multi Tool Agent](https://youtu.be/4KXK6c6TVXQ)
|
||||
- [What can you do with 16K tokens in LangChain?](https://youtu.be/z2aCZBAtWXs)
|
||||
- [Tagging and Extraction - Classification using `OpenAI Functions`](https://youtu.be/a8hMgIcUEnE)
|
||||
- ⛓ [HOW to Make Conversational Form with LangChain](https://youtu.be/IT93On2LB5k)
|
||||
- [HOW to Make Conversational Form with LangChain](https://youtu.be/IT93On2LB5k)
|
||||
|
||||
|
||||
### [LangChain](https://www.youtube.com/playlist?list=PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr) by [Prompt Engineering](https://www.youtube.com/@engineerprompt)
|
||||
@@ -107,7 +107,7 @@ Below are links to video tutorials and courses on LangChain. For written guides
|
||||
- [Working with MULTIPLE `PDF` Files in LangChain: `ChatGPT` for your Data](https://youtu.be/s5LhRdh5fu4)
|
||||
- [`ChatGPT` for YOUR OWN `PDF` files with LangChain](https://youtu.be/TLf90ipMzfE)
|
||||
- [Talk to YOUR DATA without OpenAI APIs: LangChain](https://youtu.be/wrD-fZvT6UI)
|
||||
- [Langchain: PDF Chat App (GUI) | ChatGPT for Your PDF FILES](https://youtu.be/RIWbalZ7sTo)
|
||||
- [LangChain: PDF Chat App (GUI) | ChatGPT for Your PDF FILES](https://youtu.be/RIWbalZ7sTo)
|
||||
- [LangFlow: Build Chatbots without Writing Code](https://youtu.be/KJ-ux3hre4s)
|
||||
- [LangChain: Giving Memory to LLMs](https://youtu.be/dxO6pzlgJiY)
|
||||
- [BEST OPEN Alternative to `OPENAI's EMBEDDINGs` for Retrieval QA: LangChain](https://youtu.be/ogEalPMUCSY)
|
||||
@@ -121,5 +121,9 @@ Below are links to video tutorials and courses on LangChain. For written guides
|
||||
- [LangChain Agents: Build Personal Assistants For Your Data (Q&A with Harrison Chase and Mayo Oshin)](https://youtu.be/gVkF8cwfBLI)
|
||||
|
||||
|
||||
### Codebase Analysis
|
||||
- ⛓ [Codebase Analysis: Langchain Agents](https://carbonated-yacht-2c5.notion.site/Codebase-Analysis-Langchain-Agents-0b0587acd50647ca88aaae7cff5df1f2)
|
||||
|
||||
|
||||
---------------------
|
||||
⛓ icon marks a new addition [last update 2023-07-05]
|
||||
⛓ icon marks a new addition [last update 2023-08-20]
|
||||
|
||||
@@ -1,265 +1,376 @@
|
||||
# Dependents
|
||||
|
||||
Dependents stats for `hwchase17/langchain`
|
||||
Dependents stats for `langchain-ai/langchain`
|
||||
|
||||
[](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=244&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=9697&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=19827&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[](https://github.com/langchain-ai/langchain/network/dependents)
|
||||
[&message=355&color=informational&logo=slickpic)](https://github.com/langchain-ai/langchain/network/dependents)
|
||||
[&message=19140&color=informational&logo=slickpic)](https://github.com/langchain-ai/langchain/network/dependents)
|
||||
[&message=22524&color=informational&logo=slickpic)](https://github.com/langchain-ai/langchain/network/dependents)
|
||||
|
||||
|
||||
[update: 2023-07-07; only dependent repositories with Stars > 100]
|
||||
[update: `2023-08-17`; only dependent repositories with Stars > 100]
|
||||
|
||||
|
||||
| Repository | Stars |
|
||||
| :-------- | -----: |
|
||||
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 41047 |
|
||||
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 33983 |
|
||||
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 33375 |
|
||||
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 31114 |
|
||||
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 30369 |
|
||||
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 24116 |
|
||||
|[OpenBB-finance/OpenBBTerminal](https://github.com/OpenBB-finance/OpenBBTerminal) | 22565 |
|
||||
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 18375 |
|
||||
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 17723 |
|
||||
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16958 |
|
||||
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14632 |
|
||||
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 11273 |
|
||||
|[openai/evals](https://github.com/openai/evals) | 10745 |
|
||||
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 10298 |
|
||||
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 9838 |
|
||||
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 9247 |
|
||||
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8768 |
|
||||
|[PromtEngineer/localGPT](https://github.com/PromtEngineer/localGPT) | 8651 |
|
||||
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 8119 |
|
||||
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 7418 |
|
||||
|[gventuri/pandas-ai](https://github.com/gventuri/pandas-ai) | 7301 |
|
||||
|[PipedreamHQ/pipedream](https://github.com/PipedreamHQ/pipedream) | 6636 |
|
||||
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5849 |
|
||||
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 5129 |
|
||||
|[langgenius/dify](https://github.com/langgenius/dify) | 4804 |
|
||||
|[serge-chat/serge](https://github.com/serge-chat/serge) | 4448 |
|
||||
|[csunny/DB-GPT](https://github.com/csunny/DB-GPT) | 4350 |
|
||||
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 4268 |
|
||||
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 4244 |
|
||||
|[intitni/CopilotForXcode](https://github.com/intitni/CopilotForXcode) | 4232 |
|
||||
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 4154 |
|
||||
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 4080 |
|
||||
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3949 |
|
||||
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 3920 |
|
||||
|[bentoml/OpenLLM](https://github.com/bentoml/OpenLLM) | 3481 |
|
||||
|[MineDojo/Voyager](https://github.com/MineDojo/Voyager) | 3453 |
|
||||
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3355 |
|
||||
|[postgresml/postgresml](https://github.com/postgresml/postgresml) | 3328 |
|
||||
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 3100 |
|
||||
|[kyegomez/tree-of-thoughts](https://github.com/kyegomez/tree-of-thoughts) | 3049 |
|
||||
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2844 |
|
||||
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2833 |
|
||||
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 2809 |
|
||||
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2809 |
|
||||
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2664 |
|
||||
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 2650 |
|
||||
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 2525 |
|
||||
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2372 |
|
||||
|[ParisNeo/lollms-webui](https://github.com/ParisNeo/lollms-webui) | 2287 |
|
||||
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 2265 |
|
||||
|[SamurAIGPT/privateGPT](https://github.com/SamurAIGPT/privateGPT) | 2084 |
|
||||
|[Chainlit/chainlit](https://github.com/Chainlit/chainlit) | 1912 |
|
||||
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1869 |
|
||||
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1864 |
|
||||
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1849 |
|
||||
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1766 |
|
||||
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1745 |
|
||||
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1732 |
|
||||
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1716 |
|
||||
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1619 |
|
||||
|[pinterest/querybook](https://github.com/pinterest/querybook) | 1468 |
|
||||
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1446 |
|
||||
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 1430 |
|
||||
|[Mintplex-Labs/anything-llm](https://github.com/Mintplex-Labs/anything-llm) | 1419 |
|
||||
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1416 |
|
||||
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1327 |
|
||||
|[psychic-api/psychic](https://github.com/psychic-api/psychic) | 1307 |
|
||||
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1242 |
|
||||
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1239 |
|
||||
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1203 |
|
||||
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1179 |
|
||||
|[keephq/keep](https://github.com/keephq/keep) | 1169 |
|
||||
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1156 |
|
||||
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 1090 |
|
||||
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 1088 |
|
||||
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 1074 |
|
||||
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1057 |
|
||||
|[noahshinn024/reflexion](https://github.com/noahshinn024/reflexion) | 1045 |
|
||||
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 1036 |
|
||||
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 999 |
|
||||
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 989 |
|
||||
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 974 |
|
||||
|[homanp/superagent](https://github.com/homanp/superagent) | 970 |
|
||||
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 941 |
|
||||
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 896 |
|
||||
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 856 |
|
||||
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 840 |
|
||||
|[chatarena/chatarena](https://github.com/chatarena/chatarena) | 829 |
|
||||
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 816 |
|
||||
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 816 |
|
||||
|[hashintel/hash](https://github.com/hashintel/hash) | 806 |
|
||||
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 790 |
|
||||
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 752 |
|
||||
|[cheshire-cat-ai/core](https://github.com/cheshire-cat-ai/core) | 713 |
|
||||
|[e-johnstonn/BriefGPT](https://github.com/e-johnstonn/BriefGPT) | 686 |
|
||||
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 685 |
|
||||
|[refuel-ai/autolabel](https://github.com/refuel-ai/autolabel) | 673 |
|
||||
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 617 |
|
||||
|[billxbf/ReWOO](https://github.com/billxbf/ReWOO) | 616 |
|
||||
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 609 |
|
||||
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 592 |
|
||||
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 581 |
|
||||
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 574 |
|
||||
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 572 |
|
||||
|[kreneskyp/ix](https://github.com/kreneskyp/ix) | 564 |
|
||||
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 540 |
|
||||
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 540 |
|
||||
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 537 |
|
||||
|[khoj-ai/khoj](https://github.com/khoj-ai/khoj) | 531 |
|
||||
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 528 |
|
||||
|[microsoft/PodcastCopilot](https://github.com/microsoft/PodcastCopilot) | 526 |
|
||||
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 515 |
|
||||
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 494 |
|
||||
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 483 |
|
||||
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 472 |
|
||||
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 465 |
|
||||
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 464 |
|
||||
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 464 |
|
||||
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 455 |
|
||||
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 455 |
|
||||
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 450 |
|
||||
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 446 |
|
||||
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 445 |
|
||||
|[plastic-labs/tutor-gpt](https://github.com/plastic-labs/tutor-gpt) | 426 |
|
||||
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 426 |
|
||||
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 418 |
|
||||
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 416 |
|
||||
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 401 |
|
||||
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 400 |
|
||||
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 386 |
|
||||
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 382 |
|
||||
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 368 |
|
||||
|[showlab/VLog](https://github.com/showlab/VLog) | 363 |
|
||||
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 363 |
|
||||
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 361 |
|
||||
|[opentensor/bittensor](https://github.com/opentensor/bittensor) | 360 |
|
||||
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 355 |
|
||||
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 351 |
|
||||
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 348 |
|
||||
|[alejandro-ao/ask-multiple-pdfs](https://github.com/alejandro-ao/ask-multiple-pdfs) | 321 |
|
||||
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 314 |
|
||||
|[mosaicml/examples](https://github.com/mosaicml/examples) | 313 |
|
||||
|[personoids/personoids-lite](https://github.com/personoids/personoids-lite) | 306 |
|
||||
|[itamargol/openai](https://github.com/itamargol/openai) | 304 |
|
||||
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 299 |
|
||||
|[momegas/megabots](https://github.com/momegas/megabots) | 299 |
|
||||
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 289 |
|
||||
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 283 |
|
||||
|[wandb/weave](https://github.com/wandb/weave) | 279 |
|
||||
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 273 |
|
||||
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 271 |
|
||||
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 270 |
|
||||
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 269 |
|
||||
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 259 |
|
||||
|[Azure-Samples/openai](https://github.com/Azure-Samples/openai) | 252 |
|
||||
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 248 |
|
||||
|[hnawaz007/pythondataanalysis](https://github.com/hnawaz007/pythondataanalysis) | 247 |
|
||||
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 243 |
|
||||
|[truera/trulens](https://github.com/truera/trulens) | 239 |
|
||||
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 238 |
|
||||
|[intel/intel-extension-for-transformers](https://github.com/intel/intel-extension-for-transformers) | 237 |
|
||||
|[monarch-initiative/ontogpt](https://github.com/monarch-initiative/ontogpt) | 236 |
|
||||
|[wandb/edu](https://github.com/wandb/edu) | 231 |
|
||||
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 229 |
|
||||
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 223 |
|
||||
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 221 |
|
||||
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 220 |
|
||||
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 219 |
|
||||
|[Safiullah-Rahu/CSV-AI](https://github.com/Safiullah-Rahu/CSV-AI) | 215 |
|
||||
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 215 |
|
||||
|[steamship-packages/langchain-agent-production-starter](https://github.com/steamship-packages/langchain-agent-production-starter) | 214 |
|
||||
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 213 |
|
||||
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 211 |
|
||||
|[marella/chatdocs](https://github.com/marella/chatdocs) | 207 |
|
||||
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 200 |
|
||||
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 195 |
|
||||
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 189 |
|
||||
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 186 |
|
||||
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 185 |
|
||||
|[huchenxucs/ChatDB](https://github.com/huchenxucs/ChatDB) | 179 |
|
||||
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 178 |
|
||||
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 178 |
|
||||
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 177 |
|
||||
|[jiran214/GPT-vup](https://github.com/jiran214/GPT-vup) | 176 |
|
||||
|[rsaryev/talk-codebase](https://github.com/rsaryev/talk-codebase) | 174 |
|
||||
|[edreisMD/plugnplai](https://github.com/edreisMD/plugnplai) | 174 |
|
||||
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 172 |
|
||||
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 171 |
|
||||
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 165 |
|
||||
|[gustavz/DataChad](https://github.com/gustavz/DataChad) | 164 |
|
||||
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 163 |
|
||||
|[SamPink/dev-gpt](https://github.com/SamPink/dev-gpt) | 161 |
|
||||
|[yuanjie-ai/ChatLLM](https://github.com/yuanjie-ai/ChatLLM) | 161 |
|
||||
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 160 |
|
||||
|[jondurbin/airoboros](https://github.com/jondurbin/airoboros) | 157 |
|
||||
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 157 |
|
||||
|[PradipNichite/Youtube-Tutorials](https://github.com/PradipNichite/Youtube-Tutorials) | 156 |
|
||||
|[nicknochnack/LangchainDocuments](https://github.com/nicknochnack/LangchainDocuments) | 155 |
|
||||
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 155 |
|
||||
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 154 |
|
||||
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 153 |
|
||||
|[preset-io/promptimize](https://github.com/preset-io/promptimize) | 150 |
|
||||
|[onlyphantom/llm-python](https://github.com/onlyphantom/llm-python) | 148 |
|
||||
|[Azure-Samples/azure-search-power-skills](https://github.com/Azure-Samples/azure-search-power-skills) | 146 |
|
||||
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 144 |
|
||||
|[microsoft/azure-openai-in-a-day-workshop](https://github.com/microsoft/azure-openai-in-a-day-workshop) | 144 |
|
||||
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 143 |
|
||||
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 142 |
|
||||
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 141 |
|
||||
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 140 |
|
||||
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 140 |
|
||||
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 139 |
|
||||
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 139 |
|
||||
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 138 |
|
||||
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 137 |
|
||||
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 137 |
|
||||
|[deeppavlov/dream](https://github.com/deeppavlov/dream) | 136 |
|
||||
|[miaoshouai/miaoshouai-assistant](https://github.com/miaoshouai/miaoshouai-assistant) | 135 |
|
||||
|[sugarforever/LangChain-Tutorials](https://github.com/sugarforever/LangChain-Tutorials) | 135 |
|
||||
|[yasyf/summ](https://github.com/yasyf/summ) | 135 |
|
||||
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 134 |
|
||||
|[vaibkumr/prompt-optimizer](https://github.com/vaibkumr/prompt-optimizer) | 132 |
|
||||
|[ju-bezdek/langchain-decorators](https://github.com/ju-bezdek/langchain-decorators) | 130 |
|
||||
|[homanp/vercel-langchain](https://github.com/homanp/vercel-langchain) | 128 |
|
||||
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 127 |
|
||||
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 125 |
|
||||
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 122 |
|
||||
|[fixie-ai/fixie-examples](https://github.com/fixie-ai/fixie-examples) | 122 |
|
||||
|[Aggregate-Intellect/practical-llms](https://github.com/Aggregate-Intellect/practical-llms) | 120 |
|
||||
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 120 |
|
||||
|[Azure-Samples/azure-search-openai-demo-csharp](https://github.com/Azure-Samples/azure-search-openai-demo-csharp) | 119 |
|
||||
|[prof-frink-lab/slangchain](https://github.com/prof-frink-lab/slangchain) | 117 |
|
||||
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 117 |
|
||||
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 116 |
|
||||
|[flurb18/AgentOoba](https://github.com/flurb18/AgentOoba) | 114 |
|
||||
|[kaarthik108/snowChat](https://github.com/kaarthik108/snowChat) | 112 |
|
||||
|[RedisVentures/redis-openai-qna](https://github.com/RedisVentures/redis-openai-qna) | 111 |
|
||||
|[solana-labs/chatgpt-plugin](https://github.com/solana-labs/chatgpt-plugin) | 111 |
|
||||
|[kulltc/chatgpt-sql](https://github.com/kulltc/chatgpt-sql) | 109 |
|
||||
|[summarizepaper/summarizepaper](https://github.com/summarizepaper/summarizepaper) | 109 |
|
||||
|[Azure-Samples/miyagi](https://github.com/Azure-Samples/miyagi) | 106 |
|
||||
|[ssheng/BentoChain](https://github.com/ssheng/BentoChain) | 106 |
|
||||
|[voxel51/voxelgpt](https://github.com/voxel51/voxelgpt) | 105 |
|
||||
|[mallahyari/drqa](https://github.com/mallahyari/drqa) | 103 |
|
||||
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 46276 |
|
||||
|[AntonOsika/gpt-engineer](https://github.com/AntonOsika/gpt-engineer) | 41497 |
|
||||
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 36296 |
|
||||
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 34861 |
|
||||
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 33906 |
|
||||
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 31654 |
|
||||
|[streamlit/streamlit](https://github.com/streamlit/streamlit) | 26571 |
|
||||
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 25819 |
|
||||
|[OpenBB-finance/OpenBBTerminal](https://github.com/OpenBB-finance/OpenBBTerminal) | 23180 |
|
||||
|[geekan/MetaGPT](https://github.com/geekan/MetaGPT) | 21968 |
|
||||
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 20204 |
|
||||
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 20142 |
|
||||
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 19215 |
|
||||
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 17580 |
|
||||
|[cube-js/cube](https://github.com/cube-js/cube) | 16003 |
|
||||
|[PromtEngineer/localGPT](https://github.com/PromtEngineer/localGPT) | 15134 |
|
||||
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 15027 |
|
||||
|[chatchat-space/Langchain-Chatchat](https://github.com/chatchat-space/Langchain-Chatchat) | 14024 |
|
||||
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 12020 |
|
||||
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 11599 |
|
||||
|[openai/evals](https://github.com/openai/evals) | 11509 |
|
||||
|[airbytehq/airbyte](https://github.com/airbytehq/airbyte) | 11493 |
|
||||
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 10531 |
|
||||
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 9955 |
|
||||
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 9081 |
|
||||
|[gventuri/pandas-ai](https://github.com/gventuri/pandas-ai) | 8201 |
|
||||
|[hwchase17/langchainjs](https://github.com/hwchase17/langchainjs) | 7754 |
|
||||
|[langgenius/dify](https://github.com/langgenius/dify) | 7348 |
|
||||
|[PipedreamHQ/pipedream](https://github.com/PipedreamHQ/pipedream) | 6950 |
|
||||
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 6858 |
|
||||
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 6300 |
|
||||
|[0xpayne/gpt-migrate](https://github.com/0xpayne/gpt-migrate) | 6193 |
|
||||
|[eosphoros-ai/DB-GPT](https://github.com/eosphoros-ai/DB-GPT) | 6026 |
|
||||
|[bentoml/OpenLLM](https://github.com/bentoml/OpenLLM) | 5641 |
|
||||
|[jmorganca/ollama](https://github.com/jmorganca/ollama) | 5448 |
|
||||
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 5365 |
|
||||
|[mage-ai/mage-ai](https://github.com/mage-ai/mage-ai) | 5352 |
|
||||
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 5192 |
|
||||
|[LangChain-Chinese-Getting-Started-Guide](https://github.com/liaokongVFX/LangChain-Chinese-Getting-Started-Guide) | 5129 |
|
||||
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 4993 |
|
||||
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 4831 |
|
||||
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 4824 |
|
||||
|[serge-chat/serge](https://github.com/serge-chat/serge) | 4783 |
|
||||
|[Shaunwei/RealChar](https://github.com/Shaunwei/RealChar) | 4779 |
|
||||
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 4752 |
|
||||
|[openchatai/OpenChat](https://github.com/openchatai/OpenChat) | 4452 |
|
||||
|[intel-analytics/BigDL](https://github.com/intel-analytics/BigDL) | 4286 |
|
||||
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 4167 |
|
||||
|[MineDojo/Voyager](https://github.com/MineDojo/Voyager) | 3952 |
|
||||
|[embedchain/embedchain](https://github.com/embedchain/embedchain) | 3887 |
|
||||
|[postgresml/postgresml](https://github.com/postgresml/postgresml) | 3636 |
|
||||
|[assafelovic/gpt-researcher](https://github.com/assafelovic/gpt-researcher) | 3480 |
|
||||
|[llm-workflow-engine/llm-workflow-engine](https://github.com/llm-workflow-engine/llm-workflow-engine) | 3445 |
|
||||
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 3397 |
|
||||
|[kyegomez/tree-of-thoughts](https://github.com/kyegomez/tree-of-thoughts) | 3366 |
|
||||
|[RayVentura/ShortGPT](https://github.com/RayVentura/ShortGPT) | 3335 |
|
||||
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 3316 |
|
||||
|[langchain-ai/chat-langchain](https://github.com/langchain-ai/chat-langchain) | 3270 |
|
||||
|[khoj-ai/khoj](https://github.com/khoj-ai/khoj) | 3266 |
|
||||
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 3176 |
|
||||
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2999 |
|
||||
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2932 |
|
||||
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 2816 |
|
||||
|[continuedev/continue](https://github.com/continuedev/continue) | 2803 |
|
||||
|[ParisNeo/lollms-webui](https://github.com/ParisNeo/lollms-webui) | 2679 |
|
||||
|[OpenBMB/ToolBench](https://github.com/OpenBMB/ToolBench) | 2673 |
|
||||
|[shroominic/codeinterpreter-api](https://github.com/shroominic/codeinterpreter-api) | 2492 |
|
||||
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 2486 |
|
||||
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2450 |
|
||||
|[SamurAIGPT/EmbedAI](https://github.com/SamurAIGPT/EmbedAI) | 2448 |
|
||||
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 2255 |
|
||||
|[Mintplex-Labs/anything-llm](https://github.com/Mintplex-Labs/anything-llm) | 2216 |
|
||||
|[emptycrown/llama-hub](https://github.com/emptycrown/llama-hub) | 2198 |
|
||||
|[homanp/superagent](https://github.com/homanp/superagent) | 2177 |
|
||||
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 2144 |
|
||||
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 2092 |
|
||||
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 2060 |
|
||||
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 2039 |
|
||||
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1992 |
|
||||
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1949 |
|
||||
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1915 |
|
||||
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1783 |
|
||||
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 1761 |
|
||||
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1627 |
|
||||
|[pinterest/querybook](https://github.com/pinterest/querybook) | 1509 |
|
||||
|[psychic-api/psychic](https://github.com/psychic-api/psychic) | 1499 |
|
||||
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1476 |
|
||||
|[avinashkranjan/Amazing-Python-Scripts](https://github.com/avinashkranjan/Amazing-Python-Scripts) | 1471 |
|
||||
|[hegelai/prompttools](https://github.com/hegelai/prompttools) | 1392 |
|
||||
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 1370 |
|
||||
|[Forethought-Technologies/AutoChain](https://github.com/Forethought-Technologies/AutoChain) | 1360 |
|
||||
|[keephq/keep](https://github.com/keephq/keep) | 1357 |
|
||||
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1345 |
|
||||
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1342 |
|
||||
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1332 |
|
||||
|[noahshinn024/reflexion](https://github.com/noahshinn024/reflexion) | 1314 |
|
||||
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1314 |
|
||||
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1313 |
|
||||
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1299 |
|
||||
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 1237 |
|
||||
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 1232 |
|
||||
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 1223 |
|
||||
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 1192 |
|
||||
|[pluralsh/plural](https://github.com/pluralsh/plural) | 1126 |
|
||||
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1117 |
|
||||
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 1110 |
|
||||
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 1096 |
|
||||
|[refuel-ai/autolabel](https://github.com/refuel-ai/autolabel) | 1080 |
|
||||
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 1075 |
|
||||
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 1068 |
|
||||
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 984 |
|
||||
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 957 |
|
||||
|[chatarena/chatarena](https://github.com/chatarena/chatarena) | 955 |
|
||||
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 944 |
|
||||
|[psychic-api/rag-stack](https://github.com/psychic-api/rag-stack) | 942 |
|
||||
|[nod-ai/SHARK](https://github.com/nod-ai/SHARK) | 909 |
|
||||
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 899 |
|
||||
|[melih-unsal/DemoGPT](https://github.com/melih-unsal/DemoGPT) | 896 |
|
||||
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 889 |
|
||||
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 868 |
|
||||
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 854 |
|
||||
|[cheshire-cat-ai/core](https://github.com/cheshire-cat-ai/core) | 847 |
|
||||
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 836 |
|
||||
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 818 |
|
||||
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 798 |
|
||||
|[alejandro-ao/ask-multiple-pdfs](https://github.com/alejandro-ao/ask-multiple-pdfs) | 782 |
|
||||
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 748 |
|
||||
|[LambdaLabsML/examples](https://github.com/LambdaLabsML/examples) | 741 |
|
||||
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 732 |
|
||||
|[microsoft/Llama-2-Onnx](https://github.com/microsoft/Llama-2-Onnx) | 722 |
|
||||
|[e-johnstonn/BriefGPT](https://github.com/e-johnstonn/BriefGPT) | 710 |
|
||||
|[billxbf/ReWOO](https://github.com/billxbf/ReWOO) | 710 |
|
||||
|[kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference](https://github.com/kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference) | 707 |
|
||||
|[databrickslabs/pyspark-ai](https://github.com/databrickslabs/pyspark-ai) | 704 |
|
||||
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 704 |
|
||||
|[kreneskyp/ix](https://github.com/kreneskyp/ix) | 692 |
|
||||
|[akshata29/entaoai](https://github.com/akshata29/entaoai) | 682 |
|
||||
|[promptfoo/promptfoo](https://github.com/promptfoo/promptfoo) | 670 |
|
||||
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 662 |
|
||||
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 650 |
|
||||
|[YiVal/YiVal](https://github.com/YiVal/YiVal) | 632 |
|
||||
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 624 |
|
||||
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 617 |
|
||||
|[dot-agent/openagent](https://github.com/dot-agent/openagent) | 602 |
|
||||
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 588 |
|
||||
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 585 |
|
||||
|[microsoft/PodcastCopilot](https://github.com/microsoft/PodcastCopilot) | 581 |
|
||||
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 569 |
|
||||
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 568 |
|
||||
|[xusenlinzy/api-for-open-llm](https://github.com/xusenlinzy/api-for-open-llm) | 559 |
|
||||
|[NoDataFound/hackGPT](https://github.com/NoDataFound/hackGPT) | 558 |
|
||||
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 554 |
|
||||
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 537 |
|
||||
|[FlagOpen/FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding) | 534 |
|
||||
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 534 |
|
||||
|[OpenGenerativeAI/GenossGPT](https://github.com/OpenGenerativeAI/GenossGPT) | 524 |
|
||||
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 496 |
|
||||
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 495 |
|
||||
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 494 |
|
||||
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 492 |
|
||||
|[plastic-labs/tutor-gpt](https://github.com/plastic-labs/tutor-gpt) | 490 |
|
||||
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 488 |
|
||||
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 481 |
|
||||
|[tgscan-dev/tgscan](https://github.com/tgscan-dev/tgscan) | 480 |
|
||||
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 480 |
|
||||
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 473 |
|
||||
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 471 |
|
||||
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 467 |
|
||||
|[langchain-ai/streamlit-agent](https://github.com/langchain-ai/streamlit-agent) | 463 |
|
||||
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 463 |
|
||||
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 463 |
|
||||
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 441 |
|
||||
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 437 |
|
||||
|[Dicklesworthstone/llama_embeddings_fastapi_service](https://github.com/Dicklesworthstone/llama_embeddings_fastapi_service) | 432 |
|
||||
|[DataDog/dd-trace-py](https://github.com/DataDog/dd-trace-py) | 431 |
|
||||
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 431 |
|
||||
|[jiran214/GPT-vup](https://github.com/jiran214/GPT-vup) | 428 |
|
||||
|[Azure-Samples/openai](https://github.com/Azure-Samples/openai) | 419 |
|
||||
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 414 |
|
||||
|[CarperAI/OpenELM](https://github.com/CarperAI/OpenELM) | 411 |
|
||||
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 404 |
|
||||
|[MiuLab/Taiwan-LLaMa](https://github.com/MiuLab/Taiwan-LLaMa) | 402 |
|
||||
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 399 |
|
||||
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 394 |
|
||||
|[opentensor/bittensor](https://github.com/opentensor/bittensor) | 393 |
|
||||
|[showlab/VLog](https://github.com/showlab/VLog) | 392 |
|
||||
|[microsoft/sample-app-aoai-chatGPT](https://github.com/microsoft/sample-app-aoai-chatGPT) | 391 |
|
||||
|[truera/trulens](https://github.com/truera/trulens) | 390 |
|
||||
|[Anil-matcha/Chatbase](https://github.com/Anil-matcha/Chatbase) | 363 |
|
||||
|[marella/chatdocs](https://github.com/marella/chatdocs) | 360 |
|
||||
|[jondurbin/airoboros](https://github.com/jondurbin/airoboros) | 357 |
|
||||
|[mosaicml/examples](https://github.com/mosaicml/examples) | 353 |
|
||||
|[wandb/weave](https://github.com/wandb/weave) | 352 |
|
||||
|[huchenxucs/ChatDB](https://github.com/huchenxucs/ChatDB) | 350 |
|
||||
|[rsaryev/talk-codebase](https://github.com/rsaryev/talk-codebase) | 343 |
|
||||
|[steamship-packages/langchain-production-starter](https://github.com/steamship-packages/langchain-production-starter) | 335 |
|
||||
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 335 |
|
||||
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 329 |
|
||||
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 325 |
|
||||
|[personoids/personoids-lite](https://github.com/personoids/personoids-lite) | 319 |
|
||||
|[momegas/megabots](https://github.com/momegas/megabots) | 317 |
|
||||
|[itamargol/openai](https://github.com/itamargol/openai) | 312 |
|
||||
|[intel/intel-extension-for-transformers](https://github.com/intel/intel-extension-for-transformers) | 310 |
|
||||
|[monarch-initiative/ontogpt](https://github.com/monarch-initiative/ontogpt) | 310 |
|
||||
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 308 |
|
||||
|[Nuggt-dev/Nuggt](https://github.com/Nuggt-dev/Nuggt) | 305 |
|
||||
|[cofactoryai/textbase](https://github.com/cofactoryai/textbase) | 304 |
|
||||
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 296 |
|
||||
|[onlyphantom/llm-python](https://github.com/onlyphantom/llm-python) | 288 |
|
||||
|[morpheuslord/GPT_Vuln-analyzer](https://github.com/morpheuslord/GPT_Vuln-analyzer) | 285 |
|
||||
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 280 |
|
||||
|[wandb/edu](https://github.com/wandb/edu) | 277 |
|
||||
|[austin2035/chatpdf](https://github.com/austin2035/chatpdf) | 275 |
|
||||
|[liangwq/Chatglm_lora_multi-gpu](https://github.com/liangwq/Chatglm_lora_multi-gpu) | 273 |
|
||||
|[preset-io/promptimize](https://github.com/preset-io/promptimize) | 272 |
|
||||
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 271 |
|
||||
|[hnawaz007/pythondataanalysis](https://github.com/hnawaz007/pythondataanalysis) | 268 |
|
||||
|[JohnSnowLabs/langtest](https://github.com/JohnSnowLabs/langtest) | 268 |
|
||||
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 263 |
|
||||
|[sugarforever/LangChain-Tutorials](https://github.com/sugarforever/LangChain-Tutorials) | 260 |
|
||||
|[Safiullah-Rahu/CSV-AI](https://github.com/Safiullah-Rahu/CSV-AI) | 259 |
|
||||
|[artitw/text2text](https://github.com/artitw/text2text) | 257 |
|
||||
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 256 |
|
||||
|[JayZeeDesign/researcher-gpt](https://github.com/JayZeeDesign/researcher-gpt) | 252 |
|
||||
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 251 |
|
||||
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 251 |
|
||||
|[Azure-Samples/miyagi](https://github.com/Azure-Samples/miyagi) | 248 |
|
||||
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 243 |
|
||||
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 242 |
|
||||
|[explodinggradients/ragas](https://github.com/explodinggradients/ragas) | 232 |
|
||||
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 232 |
|
||||
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 230 |
|
||||
|[eosphoros-ai/DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) | 229 |
|
||||
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 227 |
|
||||
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 224 |
|
||||
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 223 |
|
||||
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 222 |
|
||||
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 221 |
|
||||
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 221 |
|
||||
|[gustavz/DataChad](https://github.com/gustavz/DataChad) | 219 |
|
||||
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 217 |
|
||||
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 217 |
|
||||
|[ennucore/clippinator](https://github.com/ennucore/clippinator) | 211 |
|
||||
|[edreisMD/plugnplai](https://github.com/edreisMD/plugnplai) | 210 |
|
||||
|[kaarthik108/snowChat](https://github.com/kaarthik108/snowChat) | 210 |
|
||||
|[PradipNichite/Youtube-Tutorials](https://github.com/PradipNichite/Youtube-Tutorials) | 206 |
|
||||
|[ur-whitelab/chemcrow-public](https://github.com/ur-whitelab/chemcrow-public) | 202 |
|
||||
|[CambioML/pykoi](https://github.com/CambioML/pykoi) | 199 |
|
||||
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 198 |
|
||||
|[LC1332/Chat-Haruhi-Suzumiya](https://github.com/LC1332/Chat-Haruhi-Suzumiya) | 196 |
|
||||
|[nicknochnack/LangchainDocuments](https://github.com/nicknochnack/LangchainDocuments) | 196 |
|
||||
|[yuanjie-ai/ChatLLM](https://github.com/yuanjie-ai/ChatLLM) | 196 |
|
||||
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 196 |
|
||||
|[yakami129/VirtualWife](https://github.com/yakami129/VirtualWife) | 194 |
|
||||
|[Mintplex-Labs/vector-admin](https://github.com/Mintplex-Labs/vector-admin) | 191 |
|
||||
|[SamPink/dev-gpt](https://github.com/SamPink/dev-gpt) | 190 |
|
||||
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 190 |
|
||||
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 190 |
|
||||
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 182 |
|
||||
|[voxel51/voxelgpt](https://github.com/voxel51/voxelgpt) | 181 |
|
||||
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 176 |
|
||||
|[orgexyz/BlockAGI](https://github.com/orgexyz/BlockAGI) | 174 |
|
||||
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 173 |
|
||||
|[miaoshouai/miaoshouai-assistant](https://github.com/miaoshouai/miaoshouai-assistant) | 172 |
|
||||
|[microsoft/azure-openai-in-a-day-workshop](https://github.com/microsoft/azure-openai-in-a-day-workshop) | 170 |
|
||||
|[kyegomez/swarms](https://github.com/kyegomez/swarms) | 169 |
|
||||
|[Azure-Samples/azure-search-power-skills](https://github.com/Azure-Samples/azure-search-power-skills) | 169 |
|
||||
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 169 |
|
||||
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 167 |
|
||||
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 166 |
|
||||
|[ju-bezdek/langchain-decorators](https://github.com/ju-bezdek/langchain-decorators) | 165 |
|
||||
|[Azure-Samples/azure-search-openai-demo-csharp](https://github.com/Azure-Samples/azure-search-openai-demo-csharp) | 164 |
|
||||
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 164 |
|
||||
|[grumpyp/aixplora](https://github.com/grumpyp/aixplora) | 162 |
|
||||
|[langchain-ai/web-explorer](https://github.com/langchain-ai/web-explorer) | 158 |
|
||||
|[JorisdeJong123/7-Days-of-LangChain](https://github.com/JorisdeJong123/7-Days-of-LangChain) | 158 |
|
||||
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 158 |
|
||||
|[Azure-Samples/jp-azureopenai-samples](https://github.com/Azure-Samples/jp-azureopenai-samples) | 157 |
|
||||
|[AkshitIreddy/Interactive-LLM-Powered-NPCs](https://github.com/AkshitIreddy/Interactive-LLM-Powered-NPCs) | 156 |
|
||||
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 156 |
|
||||
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 156 |
|
||||
|[mayooear/private-chatbot-mpt30b-langchain](https://github.com/mayooear/private-chatbot-mpt30b-langchain) | 155 |
|
||||
|[homanp/vercel-langchain](https://github.com/homanp/vercel-langchain) | 152 |
|
||||
|[mlops-for-all/mlops-for-all.github.io](https://github.com/mlops-for-all/mlops-for-all.github.io) | 151 |
|
||||
|[vaibkumr/prompt-optimizer](https://github.com/vaibkumr/prompt-optimizer) | 151 |
|
||||
|[Agenta-AI/agenta](https://github.com/Agenta-AI/agenta) | 150 |
|
||||
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 149 |
|
||||
|[menloparklab/falcon-langchain](https://github.com/menloparklab/falcon-langchain) | 148 |
|
||||
|[deeppavlov/dream](https://github.com/deeppavlov/dream) | 146 |
|
||||
|[positive666/Prompt-Can-Anything](https://github.com/positive666/Prompt-Can-Anything) | 145 |
|
||||
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 145 |
|
||||
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 145 |
|
||||
|[SpecterOps/Nemesis](https://github.com/SpecterOps/Nemesis) | 144 |
|
||||
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 144 |
|
||||
|[summarizepaper/summarizepaper](https://github.com/summarizepaper/summarizepaper) | 142 |
|
||||
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 141 |
|
||||
|[Aggregate-Intellect/practical-llms](https://github.com/Aggregate-Intellect/practical-llms) | 140 |
|
||||
|[streamlit/llm-examples](https://github.com/streamlit/llm-examples) | 140 |
|
||||
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 140 |
|
||||
|[Chainlit/cookbook](https://github.com/Chainlit/cookbook) | 139 |
|
||||
|[alphasecio/langchain-examples](https://github.com/alphasecio/langchain-examples) | 139 |
|
||||
|[flurb18/AgentOoba](https://github.com/flurb18/AgentOoba) | 139 |
|
||||
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 138 |
|
||||
|[yasyf/summ](https://github.com/yasyf/summ) | 138 |
|
||||
|[kulltc/chatgpt-sql](https://github.com/kulltc/chatgpt-sql) | 137 |
|
||||
|[v7labs/benchllm](https://github.com/v7labs/benchllm) | 135 |
|
||||
|[ray-project/langchain-ray](https://github.com/ray-project/langchain-ray) | 134 |
|
||||
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 134 |
|
||||
|[peterwnjenga/aigent](https://github.com/peterwnjenga/aigent) | 133 |
|
||||
|[jina-ai/fastapi-serve](https://github.com/jina-ai/fastapi-serve) | 133 |
|
||||
|[retr0reg/Ret2GPT](https://github.com/retr0reg/Ret2GPT) | 132 |
|
||||
|[agenthubdev/agenthub_operators](https://github.com/agenthubdev/agenthub_operators) | 131 |
|
||||
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 131 |
|
||||
|[solana-labs/chatgpt-plugin](https://github.com/solana-labs/chatgpt-plugin) | 130 |
|
||||
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 130 |
|
||||
|[ChuloAI/BrainChulo](https://github.com/ChuloAI/BrainChulo) | 128 |
|
||||
|[ssheng/BentoChain](https://github.com/ssheng/BentoChain) | 128 |
|
||||
|[mallahyari/drqa](https://github.com/mallahyari/drqa) | 127 |
|
||||
|[fixie-ai/fixie-examples](https://github.com/fixie-ai/fixie-examples) | 127 |
|
||||
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 127 |
|
||||
|[showlab/UniVTG](https://github.com/showlab/UniVTG) | 125 |
|
||||
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 125 |
|
||||
|[RedisVentures/redis-openai-qna](https://github.com/RedisVentures/redis-openai-qna) | 124 |
|
||||
|[PJLab-ADG/DriveLikeAHuman](https://github.com/PJLab-ADG/DriveLikeAHuman) | 122 |
|
||||
|[prof-frink-lab/slangchain](https://github.com/prof-frink-lab/slangchain) | 122 |
|
||||
|[Coding-Crashkurse/Langchain-Full-Course](https://github.com/Coding-Crashkurse/Langchain-Full-Course) | 121 |
|
||||
|[ciare-robotics/world-creator](https://github.com/ciare-robotics/world-creator) | 120 |
|
||||
|[blob42/Instrukt](https://github.com/blob42/Instrukt) | 120 |
|
||||
|[langchain-ai/langsmith-cookbook](https://github.com/langchain-ai/langsmith-cookbook) | 119 |
|
||||
|[OpenPluginACI/openplugin](https://github.com/OpenPluginACI/openplugin) | 118 |
|
||||
|[defenseunicorns/leapfrogai](https://github.com/defenseunicorns/leapfrogai) | 118 |
|
||||
|[sdaaron/QueryGPT](https://github.com/sdaaron/QueryGPT) | 117 |
|
||||
|[grumpyp/chroma-langchain-tutorial](https://github.com/grumpyp/chroma-langchain-tutorial) | 117 |
|
||||
|[3Alan/DocsMind](https://github.com/3Alan/DocsMind) | 116 |
|
||||
|[CodeAlchemyAI/ViLT-GPT](https://github.com/CodeAlchemyAI/ViLT-GPT) | 114 |
|
||||
|[emarco177/ice_breaker](https://github.com/emarco177/ice_breaker) | 113 |
|
||||
|[nftblackmagic/flask-langchain](https://github.com/nftblackmagic/flask-langchain) | 113 |
|
||||
|[log1stics/voice-generator-webui](https://github.com/log1stics/voice-generator-webui) | 112 |
|
||||
|[nrl-ai/pautobot](https://github.com/nrl-ai/pautobot) | 110 |
|
||||
|[Azure/business-process-automation](https://github.com/Azure/business-process-automation) | 110 |
|
||||
|[MedalCollector/Orator](https://github.com/MedalCollector/Orator) | 109 |
|
||||
|[wombyz/HormoziGPT](https://github.com/wombyz/HormoziGPT) | 108 |
|
||||
|[afaqueumer/DocQA](https://github.com/afaqueumer/DocQA) | 106 |
|
||||
|[mortium91/langchain-assistant](https://github.com/mortium91/langchain-assistant) | 106 |
|
||||
|[Azure/azure-sdk-tools](https://github.com/Azure/azure-sdk-tools) | 105 |
|
||||
|[yeagerai/genworlds](https://github.com/yeagerai/genworlds) | 105 |
|
||||
|[AmineDiro/cria](https://github.com/AmineDiro/cria) | 104 |
|
||||
|[langchain-ai/text-split-explorer](https://github.com/langchain-ai/text-split-explorer) | 104 |
|
||||
|[luisroque/large_laguage_models](https://github.com/luisroque/large_laguage_models) | 104 |
|
||||
|[xuwenhao/mactalk-ai-course](https://github.com/xuwenhao/mactalk-ai-course) | 104 |
|
||||
|[Open-Swarm-Net/GPT-Swarm](https://github.com/Open-Swarm-Net/GPT-Swarm) | 104 |
|
||||
|[langchain-ai/langchain-aws-template](https://github.com/langchain-ai/langchain-aws-template) | 104 |
|
||||
|[aws-samples/aws-genai-llm-chatbot](https://github.com/aws-samples/aws-genai-llm-chatbot) | 103 |
|
||||
|[crosleythomas/MirrorGPT](https://github.com/crosleythomas/MirrorGPT) | 103 |
|
||||
|[Dicklesworthstone/llama2_aided_tesseract](https://github.com/Dicklesworthstone/llama2_aided_tesseract) | 101 |
|
||||
|
||||
|
||||
|
||||
_Generated by [github-dependents-info](https://github.com/nvuillam/github-dependents-info)_
|
||||
|
||||
[github-dependents-info --repo hwchase17/langchain --markdownfile dependents.md --minstars 100 --sort stars]
|
||||
`github-dependents-info --repo langchain-ai/langchain --markdownfile dependents.md --minstars 100 --sort stars`
|
||||
|
||||
1
docs/extras/guides/adapters/_category_.yml
Normal file
@@ -0,0 +1 @@
|
||||
label: 'Adapters'
|
||||
@@ -8,7 +8,7 @@ Here's a few different tools and functionalities to aid in debugging.
|
||||
|
||||
## Tracing
|
||||
|
||||
Platforms with tracing capabilities like [LangSmith](/docs/guides/langsmith/) and [WandB](/docs/ecosystem/integrations/agent_with_wandb_tracing) are the most comprehensive solutions for debugging. These platforms make it easy to not only log and visualize LLM apps, but also to actively debug, test and refine them.
|
||||
Platforms with tracing capabilities like [LangSmith](/docs/guides/langsmith/) and [WandB](/docs/integrations/providers/wandb_tracing) are the most comprehensive solutions for debugging. These platforms make it easy to not only log and visualize LLM apps, but also to actively debug, test and refine them.
|
||||
|
||||
For anyone building production-grade LLM applications, we highly recommend using a platform like this.
|
||||
|
||||
|
||||
@@ -79,3 +79,7 @@ See OpenLLM's [integration doc](https://github.com/bentoml/OpenLLM#%EF%B8%8F-int
|
||||
## [Databutton](https://databutton.com/home?new-data-app=true)
|
||||
|
||||
These templates serve as examples of how to build, deploy, and share LangChain applications using Databutton. You can create user interfaces with Streamlit, automate tasks by scheduling Python code, and store files and data in the built-in store. Examples include a Chatbot interface with conversational memory, a Personal search engine, and a starter template for LangChain apps. Deploying and sharing is just one click away.
|
||||
|
||||
## [AzureML Online Endpoint](https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/online/llm/langchain/1_langchain_basic_deploy.ipynb)
|
||||
|
||||
A minimal example of how to deploy LangChain to an Azure Machine Learning Online Endpoint.
|
||||
@@ -1318,7 +1318,7 @@
|
||||
"source": [
|
||||
"template = \"\"\"Write some python code to solve the user's problem. \n",
|
||||
"\n",
|
||||
"Return only python code in Markdown format, eg:\n",
|
||||
"Return only python code in Markdown format, e.g.:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"....\n",
|
||||
@@ -1638,196 +1638,6 @@
|
||||
"source": [
|
||||
"moderated_chain.invoke({\"input\": \"you are stupid\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a0a85ba4-f782-47b8-b16f-8b7a61d6dab7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## Conversational Retrieval With Memory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "92c87dd8-bb6f-4f32-a30d-8f5459ce6265",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fallbacks\n",
|
||||
"\n",
|
||||
"With LCEL you can easily introduce fallbacks for any Runnable component, like an LLM."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "1b1cb744-31fc-4261-ab25-65fe1fcad559",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='To get to the other side.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"bad_llm = ChatOpenAI(model_name=\"gpt-fake\")\n",
|
||||
"good_llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\")\n",
|
||||
"llm = bad_llm.with_fallbacks([good_llm])\n",
|
||||
"\n",
|
||||
"llm.invoke(\"Why did the the chicken cross the road?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b8cf3982-03f6-49b3-8ff5-7cd12444f19c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Looking at the trace, we can see that the first model failed but the second succeeded, so we still got an output: https://smith.langchain.com/public/dfaf0bf6-d86d-43e9-b084-dd16a56df15c/r\n",
|
||||
"\n",
|
||||
"We can add an arbitrary sequence of fallbacks, which will be executed in order:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "31819be0-7f40-4e67-b5ab-61340027b948",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='To get to the other side.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm = bad_llm.with_fallbacks([bad_llm, bad_llm, good_llm])\n",
|
||||
"\n",
|
||||
"llm.invoke(\"Why did the the chicken cross the road?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "acad6e88-8046-450e-b005-db7e50f33b80",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Trace: https://smith.langchain.com/public/c09efd01-3184-4369-a225-c9da8efcaf47/r\n",
|
||||
"\n",
|
||||
"We can continue to use our Runnable with fallbacks the same way we use any Runnable, mean we can include it in sequences:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "bab114a1-bb93-4b7e-a639-e7e00f21aebc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='To show off its incredible jumping skills! Kangaroos are truly amazing creatures.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", \"You're a nice assistant who always includes a compliment in your response\"),\n",
|
||||
" (\"human\", \"Why did the {animal} cross the road\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"chain = prompt | llm\n",
|
||||
"chain.invoke({\"animal\": \"kangaroo\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "58340afa-8187-4ffe-9bd2-7912fb733a15",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Trace: https://smith.langchain.com/public/ba03895f-f8bd-4c70-81b7-8b930353eabd/r\n",
|
||||
"\n",
|
||||
"Note, since every sequence of Runnables is itself a Runnable, we can create fallbacks for whole Sequences. We can also continue using the full interface, including asynchronous calls, batched calls, and streams:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "45aa3170-b2e6-430d-887b-bd879048060a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[\"\\n\\nAnswer: The rabbit crossed the road to get to the other side. That's quite clever of him!\",\n",
|
||||
" '\\n\\nAnswer: The turtle crossed the road to get to the other side. You must be pretty clever to come up with that riddle!']"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"chat_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", \"You're a nice assistant who always includes a compliment in your response\"),\n",
|
||||
" (\"human\", \"Why did the {animal} cross the road\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"chat_model = ChatOpenAI(model_name=\"gpt-fake\")\n",
|
||||
"\n",
|
||||
"prompt_template = \"\"\"Instructions: You should always include a compliment in your response.\n",
|
||||
"\n",
|
||||
"Question: Why did the {animal} cross the road?\"\"\"\n",
|
||||
"prompt = PromptTemplate.from_template(prompt_template)\n",
|
||||
"llm = OpenAI()\n",
|
||||
"\n",
|
||||
"bad_chain = chat_prompt | chat_model\n",
|
||||
"good_chain = prompt | llm\n",
|
||||
"chain = bad_chain.with_fallbacks([good_chain])\n",
|
||||
"await chain.abatch([{\"animal\": \"rabbit\"}, {\"animal\": \"turtle\"}])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "af6731c6-0c73-4b1d-a433-6e8f6ecce2bb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Traces: \n",
|
||||
"1. https://smith.langchain.com/public/ccd73236-9ae5-48a6-94b5-41210be18a46/r\n",
|
||||
"2. https://smith.langchain.com/public/f43f608e-075c-45c7-bf73-b64e4d3f3082/r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3d2fe1fe-506b-4ee5-8056-8b9df801765f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -1846,7 +1656,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -41,7 +41,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"execution_count": 1,
|
||||
"id": "466b65b3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -256,6 +256,131 @@
|
||||
"source": [
|
||||
"await chain.abatch([{\"topic\": \"bears\"}])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0a1c409d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Parallelism\n",
|
||||
"\n",
|
||||
"Let's take a look at how LangChain Expression Language support parralel requests as much as possible. For example, when using a RunnableMapping (often written as a dictionary) it executes each element in parralel."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "e3014c7a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema.runnable import RunnableMap\n",
|
||||
"chain1 = ChatPromptTemplate.from_template(\"tell me a joke about {topic}\") | model\n",
|
||||
"chain2 = ChatPromptTemplate.from_template(\"write a short (2 line) poem about {topic}\") | model\n",
|
||||
"combined = RunnableMap({\n",
|
||||
" \"joke\": chain1,\n",
|
||||
" \"poem\": chain2,\n",
|
||||
"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "08044c0a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 31.7 ms, sys: 8.59 ms, total: 40.3 ms\n",
|
||||
"Wall time: 1.05 s\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\"Why don't bears like fast food?\\n\\nBecause they can't catch it!\", additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"chain1.invoke({\"topic\": \"bears\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "22c56804",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 42.9 ms, sys: 10.2 ms, total: 53 ms\n",
|
||||
"Wall time: 1.93 s\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\"In forest's embrace, bears roam free,\\nSilent strength, nature's majesty.\", additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"chain2.invoke({\"topic\": \"bears\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "4fff4cbb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 96.3 ms, sys: 20.4 ms, total: 117 ms\n",
|
||||
"Wall time: 1.1 s\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'joke': AIMessage(content=\"Why don't bears wear socks?\\n\\nBecause they have bear feet!\", additional_kwargs={}, example=False),\n",
|
||||
" 'poem': AIMessage(content=\"In forest's embrace,\\nMajestic bears leave their trace.\", additional_kwargs={}, example=False)}"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"combined.invoke({\"topic\": \"bears\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fab75d1d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -274,7 +399,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
430
docs/extras/guides/fallbacks.ipynb
Normal file
@@ -0,0 +1,430 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "19c9cbd6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Fallbacks\n",
|
||||
"\n",
|
||||
"When working with language models, you may often encounter issues from the underlying APIs, whether these be rate limiting or downtime. Therefore, as you go to move your LLM applications into production it becomes more and more important to safe guard against these. That's why we've introduced the concept of fallbacks.\n",
|
||||
"\n",
|
||||
"Crucially, fallbacks can be applied not only on the LLM level but on the whole runnable level. This is important because often times different models require different prompts. So if your call to OpenAI fails, you don't just want to send the same prompt to Anthropic - you probably want want to use a different prompt template and send a different version there."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a6bb9ba9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Handling LLM API Errors\n",
|
||||
"\n",
|
||||
"This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API could be down, you could have hit rate limits, any number of things. Therefore, using fallbacks can help protect against these types of things.\n",
|
||||
"\n",
|
||||
"IMPORTANT: By default, a lot of the LLM wrappers catch errors and retry. You will most likely want to turn those off when working with fallbacks. Otherwise the first wrapper will keep on retrying and not failing."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "d3e893bf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI, ChatAnthropic"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4847c82d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, let's mock out what happens if we hit a RateLimitError from OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "dfdd8bf5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from unittest.mock import patch\n",
|
||||
"from openai.error import RateLimitError"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"id": "e6fdffc1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Note that we set max_retries = 0 to avoid retrying on RateLimits, etc\n",
|
||||
"openai_llm = ChatOpenAI(max_retries=0)\n",
|
||||
"anthropic_llm = ChatAnthropic()\n",
|
||||
"llm = openai_llm.with_fallbacks([anthropic_llm])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"id": "584461ab",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Hit error\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Let's use just the OpenAI LLm first, to show that we run into an error\n",
|
||||
"with patch('openai.ChatCompletion.create', side_effect=RateLimitError()):\n",
|
||||
" try:\n",
|
||||
" print(openai_llm.invoke(\"Why did the chicken cross the road?\"))\n",
|
||||
" except:\n",
|
||||
" print(\"Hit error\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"id": "4fc1e673",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"content=' I don\\'t actually know why the chicken crossed the road, but here are some possible humorous answers:\\n\\n- To get to the other side!\\n\\n- It was too chicken to just stand there. \\n\\n- It wanted a change of scenery.\\n\\n- It wanted to show the possum it could be done.\\n\\n- It was on its way to a poultry farmers\\' convention.\\n\\nThe joke plays on the double meaning of \"the other side\" - literally crossing the road to the other side, or the \"other side\" meaning the afterlife. So it\\'s an anti-joke, with a silly or unexpected pun as the answer.' additional_kwargs={} example=False\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Now let's try with fallbacks to Anthropic\n",
|
||||
"with patch('openai.ChatCompletion.create', side_effect=RateLimitError()):\n",
|
||||
" try:\n",
|
||||
" print(llm.invoke(\"Why did the the chicken cross the road?\"))\n",
|
||||
" except:\n",
|
||||
" print(\"Hit error\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f00bea25",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can use our \"LLM with Fallbacks\" as we would a normal LLM."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"id": "4f8eaaa0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"content=\" I don't actually know why the kangaroo crossed the road, but I can take a guess! Here are some possible reasons:\\n\\n- To get to the other side (the classic joke answer!)\\n\\n- It was trying to find some food or water \\n\\n- It was trying to find a mate during mating season\\n\\n- It was fleeing from a predator or perceived threat\\n\\n- It was disoriented and crossed accidentally \\n\\n- It was following a herd of other kangaroos who were crossing\\n\\n- It wanted a change of scenery or environment \\n\\n- It was trying to reach a new habitat or territory\\n\\nThe real reason is unknown without more context, but hopefully one of those potential explanations does the joke justice! Let me know if you have any other animal jokes I can try to decipher.\" additional_kwargs={} example=False\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", \"You're a nice assistant who always includes a compliment in your response\"),\n",
|
||||
" (\"human\", \"Why did the {animal} cross the road\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"chain = prompt | llm\n",
|
||||
"with patch('openai.ChatCompletion.create', side_effect=RateLimitError()):\n",
|
||||
" try:\n",
|
||||
" print(chain.invoke({\"animal\": \"kangaroo\"}))\n",
|
||||
" except:\n",
|
||||
" print(\"Hit error\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8d62241b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fallbacks for Sequences\n",
|
||||
"\n",
|
||||
"We can also create fallbacks for sequences, that are sequences themselves. Here we do that with two different models: ChatOpenAI and then normal OpenAI (which does not use a chat model). Because OpenAI is NOT a chat model, you likely want a different prompt."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"id": "6d0b8056",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# First let's create a chain with a ChatModel\n",
|
||||
"# We add in a string output parser here so the outputs between the two are the same type\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser\n",
|
||||
"\n",
|
||||
"chat_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", \"You're a nice assistant who always includes a compliment in your response\"),\n",
|
||||
" (\"human\", \"Why did the {animal} cross the road\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"# Here we're going to use a bad model name to easily create a chain that will error\n",
|
||||
"chat_model = ChatOpenAI(model_name=\"gpt-fake\")\n",
|
||||
"bad_chain = chat_prompt | chat_model | StrOutputParser()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"id": "8d1fc2a5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Now lets create a chain with the normal OpenAI model\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"prompt_template = \"\"\"Instructions: You should always include a compliment in your response.\n",
|
||||
"\n",
|
||||
"Question: Why did the {animal} cross the road?\"\"\"\n",
|
||||
"prompt = PromptTemplate.from_template(prompt_template)\n",
|
||||
"llm = OpenAI()\n",
|
||||
"good_chain = prompt | llm"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"id": "283bfa44",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\nAnswer: The turtle crossed the road to get to the other side, and I have to say he had some impressive determination.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 32,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# We can now create a final chain which combines the two\n",
|
||||
"chain = bad_chain.with_fallbacks([good_chain])\n",
|
||||
"chain.invoke({\"animal\": \"turtle\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ec4685b4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Handling Long Inputs\n",
|
||||
"\n",
|
||||
"One of the big limiting factors of LLMs in their context window. Usually you can count and track the length of prompts before sending them to an LLM, but in situations where that is hard/complicated you can fallback to a model with longer context length."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 34,
|
||||
"id": "564b84c9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"short_llm = ChatOpenAI()\n",
|
||||
"long_llm = ChatOpenAI(model=\"gpt-3.5-turbo-16k\")\n",
|
||||
"llm = short_llm.with_fallbacks([long_llm])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 38,
|
||||
"id": "5e27a775",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inputs = \"What is the next number: \" + \", \".join([\"one\", \"two\"] * 3000)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 40,
|
||||
"id": "0a502731",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"This model's maximum context length is 4097 tokens. However, your messages resulted in 12012 tokens. Please reduce the length of the messages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"try:\n",
|
||||
" print(short_llm.invoke(inputs))\n",
|
||||
"except Exception as e:\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 41,
|
||||
"id": "d91ba5d7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"content='The next number in the sequence is two.' additional_kwargs={} example=False\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"try:\n",
|
||||
" print(llm.invoke(inputs))\n",
|
||||
"except Exception as e:\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2a6735df",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fallback to Better Model\n",
|
||||
"\n",
|
||||
"Often times we ask models to output format in a specific format (like JSON). Models like GPT-3.5 can do this okay, but sometimes struggle. This naturally points to fallbacks - we can try with GPT-3.5 (faster, cheaper), but then if parsing fails we can use GPT-4."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 42,
|
||||
"id": "867a3793",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.output_parsers import DatetimeOutputParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 67,
|
||||
"id": "b8d9959d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"prompt = ChatPromptTemplate.from_template(\n",
|
||||
" \"what time was {event} (in %Y-%m-%dT%H:%M:%S.%fZ format - only return this value)\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 75,
|
||||
"id": "98087a76",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# In this case we are going to do the fallbacks on the LLM + output parser level\n",
|
||||
"# Because the error will get raised in the OutputParser\n",
|
||||
"openai_35 = ChatOpenAI() | DatetimeOutputParser()\n",
|
||||
"openai_4 = ChatOpenAI(model=\"gpt-4\")| DatetimeOutputParser()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 77,
|
||||
"id": "17ec9e8f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"only_35 = prompt | openai_35 \n",
|
||||
"fallback_4 = prompt | openai_35.with_fallbacks([openai_4])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 80,
|
||||
"id": "7e536f0b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Error: Could not parse datetime string: The Super Bowl in 1994 took place on January 30th at 3:30 PM local time. Converting this to the specified format (%Y-%m-%dT%H:%M:%S.%fZ) results in: 1994-01-30T15:30:00.000Z\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"try:\n",
|
||||
" print(only_35.invoke({\"event\": \"the superbowl in 1994\"}))\n",
|
||||
"except Exception as e:\n",
|
||||
" print(f\"Error: {e}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 81,
|
||||
"id": "01355c5e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"1994-01-30 15:30:00\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"try:\n",
|
||||
" print(fallback_4.invoke({\"event\": \"the superbowl in 1994\"}))\n",
|
||||
"except Exception as e:\n",
|
||||
" print(f\"Error: {e}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c537f9d0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
807
docs/extras/guides/local_llms.ipynb
Normal file
@@ -0,0 +1,807 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b8982428",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Private, local, open source LLMs\n",
|
||||
"\n",
|
||||
"## Use case\n",
|
||||
"\n",
|
||||
"The popularity of projects like [PrivateGPT](https://github.com/imartinez/privateGPT), [llama.cpp](https://github.com/ggerganov/llama.cpp), and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the demand to run LLMs locally (on your own device).\n",
|
||||
"\n",
|
||||
"This has at least two important benefits:\n",
|
||||
"\n",
|
||||
"1. `Privacy`: Your data is not sent to a third party, and it is not subject to the terms of service of a commercial service\n",
|
||||
"2. `Cost`: There is no inference fee, which is important for token-intensive applications (e.g., [long-running simulations](https://twitter.com/RLanceMartin/status/1691097659262820352?s=20), summarization)\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"Running an LLM locally requires a few things:\n",
|
||||
"\n",
|
||||
"1. `Open source LLM`: An open source LLM that can be freely modified and shared \n",
|
||||
"2. `Inference`: Ability to run this LLM on your device w/ acceptable latency\n",
|
||||
"\n",
|
||||
"### Open Source LLMs\n",
|
||||
"\n",
|
||||
"Users can now gain access to a rapidly growing set of [open source LLMs](https://cameronrwolfe.substack.com/p/the-history-of-open-source-llms-better). \n",
|
||||
"\n",
|
||||
"These LLMs can be assessed across at least two dimentions (see figure):\n",
|
||||
" \n",
|
||||
"1. `Base model`: What is the base-model and how was it trained?\n",
|
||||
"2. `Fine-tuning approach`: Was the base-model fine-tuned and, if so, what [set of instructions](https://cameronrwolfe.substack.com/p/beyond-llama-the-power-of-open-llms#%C2%A7alpaca-an-instruction-following-llama-model) was used?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"The relative performance of these models can be assessed using several leaderboards, including:\n",
|
||||
"\n",
|
||||
"1. [LmSys](https://chat.lmsys.org/?arena)\n",
|
||||
"2. [GPT4All](https://gpt4all.io/index.html)\n",
|
||||
"3. [HuggingFace](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)\n",
|
||||
"\n",
|
||||
"### Inference\n",
|
||||
"\n",
|
||||
"A few frameworks for this have emerged to support inference of open source LLMs on various devices:\n",
|
||||
"\n",
|
||||
"1. [`llama.cpp`](https://github.com/ggerganov/llama.cpp): C++ implementation of llama inference code with [weight optimization / quantization](https://finbarr.ca/how-is-llama-cpp-possible/)\n",
|
||||
"2. [`gpt4all`](https://docs.gpt4all.io/index.html): Optimized C backend for inference\n",
|
||||
"3. [`Ollama`](https://ollama.ai/): Bundles model weights and environment into an app that runs on device and serves the LLM \n",
|
||||
"\n",
|
||||
"In general, these frameworks will do a few things:\n",
|
||||
"\n",
|
||||
"1. `Quantization`: Reduce the memory footprint of the raw model weights\n",
|
||||
"2. `Efficient implementation for inference`: Support inference on consumer hardware (e.g., CPU or laptop GPU)\n",
|
||||
"\n",
|
||||
"In particular, see [this excellent post](https://finbarr.ca/how-is-llama-cpp-possible/) on the importance of quantization.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"With less precision, we radically decrease the memory needed to store the LLM in memory.\n",
|
||||
"\n",
|
||||
"In addition, we can see the importance of GPU memory bandwidth [sheet](https://docs.google.com/spreadsheets/d/1OehfHHNSn66BP2h3Bxp2NJTVX97icU0GmCXF6pK23H8/edit#gid=0)!\n",
|
||||
"\n",
|
||||
"A Mac M2 Max is 5-6x faster than a M1 for inference due to the larger GPU memory bandwidth.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Quickstart\n",
|
||||
"\n",
|
||||
"[`Ollama`](https://ollama.ai/) is one way to easily run inference on macOS.\n",
|
||||
" \n",
|
||||
"The instructions [here](docs/integrations/llms/ollama) provide details, which we summarize:\n",
|
||||
" \n",
|
||||
"* [Download and run](https://ollama.ai/download) the app\n",
|
||||
"* From command line, fetch a model from this [list of options](https://github.com/jmorganca/ollama): e.g., `ollama pull llama2`\n",
|
||||
"* When the app is running, all models are automatically served on `localhost:11434`\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "86178adb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The first man on the moon was Neil Armstrong, who landed on the moon on July 20, 1969 as part of the Apollo 11 mission. obviously.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import Ollama\n",
|
||||
"llm = Ollama(model=\"llama2\")\n",
|
||||
"llm(\"The first man on the moon was ...\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "343ab645",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Stream tokens as they are being generated."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 40,
|
||||
"id": "9cd83603",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" The first man to walk on the moon was Neil Armstrong, an American astronaut who was part of the Apollo 11 mission in 1969. февруари 20, 1969, Armstrong stepped out of the lunar module Eagle and onto the moon's surface, famously declaring \"That's one small step for man, one giant leap for mankind\" as he took his first steps. He was followed by fellow astronaut Edwin \"Buzz\" Aldrin, who also walked on the moon during the mission."
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The first man to walk on the moon was Neil Armstrong, an American astronaut who was part of the Apollo 11 mission in 1969. февруари 20, 1969, Armstrong stepped out of the lunar module Eagle and onto the moon\\'s surface, famously declaring \"That\\'s one small step for man, one giant leap for mankind\" as he took his first steps. He was followed by fellow astronaut Edwin \"Buzz\" Aldrin, who also walked on the moon during the mission.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 40,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler \n",
|
||||
"llm = Ollama(model=\"llama2\", \n",
|
||||
" callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]))\n",
|
||||
"llm(\"The first man on the moon was ...\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5cb27414",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Environment\n",
|
||||
"\n",
|
||||
"Inference speed is a chllenge when running models locally (see above).\n",
|
||||
"\n",
|
||||
"To minimize latency, it is desiable to run models locally on GPU, which ships with many consumer laptops [e.g., Apple devices](https://www.apple.com/newsroom/2022/06/apple-unveils-m2-with-breakthrough-performance-and-capabilities/).\n",
|
||||
"\n",
|
||||
"And even with GPU, the available GPU memory bandwidth (as noted above) is important.\n",
|
||||
"\n",
|
||||
"### Running Apple silicon GPU\n",
|
||||
"\n",
|
||||
"`Ollama` will automatically utilize the GPU on Apple devices.\n",
|
||||
" \n",
|
||||
"Other frameworks require the user to set up the environment to utilize the Apple GPU.\n",
|
||||
"\n",
|
||||
"For example, `llama.cpp` python bindings can be configured to use the GPU via [Metal](https://developer.apple.com/metal/).\n",
|
||||
"\n",
|
||||
"Metal is a graphics and compute API created by Apple providing near-direct access to the GPU. \n",
|
||||
"\n",
|
||||
"See the [`llama.cpp`](docs/integrations/llms/llamacpp) setup [here](https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md) to enable this.\n",
|
||||
"\n",
|
||||
"In particular, ensure that conda is using the correct virtual enviorment that you created (`miniforge3`).\n",
|
||||
"\n",
|
||||
"E.g., for me:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"conda activate /Users/rlm/miniforge3/envs/llama\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"With the above confirmed, then:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"CMAKE_ARGS=\"-DLLAMA_METAL=on\" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c382e79a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## LLMs\n",
|
||||
"\n",
|
||||
"There are various ways to gain access to quantized model weights.\n",
|
||||
"\n",
|
||||
"1. [`HuggingFace`](https://huggingface.co/TheBloke) - Many quantized model are available for download and can be run with framework such as [`llama.cpp`](https://github.com/ggerganov/llama.cpp)\n",
|
||||
"2. [`gpt4all`](https://gpt4all.io/index.html) - The model explorer offers a leaderboard of metrics and associated quantized models available for download \n",
|
||||
"3. [`Ollama`](https://github.com/jmorganca/ollama) - Several models can be accessed directly via `pull`\n",
|
||||
"\n",
|
||||
"### Ollama\n",
|
||||
"\n",
|
||||
"With [Ollama](docs/integrations/llms/ollama), fetch a model via `ollama pull <model family>:<tag>`:\n",
|
||||
"\n",
|
||||
"* E.g., for Llama-7b: `ollama pull llama2` will download the most basic version of the model (e.g., smallest # parameters and 4 bit quantization)\n",
|
||||
"* We can also specify a particular version from the [model list](https://github.com/jmorganca/ollama), e.g., `ollama pull llama2:13b`\n",
|
||||
"* See the full set of parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain.llms.ollama.Ollama.html)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 42,
|
||||
"id": "8ecd2f78",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Sure! Here\\'s the answer, broken down step by step:\\n\\nThe first man on the moon was... Neil Armstrong.\\n\\nHere\\'s how I arrived at that answer:\\n\\n1. The first manned mission to land on the moon was Apollo 11.\\n2. The mission included three astronauts: Neil Armstrong, Edwin \"Buzz\" Aldrin, and Michael Collins.\\n3. Neil Armstrong was the mission commander and the first person to set foot on the moon.\\n4. On July 20, 1969, Armstrong stepped out of the lunar module Eagle and onto the moon\\'s surface, famously declaring \"That\\'s one small step for man, one giant leap for mankind.\"\\n\\nSo, the first man on the moon was Neil Armstrong!'"
|
||||
]
|
||||
},
|
||||
"execution_count": 42,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import Ollama\n",
|
||||
"llm = Ollama(model=\"llama2:13b\")\n",
|
||||
"llm(\"The first man on the moon was ... think step by step\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "07c8c0d1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Llama.cpp\n",
|
||||
"\n",
|
||||
"Llama.cpp is compatible with a [broad set of models](https://github.com/ggerganov/llama.cpp).\n",
|
||||
"\n",
|
||||
"For example, below we run inference on `llama2-13b` with 4 bit quantization downloaded from [HuggingFace](https://huggingface.co/TheBloke/Llama-2-13B-GGML/tree/main).\n",
|
||||
"\n",
|
||||
"As noted above, see the [API reference](https://api.python.langchain.com/en/latest/llms/langchain.llms.llamacpp.LlamaCpp.html?highlight=llamacpp#langchain.llms.llamacpp.LlamaCpp) for the full set of parameters. \n",
|
||||
"\n",
|
||||
"From the [llama.cpp docs](https://python.langchain.com/docs/integrations/llms/llamacpp), a few are worth commenting on:\n",
|
||||
"\n",
|
||||
"`n_gpu_layers`: number of layers to be loaded into GPU memory\n",
|
||||
"\n",
|
||||
"* Value: 1\n",
|
||||
"* Meaning: Only one layer of the model will be loaded into GPU memory (1 is often sufficient).\n",
|
||||
"\n",
|
||||
"`n_batch`: number of tokens the model should process in parallel \n",
|
||||
"* Value: n_batch\n",
|
||||
"* Meaning: It's recommended to choose a value between 1 and n_ctx (which in this case is set to 2048)\n",
|
||||
"\n",
|
||||
"`n_ctx`: Token context window .\n",
|
||||
"* Value: 2048\n",
|
||||
"* Meaning: The model will consider a window of 2048 tokens at a time\n",
|
||||
"\n",
|
||||
"`f16_kv`: whether the model should use half-precision for the key/value cache\n",
|
||||
"* Value: True\n",
|
||||
"* Meaning: The model will use half-precision, which can be more memory efficient; Metal only support True."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5eba38dc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install llama-cpp-python"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 43,
|
||||
"id": "9d5f94b5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"objc[10142]: Class GGMLMetalClass is implemented in both /Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/libreplit-mainline-metal.dylib (0x2a0c4c208) and /Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/llama_cpp/libllama.dylib (0x2c28bc208). One of the two will be used. Which one is undefined.\n",
|
||||
"llama.cpp: loading model from /Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin\n",
|
||||
"llama_model_load_internal: format = ggjt v3 (latest)\n",
|
||||
"llama_model_load_internal: n_vocab = 32000\n",
|
||||
"llama_model_load_internal: n_ctx = 2048\n",
|
||||
"llama_model_load_internal: n_embd = 5120\n",
|
||||
"llama_model_load_internal: n_mult = 256\n",
|
||||
"llama_model_load_internal: n_head = 40\n",
|
||||
"llama_model_load_internal: n_layer = 40\n",
|
||||
"llama_model_load_internal: n_rot = 128\n",
|
||||
"llama_model_load_internal: freq_base = 10000.0\n",
|
||||
"llama_model_load_internal: freq_scale = 1\n",
|
||||
"llama_model_load_internal: ftype = 2 (mostly Q4_0)\n",
|
||||
"llama_model_load_internal: n_ff = 13824\n",
|
||||
"llama_model_load_internal: model size = 13B\n",
|
||||
"llama_model_load_internal: ggml ctx size = 0.09 MB\n",
|
||||
"llama_model_load_internal: mem required = 8953.71 MB (+ 1608.00 MB per state)\n",
|
||||
"llama_new_context_with_model: kv self size = 1600.00 MB\n",
|
||||
"ggml_metal_init: allocating\n",
|
||||
"ggml_metal_init: using MPS\n",
|
||||
"ggml_metal_init: loading '/Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/llama_cpp/ggml-metal.metal'\n",
|
||||
"ggml_metal_init: loaded kernel_add 0x47774af60\n",
|
||||
"ggml_metal_init: loaded kernel_mul 0x47774bc00\n",
|
||||
"ggml_metal_init: loaded kernel_mul_row 0x47774c230\n",
|
||||
"ggml_metal_init: loaded kernel_scale 0x47774c890\n",
|
||||
"ggml_metal_init: loaded kernel_silu 0x47774cef0\n",
|
||||
"ggml_metal_init: loaded kernel_relu 0x10e33e500\n",
|
||||
"ggml_metal_init: loaded kernel_gelu 0x47774b2f0\n",
|
||||
"ggml_metal_init: loaded kernel_soft_max 0x47771a580\n",
|
||||
"ggml_metal_init: loaded kernel_diag_mask_inf 0x47774dab0\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_f16 0x47774e110\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q4_0 0x47774e7d0\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q4_1 0x13efd7170\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q2_K 0x13efd73d0\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q3_K 0x13efd7630\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q4_K 0x13efd7890\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q5_K 0x4744c9740\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q6_K 0x4744ca6b0\n",
|
||||
"ggml_metal_init: loaded kernel_rms_norm 0x4744cb250\n",
|
||||
"ggml_metal_init: loaded kernel_norm 0x4744cb970\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x10e33f700\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x10e33fcd0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x4744cc2d0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q2_K_f32 0x4744cc6f0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q3_K_f32 0x4744cd6b0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q4_K_f32 0x4744cde20\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q5_K_f32 0x10e33ff30\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x10e340190\n",
|
||||
"ggml_metal_init: loaded kernel_rope 0x10e3403f0\n",
|
||||
"ggml_metal_init: loaded kernel_alibi_f32 0x10e340de0\n",
|
||||
"ggml_metal_init: loaded kernel_cpy_f32_f16 0x10e3416d0\n",
|
||||
"ggml_metal_init: loaded kernel_cpy_f32_f32 0x10e342080\n",
|
||||
"ggml_metal_init: loaded kernel_cpy_f16_f16 0x10e342ca0\n",
|
||||
"ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB\n",
|
||||
"ggml_metal_init: hasUnifiedMemory = true\n",
|
||||
"ggml_metal_init: maxTransferRate = built-in GPU\n",
|
||||
"ggml_metal_add_buffer: allocated 'data ' buffer, size = 6984.06 MB, ( 6986.19 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'eval ' buffer, size = 1032.00 MB, ( 8018.19 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1602.00 MB, ( 9620.19 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 426.00 MB, (10046.19 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 512.00 MB, (10558.19 / 21845.34)\n",
|
||||
"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import LlamaCpp\n",
|
||||
"llm = LlamaCpp(\n",
|
||||
" model_path=\"/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin\",\n",
|
||||
" n_gpu_layers=1,\n",
|
||||
" n_batch=512,\n",
|
||||
" n_ctx=2048,\n",
|
||||
" f16_kv=True, \n",
|
||||
" callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f56f5168",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The console log will show the the below to indicate Metal was enabled properly from steps above:\n",
|
||||
"```\n",
|
||||
"ggml_metal_init: allocating\n",
|
||||
"ggml_metal_init: using MPS\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 45,
|
||||
"id": "7890a077",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Llama.generate: prefix-match hit\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" and use logical reasoning to figure out who the first man on the moon was.\n",
|
||||
"\n",
|
||||
"Here are some clues:\n",
|
||||
"\n",
|
||||
"1. The first man on the moon was an American.\n",
|
||||
"2. He was part of the Apollo 11 mission.\n",
|
||||
"3. He stepped out of the lunar module and became the first person to set foot on the moon's surface.\n",
|
||||
"4. His last name is Armstrong.\n",
|
||||
"\n",
|
||||
"Now, let's use our reasoning skills to figure out who the first man on the moon was. Based on clue #1, we know that the first man on the moon was an American. Clue #2 tells us that he was part of the Apollo 11 mission. Clue #3 reveals that he was the first person to set foot on the moon's surface. And finally, clue #4 gives us his last name: Armstrong.\n",
|
||||
"Therefore, the first man on the moon was Neil Armstrong!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"llama_print_timings: load time = 9623.21 ms\n",
|
||||
"llama_print_timings: sample time = 143.77 ms / 203 runs ( 0.71 ms per token, 1412.01 tokens per second)\n",
|
||||
"llama_print_timings: prompt eval time = 485.94 ms / 7 tokens ( 69.42 ms per token, 14.40 tokens per second)\n",
|
||||
"llama_print_timings: eval time = 6385.16 ms / 202 runs ( 31.61 ms per token, 31.64 tokens per second)\n",
|
||||
"llama_print_timings: total time = 7279.28 ms\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" and use logical reasoning to figure out who the first man on the moon was.\\n\\nHere are some clues:\\n\\n1. The first man on the moon was an American.\\n2. He was part of the Apollo 11 mission.\\n3. He stepped out of the lunar module and became the first person to set foot on the moon's surface.\\n4. His last name is Armstrong.\\n\\nNow, let's use our reasoning skills to figure out who the first man on the moon was. Based on clue #1, we know that the first man on the moon was an American. Clue #2 tells us that he was part of the Apollo 11 mission. Clue #3 reveals that he was the first person to set foot on the moon's surface. And finally, clue #4 gives us his last name: Armstrong.\\nTherefore, the first man on the moon was Neil Armstrong!\""
|
||||
]
|
||||
},
|
||||
"execution_count": 45,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm(\"The first man on the moon was ... Let's think step by step\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "831ddf7c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### GPT4All\n",
|
||||
"\n",
|
||||
"We can use model weights downloaded from [GPT4All](https://python.langchain.com/docs/integrations/llms/gpt4all) model explorer.\n",
|
||||
"\n",
|
||||
"Similar to what is shown above, we can run inference and use [the API reference](https://api.python.langchain.com/en/latest/llms/langchain.llms.gpt4all.GPT4All.html?highlight=gpt4all#langchain.llms.gpt4all.GPT4All) to set parameters of interest."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e27baf6e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install gpt4all"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 46,
|
||||
"id": "b55a2147",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Found model file at /Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\n",
|
||||
"llama_new_context_with_model: max tensor size = 87.89 MB\n",
|
||||
"llama_new_context_with_model: max tensor size = 87.89 MB\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"llama.cpp: using Metal\n",
|
||||
"llama.cpp: loading model from /Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\n",
|
||||
"llama_model_load_internal: format = ggjt v3 (latest)\n",
|
||||
"llama_model_load_internal: n_vocab = 32001\n",
|
||||
"llama_model_load_internal: n_ctx = 2048\n",
|
||||
"llama_model_load_internal: n_embd = 5120\n",
|
||||
"llama_model_load_internal: n_mult = 256\n",
|
||||
"llama_model_load_internal: n_head = 40\n",
|
||||
"llama_model_load_internal: n_layer = 40\n",
|
||||
"llama_model_load_internal: n_rot = 128\n",
|
||||
"llama_model_load_internal: ftype = 2 (mostly Q4_0)\n",
|
||||
"llama_model_load_internal: n_ff = 13824\n",
|
||||
"llama_model_load_internal: n_parts = 1\n",
|
||||
"llama_model_load_internal: model size = 13B\n",
|
||||
"llama_model_load_internal: ggml ctx size = 0.09 MB\n",
|
||||
"llama_model_load_internal: mem required = 9031.71 MB (+ 1608.00 MB per state)\n",
|
||||
"llama_new_context_with_model: kv self size = 1600.00 MB\n",
|
||||
"ggml_metal_init: allocating\n",
|
||||
"ggml_metal_init: using MPS\n",
|
||||
"ggml_metal_init: loading '/Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/gpt4all/llmodel_DO_NOT_MODIFY/build/ggml-metal.metal'\n",
|
||||
"ggml_metal_init: loaded kernel_add 0x37944d850\n",
|
||||
"ggml_metal_init: loaded kernel_mul 0x37944f350\n",
|
||||
"ggml_metal_init: loaded kernel_mul_row 0x37944fdd0\n",
|
||||
"ggml_metal_init: loaded kernel_scale 0x3794505a0\n",
|
||||
"ggml_metal_init: loaded kernel_silu 0x379450800\n",
|
||||
"ggml_metal_init: loaded kernel_relu 0x379450a60\n",
|
||||
"ggml_metal_init: loaded kernel_gelu 0x379450cc0\n",
|
||||
"ggml_metal_init: loaded kernel_soft_max 0x379450ff0\n",
|
||||
"ggml_metal_init: loaded kernel_diag_mask_inf 0x379451250\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_f16 0x3794514b0\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q4_0 0x379451710\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q4_1 0x379451970\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q2_k 0x379451bd0\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q3_k 0x379451e30\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q4_k 0x379452090\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q5_k 0x3794522f0\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q6_k 0x379452550\n",
|
||||
"ggml_metal_init: loaded kernel_rms_norm 0x3794527b0\n",
|
||||
"ggml_metal_init: loaded kernel_norm 0x379452a10\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x379452c70\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x379452ed0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x379453130\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q2_k_f32 0x379453390\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q3_k_f32 0x3794535f0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q4_k_f32 0x379453850\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q5_k_f32 0x379453ab0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q6_k_f32 0x379453d10\n",
|
||||
"ggml_metal_init: loaded kernel_rope 0x379453f70\n",
|
||||
"ggml_metal_init: loaded kernel_alibi_f32 0x3794541d0\n",
|
||||
"ggml_metal_init: loaded kernel_cpy_f32_f16 0x379454430\n",
|
||||
"ggml_metal_init: loaded kernel_cpy_f32_f32 0x379454690\n",
|
||||
"ggml_metal_init: loaded kernel_cpy_f16_f16 0x3794548f0\n",
|
||||
"ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB\n",
|
||||
"ggml_metal_init: hasUnifiedMemory = true\n",
|
||||
"ggml_metal_init: maxTransferRate = built-in GPU\n",
|
||||
"ggml_metal_add_buffer: allocated 'data ' buffer, size = 6984.06 MB, (17542.94 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'eval ' buffer, size = 1024.00 MB, (18566.94 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1602.00 MB, (20168.94 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 512.00 MB, (20680.94 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 512.00 MB, (21192.94 / 21845.34)\n",
|
||||
"ggml_metal_free: deallocating\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import GPT4All\n",
|
||||
"llm = GPT4All(model=\"/Users/rlm/Desktop/Code/gpt4all/models/nous-hermes-13b.ggmlv3.q4_0.bin\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 47,
|
||||
"id": "e3d4526f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\".\\n1) The United States decides to send a manned mission to the moon.2) They choose their best astronauts and train them for this specific mission.3) They build a spacecraft that can take humans to the moon, called the Lunar Module (LM).4) They also create a larger spacecraft, called the Saturn V rocket, which will launch both the LM and the Command Service Module (CSM), which will carry the astronauts into orbit.5) The mission is planned down to the smallest detail: from the trajectory of the rockets to the exact movements of the astronauts during their moon landing.6) On July 16, 1969, the Saturn V rocket launches from Kennedy Space Center in Florida, carrying the Apollo 11 mission crew into space.7) After one and a half orbits around the Earth, the LM separates from the CSM and begins its descent to the moon's surface.8) On July 20, 1969, at 2:56 pm EDT (GMT-4), Neil Armstrong becomes the first man on the moon. He speaks these\""
|
||||
]
|
||||
},
|
||||
"execution_count": 47,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm(\"The first man on the moon was ... Let's think step by step\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6b84e543",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prompts\n",
|
||||
"\n",
|
||||
"Some LLMs will benefit from specific prompts.\n",
|
||||
"\n",
|
||||
"For example, llama2 can use [special tokens](https://twitter.com/RLanceMartin/status/1681879318493003776?s=20).\n",
|
||||
"\n",
|
||||
"We can use `ConditionalPromptSelector` to set prompt based on the model type."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 57,
|
||||
"id": "d082b10a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"llama.cpp: loading model from /Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin\n",
|
||||
"llama_model_load_internal: format = ggjt v3 (latest)\n",
|
||||
"llama_model_load_internal: n_vocab = 32000\n",
|
||||
"llama_model_load_internal: n_ctx = 2048\n",
|
||||
"llama_model_load_internal: n_embd = 5120\n",
|
||||
"llama_model_load_internal: n_mult = 256\n",
|
||||
"llama_model_load_internal: n_head = 40\n",
|
||||
"llama_model_load_internal: n_layer = 40\n",
|
||||
"llama_model_load_internal: n_rot = 128\n",
|
||||
"llama_model_load_internal: freq_base = 10000.0\n",
|
||||
"llama_model_load_internal: freq_scale = 1\n",
|
||||
"llama_model_load_internal: ftype = 2 (mostly Q4_0)\n",
|
||||
"llama_model_load_internal: n_ff = 13824\n",
|
||||
"llama_model_load_internal: model size = 13B\n",
|
||||
"llama_model_load_internal: ggml ctx size = 0.09 MB\n",
|
||||
"llama_model_load_internal: mem required = 8953.71 MB (+ 1608.00 MB per state)\n",
|
||||
"llama_new_context_with_model: kv self size = 1600.00 MB\n",
|
||||
"ggml_metal_init: allocating\n",
|
||||
"ggml_metal_init: using MPS\n",
|
||||
"ggml_metal_init: loading '/Users/rlm/miniforge3/envs/llama/lib/python3.9/site-packages/llama_cpp/ggml-metal.metal'\n",
|
||||
"ggml_metal_init: loaded kernel_add 0x4744d09d0\n",
|
||||
"ggml_metal_init: loaded kernel_mul 0x3781cb3d0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_row 0x37813bb60\n",
|
||||
"ggml_metal_init: loaded kernel_scale 0x474481080\n",
|
||||
"ggml_metal_init: loaded kernel_silu 0x4744d29f0\n",
|
||||
"ggml_metal_init: loaded kernel_relu 0x3781254c0\n",
|
||||
"ggml_metal_init: loaded kernel_gelu 0x47447f280\n",
|
||||
"ggml_metal_init: loaded kernel_soft_max 0x4744cf470\n",
|
||||
"ggml_metal_init: loaded kernel_diag_mask_inf 0x4744cf6d0\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_f16 0x4744cf930\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q4_0 0x4744cfb90\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q4_1 0x4744cfdf0\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q2_K 0x4744d0050\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q3_K 0x4744ce980\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q4_K 0x4744cebe0\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q5_K 0x4744cee40\n",
|
||||
"ggml_metal_init: loaded kernel_get_rows_q6_K 0x4744cf0a0\n",
|
||||
"ggml_metal_init: loaded kernel_rms_norm 0x474482450\n",
|
||||
"ggml_metal_init: loaded kernel_norm 0x4744826b0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x474482910\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x474482b70\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x474482dd0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q2_K_f32 0x474483030\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q3_K_f32 0x474483290\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q4_K_f32 0x4744834f0\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q5_K_f32 0x474483750\n",
|
||||
"ggml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x4744839b0\n",
|
||||
"ggml_metal_init: loaded kernel_rope 0x474483c10\n",
|
||||
"ggml_metal_init: loaded kernel_alibi_f32 0x474483e70\n",
|
||||
"ggml_metal_init: loaded kernel_cpy_f32_f16 0x4744840d0\n",
|
||||
"ggml_metal_init: loaded kernel_cpy_f32_f32 0x474484330\n",
|
||||
"ggml_metal_init: loaded kernel_cpy_f16_f16 0x474484590\n",
|
||||
"ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB\n",
|
||||
"ggml_metal_init: hasUnifiedMemory = true\n",
|
||||
"ggml_metal_init: maxTransferRate = built-in GPU\n",
|
||||
"ggml_metal_add_buffer: allocated 'data ' buffer, size = 6984.06 MB, ( 6986.94 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'eval ' buffer, size = 1032.00 MB, ( 8018.94 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1602.00 MB, ( 9620.94 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 426.00 MB, (10046.94 / 21845.34)\n",
|
||||
"ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 512.00 MB, (10558.94 / 21845.34)\n",
|
||||
"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Set our LLM\n",
|
||||
"llm = LlamaCpp(\n",
|
||||
" model_path=\"/Users/rlm/Desktop/Code/llama.cpp/llama-2-13b-chat.ggmlv3.q4_0.bin\",\n",
|
||||
" n_gpu_layers=1,\n",
|
||||
" n_batch=512,\n",
|
||||
" n_ctx=2048,\n",
|
||||
" f16_kv=True, \n",
|
||||
" callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "66656084",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Set the associated prompt."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 58,
|
||||
"id": "8555f5bf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"PromptTemplate(input_variables=['question'], output_parser=None, partial_variables={}, template='<<SYS>> \\n You are an assistant tasked with improving Google search results. \\n <</SYS>> \\n\\n [INST] Generate THREE Google search queries that are similar to this question. The output should be a numbered list of questions and each should have a question mark at the end: \\n\\n {question} [/INST]', template_format='f-string', validate_template=True)"
|
||||
]
|
||||
},
|
||||
"execution_count": 58,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"from langchain.chains.prompt_selector import ConditionalPromptSelector\n",
|
||||
"\n",
|
||||
"DEFAULT_LLAMA_SEARCH_PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"question\"],\n",
|
||||
" template=\"\"\"<<SYS>> \\n You are an assistant tasked with improving Google search \\\n",
|
||||
"results. \\n <</SYS>> \\n\\n [INST] Generate THREE Google search queries that \\\n",
|
||||
"are similar to this question. The output should be a numbered list of questions \\\n",
|
||||
"and each should have a question mark at the end: \\n\\n {question} [/INST]\"\"\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"DEFAULT_SEARCH_PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"question\"],\n",
|
||||
" template=\"\"\"You are an assistant tasked with improving Google search \\\n",
|
||||
"results. Generate THREE Google search queries that are similar to \\\n",
|
||||
"this question. The output should be a numbered list of questions and each \\\n",
|
||||
"should have a question mark at the end: {question}\"\"\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"QUESTION_PROMPT_SELECTOR = ConditionalPromptSelector(\n",
|
||||
" default_prompt=DEFAULT_SEARCH_PROMPT,\n",
|
||||
" conditionals=[\n",
|
||||
" (lambda llm: isinstance(llm, LlamaCpp), DEFAULT_LLAMA_SEARCH_PROMPT)\n",
|
||||
" ],\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"prompt = QUESTION_PROMPT_SELECTOR.get_prompt(llm)\n",
|
||||
"prompt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 59,
|
||||
"id": "d0aedfd2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Sure! Here are three similar search queries with a question mark at the end:\n",
|
||||
"\n",
|
||||
"1. Which NBA team did LeBron James lead to a championship in the year he was drafted?\n",
|
||||
"2. Who won the Grammy Awards for Best New Artist and Best Female Pop Vocal Performance in the same year that Lady Gaga was born?\n",
|
||||
"3. What MLB team did Babe Ruth play for when he hit 60 home runs in a single season?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"llama_print_timings: load time = 14943.19 ms\n",
|
||||
"llama_print_timings: sample time = 72.93 ms / 101 runs ( 0.72 ms per token, 1384.87 tokens per second)\n",
|
||||
"llama_print_timings: prompt eval time = 14942.95 ms / 93 tokens ( 160.68 ms per token, 6.22 tokens per second)\n",
|
||||
"llama_print_timings: eval time = 3430.85 ms / 100 runs ( 34.31 ms per token, 29.15 tokens per second)\n",
|
||||
"llama_print_timings: total time = 18578.26 ms\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Sure! Here are three similar search queries with a question mark at the end:\\n\\n1. Which NBA team did LeBron James lead to a championship in the year he was drafted?\\n2. Who won the Grammy Awards for Best New Artist and Best Female Pop Vocal Performance in the same year that Lady Gaga was born?\\n3. What MLB team did Babe Ruth play for when he hit 60 home runs in a single season?'"
|
||||
]
|
||||
},
|
||||
"execution_count": 59,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Chain\n",
|
||||
"llm_chain = LLMChain(prompt=prompt,llm=llm)\n",
|
||||
"question = \"What NFL team won the Super Bowl in the year that Justin Bieber was born?\"\n",
|
||||
"llm_chain.run({\"question\":question})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6ba66260",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use cases\n",
|
||||
"\n",
|
||||
"Given an `llm` created from one of the models above, you can use it for [many use cases](docs/use_cases).\n",
|
||||
"\n",
|
||||
"For example, here is a guide to [RAG](docs/use_cases/question_answering/how_to/local_retrieval_qa) with local LLMs.\n",
|
||||
"\n",
|
||||
"In general, use cases for local model can be driven by at least two factors:\n",
|
||||
"\n",
|
||||
"* `Privacy`: private data (e.g., journals, etc) that a user does not want to share \n",
|
||||
"* `Cost`: text preprocessing (extraction/tagging), summarization, and agent simulations are token-use-intensive tasks\n",
|
||||
"\n",
|
||||
"There are a few approach to support specific use-cases: \n",
|
||||
"\n",
|
||||
"* Fine-tuning (e.g., [gpt-llm-trainer](https://github.com/mshumer/gpt-llm-trainer), [Anyscale](https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehensive-case-study-for-tailoring-models-to-unique-applications)) \n",
|
||||
"* [Function-calling](https://github.com/MeetKai/functionary/tree/main) for use-cases like extraction or tagging\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
105
docs/extras/guides/pydantic_compatibility.md
Normal file
@@ -0,0 +1,105 @@
|
||||
# Pydantic Compatibility
|
||||
|
||||
- Pydantic v2 was released in June, 2023 (https://docs.pydantic.dev/2.0/blog/pydantic-v2-final/)
|
||||
- v2 contains has a number of breaking changes (https://docs.pydantic.dev/2.0/migration/)
|
||||
- Pydantic v2 and v1 are under the same package name, so both versions cannot be installed at the same time
|
||||
|
||||
## LangChain Pydantic Migration Plan
|
||||
|
||||
As of `langchain>=0.0.267`, LangChain will allow users to install either Pydantic V1 or V2.
|
||||
* Internally LangChain will continue to [use V1](https://docs.pydantic.dev/latest/migration/#continue-using-pydantic-v1-features).
|
||||
* During this time, users can pin their pydantic version to v1 to avoid breaking changes, or start a partial
|
||||
migration using pydantic v2 throughout their code, but avoiding mixing v1 and v2 code for LangChain (see below).
|
||||
|
||||
User can either pin to pydantic v1, and upgrade their code in one go once LangChain has migrated to v2 internally, or they can start a partial migration to v2, but must avoid mixing v1 and v2 code for LangChain.
|
||||
|
||||
Below are two examples of showing how to avoid mixing pydantic v1 and v2 code in
|
||||
the case of inheritance and in the case of passing objects to LangChain.
|
||||
|
||||
**Example 1: Extending via inheritance**
|
||||
|
||||
**YES**
|
||||
|
||||
```python
|
||||
from pydantic.v1 import root_validator, validator
|
||||
|
||||
class CustomTool(BaseTool): # BaseTool is v1 code
|
||||
x: int = Field(default=1)
|
||||
|
||||
def _run(*args, **kwargs):
|
||||
return "hello"
|
||||
|
||||
@validator('x') # v1 code
|
||||
@classmethod
|
||||
def validate_x(cls, x: int) -> int:
|
||||
return 1
|
||||
|
||||
|
||||
CustomTool(
|
||||
name='custom_tool',
|
||||
description="hello",
|
||||
x=1,
|
||||
)
|
||||
```
|
||||
|
||||
Mixing Pydantic v2 primitives with Pydantic v1 primitives can raise cryptic errors
|
||||
|
||||
**NO**
|
||||
|
||||
```python
|
||||
from pydantic import Field, field_validator # pydantic v2
|
||||
|
||||
class CustomTool(BaseTool): # BaseTool is v1 code
|
||||
x: int = Field(default=1)
|
||||
|
||||
def _run(*args, **kwargs):
|
||||
return "hello"
|
||||
|
||||
@field_validator('x') # v2 code
|
||||
@classmethod
|
||||
def validate_x(cls, x: int) -> int:
|
||||
return 1
|
||||
|
||||
|
||||
CustomTool(
|
||||
name='custom_tool',
|
||||
description="hello",
|
||||
x=1,
|
||||
)
|
||||
```
|
||||
|
||||
**Example 2: Passing objects to LangChain**
|
||||
|
||||
**YES**
|
||||
|
||||
```python
|
||||
from langchain.tools.base import Tool
|
||||
from pydantic.v1 import BaseModel, Field # <-- Uses v1 namespace
|
||||
|
||||
class CalculatorInput(BaseModel):
|
||||
question: str = Field()
|
||||
|
||||
Tool.from_function( # <-- tool uses v1 namespace
|
||||
func=lambda question: 'hello',
|
||||
name="Calculator",
|
||||
description="useful for when you need to answer questions about math",
|
||||
args_schema=CalculatorInput
|
||||
)
|
||||
```
|
||||
|
||||
**NO**
|
||||
|
||||
```python
|
||||
from langchain.tools.base import Tool
|
||||
from pydantic import BaseModel, Field # <-- Uses v2 namespace
|
||||
|
||||
class CalculatorInput(BaseModel):
|
||||
question: str = Field()
|
||||
|
||||
Tool.from_function( # <-- tool uses v1 namespace
|
||||
func=lambda question: 'hello',
|
||||
name="Calculator",
|
||||
description="useful for when you need to answer questions about math",
|
||||
args_schema=CalculatorInput
|
||||
)
|
||||
```
|
||||
@@ -7,12 +7,12 @@
|
||||
"source": [
|
||||
"# Context\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"[Context](https://getcontext.ai/) provides product analytics for AI chatbots.\n",
|
||||
"[Context](https://context.ai/) provides user analytics for LLM powered products and features.\n",
|
||||
"\n",
|
||||
"Context helps you understand how users are interacting with your AI chat products.\n",
|
||||
"Gain critical insights, optimise poor experiences, and minimise brand risks.\n"
|
||||
"With Context, you can start understanding your users and improving their experiences in less than 30 minutes.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -55,7 +55,7 @@
|
||||
"\n",
|
||||
"To get your Context API token:\n",
|
||||
"\n",
|
||||
"1. Go to the settings page within your Context account (https://go.getcontext.ai/settings).\n",
|
||||
"1. Go to the settings page within your Context account (https://with.context.ai/settings).\n",
|
||||
"2. Generate a new API Token.\n",
|
||||
"3. Store this token somewhere secure."
|
||||
]
|
||||
@@ -207,7 +207,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -11,7 +11,7 @@
|
||||
"\n",
|
||||
"[PromptLayer](https://promptlayer.com) is a an LLM observability platform that lets you visualize requests, version prompts, and track usage. In this guide we will go over how to setup the `PromptLayerCallbackHandler`. \n",
|
||||
"\n",
|
||||
"While PromptLayer does have LLMs that integrate directly with LangChain (eg [`PromptLayerOpenAI`](https://python.langchain.com/docs/integrations/llms/promptlayer_openai)), this callback is the recommended way to integrate PromptLayer with LangChain.\n",
|
||||
"While PromptLayer does have LLMs that integrate directly with LangChain (e.g. [`PromptLayerOpenAI`](https://python.langchain.com/docs/integrations/llms/promptlayer_openai)), this callback is the recommended way to integrate PromptLayer with LangChain.\n",
|
||||
"\n",
|
||||
"See [our docs](https://docs.promptlayer.com/languages/langchain) for more information."
|
||||
]
|
||||
|
||||
88
docs/extras/integrations/chat/ernie.ipynb
Normal file
@@ -0,0 +1,88 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ERNIE-Bot Chat\n",
|
||||
"\n",
|
||||
"[ERNIE-Bot](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/jlil56u11) is a large language model developed by Baidu, covering a huge amount of Chinese data.\n",
|
||||
"This notebook covers how to get started with ErnieBot chat models."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ErnieBotChat\n",
|
||||
"from langchain.schema import HumanMessage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat = ErnieBotChat(ernie_client_id='YOUR_CLIENT_ID', ernie_client_secret='YOUR_CLIENT_SECRET')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"or you can set `client_id` and `client_secret` in your environment variables\n",
|
||||
"```bash\n",
|
||||
"export ERNIE_CLIENT_ID=YOUR_CLIENT_ID\n",
|
||||
"export ERNIE_CLIENT_SECRET=YOUR_CLIENT_SECRET\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='Hello, I am an artificial intelligence language model. My purpose is to help users answer questions or provide information. What can I do for you?', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat([\n",
|
||||
" HumanMessage(content='hello there, who are you?')\n",
|
||||
"])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
185
docs/extras/integrations/chat/litellm.ipynb
Normal file
@@ -0,0 +1,185 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "bf733a38-db84-4363-89e2-de6735c37230",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 🚅 LiteLLM\n",
|
||||
"\n",
|
||||
"[LiteLLM](https://github.com/BerriAI/litellm) is a library that simplifies calling Anthropic, Azure, Huggingface, Replicate, etc. \n",
|
||||
"\n",
|
||||
"This notebook covers how to get started with using Langchain + the LiteLLM I/O library. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "d4a7c55d-b235-4ca4-a579-c90cc9570da9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatLiteLLM\n",
|
||||
"from langchain.prompts.chat import (\n",
|
||||
" ChatPromptTemplate,\n",
|
||||
" SystemMessagePromptTemplate,\n",
|
||||
" AIMessagePromptTemplate,\n",
|
||||
" HumanMessagePromptTemplate,\n",
|
||||
")\n",
|
||||
"from langchain.schema import AIMessage, HumanMessage, SystemMessage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "70cf04e8-423a-4ff6-8b09-f11fb711c817",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat = ChatLiteLLM(model=\"gpt-3.5-turbo\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8199ef8f-eb8b-4253-9ea0-6c24a013ca4c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\" J'aime la programmation.\", additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"messages = [\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Translate this sentence from English to French. I love programming.\"\n",
|
||||
" )\n",
|
||||
"]\n",
|
||||
"chat(messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c361ab1e-8c0c-4206-9e3c-9d1424a12b9c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## `ChatLiteLLM` also supports async and streaming functionality:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "93a21c5c-6ef9-4688-be60-b2e1f94842fb",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "c5fac0e9-05a4-4fc1-a3b3-e5bbb24b971b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LLMResult(generations=[[ChatGeneration(text=\" J'aime programmer.\", generation_info=None, message=AIMessage(content=\" J'aime programmer.\", additional_kwargs={}, example=False))]], llm_output={}, run=[RunInfo(run_id=UUID('8cc8fb68-1c35-439c-96a0-695036a93652'))])"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"await chat.agenerate([messages])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "025be980-e50d-4a68-93dc-c9c7b500ce34",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" J'aime la programmation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\" J'aime la programmation.\", additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat = ChatLiteLLM(\n",
|
||||
" streaming=True,\n",
|
||||
" verbose=True,\n",
|
||||
" callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),\n",
|
||||
")\n",
|
||||
"chat(messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c253883f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
382
docs/extras/integrations/chat/ollama.ipynb
Normal file
@@ -0,0 +1,382 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ollama\n",
|
||||
"\n",
|
||||
"[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as LLaMA2, locally.\n",
|
||||
"\n",
|
||||
"Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. \n",
|
||||
"\n",
|
||||
"It optimizes setup and configuration details, including GPU usage.\n",
|
||||
"\n",
|
||||
"For a complete list of supported models and model variants, see the [Ollama model library](https://ollama.ai/library).\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n",
|
||||
"\n",
|
||||
"* [Download](https://ollama.ai/download)\n",
|
||||
"* Fetch a model via `ollama pull <model family>`\n",
|
||||
"* e.g., for `Llama-7b`: `ollama pull llama2`\n",
|
||||
"* This will download the most basic version of the model (e.g., minimum # parameters and 4-bit quantization)\n",
|
||||
"* On Mac, it will download to:\n",
|
||||
"\n",
|
||||
"`~/.ollama/models/manifests/registry.ollama.ai/library/<model family>/latest`\n",
|
||||
"\n",
|
||||
"* And we can specify a particular version, e.g., for `ollama pull vicuna:13b-v1.5-16k-q4_0`\n",
|
||||
"* The file is here with the model version in place of `latest`\n",
|
||||
"\n",
|
||||
"`~/.ollama/models/manifests/registry.ollama.ai/library/vicuna/13b-v1.5-16k-q4_0`\n",
|
||||
"\n",
|
||||
"You can easily access models in a few ways:\n",
|
||||
"\n",
|
||||
"1/ if the app is running:\n",
|
||||
"* All of your local models are automatically served on `localhost:11434`\n",
|
||||
"* Select your model when setting `llm = Ollama(..., model=\"<model family>:<version>\")`\n",
|
||||
"* If you set `llm = Ollama(..., model=\"<model family\")` withoout a version it will simply look for `latest`\n",
|
||||
"\n",
|
||||
"2/ if building from source or just running the binary: \n",
|
||||
"* Then you must run `ollama serve`\n",
|
||||
"* All of your local models are automatically served on `localhost:11434`\n",
|
||||
"* Then, select as shown above\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Usage\n",
|
||||
"\n",
|
||||
"You can see a full list of supported parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain.llms.ollama.Ollama.html).\n",
|
||||
"\n",
|
||||
"If you are using a LLaMA `chat` model (e.g., `ollama pull llama2:7b-chat`) then you can use the `ChatOllama` interface.\n",
|
||||
"\n",
|
||||
"This includes [special tokens](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) for system message and user input."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOllama\n",
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler \n",
|
||||
"chat_model = ChatOllama(model=\"llama2:7b-chat\", \n",
|
||||
" callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"With `StreamingStdOutCallbackHandler`, you will see tokens streamed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Artificial intelligence (AI) has a rich and varied history that spans several decades. Hinweis: The following is a brief overview of the major milestones in the history of AI, but it is by no means exhaustive.\n",
|
||||
"\n",
|
||||
"1. Early Beginnings (1950s-1960s): The term \"Artificial Intelligence\" was coined in 1956 by computer scientist John McCarthy. However, the concept of creating machines that can think and learn like humans dates back to ancient times. In the 1950s and 1960s, researchers began exploring the possibilities of AI using simple algorithms and machine learning techniques.\n",
|
||||
"2. Rule-Based Systems (1970s-1980s): In the 1970s and 1980s, AI research focused on developing rule-based systems, which use predefined rules to reason and make decisions. This led to the development of expert systems, which were designed to mimic the decision-making abilities of human experts in specific domains.\n",
|
||||
"3. Machine Learning (1980s-1990s): The 1980s saw a shift towards machine learning, which enables machines to learn from data without being explicitly programmed. This led to the development of algorithms such as decision trees, neural networks, and support vector machines.\n",
|
||||
"4. Deep Learning (2000s-present): In the early 2000s, deep learning emerged as a subfield of machine learning, focusing on neural networks with multiple layers. These networks can learn complex representations of data, leading to breakthroughs in image and speech recognition, natural language processing, and other areas.\n",
|
||||
"5. Natural Language Processing (NLP) (1980s-present): NLP has been an active area of research since the 1980s, with a focus on developing algorithms that can understand and generate human language. This has led to applications such as chatbots, voice assistants, and language translation systems.\n",
|
||||
"6. Robotics (1970s-present): The development of robotics has been closely tied to AI research, with a focus on creating machines that can perform tasks that typically require human intelligence, such as manipulation and locomotion.\n",
|
||||
"7. Computer Vision (1980s-present): Computer vision has been an active area of research since the 1980s, with a focus on enabling machines to interpret and understand visual data from the world around us. This has led to applications such as image recognition, object detection, and autonomous driving.\n",
|
||||
"8. Ethics and Society (1990s-present): As AI technology has become more advanced and integrated into various aspects of society, there has been a growing concern about the ethical implications of AI. This includes issues related to privacy, bias, and job displacement.\n",
|
||||
"9. Reinforcement Learning (2000s-present): Reinforcement learning is a subfield of machine learning that involves training machines to make decisions based on feedback from their environment. This has led to breakthroughs in areas such as game playing, robotics, and autonomous driving.\n",
|
||||
"10. Generative Models (2010s-present): Generative models are a class of AI algorithms that can generate new data that is similar to a given dataset. This has led to applications such as image synthesis, music generation, and language creation.\n",
|
||||
"\n",
|
||||
"These are just a few of the many developments in the history of AI. As the field continues to evolve, we can expect even more exciting breakthroughs and innovations in the years to come."
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=' Artificial intelligence (AI) has a rich and varied history that spans several decades. Hinweis: The following is a brief overview of the major milestones in the history of AI, but it is by no means exhaustive.\\n\\n1. Early Beginnings (1950s-1960s): The term \"Artificial Intelligence\" was coined in 1956 by computer scientist John McCarthy. However, the concept of creating machines that can think and learn like humans dates back to ancient times. In the 1950s and 1960s, researchers began exploring the possibilities of AI using simple algorithms and machine learning techniques.\\n2. Rule-Based Systems (1970s-1980s): In the 1970s and 1980s, AI research focused on developing rule-based systems, which use predefined rules to reason and make decisions. This led to the development of expert systems, which were designed to mimic the decision-making abilities of human experts in specific domains.\\n3. Machine Learning (1980s-1990s): The 1980s saw a shift towards machine learning, which enables machines to learn from data without being explicitly programmed. This led to the development of algorithms such as decision trees, neural networks, and support vector machines.\\n4. Deep Learning (2000s-present): In the early 2000s, deep learning emerged as a subfield of machine learning, focusing on neural networks with multiple layers. These networks can learn complex representations of data, leading to breakthroughs in image and speech recognition, natural language processing, and other areas.\\n5. Natural Language Processing (NLP) (1980s-present): NLP has been an active area of research since the 1980s, with a focus on developing algorithms that can understand and generate human language. This has led to applications such as chatbots, voice assistants, and language translation systems.\\n6. Robotics (1970s-present): The development of robotics has been closely tied to AI research, with a focus on creating machines that can perform tasks that typically require human intelligence, such as manipulation and locomotion.\\n7. Computer Vision (1980s-present): Computer vision has been an active area of research since the 1980s, with a focus on enabling machines to interpret and understand visual data from the world around us. This has led to applications such as image recognition, object detection, and autonomous driving.\\n8. Ethics and Society (1990s-present): As AI technology has become more advanced and integrated into various aspects of society, there has been a growing concern about the ethical implications of AI. This includes issues related to privacy, bias, and job displacement.\\n9. Reinforcement Learning (2000s-present): Reinforcement learning is a subfield of machine learning that involves training machines to make decisions based on feedback from their environment. This has led to breakthroughs in areas such as game playing, robotics, and autonomous driving.\\n10. Generative Models (2010s-present): Generative models are a class of AI algorithms that can generate new data that is similar to a given dataset. This has led to applications such as image synthesis, music generation, and language creation.\\n\\nThese are just a few of the many developments in the history of AI. As the field continues to evolve, we can expect even more exciting breakthroughs and innovations in the years to come.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.schema import HumanMessage\n",
|
||||
"\n",
|
||||
"messages = [\n",
|
||||
" HumanMessage(content=\"Tell me about the history of AI\")\n",
|
||||
"]\n",
|
||||
"chat_model(messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## RAG\n",
|
||||
"\n",
|
||||
"We can use Olama with RAG, [just as shown here](https://python.langchain.com/docs/use_cases/question_answering/how_to/local_retrieval_qa).\n",
|
||||
"\n",
|
||||
"Let's use the 13b model:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"ollama pull llama2:13b\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Or, the 13b-chat model:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"ollama pull llama2:13b-chat\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Let's also use local embeddings from `GPT4AllEmbeddings` and `Chroma`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install gpt4all chromadb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import WebBaseLoader\n",
|
||||
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
|
||||
"data = loader.load()\n",
|
||||
"\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
|
||||
"all_splits = text_splitter.split_documents(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Found model file at /Users/rlm/.cache/gpt4all/ggml-all-MiniLM-L6-v2-f16.bin\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.embeddings import GPT4AllEmbeddings\n",
|
||||
"\n",
|
||||
"vectorstore = Chroma.from_documents(documents=all_splits, embedding=GPT4AllEmbeddings())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"4"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"What are the approaches to Task Decomposition?\"\n",
|
||||
"docs = vectorstore.similarity_search(question)\n",
|
||||
"len(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate\n",
|
||||
"\n",
|
||||
"# Prompt\n",
|
||||
"template = \"\"\"[INST] <<SYS>> Use the following pieces of context to answer the question at the end. \n",
|
||||
"If you don't know the answer, just say that you don't know, don't try to make up an answer. \n",
|
||||
"Use three sentences maximum and keep the answer as concise as possible. <</SYS>>\n",
|
||||
"{context}\n",
|
||||
"Question: {question}\n",
|
||||
"Helpful Answer:[/INST]\"\"\"\n",
|
||||
"QA_CHAIN_PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"context\", \"question\"],\n",
|
||||
" template=template,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Chat model\n",
|
||||
"from langchain.chat_models import ChatOllama\n",
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
||||
"chat_model = ChatOllama(model=\"llama2:13b-chat\",\n",
|
||||
" verbose=True,\n",
|
||||
" callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# QA chain\n",
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"qa_chain = RetrievalQA.from_chain_type(\n",
|
||||
" chat_model,\n",
|
||||
" retriever=vectorstore.as_retriever(),\n",
|
||||
" chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Based on the provided context, there are three approaches to task decomposition for AI agents:\n",
|
||||
"\n",
|
||||
"1. LLM with simple prompting, such as \"Steps for XYZ.\" or \"What are the subgoals for achieving XYZ?\"\n",
|
||||
"2. Task-specific instructions, such as \"Write a story outline\" for writing a novel.\n",
|
||||
"3. Human inputs."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"What are the various approaches to Task Decomposition for AI Agents?\"\n",
|
||||
"result = qa_chain({\"query\": question})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also get logging for tokens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Based on the given context, here is the answer to the question \"What are the approaches to Task Decomposition?\"\n",
|
||||
"\n",
|
||||
"There are three approaches to task decomposition:\n",
|
||||
"\n",
|
||||
"1. LLM with simple prompting, such as \"Steps for XYZ.\" or \"What are the subgoals for achieving XYZ?\"\n",
|
||||
"2. Using task-specific instructions, like \"Write a story outline\" for writing a novel.\n",
|
||||
"3. With human inputs.{'model': 'llama2:13b-chat', 'created_at': '2023-08-23T15:37:51.469127Z', 'done': True, 'context': [1, 29871, 1, 29961, 25580, 29962, 518, 25580, 29962, 518, 25580, 29962, 3532, 14816, 29903, 6778, 4803, 278, 1494, 12785, 310, 3030, 304, 1234, 278, 1139, 472, 278, 1095, 29889, 29871, 13, 3644, 366, 1016, 29915, 29873, 1073, 278, 1234, 29892, 925, 1827, 393, 366, 1016, 29915, 29873, 1073, 29892, 1016, 29915, 29873, 1018, 304, 1207, 701, 385, 1234, 29889, 29871, 13, 11403, 2211, 25260, 7472, 322, 3013, 278, 1234, 408, 3022, 895, 408, 1950, 29889, 529, 829, 14816, 29903, 6778, 13, 5398, 26227, 508, 367, 2309, 313, 29896, 29897, 491, 365, 26369, 411, 2560, 9508, 292, 763, 376, 7789, 567, 363, 1060, 29979, 29999, 7790, 29876, 29896, 19602, 376, 5618, 526, 278, 1014, 1484, 1338, 363, 3657, 15387, 1060, 29979, 29999, 29973, 613, 313, 29906, 29897, 491, 773, 3414, 29899, 14940, 11994, 29936, 321, 29889, 29887, 29889, 376, 6113, 263, 5828, 27887, 1213, 363, 5007, 263, 9554, 29892, 470, 313, 29941, 29897, 411, 5199, 10970, 29889, 13, 13, 5398, 26227, 508, 367, 2309, 313, 29896, 29897, 491, 365, 26369, 411, 2560, 9508, 292, 763, 376, 7789, 567, 363, 1060, 29979, 29999, 7790, 29876, 29896, 19602, 376, 5618, 526, 278, 1014, 1484, 1338, 363, 3657, 15387, 1060, 29979, 29999, 29973, 613, 313, 29906, 29897, 491, 773, 3414, 29899, 14940, 11994, 29936, 321, 29889, 29887, 29889, 376, 6113, 263, 5828, 27887, 1213, 363, 5007, 263, 9554, 29892, 470, 313, 29941, 29897, 411, 5199, 10970, 29889, 13, 13, 1451, 16047, 267, 297, 1472, 29899, 8489, 18987, 322, 3414, 26227, 29901, 1858, 9450, 975, 263, 3309, 29891, 4955, 322, 17583, 3902, 8253, 278, 1650, 2913, 3933, 18066, 292, 29889, 365, 26369, 29879, 21117, 304, 10365, 13900, 746, 20050, 411, 15668, 4436, 29892, 3907, 963, 3109, 16424, 9401, 304, 25618, 1058, 5110, 515, 14260, 322, 1059, 29889, 13, 13, 1451, 16047, 267, 297, 1472, 29899, 8489, 18987, 322, 3414, 26227, 29901, 1858, 9450, 975, 263, 3309, 29891, 4955, 322, 17583, 3902, 8253, 278, 1650, 2913, 3933, 18066, 292, 29889, 365, 26369, 29879, 21117, 304, 10365, 13900, 746, 20050, 411, 15668, 4436, 29892, 3907, 963, 3109, 16424, 9401, 304, 25618, 1058, 5110, 515, 14260, 322, 1059, 29889, 13, 16492, 29901, 1724, 526, 278, 13501, 304, 9330, 897, 510, 3283, 29973, 13, 29648, 1319, 673, 10834, 29914, 25580, 29962, 518, 29914, 25580, 29962, 518, 29914, 25580, 29962, 29871, 16564, 373, 278, 2183, 3030, 29892, 1244, 338, 278, 1234, 304, 278, 1139, 376, 5618, 526, 278, 13501, 304, 9330, 897, 510, 3283, 3026, 13, 13, 8439, 526, 2211, 13501, 304, 3414, 26227, 29901, 13, 13, 29896, 29889, 365, 26369, 411, 2560, 9508, 292, 29892, 1316, 408, 376, 7789, 567, 363, 1060, 29979, 29999, 1213, 470, 376, 5618, 526, 278, 1014, 1484, 1338, 363, 3657, 15387, 1060, 29979, 29999, 3026, 13, 29906, 29889, 5293, 3414, 29899, 14940, 11994, 29892, 763, 376, 6113, 263, 5828, 27887, 29908, 363, 5007, 263, 9554, 29889, 13, 29941, 29889, 2973, 5199, 10970, 29889, 2], 'total_duration': 9514823750, 'load_duration': 795542, 'sample_count': 99, 'sample_duration': 68732000, 'prompt_eval_count': 146, 'prompt_eval_duration': 6206275000, 'eval_count': 98, 'eval_duration': 3229641000}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.schema import LLMResult\n",
|
||||
"from langchain.callbacks.base import BaseCallbackHandler\n",
|
||||
"\n",
|
||||
"class GenerationStatisticsCallback(BaseCallbackHandler):\n",
|
||||
" def on_llm_end(self, response: LLMResult, **kwargs) -> None:\n",
|
||||
" print(response.generations[0][0].generation_info)\n",
|
||||
" \n",
|
||||
"callback_manager = CallbackManager([StreamingStdOutCallbackHandler(), GenerationStatisticsCallback()])\n",
|
||||
"\n",
|
||||
"chat_model = ChatOllama(model=\"llama2:13b-chat\",\n",
|
||||
" verbose=True,\n",
|
||||
" callback_manager=callback_manager)\n",
|
||||
"\n",
|
||||
"qa_chain = RetrievalQA.from_chain_type(\n",
|
||||
" chat_model,\n",
|
||||
" retriever=vectorstore.as_retriever(),\n",
|
||||
" chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"question = \"What are the approaches to Task Decomposition?\"\n",
|
||||
"result = qa_chain({\"query\": question})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`eval_count` / (`eval_duration`/10e9) gets `tok / s`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"30.343929867127645"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"98 / (3229641000/1000/1000/1000)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -143,12 +143,39 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c095285d",
|
||||
"cell_type": "markdown",
|
||||
"id": "57e27714",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
"source": [
|
||||
"## Fine-tuning\n",
|
||||
"\n",
|
||||
"You can call fine-tuned OpenAI models by passing in your corresponding `modelName` parameter.\n",
|
||||
"\n",
|
||||
"This generally takes the form of `ft:{OPENAI_MODEL_NAME}:{ORG_NAME}::{MODEL_ID}`. For example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "33c4a8b0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\"J'adore la programmation.\", additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"fine_tuned_model = ChatOpenAI(temperature=0, model_name=\"ft:gpt-3.5-turbo-0613:langchain::7qTVM5AR\")\n",
|
||||
"\n",
|
||||
"fine_tuned_model(messages)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -167,7 +194,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.7"
|
||||
"version": "3.10.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
325
docs/extras/integrations/chat_loaders/discord.ipynb
Normal file
@@ -0,0 +1,325 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c4ff9336-1cf3-459e-bd70-d1314c1da6a0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Discord\n",
|
||||
"\n",
|
||||
"This notebook shows how to create your own chat loader that works on copy-pasted messages (from dms) to a list of LangChain messages.\n",
|
||||
"\n",
|
||||
"The process has four steps:\n",
|
||||
"1. Create the chat .txt file by copying chats from the Discord app and pasting them in a file on your local computer\n",
|
||||
"2. Copy the chat loader definition from below to a local file.\n",
|
||||
"3. Initialize the `DiscordChatLoader` with the file path pointed to the text file.\n",
|
||||
"4. Call `loader.load()` (or `loader.lazy_load()`) to perform the conversion.\n",
|
||||
"\n",
|
||||
"## 1. Creat message dump\n",
|
||||
"\n",
|
||||
"Currently (2023/08/23) this loader only supports .txt files in the format generated by copying messages in the app to your clipboard and pasting in a file. Below is an example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "e4ccfdfa-6869-4d67-90a0-ab99f01b7553",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Overwriting discord_chats.txt\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%writefile discord_chats.txt\n",
|
||||
"talkingtower — 08/15/2023 11:10 AM\n",
|
||||
"Love music! Do you like jazz?\n",
|
||||
"reporterbob — 08/15/2023 9:27 PM\n",
|
||||
"Yes! Jazz is fantastic. Ever heard this one?\n",
|
||||
"Website\n",
|
||||
"Listen to classic jazz track...\n",
|
||||
"\n",
|
||||
"talkingtower — Yesterday at 5:03 AM\n",
|
||||
"Indeed! Great choice. 🎷\n",
|
||||
"reporterbob — Yesterday at 5:23 AM\n",
|
||||
"Thanks! How about some virtual sightseeing?\n",
|
||||
"Website\n",
|
||||
"Virtual tour of famous landmarks...\n",
|
||||
"\n",
|
||||
"talkingtower — Today at 2:38 PM\n",
|
||||
"Sounds fun! Let's explore.\n",
|
||||
"reporterbob — Today at 2:56 PM\n",
|
||||
"Enjoy the tour! See you around.\n",
|
||||
"talkingtower — Today at 3:00 PM\n",
|
||||
"Thank you! Goodbye! 👋\n",
|
||||
"reporterbob — Today at 3:02 PM\n",
|
||||
"Farewell! Happy exploring."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "359565a7-dad3-403c-a73c-6414b1295127",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2. Define chat loader\n",
|
||||
"\n",
|
||||
"LangChain currently does not support "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a429e0c4-4d7d-45f8-bbbb-c7fc5229f6af",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import logging\n",
|
||||
"import re\n",
|
||||
"from typing import Iterator, List\n",
|
||||
"\n",
|
||||
"from langchain import schema\n",
|
||||
"from langchain.chat_loaders import base as chat_loaders\n",
|
||||
"\n",
|
||||
"logger = logging.getLogger()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class DiscordChatLoader(chat_loaders.BaseChatLoader):\n",
|
||||
" \n",
|
||||
" def __init__(self, path: str):\n",
|
||||
" \"\"\"\n",
|
||||
" Initialize the Discord chat loader.\n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" path: Path to the exported Discord chat text file.\n",
|
||||
" \"\"\"\n",
|
||||
" self.path = path\n",
|
||||
" self._message_line_regex = re.compile(\n",
|
||||
" r\"(.+?) — (\\w{3,9} \\d{1,2}(?:st|nd|rd|th)?(?:, \\d{4})? \\d{1,2}:\\d{2} (?:AM|PM)|Today at \\d{1,2}:\\d{2} (?:AM|PM)|Yesterday at \\d{1,2}:\\d{2} (?:AM|PM))\", # noqa\n",
|
||||
" flags=re.DOTALL,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" def _load_single_chat_session_from_txt(\n",
|
||||
" self, file_path: str\n",
|
||||
" ) -> chat_loaders.ChatSession:\n",
|
||||
" \"\"\"\n",
|
||||
" Load a single chat session from a text file.\n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" file_path: Path to the text file containing the chat messages.\n",
|
||||
"\n",
|
||||
" Returns:\n",
|
||||
" A `ChatSession` object containing the loaded chat messages.\n",
|
||||
" \"\"\"\n",
|
||||
" with open(file_path, \"r\", encoding=\"utf-8\") as file:\n",
|
||||
" lines = file.readlines()\n",
|
||||
"\n",
|
||||
" results: List[schema.BaseMessage] = []\n",
|
||||
" current_sender = None\n",
|
||||
" current_timestamp = None\n",
|
||||
" current_content = []\n",
|
||||
" for line in lines:\n",
|
||||
" if re.match(\n",
|
||||
" r\".+? — (\\d{2}/\\d{2}/\\d{4} \\d{1,2}:\\d{2} (?:AM|PM)|Today at \\d{1,2}:\\d{2} (?:AM|PM)|Yesterday at \\d{1,2}:\\d{2} (?:AM|PM))\", # noqa\n",
|
||||
" line,\n",
|
||||
" ):\n",
|
||||
" if current_sender and current_content:\n",
|
||||
" results.append(\n",
|
||||
" schema.HumanMessage(\n",
|
||||
" content=\"\".join(current_content).strip(),\n",
|
||||
" additional_kwargs={\n",
|
||||
" \"sender\": current_sender,\n",
|
||||
" \"events\": [{\"message_time\": current_timestamp}],\n",
|
||||
" },\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" current_sender, current_timestamp = line.split(\" — \")[:2]\n",
|
||||
" current_content = [\n",
|
||||
" line[len(current_sender) + len(current_timestamp) + 4 :].strip()\n",
|
||||
" ]\n",
|
||||
" elif re.match(r\"\\[\\d{1,2}:\\d{2} (?:AM|PM)\\]\", line.strip()):\n",
|
||||
" results.append(\n",
|
||||
" schema.HumanMessage(\n",
|
||||
" content=\"\".join(current_content).strip(),\n",
|
||||
" additional_kwargs={\n",
|
||||
" \"sender\": current_sender,\n",
|
||||
" \"events\": [{\"message_time\": current_timestamp}],\n",
|
||||
" },\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" current_timestamp = line.strip()[1:-1]\n",
|
||||
" current_content = []\n",
|
||||
" else:\n",
|
||||
" current_content.append(\"\\n\" + line.strip())\n",
|
||||
"\n",
|
||||
" if current_sender and current_content:\n",
|
||||
" results.append(\n",
|
||||
" schema.HumanMessage(\n",
|
||||
" content=\"\".join(current_content).strip(),\n",
|
||||
" additional_kwargs={\n",
|
||||
" \"sender\": current_sender,\n",
|
||||
" \"events\": [{\"message_time\": current_timestamp}],\n",
|
||||
" },\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" return chat_loaders.ChatSession(messages=results)\n",
|
||||
"\n",
|
||||
" def lazy_load(self) -> Iterator[chat_loaders.ChatSession]:\n",
|
||||
" \"\"\"\n",
|
||||
" Lazy load the messages from the chat file and yield them in the required format.\n",
|
||||
"\n",
|
||||
" Yields:\n",
|
||||
" A `ChatSession` object containing the loaded chat messages.\n",
|
||||
" \"\"\"\n",
|
||||
" yield self._load_single_chat_session_from_txt(self.path)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c8240393-48be-44d2-b0d6-52c215cd8ac2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2. Create loader\n",
|
||||
"\n",
|
||||
"We will point to the file we just wrote to disk."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "1268de40-b0e5-445d-9cd8-54856cd0293a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = DiscordChatLoader(\n",
|
||||
" path=\"./discord_chats.txt\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4928df4b-ae31-48a7-bd76-be3ecee1f3e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3. Load Messages\n",
|
||||
"\n",
|
||||
"Assuming the format is correct, the loader will convert the chats to langchain messages."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "c8a0836d-4a22-4790-bfe9-97f2145bb0d6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"from langchain.chat_loaders.base import ChatSession\n",
|
||||
"from langchain.chat_loaders.utils import (\n",
|
||||
" map_ai_messages,\n",
|
||||
" merge_chat_runs,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"raw_messages = loader.lazy_load()\n",
|
||||
"# Merge consecutive messages from the same sender into a single message\n",
|
||||
"merged_messages = merge_chat_runs(raw_messages)\n",
|
||||
"# Convert messages from \"talkingtower\" to AI messages\n",
|
||||
"messages: List[ChatSession] = list(map_ai_messages(merged_messages, sender=\"talkingtower\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "1913963b-c44e-4f7a-aba7-0423c9b8bd59",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'messages': [AIMessage(content='Love music! Do you like jazz?', additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': '08/15/2023 11:10 AM\\n'}]}, example=False),\n",
|
||||
" HumanMessage(content='Yes! Jazz is fantastic. Ever heard this one?\\nWebsite\\nListen to classic jazz track...', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': '08/15/2023 9:27 PM\\n'}]}, example=False),\n",
|
||||
" AIMessage(content='Indeed! Great choice. 🎷', additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': 'Yesterday at 5:03 AM\\n'}]}, example=False),\n",
|
||||
" HumanMessage(content='Thanks! How about some virtual sightseeing?\\nWebsite\\nVirtual tour of famous landmarks...', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': 'Yesterday at 5:23 AM\\n'}]}, example=False),\n",
|
||||
" AIMessage(content=\"Sounds fun! Let's explore.\", additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': 'Today at 2:38 PM\\n'}]}, example=False),\n",
|
||||
" HumanMessage(content='Enjoy the tour! See you around.', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': 'Today at 2:56 PM\\n'}]}, example=False),\n",
|
||||
" AIMessage(content='Thank you! Goodbye! 👋', additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': 'Today at 3:00 PM\\n'}]}, example=False),\n",
|
||||
" HumanMessage(content='Farewell! Happy exploring.', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': 'Today at 3:02 PM\\n'}]}, example=False)]}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"messages"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8595a518-5c89-44aa-94a7-ca51e7e2a5fa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Next Steps\n",
|
||||
"\n",
|
||||
"You can then use these messages how you see fit, such as finetuning a model, few-shot example selection, or directly make predictions for the next message "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "08ff0a1e-fca0-4da3-aacd-d7401f99d946",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Thank you! Have a wonderful day! 🌟"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI()\n",
|
||||
"\n",
|
||||
"for chunk in llm.stream(messages[0]['messages']):\n",
|
||||
" print(chunk.content, end=\"\", flush=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "50a5251f-074a-4a3c-a2b0-b1de85e0ac6a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
579
docs/extras/integrations/chat_loaders/facebook.ipynb
Normal file
@@ -0,0 +1,579 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e4bd269b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Facebook Messenger\n",
|
||||
"\n",
|
||||
"This notebook shows how to load data from Facebook in a format you can finetune on. The overall steps are:\n",
|
||||
"\n",
|
||||
"1. Download your messenger data to disk.\n",
|
||||
"2. Create the Chat Loader and call `loader.load()` (or `loader.lazy_load()`) to perform the conversion.\n",
|
||||
"3. Optionally use `merge_chat_runs` to combine message from the same sender in sequence, and/or `map_ai_messages` to convert messages from the specified sender to the \"AIMessage\" class. Once you've done this, call `convert_messages_for_finetuning` to prepare your data for fine-tuning.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Once this has been done, you can fine-tune your model. To do so you would complete the following steps:\n",
|
||||
"\n",
|
||||
"4. Upload your messages to OpenAI and run a fine-tuning job.\n",
|
||||
"6. Use the resulting model in your LangChain app!\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Let's begin.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## 1. Download Data\n",
|
||||
"\n",
|
||||
"To download your own messenger data, following instructions [here](https://www.zapptales.com/en/download-facebook-messenger-chat-history-how-to/). IMPORTANT - make sure to download them in JSON format (not HTML).\n",
|
||||
"\n",
|
||||
"We are hosting an example dump at [this google drive link](https://drive.google.com/file/d/1rh1s1o2i7B-Sk1v9o8KNgivLVGwJ-osV/view?usp=sharing) that we will use in this walkthrough."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "647f2158-a42e-4634-b283-b8492caf542a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"File file.zip downloaded.\n",
|
||||
"File file.zip has been unzipped.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This uses some example data\n",
|
||||
"import requests\n",
|
||||
"import zipfile\n",
|
||||
"\n",
|
||||
"def download_and_unzip(url: str, output_path: str = 'file.zip') -> None:\n",
|
||||
" file_id = url.split('/')[-2]\n",
|
||||
" download_url = f'https://drive.google.com/uc?export=download&id={file_id}'\n",
|
||||
"\n",
|
||||
" response = requests.get(download_url)\n",
|
||||
" if response.status_code != 200:\n",
|
||||
" print('Failed to download the file.')\n",
|
||||
" return\n",
|
||||
"\n",
|
||||
" with open(output_path, 'wb') as file:\n",
|
||||
" file.write(response.content)\n",
|
||||
" print(f'File {output_path} downloaded.')\n",
|
||||
"\n",
|
||||
" with zipfile.ZipFile(output_path, 'r') as zip_ref:\n",
|
||||
" zip_ref.extractall()\n",
|
||||
" print(f'File {output_path} has been unzipped.')\n",
|
||||
"\n",
|
||||
"# URL of the file to download\n",
|
||||
"url = 'https://drive.google.com/file/d/1rh1s1o2i7B-Sk1v9o8KNgivLVGwJ-osV/view?usp=sharing'\n",
|
||||
"\n",
|
||||
"# Download and unzip\n",
|
||||
"download_and_unzip(url)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "48ef8bb1-fc28-453c-835a-94a552f05a91",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2. Create Chat Loader\n",
|
||||
"\n",
|
||||
"We have 2 different `FacebookMessengerChatLoader` classes, one for an entire directory of chats, and one to load individual files. We"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "a0869bc6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"directory_path = \"./hogwarts\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "0460bf25",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_loaders.facebook_messenger import (\n",
|
||||
" SingleFileFacebookMessengerChatLoader,\n",
|
||||
" FolderFacebookMessengerChatLoader,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "f61ee277",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = SingleFileFacebookMessengerChatLoader(\n",
|
||||
" path=\"./hogwarts/inbox/HermioneGranger/messages_Hermione_Granger.json\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "ec466ad7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[HumanMessage(content=\"Hi Hermione! How's your summer going so far?\", additional_kwargs={'sender': 'Harry Potter'}, example=False),\n",
|
||||
" HumanMessage(content=\"Harry! Lovely to hear from you. My summer is going well, though I do miss everyone. I'm spending most of my time going through my books and researching fascinating new topics. How about you?\", additional_kwargs={'sender': 'Hermione Granger'}, example=False),\n",
|
||||
" HumanMessage(content=\"I miss you all too. The Dursleys are being their usual unpleasant selves but I'm getting by. At least I can practice some spells in my room without them knowing. Let me know if you find anything good in your researching!\", additional_kwargs={'sender': 'Harry Potter'}, example=False)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat_session = loader.load()[0]\n",
|
||||
"chat_session[\"messages\"][:3]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "8a3ee473",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = FolderFacebookMessengerChatLoader(\n",
|
||||
" path=\"./hogwarts\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "9f41e122",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"9"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat_sessions = loader.load()\n",
|
||||
"len(chat_sessions)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d4aa3580-adc1-4b48-9bba-0e8e8d9f44ce",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3. Prepare for fine-tuning\n",
|
||||
"\n",
|
||||
"Calling `load()` returns all the chat messages we could extract as human messages. When conversing with chat bots, conversations typically follow a more strict alternating dialogue pattern relative to real conversations. \n",
|
||||
"\n",
|
||||
"You can choose to merge message \"runs\" (consecutive messages from the same sender) and select a sender to represent the \"AI\". The fine-tuned LLM will learn to generate these AI messages."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "5a78030d-b757-4bbe-8a6c-841056f46df7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_loaders.utils import (\n",
|
||||
" merge_chat_runs,\n",
|
||||
" map_ai_messages,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "ff35b028-78bf-4c5b-9ec6-939fe67de7f7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"merged_sessions = merge_chat_runs(chat_sessions)\n",
|
||||
"alternating_sessions = list(map_ai_messages(merged_sessions, \"Harry Potter\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "4b11906e-a496-4d01-9f0d-1938c14147bf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[AIMessage(content=\"Professor Snape, I was hoping I could speak with you for a moment about something that's been concerning me lately.\", additional_kwargs={'sender': 'Harry Potter'}, example=False),\n",
|
||||
" HumanMessage(content=\"What is it, Potter? I'm quite busy at the moment.\", additional_kwargs={'sender': 'Severus Snape'}, example=False),\n",
|
||||
" AIMessage(content=\"I apologize for the interruption, sir. I'll be brief. I've noticed some strange activity around the school grounds at night. I saw a cloaked figure lurking near the Forbidden Forest last night. I'm worried someone may be plotting something sinister.\", additional_kwargs={'sender': 'Harry Potter'}, example=False)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Now all of Harry Potter's messages will take the AI message class\n",
|
||||
"# which maps to the 'assistant' role in OpenAI's training format\n",
|
||||
"alternating_sessions[0]['messages'][:3]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d985478d-062e-47b9-ae9a-102f59be07c0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Now we can convert to OpenAI format dictionaries"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "21372331",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.adapters.openai import convert_messages_for_finetuning"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 38,
|
||||
"id": "92c5ae7a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Prepared 9 dialogues for training\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"training_data = convert_messages_for_finetuning(alternating_sessions)\n",
|
||||
"print(f\"Prepared {len(training_data)} dialogues for training\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"id": "dfcbd181",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'role': 'assistant',\n",
|
||||
" 'content': \"Professor Snape, I was hoping I could speak with you for a moment about something that's been concerning me lately.\"},\n",
|
||||
" {'role': 'user',\n",
|
||||
" 'content': \"What is it, Potter? I'm quite busy at the moment.\"},\n",
|
||||
" {'role': 'assistant',\n",
|
||||
" 'content': \"I apologize for the interruption, sir. I'll be brief. I've noticed some strange activity around the school grounds at night. I saw a cloaked figure lurking near the Forbidden Forest last night. I'm worried someone may be plotting something sinister.\"}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 33,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"training_data[0][:3]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f1a9fd64-4f9f-42d3-b5dc-2a340e51e9e7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"OpenAI currently requires at least 10 training examples for a fine-tuning job, though they recommend between 50-100 for most tasks. Since we only have 9 chat sessions, we can subdivide them (optionally with some overlap) so that each training example is comprised of a portion of a whole conversation.\n",
|
||||
"\n",
|
||||
"Facebook chat sessions (1 per person) often span multiple days and conversations,\n",
|
||||
"so the long-range dependencies may not be that important to model anyhow."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 42,
|
||||
"id": "13cd290a-b1e9-4686-bb5e-d99de8b8612b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"100"
|
||||
]
|
||||
},
|
||||
"execution_count": 42,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Our chat is alternating, we will make each datapoint a group of 8 messages,\n",
|
||||
"# with 2 messages overlapping\n",
|
||||
"chunk_size = 8\n",
|
||||
"overlap = 2\n",
|
||||
"\n",
|
||||
"training_examples = [\n",
|
||||
" conversation_messages[i: i + chunk_size] \n",
|
||||
" for conversation_messages in training_data\n",
|
||||
" for i in range(\n",
|
||||
" 0, len(conversation_messages) - chunk_size + 1, \n",
|
||||
" chunk_size - overlap)\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"len(training_examples)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cc8baf41-ff07-4492-96bd-b2472ee7cef9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 4. Fine-tune the model\n",
|
||||
"\n",
|
||||
"It's time to fine-tune the model. Make sure you have `openai` installed\n",
|
||||
"and have set your `OPENAI_API_KEY` appropriately"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 43,
|
||||
"id": "95ce3f63-3c80-44b2-9060-534ad74e16fa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# %pip install -U openai --quiet"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 58,
|
||||
"id": "ab9e28eb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"File file-zCyNBeg4snpbBL7VkvsuhCz8 ready afer 30.55 seconds.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"from io import BytesIO\n",
|
||||
"import time\n",
|
||||
"\n",
|
||||
"import openai\n",
|
||||
"\n",
|
||||
"# We will write the jsonl file in memory\n",
|
||||
"my_file = BytesIO()\n",
|
||||
"for m in training_examples:\n",
|
||||
" my_file.write((json.dumps({\"messages\": m}) + \"\\n\").encode('utf-8'))\n",
|
||||
"\n",
|
||||
"my_file.seek(0)\n",
|
||||
"training_file = openai.File.create(\n",
|
||||
" file=my_file,\n",
|
||||
" purpose='fine-tune'\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# OpenAI audits each training file for compliance reasons.\n",
|
||||
"# This make take a few minutes\n",
|
||||
"status = openai.File.retrieve(training_file.id).status\n",
|
||||
"start_time = time.time()\n",
|
||||
"while status != \"processed\":\n",
|
||||
" print(f\"Status=[{status}]... {time.time() - start_time:.2f}s\", end=\"\\r\", flush=True)\n",
|
||||
" time.sleep(5)\n",
|
||||
" status = openai.File.retrieve(training_file.id).status\n",
|
||||
"print(f\"File {training_file.id} ready after {time.time() - start_time:.2f} seconds.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "759a7f51-fde9-4b75-aaa9-e600e6537bd1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"With the file ready, it's time to kick off a training job."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 59,
|
||||
"id": "3f451425",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"job = openai.FineTuningJob.create(\n",
|
||||
" training_file=training_file.id,\n",
|
||||
" model=\"gpt-3.5-turbo\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "489b23ef-5e14-42a9-bafb-44220ec6960b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Grab a cup of tea while your model is being prepared. This may take some time!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 60,
|
||||
"id": "bac1637a-c087-4523-ade1-c47f9bf4c6f4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Status=[running]... 908.87s\r"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"status = openai.FineTuningJob.retrieve(job.id).status\n",
|
||||
"start_time = time.time()\n",
|
||||
"while status != \"succeeded\":\n",
|
||||
" print(f\"Status=[{status}]... {time.time() - start_time:.2f}s\", end=\"\\r\", flush=True)\n",
|
||||
" time.sleep(5)\n",
|
||||
" job = openai.FineTuningJob.retrieve(job.id)\n",
|
||||
" status = job.status"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 66,
|
||||
"id": "535895e1-bc69-40e5-82ed-e24ed2baeeee",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"ft:gpt-3.5-turbo-0613:personal::7rDwkaOq\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(job.fine_tuned_model)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "502ff73b-f9e9-49ce-ba45-401811e57946",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 5. Use in LangChain\n",
|
||||
"\n",
|
||||
"You can use the resulting model ID directly the `ChatOpenAI` model class."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 67,
|
||||
"id": "3925d60d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(\n",
|
||||
" model=job.fine_tuned_model,\n",
|
||||
" temperature=1,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 69,
|
||||
"id": "7190cf2e-ab34-4ceb-bdad-45f24f069c29",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"human\", \"{input}\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"chain = prompt | model | StrOutputParser()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 72,
|
||||
"id": "f02057e9-f914-40b1-9c9d-9432ff594b98",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"The usual - Potions, Transfiguration, Defense Against the Dark Arts. What about you?"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for tok in chain.stream({\"input\": \"What classes are you taking?\"}):\n",
|
||||
" print(tok, end=\"\", flush=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "35331503-3cc6-4d64-955e-64afe6b5fef3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
179
docs/extras/integrations/chat_loaders/gmail.ipynb
Normal file
@@ -0,0 +1,179 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b3d1705d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GMail\n",
|
||||
"\n",
|
||||
"This loader goes over how to load data from GMail. There are many ways you could want to load data from GMail. This loader is currently fairly opionated in how to do so. The way it does it is it first looks for all messages that you have sent. It then looks for messages where you are responding to a previous email. It then fetches that previous email, and creates a training example of that email, followed by your email.\n",
|
||||
"\n",
|
||||
"Note that there are clear limitations here. For example, all examples created are only looking at the previous email for context.\n",
|
||||
"\n",
|
||||
"To use:\n",
|
||||
"\n",
|
||||
"- Set up a Google Developer Account: Go to the Google Developer Console, create a project, and enable the Gmail API for that project. This will give you a credentials.json file that you'll need later.\n",
|
||||
"\n",
|
||||
"- Install the Google Client Library: Run the following command to install the Google Client Library:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "84578039",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install --upgrade google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "be18f796",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os.path\n",
|
||||
"import base64\n",
|
||||
"import json\n",
|
||||
"import re\n",
|
||||
"import time\n",
|
||||
"from google.auth.transport.requests import Request\n",
|
||||
"from google.oauth2.credentials import Credentials\n",
|
||||
"from google_auth_oauthlib.flow import InstalledAppFlow\n",
|
||||
"from googleapiclient.discovery import build\n",
|
||||
"import logging\n",
|
||||
"import requests\n",
|
||||
"\n",
|
||||
"SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"creds = None\n",
|
||||
"# The file token.json stores the user's access and refresh tokens, and is\n",
|
||||
"# created automatically when the authorization flow completes for the first\n",
|
||||
"# time.\n",
|
||||
"if os.path.exists('email_token.json'):\n",
|
||||
" creds = Credentials.from_authorized_user_file('email_token.json', SCOPES)\n",
|
||||
"# If there are no (valid) credentials available, let the user log in.\n",
|
||||
"if not creds or not creds.valid:\n",
|
||||
" if creds and creds.expired and creds.refresh_token:\n",
|
||||
" creds.refresh(Request())\n",
|
||||
" else:\n",
|
||||
" flow = InstalledAppFlow.from_client_secrets_file( \n",
|
||||
" # your creds file here. Please create json file as here https://cloud.google.com/docs/authentication/getting-started\n",
|
||||
" 'creds.json', SCOPES)\n",
|
||||
" creds = flow.run_local_server(port=0)\n",
|
||||
" # Save the credentials for the next run\n",
|
||||
" with open('email_token.json', 'w') as token:\n",
|
||||
" token.write(creds.to_json())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "a2793ba0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_loaders.gmail import GMailLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "2154597f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = GMailLoader(creds=creds, n=3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "0b7d11bd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "74764bc7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"2"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Sometimes there can be errors which we silently ignore\n",
|
||||
"len(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "d9360a85",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_loaders.utils import (\n",
|
||||
" map_ai_messages,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "a9646f7a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This makes messages sent by hchase@langchain.com the AI Messages\n",
|
||||
"# This means you will train an LLM to predict as if it's responding as hchase\n",
|
||||
"training_data = list(map_ai_messages(data, sender=\"Harrison Chase <hchase@langchain.com>\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d1a182f0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
188
docs/extras/integrations/chat_loaders/index.mdx
Normal file
@@ -0,0 +1,188 @@
|
||||
---
|
||||
sidebar_position: 0
|
||||
---
|
||||
|
||||
# Chat loaders
|
||||
|
||||
Like document loaders, chat loaders are utilities designed to help load conversations from popular communication platforms such as Facebook, Slack, Discord, etc. These are loaded into memory as LangChain chat message objects. Such utilities facilitate tasks such as fine-tuning a language model to match your personal style or voice.
|
||||
|
||||
This brief guide will illustrate the process using [OpenAI's fine-tuning API](https://platform.openai.com/docs/guides/fine-tuning) comprised of six steps:
|
||||
|
||||
1. Export your Facebook Messenger chat data in a compatible format for your intended chat loader.
|
||||
2. Load the chat data into memory as LangChain chat message objects. (_this is what is covered in each integration notebook in this section of the documentation_).
|
||||
- Assign a person to the "AI" role and optionally filter, group, and merge messages.
|
||||
3. Export these acquired messages in a format expected by the fine-tuning API.
|
||||
4. Upload this data to OpenAI.
|
||||
5. Fine-tune your model.
|
||||
6. Implement the fine-tuned model in LangChain.
|
||||
|
||||
This guide is not wholly comprehensive but is designed to take you through the fundamentals of going from raw data to fine-tuned model.
|
||||
|
||||
We will demonstrate the procedure through an example of fine-tuning a `gpt-3.5-turbo` model on Facebook Messenger data.
|
||||
|
||||
### 1. Export your chat data
|
||||
|
||||
To export your Facebook messenger data, you can follow the [instructions here](https://www.zapptales.com/en/download-facebook-messenger-chat-history-how-to/).
|
||||
|
||||
:::important JSON format
|
||||
You must select "JSON format" (instead of HTML) when exporting your data to be compatible with the current loader.
|
||||
:::
|
||||
|
||||
OpenAI requires at least 10 examples to fine-tune your model, but they recommend between 50-100 for more optimal results.
|
||||
You can use the example data stored at [this google drive link](https://drive.google.com/file/d/1rh1s1o2i7B-Sk1v9o8KNgivLVGwJ-osV/view?usp=sharing) to test the process.
|
||||
|
||||
### 2. Load the chat
|
||||
|
||||
Once you've obtained your chat data, you can load it into memory as LangChain chat message objects. Here’s an example of loading data using the Python code:
|
||||
|
||||
```python
|
||||
from langchain.chat_loaders.facebook_messenger import FolderFacebookMessengerChatLoader
|
||||
|
||||
loader = FolderFacebookMessengerChatLoader(
|
||||
path="./facebook_messenger_chats",
|
||||
)
|
||||
|
||||
chat_sessions = loader.load()
|
||||
```
|
||||
|
||||
In this snippet, we point the loader to a directory of Facebook chat dumps which are then loaded as multiple "sessions" of messages, one session per conversation file.
|
||||
|
||||
Once you've loaded the messages, you should decide which person you want to fine-tune the model to (usually yourself). You can also decide to merge consecutive messages from the same sender into a single chat message.
|
||||
For both of these tasks, you can use the chat_loaders utilities to do so:
|
||||
|
||||
```
|
||||
from langchain.chat_loaders.utils import (
|
||||
merge_chat_runs,
|
||||
map_ai_messages,
|
||||
)
|
||||
|
||||
merged_sessions = merge_chat_runs(chat_sessions)
|
||||
alternating_sessions = list(map_ai_messages(merged_sessions, "My Name"))
|
||||
```
|
||||
|
||||
### 3. Export messages to OpenAI format
|
||||
|
||||
Convert the chat messages to dictionaries using the `convert_messages_for_finetuning` function. Then, group the data into chunks for better context modeling and overlap management.
|
||||
|
||||
```python
|
||||
from langchain.adapters.openai import convert_messages_for_finetuning
|
||||
|
||||
openai_messages = convert_messages_for_finetuning(chat_sessions)
|
||||
```
|
||||
|
||||
At this point, the data is ready for upload to OpenAI. You can choose to split up conversations into smaller chunks for training if you
|
||||
do not have enough conversations to train on. Feel free to play around with different chunk sizes or with adding system messages to the fine-tuning data.
|
||||
|
||||
```python
|
||||
chunk_size = 8
|
||||
overlap = 2
|
||||
|
||||
message_groups = [
|
||||
conversation_messages[i: i + chunk_size]
|
||||
for conversation_messages in openai_messages
|
||||
for i in range(
|
||||
0, len(conversation_messages) - chunk_size + 1,
|
||||
chunk_size - overlap)
|
||||
]
|
||||
|
||||
len(message_groups)
|
||||
# 9
|
||||
```
|
||||
|
||||
### 4. Upload the data to OpenAI
|
||||
|
||||
Ensure you have set your OpenAI API key by following these [instructions](https://platform.openai.com/account/api-keys), then upload the training file.
|
||||
An audit is performed to ensure data compliance, so you may have to wait a few minutes for the dataset to become ready for use.
|
||||
|
||||
```python
|
||||
import time
|
||||
import json
|
||||
import io
|
||||
|
||||
import openai
|
||||
|
||||
my_file = io.BytesIO()
|
||||
for group in message_groups:
|
||||
my_file.write((json.dumps({"messages": group}) + "\n").encode('utf-8'))
|
||||
|
||||
my_file.seek(0)
|
||||
training_file = openai.File.create(
|
||||
file=my_file,
|
||||
purpose='fine-tune'
|
||||
)
|
||||
|
||||
# Wait while the file is processed
|
||||
status = openai.File.retrieve(training_file.id).status
|
||||
start_time = time.time()
|
||||
while status != "processed":
|
||||
print(f"Status=[{status}]... {time.time() - start_time:.2f}s", end="\r", flush=True)
|
||||
time.sleep(5)
|
||||
status = openai.File.retrieve(training_file.id).status
|
||||
print(f"File {training_file.id} ready after {time.time() - start_time:.2f} seconds.")
|
||||
```
|
||||
|
||||
Once this is done, you can proceed to the model training!
|
||||
|
||||
### 5. Fine-tune the model
|
||||
|
||||
Start the fine-tuning job with your chosen base model.
|
||||
|
||||
```python
|
||||
job = openai.FineTuningJob.create(
|
||||
training_file=training_file.id,
|
||||
model="gpt-3.5-turbo",
|
||||
)
|
||||
```
|
||||
|
||||
This might take a while. Check the status with `openai.FineTuningJob.retrieve(job.id).status` and wait for it to report `succeeded`.
|
||||
|
||||
```python
|
||||
# It may take 10-20+ minutes to complete training.
|
||||
status = openai.FineTuningJob.retrieve(job.id).status
|
||||
start_time = time.time()
|
||||
while status != "succeeded":
|
||||
print(f"Status=[{status}]... {time.time() - start_time:.2f}s", end="\r", flush=True)
|
||||
time.sleep(5)
|
||||
job = openai.FineTuningJob.retrieve(job.id)
|
||||
status = job.status
|
||||
```
|
||||
|
||||
### 6. Use the model in LangChain
|
||||
|
||||
You're almost there! Use the fine-tuned model in LangChain.
|
||||
|
||||
```python
|
||||
from langchain import chat_models
|
||||
|
||||
model_name = job.fine_tuned_model
|
||||
# Example: ft:gpt-3.5-turbo-0613:personal::5mty86jblapsed
|
||||
model = chat_models.ChatOpenAI(model=model_name)
|
||||
```
|
||||
|
||||
```python
|
||||
from langchain.prompts import ChatPromptTemplate
|
||||
from langchain.schema.output_parser import StrOutputParser
|
||||
|
||||
prompt = ChatPromptTemplate.from_messages(
|
||||
[
|
||||
("human", "{input}"),
|
||||
]
|
||||
)
|
||||
|
||||
chain = prompt | model | StrOutputParser()
|
||||
|
||||
for tok in chain.stream({"input": "What classes are you taking?"}):
|
||||
print(tok, end="", flush=True)
|
||||
|
||||
# The usual - Potions, Transfiguration, Defense Against the Dark Arts. What about you?
|
||||
```
|
||||
|
||||
And that's it! You've successfully fine-tuned a model and used it in LangChain.
|
||||
|
||||
## Supported Chat Loaders
|
||||
|
||||
LangChain currently supports the following chat loaders. Feel free to contribute more!
|
||||
|
||||
import DocCardList from "@theme/DocCardList";
|
||||
|
||||
<DocCardList />
|
||||
163
docs/extras/integrations/chat_loaders/slack.ipynb
Normal file
@@ -0,0 +1,163 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "01fcfa2f-33a9-48f3-835a-b1956c394d6b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Slack\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the Slack chat loader. This class helps map exported slack conversations to LangChain chat messages.\n",
|
||||
"\n",
|
||||
"The process has three steps:\n",
|
||||
"1. Export the desired conversation thread by following the [instructions here](https://slack.com/help/articles/1500001548241-Request-to-export-all-conversations).\n",
|
||||
"2. Create the `SlackChatLoader` with the file path pointed to the json file or directory of JSON files\n",
|
||||
"3. Call `loader.load()` (or `loader.lazy_load()`) to perform the conversion. Optionally use `merge_chat_runs` to combine message from the same sender in sequence, and/or `map_ai_messages` to convert messages from the specified sender to the \"AIMessage\" class.\n",
|
||||
"\n",
|
||||
"## 1. Creat message dump\n",
|
||||
"\n",
|
||||
"Currently (2023/08/23) this loader best supports a zip directory of files in the format generated by exporting your a direct message converstion from Slack. Follow up-to-date instructions from slack on how to do so.\n",
|
||||
"\n",
|
||||
"We have an example in the LangChain repo."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "a79d35bf-5f21-4063-84bf-a60845c1c51f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import requests\n",
|
||||
"\n",
|
||||
"permalink = \"https://raw.githubusercontent.com/langchain-ai/langchain/342087bdfa3ac31d622385d0f2d09cf5e06c8db3/libs/langchain/tests/integration_tests/examples/slack_export.zip\"\n",
|
||||
"response = requests.get(permalink)\n",
|
||||
"with open(\"slack_dump.zip\", \"wb\") as f:\n",
|
||||
" f.write(response.content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cf60f703-76f1-4602-a723-02c59535c1af",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2. Create the Chat Loader\n",
|
||||
"\n",
|
||||
"Provide the loader with the file path to the zip directory. You can optionally specify the user id that maps to an ai message as well an configure whether to merge message runs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "4b8b432a-d2bc-49e1-b35f-761730a8fd6d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_loaders.slack import SlackChatLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8ec6661b-0aca-48ae-9e2b-6412856c287b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = SlackChatLoader(\n",
|
||||
" path=\"slack_dump.zip\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8805a7c5-84b4-49f5-8989-0022f2054ace",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3. Load messages\n",
|
||||
"\n",
|
||||
"The `load()` (or `lazy_load`) methods return a list of \"ChatSessions\" that currently just contain a list of messages per loaded conversation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "fcd69b3e-020d-4a15-8a0d-61c2d34e1ee1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"from langchain.chat_loaders.base import ChatSession\n",
|
||||
"from langchain.chat_loaders.utils import (\n",
|
||||
" map_ai_messages,\n",
|
||||
" merge_chat_runs,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"raw_messages = loader.lazy_load()\n",
|
||||
"# Merge consecutive messages from the same sender into a single message\n",
|
||||
"merged_messages = merge_chat_runs(raw_messages)\n",
|
||||
"# Convert messages from \"U0500003428\" to AI messages\n",
|
||||
"messages: List[ChatSession] = list(map_ai_messages(merged_messages, sender=\"U0500003428\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7d033f87-cd0c-4f44-a753-41b871c1e919",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Next Steps\n",
|
||||
"\n",
|
||||
"You can then use these messages how you see fit, such as finetuning a model, few-shot example selection, or directly make predictions for the next message. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "7d8a1629-5d9e-49b3-b978-3add57027d59",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Hi, \n",
|
||||
"\n",
|
||||
"I hope you're doing well. I wanted to reach out and ask if you'd be available to meet up for coffee sometime next week. I'd love to catch up and hear about what's been going on in your life. Let me know if you're interested and we can find a time that works for both of us. \n",
|
||||
"\n",
|
||||
"Looking forward to hearing from you!\n",
|
||||
"\n",
|
||||
"Best, [Your Name]"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI()\n",
|
||||
"\n",
|
||||
"for chunk in llm.stream(messages[1]['messages']):\n",
|
||||
" print(chunk.content, end=\"\", flush=True)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
206
docs/extras/integrations/chat_loaders/telegram.ipynb
Normal file
@@ -0,0 +1,206 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "735455a6-f82e-4252-b545-27385ef883f4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Telegram\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the Telegram chat loader. This class helps map exported Telegram conversations to LangChain chat messages.\n",
|
||||
"\n",
|
||||
"The process has three steps:\n",
|
||||
"1. Export the chat .txt file by copying chats from the Discord app and pasting them in a file on your local computer\n",
|
||||
"2. Create the `TelegramChatLoader` with the file path pointed to the json file or directory of JSON files\n",
|
||||
"3. Call `loader.load()` (or `loader.lazy_load()`) to perform the conversion. Optionally use `merge_chat_runs` to combine message from the same sender in sequence, and/or `map_ai_messages` to convert messages from the specified sender to the \"AIMessage\" class.\n",
|
||||
"\n",
|
||||
"## 1. Creat message dump\n",
|
||||
"\n",
|
||||
"Currently (2023/08/23) this loader best supports json files in the format generated by exporting your chat history from the [Telegram Desktop App](https://desktop.telegram.org/).\n",
|
||||
"\n",
|
||||
"**Important:** There are 'lite' versions of telegram such as \"Telegram for MacOS\" that lack the export functionality. Please make sure you use the correct app to export the file.\n",
|
||||
"\n",
|
||||
"To make the export:\n",
|
||||
"1. Download and open telegram desktop\n",
|
||||
"2. Select a conversation\n",
|
||||
"3. Navigate to the conversation settings (currently the three dots in the top right corner)\n",
|
||||
"4. Click \"Export Chat History\"\n",
|
||||
"5. Unselect photos and other media. Select \"Machine-readable JSON\" format to export.\n",
|
||||
"\n",
|
||||
"An example is below: "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "285f2044-0f58-4b92-addb-9f8569076734",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Overwriting telegram_conversation.json\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%writefile telegram_conversation.json\n",
|
||||
"{\n",
|
||||
" \"name\": \"Jiminy\",\n",
|
||||
" \"type\": \"personal_chat\",\n",
|
||||
" \"id\": 5965280513,\n",
|
||||
" \"messages\": [\n",
|
||||
" {\n",
|
||||
" \"id\": 1,\n",
|
||||
" \"type\": \"message\",\n",
|
||||
" \"date\": \"2023-08-23T13:11:23\",\n",
|
||||
" \"date_unixtime\": \"1692821483\",\n",
|
||||
" \"from\": \"Jiminy Cricket\",\n",
|
||||
" \"from_id\": \"user123450513\",\n",
|
||||
" \"text\": \"You better trust your conscience\",\n",
|
||||
" \"text_entities\": [\n",
|
||||
" {\n",
|
||||
" \"type\": \"plain\",\n",
|
||||
" \"text\": \"You better trust your conscience\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"id\": 2,\n",
|
||||
" \"type\": \"message\",\n",
|
||||
" \"date\": \"2023-08-23T13:13:20\",\n",
|
||||
" \"date_unixtime\": \"1692821600\",\n",
|
||||
" \"from\": \"Batman & Robin\",\n",
|
||||
" \"from_id\": \"user6565661032\",\n",
|
||||
" \"text\": \"What did you just say?\",\n",
|
||||
" \"text_entities\": [\n",
|
||||
" {\n",
|
||||
" \"type\": \"plain\",\n",
|
||||
" \"text\": \"What did you just say?\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7cc109f4-4c92-4cd3-8143-c322776c3f03",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2. Create the Chat Loader\n",
|
||||
"\n",
|
||||
"All that's required is the file path. You can optionally specify the user name that maps to an ai message as well an configure whether to merge message runs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "111f7767-573c-42d4-86f0-bd766bbaa071",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_loaders.telegram import TelegramChatLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a4226efa-2640-4990-a20c-6861d1887329",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TelegramChatLoader(\n",
|
||||
" path=\"./telegram_conversation.json\", \n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "71699fb7-7815-4c89-8d96-30e8fada6923",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3. Load messages\n",
|
||||
"\n",
|
||||
"The `load()` (or `lazy_load`) methods return a list of \"ChatSessions\" that currently just contain a list of messages per loaded conversation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "81121efb-c875-4a77-ad1e-fe26b3d7e812",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"from langchain.chat_loaders.base import ChatSession\n",
|
||||
"from langchain.chat_loaders.utils import (\n",
|
||||
" map_ai_messages,\n",
|
||||
" merge_chat_runs,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"raw_messages = loader.lazy_load()\n",
|
||||
"# Merge consecutive messages from the same sender into a single message\n",
|
||||
"merged_messages = merge_chat_runs(raw_messages)\n",
|
||||
"# Convert messages from \"Jiminy Cricket\" to AI messages\n",
|
||||
"messages: List[ChatSession] = list(map_ai_messages(merged_messages, sender=\"Jiminy Cricket\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b9089c05-7375-41ca-a2f9-672a845314e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Next Steps\n",
|
||||
"\n",
|
||||
"You can then use these messages how you see fit, such as finetuning a model, few-shot example selection, or directly make predictions for the next message "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "637a6f5d-6944-4722-9361-a76ef5e9dd2a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"I said, \"You better trust your conscience.\""
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI()\n",
|
||||
"\n",
|
||||
"for chunk in llm.stream(messages[0]['messages']):\n",
|
||||
" print(chunk.content, end=\"\", flush=True)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
77
docs/extras/integrations/chat_loaders/twitter.ipynb
Normal file
@@ -0,0 +1,77 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d86853d2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Twitter (via Apify)\n",
|
||||
"\n",
|
||||
"This notebook shows how to load chat messages from Twitter to finetune on. We do this by utilizing Apify. \n",
|
||||
"\n",
|
||||
"First, use Apify to export tweets. An example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "e5034b4e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"from langchain.schema import AIMessage\n",
|
||||
"from langchain.adapters.openai import convert_message_to_dict"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "8bf0fb93",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with open('example_data/dataset_twitter-scraper_2023-08-23_22-13-19-740.json') as f:\n",
|
||||
" data = json.load(f)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "468124fa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Filter out tweets that reference other tweets, because it's a bit weird\n",
|
||||
"tweets = [d[\"full_text\"] for d in data if \"t.co\" not in d['full_text']]\n",
|
||||
"# Create them as AI messages\n",
|
||||
"messages = [AIMessage(content=t) for t in tweets]\n",
|
||||
"# Add in a system message at the start\n",
|
||||
"# TODO: we could try to extract the subject from the tweets, and put that in the system message.\n",
|
||||
"system_message = {\"role\": \"system\", \"content\": \"write a tweet\"}\n",
|
||||
"data = [[system_message, convert_message_to_dict(m)] for m in messages]"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
204
docs/extras/integrations/chat_loaders/whatsapp.ipynb
Normal file
@@ -0,0 +1,204 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "735455a6-f82e-4252-b545-27385ef883f4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# WhatsApp\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the WhatsApp chat loader. This class helps map exported Telegram conversations to LangChain chat messages.\n",
|
||||
"\n",
|
||||
"The process has three steps:\n",
|
||||
"1. Export the chat conversations to computer\n",
|
||||
"2. Create the `WhatsAppChatLoader` with the file path pointed to the json file or directory of JSON files\n",
|
||||
"3. Call `loader.load()` (or `loader.lazy_load()`) to perform the conversion.\n",
|
||||
"\n",
|
||||
"## 1. Creat message dump\n",
|
||||
"\n",
|
||||
"To make the export of your WhatsApp conversation(s), complete the following steps:\n",
|
||||
"\n",
|
||||
"1. Open the target conversation\n",
|
||||
"2. Click the three dots in the top right corner and select \"More\".\n",
|
||||
"3. Then select \"Export chat\" and choose \"Without media\".\n",
|
||||
"\n",
|
||||
"An example of the data format for each converation is below: "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "285f2044-0f58-4b92-addb-9f8569076734",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Writing whatsapp_chat.txt\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%writefile whatsapp_chat.txt\n",
|
||||
"[8/15/23, 9:12:33 AM] Dr. Feather: Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them.\n",
|
||||
"[8/15/23, 9:12:43 AM] Dr. Feather: I spotted a rare Hyacinth Macaw yesterday in the Amazon Rainforest. Such a magnificent creature!\n",
|
||||
"[8/15/23, 9:12:48 AM] Dr. Feather: image omitted\n",
|
||||
"[8/15/23, 9:13:15 AM] Jungle Jane: That's stunning! Were you able to observe its behavior?\n",
|
||||
"[8/15/23, 9:13:23 AM] Dr. Feather: image omitted\n",
|
||||
"[8/15/23, 9:14:02 AM] Dr. Feather: Yes, it seemed quite social with other macaws. They're known for their playful nature.\n",
|
||||
"[8/15/23, 9:14:15 AM] Jungle Jane: How's the research going on parrot communication?\n",
|
||||
"[8/15/23, 9:14:30 AM] Dr. Feather: image omitted\n",
|
||||
"[8/15/23, 9:14:50 AM] Dr. Feather: It's progressing well. We're learning so much about how they use sound and color to communicate.\n",
|
||||
"[8/15/23, 9:15:10 AM] Jungle Jane: That's fascinating! Can't wait to read your paper on it.\n",
|
||||
"[8/15/23, 9:15:20 AM] Dr. Feather: Thank you! I'll send you a draft soon.\n",
|
||||
"[8/15/23, 9:25:16 PM] Jungle Jane: Looking forward to it! Keep up the great work."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7cc109f4-4c92-4cd3-8143-c322776c3f03",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2. Create the Chat Loader\n",
|
||||
"\n",
|
||||
"The WhatsAppChatLoader accepts the resulting zip file, unzipped directory, or the path to any of the chat `.txt` files therein.\n",
|
||||
"\n",
|
||||
"Provide that as well as the user name you want to take on the role of \"AI\" when finetuning."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "111f7767-573c-42d4-86f0-bd766bbaa071",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_loaders.whatsapp import WhatsAppChatLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "a4226efa-2640-4990-a20c-6861d1887329",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = WhatsAppChatLoader(\n",
|
||||
" path=\"./whatsapp_chat.txt\", \n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "71699fb7-7815-4c89-8d96-30e8fada6923",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 3. Load messages\n",
|
||||
"\n",
|
||||
"The `load()` (or `lazy_load`) methods return a list of \"ChatSessions\" that currently store the list of messages per loaded conversation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "81121efb-c875-4a77-ad1e-fe26b3d7e812",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'messages': [AIMessage(content='I spotted a rare Hyacinth Macaw yesterday in the Amazon Rainforest. Such a magnificent creature!', additional_kwargs={'sender': 'Dr. Feather', 'events': [{'message_time': '8/15/23, 9:12:43 AM'}]}, example=False),\n",
|
||||
" HumanMessage(content=\"That's stunning! Were you able to observe its behavior?\", additional_kwargs={'sender': 'Jungle Jane', 'events': [{'message_time': '8/15/23, 9:13:15 AM'}]}, example=False),\n",
|
||||
" AIMessage(content=\"Yes, it seemed quite social with other macaws. They're known for their playful nature.\", additional_kwargs={'sender': 'Dr. Feather', 'events': [{'message_time': '8/15/23, 9:14:02 AM'}]}, example=False),\n",
|
||||
" HumanMessage(content=\"How's the research going on parrot communication?\", additional_kwargs={'sender': 'Jungle Jane', 'events': [{'message_time': '8/15/23, 9:14:15 AM'}]}, example=False),\n",
|
||||
" AIMessage(content=\"It's progressing well. We're learning so much about how they use sound and color to communicate.\", additional_kwargs={'sender': 'Dr. Feather', 'events': [{'message_time': '8/15/23, 9:14:50 AM'}]}, example=False),\n",
|
||||
" HumanMessage(content=\"That's fascinating! Can't wait to read your paper on it.\", additional_kwargs={'sender': 'Jungle Jane', 'events': [{'message_time': '8/15/23, 9:15:10 AM'}]}, example=False),\n",
|
||||
" AIMessage(content=\"Thank you! I'll send you a draft soon.\", additional_kwargs={'sender': 'Dr. Feather', 'events': [{'message_time': '8/15/23, 9:15:20 AM'}]}, example=False),\n",
|
||||
" HumanMessage(content='Looking forward to it! Keep up the great work.', additional_kwargs={'sender': 'Jungle Jane', 'events': [{'message_time': '8/15/23, 9:25:16 PM'}]}, example=False)]}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"from langchain.chat_loaders.base import ChatSession\n",
|
||||
"from langchain.chat_loaders.utils import (\n",
|
||||
" map_ai_messages,\n",
|
||||
" merge_chat_runs,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"raw_messages = loader.lazy_load()\n",
|
||||
"# Merge consecutive messages from the same sender into a single message\n",
|
||||
"merged_messages = merge_chat_runs(raw_messages)\n",
|
||||
"# Convert messages from \"Dr. Feather\" to AI messages\n",
|
||||
"messages: List[ChatSession] = list(map_ai_messages(merged_messages, sender=\"Dr. Feather\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b9089c05-7375-41ca-a2f9-672a845314e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Next Steps\n",
|
||||
"\n",
|
||||
"You can then use these messages how you see fit, such as finetuning a model, few-shot example selection, or directly make predictions for the next message."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "637a6f5d-6944-4722-9361-a76ef5e9dd2a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Thank you for the encouragement! I'll do my best to continue studying and sharing fascinating insights about parrot communication."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI()\n",
|
||||
"\n",
|
||||
"for chunk in llm.stream(messages[0]['messages']):\n",
|
||||
" print(chunk.content, end=\"\", flush=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "16156643-cfbd-444f-b4ae-198eb44f0267",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -5,7 +5,7 @@
|
||||
"id": "62359e08-cf80-4210-a30c-f450000e65b9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ArcGISLoader\n",
|
||||
"# ArcGIS\n",
|
||||
"\n",
|
||||
"This notebook demonstrates the use of the `langchain.document_loaders.ArcGISLoader` class.\n",
|
||||
"\n",
|
||||
@@ -39,8 +39,8 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 4.04 ms, sys: 1.63 ms, total: 5.67 ms\n",
|
||||
"Wall time: 644 ms\n"
|
||||
"CPU times: user 7.86 ms, sys: 0 ns, total: 7.86 ms\n",
|
||||
"Wall time: 802 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -59,7 +59,179 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"dict_keys(['url', 'layer_description', 'item_description', 'layer_properties'])"
|
||||
"{'accessed': '2023-08-15T04:30:41.689270+00:00Z',\n",
|
||||
" 'name': 'Beach Ramps',\n",
|
||||
" 'url': 'https://maps1.vcgov.org/arcgis/rest/services/Beaches/MapServer/7',\n",
|
||||
" 'layer_description': '(Not Provided)',\n",
|
||||
" 'item_description': '(Not Provided)',\n",
|
||||
" 'layer_properties': {\n",
|
||||
" \"currentVersion\": 10.81,\n",
|
||||
" \"id\": 7,\n",
|
||||
" \"name\": \"Beach Ramps\",\n",
|
||||
" \"type\": \"Feature Layer\",\n",
|
||||
" \"description\": \"\",\n",
|
||||
" \"geometryType\": \"esriGeometryPoint\",\n",
|
||||
" \"sourceSpatialReference\": {\n",
|
||||
" \"wkid\": 2881,\n",
|
||||
" \"latestWkid\": 2881\n",
|
||||
" },\n",
|
||||
" \"copyrightText\": \"\",\n",
|
||||
" \"parentLayer\": null,\n",
|
||||
" \"subLayers\": [],\n",
|
||||
" \"minScale\": 750000,\n",
|
||||
" \"maxScale\": 0,\n",
|
||||
" \"drawingInfo\": {\n",
|
||||
" \"renderer\": {\n",
|
||||
" \"type\": \"simple\",\n",
|
||||
" \"symbol\": {\n",
|
||||
" \"type\": \"esriPMS\",\n",
|
||||
" \"url\": \"9bb2e5ca499bb68aa3ee0d4e1ecc3849\",\n",
|
||||
" \"imageData\": \"iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAAXNSR0IB2cksfwAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAJJJREFUOI3NkDEKg0AQRZ9kkSnSGBshR7DJqdJYeg7BMpcS0uQWQsqoCLExkcUJzGqT38zw2fcY1rEzbp7vjXz0EXC7gBxs1ABcG/8CYkCcDqwyLqsV+RlV0I/w7PzuJBArr1VB20H58Ls6h+xoFITkTwWpQJX7XSIBAnFwVj7MLAjJV/AC6G3QoAmK+74Lom04THTBEp/HCSc6AAAAAElFTkSuQmCC\",\n",
|
||||
" \"contentType\": \"image/png\",\n",
|
||||
" \"width\": 12,\n",
|
||||
" \"height\": 12,\n",
|
||||
" \"angle\": 0,\n",
|
||||
" \"xoffset\": 0,\n",
|
||||
" \"yoffset\": 0\n",
|
||||
" },\n",
|
||||
" \"label\": \"\",\n",
|
||||
" \"description\": \"\"\n",
|
||||
" },\n",
|
||||
" \"transparency\": 0,\n",
|
||||
" \"labelingInfo\": null\n",
|
||||
" },\n",
|
||||
" \"defaultVisibility\": true,\n",
|
||||
" \"extent\": {\n",
|
||||
" \"xmin\": -81.09480168806815,\n",
|
||||
" \"ymin\": 28.858349245353473,\n",
|
||||
" \"xmax\": -80.77512908572814,\n",
|
||||
" \"ymax\": 29.41078388840041,\n",
|
||||
" \"spatialReference\": {\n",
|
||||
" \"wkid\": 4326,\n",
|
||||
" \"latestWkid\": 4326\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" \"hasAttachments\": false,\n",
|
||||
" \"htmlPopupType\": \"esriServerHTMLPopupTypeNone\",\n",
|
||||
" \"displayField\": \"AccessName\",\n",
|
||||
" \"typeIdField\": null,\n",
|
||||
" \"subtypeFieldName\": null,\n",
|
||||
" \"subtypeField\": null,\n",
|
||||
" \"defaultSubtypeCode\": null,\n",
|
||||
" \"fields\": [\n",
|
||||
" {\n",
|
||||
" \"name\": \"OBJECTID\",\n",
|
||||
" \"type\": \"esriFieldTypeOID\",\n",
|
||||
" \"alias\": \"OBJECTID\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"Shape\",\n",
|
||||
" \"type\": \"esriFieldTypeGeometry\",\n",
|
||||
" \"alias\": \"Shape\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessName\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessName\",\n",
|
||||
" \"length\": 40,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessID\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessID\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessType\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessType\",\n",
|
||||
" \"length\": 25,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"GeneralLoc\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"GeneralLoc\",\n",
|
||||
" \"length\": 100,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"MilePost\",\n",
|
||||
" \"type\": \"esriFieldTypeDouble\",\n",
|
||||
" \"alias\": \"MilePost\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"City\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"City\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessStatus\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessStatus\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"Entry_Date_Time\",\n",
|
||||
" \"type\": \"esriFieldTypeDate\",\n",
|
||||
" \"alias\": \"Entry_Date_Time\",\n",
|
||||
" \"length\": 8,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"DrivingZone\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"DrivingZone\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"geometryField\": {\n",
|
||||
" \"name\": \"Shape\",\n",
|
||||
" \"type\": \"esriFieldTypeGeometry\",\n",
|
||||
" \"alias\": \"Shape\"\n",
|
||||
" },\n",
|
||||
" \"indexes\": null,\n",
|
||||
" \"subtypes\": [],\n",
|
||||
" \"relationships\": [],\n",
|
||||
" \"canModifyLayer\": true,\n",
|
||||
" \"canScaleSymbols\": false,\n",
|
||||
" \"hasLabels\": false,\n",
|
||||
" \"capabilities\": \"Map,Query,Data\",\n",
|
||||
" \"maxRecordCount\": 1000,\n",
|
||||
" \"supportsStatistics\": true,\n",
|
||||
" \"supportsAdvancedQueries\": true,\n",
|
||||
" \"supportedQueryFormats\": \"JSON, geoJSON\",\n",
|
||||
" \"isDataVersioned\": false,\n",
|
||||
" \"ownershipBasedAccessControlForFeatures\": {\n",
|
||||
" \"allowOthersToQuery\": true\n",
|
||||
" },\n",
|
||||
" \"useStandardizedQueries\": true,\n",
|
||||
" \"advancedQueryCapabilities\": {\n",
|
||||
" \"useStandardizedQueries\": true,\n",
|
||||
" \"supportsStatistics\": true,\n",
|
||||
" \"supportsHavingClause\": true,\n",
|
||||
" \"supportsCountDistinct\": true,\n",
|
||||
" \"supportsOrderBy\": true,\n",
|
||||
" \"supportsDistinct\": true,\n",
|
||||
" \"supportsPagination\": true,\n",
|
||||
" \"supportsTrueCurve\": true,\n",
|
||||
" \"supportsReturningQueryExtent\": true,\n",
|
||||
" \"supportsQueryWithDistance\": true,\n",
|
||||
" \"supportsSqlExpression\": true\n",
|
||||
" },\n",
|
||||
" \"supportsDatumTransformation\": true,\n",
|
||||
" \"dateFieldsTimeReference\": null,\n",
|
||||
" \"supportsCoordinatesQuantization\": true\n",
|
||||
" }}"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
@@ -68,200 +240,12 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].metadata.keys()"
|
||||
"docs[0].metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "6b6e9107-6a80-4ef7-8149-3013faa2de76",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"KeysView({\n",
|
||||
" \"currentVersion\": 10.81,\n",
|
||||
" \"id\": 7,\n",
|
||||
" \"name\": \"Beach Ramps\",\n",
|
||||
" \"type\": \"Feature Layer\",\n",
|
||||
" \"description\": \"\",\n",
|
||||
" \"geometryType\": \"esriGeometryPoint\",\n",
|
||||
" \"sourceSpatialReference\": {\n",
|
||||
" \"wkid\": 2881,\n",
|
||||
" \"latestWkid\": 2881\n",
|
||||
" },\n",
|
||||
" \"copyrightText\": \"\",\n",
|
||||
" \"parentLayer\": null,\n",
|
||||
" \"subLayers\": [],\n",
|
||||
" \"minScale\": 750000,\n",
|
||||
" \"maxScale\": 0,\n",
|
||||
" \"drawingInfo\": {\n",
|
||||
" \"renderer\": {\n",
|
||||
" \"type\": \"simple\",\n",
|
||||
" \"symbol\": {\n",
|
||||
" \"type\": \"esriPMS\",\n",
|
||||
" \"url\": \"9bb2e5ca499bb68aa3ee0d4e1ecc3849\",\n",
|
||||
" \"imageData\": \"iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAAXNSR0IB2cksfwAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAJJJREFUOI3NkDEKg0AQRZ9kkSnSGBshR7DJqdJYeg7BMpcS0uQWQsqoCLExkcUJzGqT38zw2fcY1rEzbp7vjXz0EXC7gBxs1ABcG/8CYkCcDqwyLqsV+RlV0I/w7PzuJBArr1VB20H58Ls6h+xoFITkTwWpQJX7XSIBAnFwVj7MLAjJV/AC6G3QoAmK+74Lom04THTBEp/HCSc6AAAAAElFTkSuQmCC\",\n",
|
||||
" \"contentType\": \"image/png\",\n",
|
||||
" \"width\": 12,\n",
|
||||
" \"height\": 12,\n",
|
||||
" \"angle\": 0,\n",
|
||||
" \"xoffset\": 0,\n",
|
||||
" \"yoffset\": 0\n",
|
||||
" },\n",
|
||||
" \"label\": \"\",\n",
|
||||
" \"description\": \"\"\n",
|
||||
" },\n",
|
||||
" \"transparency\": 0,\n",
|
||||
" \"labelingInfo\": null\n",
|
||||
" },\n",
|
||||
" \"defaultVisibility\": true,\n",
|
||||
" \"extent\": {\n",
|
||||
" \"xmin\": -81.09480168806815,\n",
|
||||
" \"ymin\": 28.858349245353473,\n",
|
||||
" \"xmax\": -80.77512908572814,\n",
|
||||
" \"ymax\": 29.41078388840041,\n",
|
||||
" \"spatialReference\": {\n",
|
||||
" \"wkid\": 4326,\n",
|
||||
" \"latestWkid\": 4326\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" \"hasAttachments\": false,\n",
|
||||
" \"htmlPopupType\": \"esriServerHTMLPopupTypeNone\",\n",
|
||||
" \"displayField\": \"AccessName\",\n",
|
||||
" \"typeIdField\": null,\n",
|
||||
" \"subtypeFieldName\": null,\n",
|
||||
" \"subtypeField\": null,\n",
|
||||
" \"defaultSubtypeCode\": null,\n",
|
||||
" \"fields\": [\n",
|
||||
" {\n",
|
||||
" \"name\": \"OBJECTID\",\n",
|
||||
" \"type\": \"esriFieldTypeOID\",\n",
|
||||
" \"alias\": \"OBJECTID\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"Shape\",\n",
|
||||
" \"type\": \"esriFieldTypeGeometry\",\n",
|
||||
" \"alias\": \"Shape\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessName\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessName\",\n",
|
||||
" \"length\": 40,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessID\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessID\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessType\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessType\",\n",
|
||||
" \"length\": 25,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"GeneralLoc\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"GeneralLoc\",\n",
|
||||
" \"length\": 100,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"MilePost\",\n",
|
||||
" \"type\": \"esriFieldTypeDouble\",\n",
|
||||
" \"alias\": \"MilePost\",\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"City\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"City\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"AccessStatus\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"AccessStatus\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"Entry_Date_Time\",\n",
|
||||
" \"type\": \"esriFieldTypeDate\",\n",
|
||||
" \"alias\": \"Entry_Date_Time\",\n",
|
||||
" \"length\": 8,\n",
|
||||
" \"domain\": null\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"name\": \"DrivingZone\",\n",
|
||||
" \"type\": \"esriFieldTypeString\",\n",
|
||||
" \"alias\": \"DrivingZone\",\n",
|
||||
" \"length\": 50,\n",
|
||||
" \"domain\": null\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" \"geometryField\": {\n",
|
||||
" \"name\": \"Shape\",\n",
|
||||
" \"type\": \"esriFieldTypeGeometry\",\n",
|
||||
" \"alias\": \"Shape\"\n",
|
||||
" },\n",
|
||||
" \"indexes\": null,\n",
|
||||
" \"subtypes\": [],\n",
|
||||
" \"relationships\": [],\n",
|
||||
" \"canModifyLayer\": true,\n",
|
||||
" \"canScaleSymbols\": false,\n",
|
||||
" \"hasLabels\": false,\n",
|
||||
" \"capabilities\": \"Map,Query,Data\",\n",
|
||||
" \"maxRecordCount\": 1000,\n",
|
||||
" \"supportsStatistics\": true,\n",
|
||||
" \"supportsAdvancedQueries\": true,\n",
|
||||
" \"supportedQueryFormats\": \"JSON, geoJSON\",\n",
|
||||
" \"isDataVersioned\": false,\n",
|
||||
" \"ownershipBasedAccessControlForFeatures\": {\n",
|
||||
" \"allowOthersToQuery\": true\n",
|
||||
" },\n",
|
||||
" \"useStandardizedQueries\": true,\n",
|
||||
" \"advancedQueryCapabilities\": {\n",
|
||||
" \"useStandardizedQueries\": true,\n",
|
||||
" \"supportsStatistics\": true,\n",
|
||||
" \"supportsHavingClause\": true,\n",
|
||||
" \"supportsCountDistinct\": true,\n",
|
||||
" \"supportsOrderBy\": true,\n",
|
||||
" \"supportsDistinct\": true,\n",
|
||||
" \"supportsPagination\": true,\n",
|
||||
" \"supportsTrueCurve\": true,\n",
|
||||
" \"supportsReturningQueryExtent\": true,\n",
|
||||
" \"supportsQueryWithDistance\": true,\n",
|
||||
" \"supportsSqlExpression\": true\n",
|
||||
" },\n",
|
||||
" \"supportsDatumTransformation\": true,\n",
|
||||
" \"dateFieldsTimeReference\": null,\n",
|
||||
" \"supportsCoordinatesQuantization\": true\n",
|
||||
"})"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].metadata['layer_properties'].keys()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "1d132b7d-5a13-4d66-98e8-785ffdf87af0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -269,29 +253,29 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{\"OBJECTID\": 2, \"AccessName\": \"27TH AV\", \"AccessID\": \"NS-141\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3600 BLK S ATLANTIC AV\", \"MilePost\": 4.83, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 7, \"AccessName\": \"BEACHWAY AV\", \"AccessID\": \"NS-106\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1400 N ATLANTIC AV\", \"MilePost\": 1.57, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 10, \"AccessName\": \"SEABREEZE BLVD\", \"AccessID\": \"DB-051\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"500 BLK N ATLANTIC AV\", \"MilePost\": 14.24, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691394892000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 13, \"AccessName\": \"GRANADA BLVD\", \"AccessID\": \"OB-030\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"20 BLK OCEAN SHORE BLVD\", \"MilePost\": 10.02, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 16, \"AccessName\": \"INTERNATIONAL SPEEDWAY BLVD\", \"AccessID\": \"DB-059\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"300 BLK S ATLANTIC AV\", \"MilePost\": 15.27, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691395174000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 26, \"AccessName\": \"UNIVERSITY BLVD\", \"AccessID\": \"DB-048\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"900 BLK N ATLANTIC AV\", \"MilePost\": 13.74, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691394892000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 36, \"AccessName\": \"BEACH ST\", \"AccessID\": \"PI-097\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"4890 BLK S ATLANTIC AV\", \"MilePost\": 25.85, \"City\": \"PONCE INLET\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 40, \"AccessName\": \"BOTEFUHR AV\", \"AccessID\": \"DBS-067\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1900 BLK S ATLANTIC AV\", \"MilePost\": 16.68, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691395124000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 41, \"AccessName\": \"SILVER BEACH AV\", \"AccessID\": \"DB-064\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1000 BLK S ATLANTIC AV\", \"MilePost\": 15.98, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691395174000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 50, \"AccessName\": \"3RD AV\", \"AccessID\": \"NS-118\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1200 BLK HILL ST\", \"MilePost\": 3.25, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 58, \"AccessName\": \"DUNLAWTON BLVD\", \"AccessID\": \"DBS-078\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3400 BLK S ATLANTIC AV\", \"MilePost\": 20.61, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 63, \"AccessName\": \"MILSAP RD\", \"AccessID\": \"OB-037\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"700 BLK S ATLANTIC AV\", \"MilePost\": 11.52, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 68, \"AccessName\": \"EMILIA AV\", \"AccessID\": \"DBS-082\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3790 BLK S ATLANTIC AV\", \"MilePost\": 21.38, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 92, \"AccessName\": \"FLAGLER AV\", \"AccessID\": \"NS-110\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"500 BLK FLAGLER AV\", \"MilePost\": 2.57, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 94, \"AccessName\": \"CRAWFORD RD\", \"AccessID\": \"NS-108\", \"AccessType\": \"OPEN VEHICLE RAMP - PASS\", \"GeneralLoc\": \"800 BLK N ATLANTIC AV\", \"MilePost\": 2.19, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 122, \"AccessName\": \"HARTFORD AV\", \"AccessID\": \"DB-043\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1890 BLK N ATLANTIC AV\", \"MilePost\": 12.76, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"CLOSED - SEASONAL\", \"Entry_Date_Time\": 1691394832000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 125, \"AccessName\": \"WILLIAMS AV\", \"AccessID\": \"DB-042\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"2200 BLK N ATLANTIC AV\", \"MilePost\": 12.5, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 134, \"AccessName\": \"CARDINAL DR\", \"AccessID\": \"OB-036\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"600 BLK S ATLANTIC AV\", \"MilePost\": 11.27, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 229, \"AccessName\": \"EL PORTAL ST\", \"AccessID\": \"DBS-076\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3200 BLK S ATLANTIC AV\", \"MilePost\": 20.04, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 230, \"AccessName\": \"HARVARD DR\", \"AccessID\": \"OB-038\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"900 BLK S ATLANTIC AV\", \"MilePost\": 11.72, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 232, \"AccessName\": \"VAN AV\", \"AccessID\": \"DBS-075\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3100 BLK S ATLANTIC AV\", \"MilePost\": 19.6, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 233, \"AccessName\": \"ROCKEFELLER DR\", \"AccessID\": \"OB-034\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"400 BLK S ATLANTIC AV\", \"MilePost\": 10.9, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"CLOSED - SEASONAL\", \"Entry_Date_Time\": 1691394832000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 235, \"AccessName\": \"MINERVA RD\", \"AccessID\": \"DBS-069\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"2300 BLK S ATLANTIC AV\", \"MilePost\": 17.52, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691395124000, \"DrivingZone\": \"YES\"}\n"
|
||||
"{\"OBJECTID\": 4, \"AccessName\": \"BEACHWAY AV\", \"AccessID\": \"NS-106\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1400 N ATLANTIC AV\", \"MilePost\": 1.57, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 5, \"AccessName\": \"SEABREEZE BLVD\", \"AccessID\": \"DB-051\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"500 BLK N ATLANTIC AV\", \"MilePost\": 14.24, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 6, \"AccessName\": \"27TH AV\", \"AccessID\": \"NS-141\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3600 BLK S ATLANTIC AV\", \"MilePost\": 4.83, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 11, \"AccessName\": \"INTERNATIONAL SPEEDWAY BLVD\", \"AccessID\": \"DB-059\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"300 BLK S ATLANTIC AV\", \"MilePost\": 15.27, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 14, \"AccessName\": \"GRANADA BLVD\", \"AccessID\": \"OB-030\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"20 BLK OCEAN SHORE BLVD\", \"MilePost\": 10.02, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 27, \"AccessName\": \"UNIVERSITY BLVD\", \"AccessID\": \"DB-048\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"900 BLK N ATLANTIC AV\", \"MilePost\": 13.74, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 38, \"AccessName\": \"BEACH ST\", \"AccessID\": \"PI-097\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"4890 BLK S ATLANTIC AV\", \"MilePost\": 25.85, \"City\": \"PONCE INLET\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 42, \"AccessName\": \"BOTEFUHR AV\", \"AccessID\": \"DBS-067\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1900 BLK S ATLANTIC AV\", \"MilePost\": 16.68, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 43, \"AccessName\": \"SILVER BEACH AV\", \"AccessID\": \"DB-064\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1000 BLK S ATLANTIC AV\", \"MilePost\": 15.98, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 45, \"AccessName\": \"MILSAP RD\", \"AccessID\": \"OB-037\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"700 BLK S ATLANTIC AV\", \"MilePost\": 11.52, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 56, \"AccessName\": \"3RD AV\", \"AccessID\": \"NS-118\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1200 BLK HILL ST\", \"MilePost\": 3.25, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 64, \"AccessName\": \"DUNLAWTON BLVD\", \"AccessID\": \"DBS-078\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3400 BLK S ATLANTIC AV\", \"MilePost\": 20.61, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 69, \"AccessName\": \"EMILIA AV\", \"AccessID\": \"DBS-082\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3790 BLK S ATLANTIC AV\", \"MilePost\": 21.38, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"BOTH\"}\n",
|
||||
"{\"OBJECTID\": 94, \"AccessName\": \"FLAGLER AV\", \"AccessID\": \"NS-110\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"500 BLK FLAGLER AV\", \"MilePost\": 2.57, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 96, \"AccessName\": \"CRAWFORD RD\", \"AccessID\": \"NS-108\", \"AccessType\": \"OPEN VEHICLE RAMP - PASS\", \"GeneralLoc\": \"800 BLK N ATLANTIC AV\", \"MilePost\": 2.19, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 124, \"AccessName\": \"HARTFORD AV\", \"AccessID\": \"DB-043\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1890 BLK N ATLANTIC AV\", \"MilePost\": 12.76, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 127, \"AccessName\": \"WILLIAMS AV\", \"AccessID\": \"DB-042\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"2200 BLK N ATLANTIC AV\", \"MilePost\": 12.5, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 136, \"AccessName\": \"CARDINAL DR\", \"AccessID\": \"OB-036\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"600 BLK S ATLANTIC AV\", \"MilePost\": 11.27, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 229, \"AccessName\": \"EL PORTAL ST\", \"AccessID\": \"DBS-076\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3200 BLK S ATLANTIC AV\", \"MilePost\": 20.04, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 230, \"AccessName\": \"HARVARD DR\", \"AccessID\": \"OB-038\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"900 BLK S ATLANTIC AV\", \"MilePost\": 11.72, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 232, \"AccessName\": \"VAN AV\", \"AccessID\": \"DBS-075\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3100 BLK S ATLANTIC AV\", \"MilePost\": 19.6, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 234, \"AccessName\": \"ROCKEFELLER DR\", \"AccessID\": \"OB-034\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"400 BLK S ATLANTIC AV\", \"MilePost\": 10.9, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n",
|
||||
"{\"OBJECTID\": 235, \"AccessName\": \"MINERVA RD\", \"AccessID\": \"DBS-069\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"2300 BLK S ATLANTIC AV\", \"MilePost\": 17.52, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"CLOSED\", \"Entry_Date_Time\": 1692039947000, \"DrivingZone\": \"YES\"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
221
docs/extras/integrations/document_loaders/assemblyai.ipynb
Normal file
@@ -0,0 +1,221 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# AssemblyAI Audio Transcripts\n",
|
||||
"\n",
|
||||
"The `AssemblyAIAudioTranscriptLoader` allows to transcribe audio files with the [AssemblyAI API](https://www.assemblyai.com) and loads the transcribed text into documents.\n",
|
||||
"\n",
|
||||
"To use it, you should have the `assemblyai` python package installed, and the\n",
|
||||
"environment variable `ASSEMBLYAI_API_KEY` set with your API key. Alternatively, the API key can also be passed as an argument.\n",
|
||||
"\n",
|
||||
"More info about AssemblyAI:\n",
|
||||
"\n",
|
||||
"- [Website](https://www.assemblyai.com/)\n",
|
||||
"- [Get a Free API key](https://www.assemblyai.com/dashboard/signup)\n",
|
||||
"- [AssemblyAI API Docs](https://www.assemblyai.com/docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation\n",
|
||||
"\n",
|
||||
"First, you need to install the `assemblyai` python package.\n",
|
||||
"\n",
|
||||
"You can find more info about it inside the [assemblyai-python-sdk GitHub repo](https://github.com/AssemblyAI/assemblyai-python-sdk)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install assemblyai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example\n",
|
||||
"\n",
|
||||
"The `AssemblyAIAudioTranscriptLoader` needs at least the `file_path` argument. Audio files can be specified as an URL or a local file path."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AssemblyAIAudioTranscriptLoader\n",
|
||||
"\n",
|
||||
"audio_file = \"https://storage.googleapis.com/aai-docs-samples/nbc.mp3\"\n",
|
||||
"# or a local file path: audio_file = \"./nbc.mp3\"\n",
|
||||
"\n",
|
||||
"loader = AssemblyAIAudioTranscriptLoader(file_path=audio_file)\n",
|
||||
"\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note: Calling `loader.load()` blocks until the transcription is finished."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The transcribed text is available in the `page_content`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs[0].page_content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```\n",
|
||||
"\"Load time, a new president and new congressional makeup. Same old ...\"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The `metadata` contains the full JSON response with more meta information:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs[0].metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```\n",
|
||||
"{'language_code': <LanguageCode.en_us: 'en_us'>,\n",
|
||||
" 'audio_url': 'https://storage.googleapis.com/aai-docs-samples/nbc.mp3',\n",
|
||||
" 'punctuate': True,\n",
|
||||
" 'format_text': True,\n",
|
||||
" ...\n",
|
||||
"}\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Transcript Formats\n",
|
||||
"\n",
|
||||
"You can specify the `transcript_format` argument for different formats.\n",
|
||||
"\n",
|
||||
"Depending on the format, one or more documents are returned. These are the different `TranscriptFormat` options:\n",
|
||||
"\n",
|
||||
"- `TEXT`: One document with the transcription text\n",
|
||||
"- `SENTENCES`: Multiple documents, splits the transcription by each sentence\n",
|
||||
"- `PARAGRAPHS`: Multiple documents, splits the transcription by each paragraph\n",
|
||||
"- `SUBTITLES_SRT`: One document with the transcript exported in SRT subtitles format\n",
|
||||
"- `SUBTITLES_VTT`: One document with the transcript exported in VTT subtitles format"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.assemblyai import TranscriptFormat\n",
|
||||
"\n",
|
||||
"loader = AssemblyAIAudioTranscriptLoader(\n",
|
||||
" file_path=\"./your_file.mp3\",\n",
|
||||
" transcript_format=TranscriptFormat.SENTENCES,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Transcription Config\n",
|
||||
"\n",
|
||||
"You can also specify the `config` argument to use different audio intelligence models.\n",
|
||||
"\n",
|
||||
"Visit the [AssemblyAI API Documentation](https://www.assemblyai.com/docs) to get an overview of all available models!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import assemblyai as aai\n",
|
||||
"\n",
|
||||
"config = aai.TranscriptionConfig(speaker_labels=True,\n",
|
||||
" auto_chapters=True,\n",
|
||||
" entity_detection=True\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"loader = AssemblyAIAudioTranscriptLoader(\n",
|
||||
" file_path=\"./your_file.mp3\",\n",
|
||||
" config=config\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pass the API Key as argument\n",
|
||||
"\n",
|
||||
"Next to setting the API key as environment variable `ASSEMBLYAI_API_KEY`, it is also possible to pass it as argument."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = AssemblyAIAudioTranscriptLoader(\n",
|
||||
" file_path=\"./your_file.mp3\",\n",
|
||||
" api_key=\"YOUR_KEY\"\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -15,14 +15,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You need the lxml package to use the DocugamiLoader\n",
|
||||
"!pip install lxml"
|
||||
"# You need the lxml package to use the DocugamiLoader (run pip install directly without \"poetry run\" if you are not using poetry)\n",
|
||||
"!poetry run pip install lxml --quiet"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -75,25 +75,25 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='MUTUAL NON-DISCLOSURE AGREEMENT This Mutual Non-Disclosure Agreement (this “ Agreement ”) is entered into and made effective as of April 4 , 2018 between Docugami Inc. , a Delaware corporation , whose address is 150 Lake Street South , Suite 221 , Kirkland , Washington 98033 , and Caleb Divine , an individual, whose address is 1201 Rt 300 , Newburgh NY 12550 .', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:ThisMutualNon-disclosureAgreement', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'ThisMutualNon-disclosureAgreement'}),\n",
|
||||
" Document(page_content='The above named parties desire to engage in discussions regarding a potential agreement or other transaction between the parties (the “Purpose”). In connection with such discussions, it may be necessary for the parties to disclose to each other certain confidential information or materials to enable them to evaluate whether to enter into such agreement or transaction.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Discussions', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'Discussions'}),\n",
|
||||
" Document(page_content='In consideration of the foregoing, the parties agree as follows:', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Consideration', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'Consideration'}),\n",
|
||||
" Document(page_content='1. Confidential Information . For purposes of this Agreement , “ Confidential Information ” means any information or materials disclosed by one party to the other party that: (i) if disclosed in writing or in the form of tangible materials, is marked “confidential” or “proprietary” at the time of such disclosure; (ii) if disclosed orally or by visual presentation, is identified as “confidential” or “proprietary” at the time of such disclosure, and is summarized in a writing sent by the disclosing party to the receiving party within thirty ( 30 ) days after any such disclosure; or (iii) due to its nature or the circumstances of its disclosure, a person exercising reasonable business judgment would understand to be confidential or proprietary.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Purposes/docset:ConfidentialInformation-section/docset:ConfidentialInformation[2]', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'ConfidentialInformation'}),\n",
|
||||
" Document(page_content=\"2. Obligations and Restrictions . Each party agrees: (i) to maintain the other party's Confidential Information in strict confidence; (ii) not to disclose such Confidential Information to any third party; and (iii) not to use such Confidential Information for any purpose except for the Purpose. Each party may disclose the other party’s Confidential Information to its employees and consultants who have a bona fide need to know such Confidential Information for the Purpose, but solely to the extent necessary to pursue the Purpose and for no other purpose; provided, that each such employee and consultant first executes a written agreement (or is otherwise already bound by a written agreement) that contains use and nondisclosure restrictions at least as protective of the other party’s Confidential Information as those set forth in this Agreement .\", metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Obligations/docset:ObligationsAndRestrictions-section/docset:ObligationsAndRestrictions', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'ObligationsAndRestrictions'}),\n",
|
||||
" Document(page_content='3. Exceptions. The obligations and restrictions in Section 2 will not apply to any information or materials that:', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Exceptions/docset:Exceptions-section/docset:Exceptions[2]', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Exceptions'}),\n",
|
||||
" Document(page_content='(i) were, at the date of disclosure, or have subsequently become, generally known or available to the public through no act or failure to act by the receiving party;', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheDate/docset:TheDate/docset:TheDate', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheDate'}),\n",
|
||||
" Document(page_content='(ii) were rightfully known by the receiving party prior to receiving such information or materials from the disclosing party;', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheDate/docset:SuchInformation/docset:TheReceivingParty', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheReceivingParty'}),\n",
|
||||
" Document(page_content='(iii) are rightfully acquired by the receiving party from a third party who has the right to disclose such information or materials without breach of any confidentiality obligation to the disclosing party;', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheDate/docset:TheReceivingParty/docset:TheReceivingParty', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheReceivingParty'}),\n",
|
||||
" Document(page_content='4. Compelled Disclosure . Nothing in this Agreement will be deemed to restrict a party from disclosing the other party’s Confidential Information to the extent required by any order, subpoena, law, statute or regulation; provided, that the party required to make such a disclosure uses reasonable efforts to give the other party reasonable advance notice of such required disclosure in order to enable the other party to prevent or limit such disclosure.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Disclosure/docset:CompelledDisclosure-section/docset:CompelledDisclosure', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'CompelledDisclosure'}),\n",
|
||||
" Document(page_content='5. Return of Confidential Information . Upon the completion or abandonment of the Purpose, and in any event upon the disclosing party’s request, the receiving party will promptly return to the disclosing party all tangible items and embodiments containing or consisting of the disclosing party’s Confidential Information and all copies thereof (including electronic copies), and any notes, analyses, compilations, studies, interpretations, memoranda or other documents (regardless of the form thereof) prepared by or on behalf of the receiving party that contain or are based upon the disclosing party’s Confidential Information .', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheCompletion/docset:ReturnofConfidentialInformation-section/docset:ReturnofConfidentialInformation', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'ReturnofConfidentialInformation'}),\n",
|
||||
" Document(page_content='6. No Obligations . Each party retains the right to determine whether to disclose any Confidential Information to the other party.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:NoObligations/docset:NoObligations-section/docset:NoObligations[2]', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'NoObligations'}),\n",
|
||||
" Document(page_content='7. No Warranty. ALL CONFIDENTIAL INFORMATION IS PROVIDED BY THE DISCLOSING PARTY “AS IS ”.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:NoWarranty/docset:NoWarranty-section/docset:NoWarranty[2]', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'NoWarranty'}),\n",
|
||||
" Document(page_content='8. Term. This Agreement will remain in effect for a period of seven ( 7 ) years from the date of last disclosure of Confidential Information by either party, at which time it will terminate.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:ThisAgreement/docset:Term-section/docset:Term', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Term'}),\n",
|
||||
" Document(page_content='9. Equitable Relief . Each party acknowledges that the unauthorized use or disclosure of the disclosing party’s Confidential Information may cause the disclosing party to incur irreparable harm and significant damages, the degree of which may be difficult to ascertain. Accordingly, each party agrees that the disclosing party will have the right to seek immediate equitable relief to enjoin any unauthorized use or disclosure of its Confidential Information , in addition to any other rights and remedies that it may have at law or otherwise.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:EquitableRelief/docset:EquitableRelief-section/docset:EquitableRelief[2]', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'EquitableRelief'}),\n",
|
||||
" Document(page_content='10. Non-compete. To the maximum extent permitted by applicable law, during the Term of this Agreement and for a period of one ( 1 ) year thereafter, Caleb Divine may not market software products or do business that directly or indirectly competes with Docugami software products .', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheMaximumExtent/docset:Non-compete-section/docset:Non-compete', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Non-compete'}),\n",
|
||||
" Document(page_content='11. Miscellaneous. This Agreement will be governed and construed in accordance with the laws of the State of Washington , excluding its body of law controlling conflict of laws. This Agreement is the complete and exclusive understanding and agreement between the parties regarding the subject matter of this Agreement and supersedes all prior agreements, understandings and communications, oral or written, between the parties regarding the subject matter of this Agreement . If any provision of this Agreement is held invalid or unenforceable by a court of competent jurisdiction, that provision of this Agreement will be enforced to the maximum extent permissible and the other provisions of this Agreement will remain in full force and effect. Neither party may assign this Agreement , in whole or in part, by operation of law or otherwise, without the other party’s prior written consent, and any attempted assignment without such consent will be void. This Agreement may be executed in counterparts, each of which will be deemed an original, but all of which together will constitute one and the same instrument.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Accordance/docset:Miscellaneous-section/docset:Miscellaneous', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Miscellaneous'}),\n",
|
||||
" Document(page_content='[SIGNATURE PAGE FOLLOWS] IN WITNESS WHEREOF, the parties hereto have executed this Mutual Non-Disclosure Agreement by their duly authorized officers or representatives as of the date first set forth above.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:Witness/docset:TheParties/docset:TheParties', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheParties'}),\n",
|
||||
" Document(page_content='DOCUGAMI INC . : \\n\\n Caleb Divine : \\n\\n Signature: Signature: Name: \\n\\n Jean Paoli Name: Title: \\n\\n CEO Title:', metadata={'xpath': '/docset:MutualNon-disclosure/docset:Witness/docset:TheParties/docset:DocugamiInc/docset:DocugamiInc/xhtml:table', 'id': '43rj0ds7s0ur', 'name': 'NDA simple layout.docx', 'structure': '', 'tag': 'table'})]"
|
||||
"[Document(page_content='MUTUAL NON-DISCLOSURE AGREEMENT This Mutual Non-Disclosure Agreement (this “ Agreement ”) is entered into and made effective as of April 4 , 2018 between Docugami Inc. , a Delaware corporation , whose address is 150 Lake Street South , Suite 221 , Kirkland , Washington 98033 , and Caleb Divine , an individual, whose address is 1201 Rt 300 , Newburgh NY 12550 .', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:ThisMutualNon-disclosureAgreement', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'ThisMutualNon-disclosureAgreement'}),\n",
|
||||
" Document(page_content='The above named parties desire to engage in discussions regarding a potential agreement or other transaction between the parties (the “Purpose”). In connection with such discussions, it may be necessary for the parties to disclose to each other certain confidential information or materials to enable them to evaluate whether to enter into such agreement or transaction.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Discussions', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'Discussions'}),\n",
|
||||
" Document(page_content='In consideration of the foregoing, the parties agree as follows:', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Consideration', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'Consideration'}),\n",
|
||||
" Document(page_content='1. Confidential Information . For purposes of this Agreement , “ Confidential Information ” means any information or materials disclosed by one party to the other party that: (i) if disclosed in writing or in the form of tangible materials, is marked “confidential” or “proprietary” at the time of such disclosure; (ii) if disclosed orally or by visual presentation, is identified as “confidential” or “proprietary” at the time of such disclosure, and is summarized in a writing sent by the disclosing party to the receiving party within thirty ( 30 ) days after any such disclosure; or (iii) due to its nature or the circumstances of its disclosure, a person exercising reasonable business judgment would understand to be confidential or proprietary.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Purposes/docset:ConfidentialInformation-section/docset:ConfidentialInformation[2]', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'ConfidentialInformation'}),\n",
|
||||
" Document(page_content=\"2. Obligations and Restrictions . Each party agrees: (i) to maintain the other party's Confidential Information in strict confidence; (ii) not to disclose such Confidential Information to any third party; and (iii) not to use such Confidential Information for any purpose except for the Purpose. Each party may disclose the other party’s Confidential Information to its employees and consultants who have a bona fide need to know such Confidential Information for the Purpose, but solely to the extent necessary to pursue the Purpose and for no other purpose; provided, that each such employee and consultant first executes a written agreement (or is otherwise already bound by a written agreement) that contains use and nondisclosure restrictions at least as protective of the other party’s Confidential Information as those set forth in this Agreement .\", metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Obligations/docset:ObligationsAndRestrictions-section/docset:ObligationsAndRestrictions', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'ObligationsAndRestrictions'}),\n",
|
||||
" Document(page_content='3. Exceptions. The obligations and restrictions in Section 2 will not apply to any information or materials that:', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Exceptions/docset:Exceptions-section/docset:Exceptions[2]', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Exceptions'}),\n",
|
||||
" Document(page_content='(i) were, at the date of disclosure, or have subsequently become, generally known or available to the public through no act or failure to act by the receiving party;', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheDate/docset:TheDate/docset:TheDate', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheDate'}),\n",
|
||||
" Document(page_content='(ii) were rightfully known by the receiving party prior to receiving such information or materials from the disclosing party;', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheDate/docset:SuchInformation/docset:TheReceivingParty', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheReceivingParty'}),\n",
|
||||
" Document(page_content='(iii) are rightfully acquired by the receiving party from a third party who has the right to disclose such information or materials without breach of any confidentiality obligation to the disclosing party;', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheDate/docset:TheReceivingParty/docset:TheReceivingParty', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheReceivingParty'}),\n",
|
||||
" Document(page_content='4. Compelled Disclosure . Nothing in this Agreement will be deemed to restrict a party from disclosing the other party’s Confidential Information to the extent required by any order, subpoena, law, statute or regulation; provided, that the party required to make such a disclosure uses reasonable efforts to give the other party reasonable advance notice of such required disclosure in order to enable the other party to prevent or limit such disclosure.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Disclosure/docset:CompelledDisclosure-section/docset:CompelledDisclosure', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'CompelledDisclosure'}),\n",
|
||||
" Document(page_content='5. Return of Confidential Information . Upon the completion or abandonment of the Purpose, and in any event upon the disclosing party’s request, the receiving party will promptly return to the disclosing party all tangible items and embodiments containing or consisting of the disclosing party’s Confidential Information and all copies thereof (including electronic copies), and any notes, analyses, compilations, studies, interpretations, memoranda or other documents (regardless of the form thereof) prepared by or on behalf of the receiving party that contain or are based upon the disclosing party’s Confidential Information .', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheCompletion/docset:ReturnofConfidentialInformation-section/docset:ReturnofConfidentialInformation', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'ReturnofConfidentialInformation'}),\n",
|
||||
" Document(page_content='6. No Obligations . Each party retains the right to determine whether to disclose any Confidential Information to the other party.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:NoObligations/docset:NoObligations-section/docset:NoObligations[2]', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'NoObligations'}),\n",
|
||||
" Document(page_content='7. No Warranty. ALL CONFIDENTIAL INFORMATION IS PROVIDED BY THE DISCLOSING PARTY “AS IS ”.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:NoWarranty/docset:NoWarranty-section/docset:NoWarranty[2]', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'NoWarranty'}),\n",
|
||||
" Document(page_content='8. Term. This Agreement will remain in effect for a period of seven ( 7 ) years from the date of last disclosure of Confidential Information by either party, at which time it will terminate.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:ThisAgreement/docset:Term-section/docset:Term', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Term'}),\n",
|
||||
" Document(page_content='9. Equitable Relief . Each party acknowledges that the unauthorized use or disclosure of the disclosing party’s Confidential Information may cause the disclosing party to incur irreparable harm and significant damages, the degree of which may be difficult to ascertain. Accordingly, each party agrees that the disclosing party will have the right to seek immediate equitable relief to enjoin any unauthorized use or disclosure of its Confidential Information , in addition to any other rights and remedies that it may have at law or otherwise.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:EquitableRelief/docset:EquitableRelief-section/docset:EquitableRelief[2]', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'EquitableRelief'}),\n",
|
||||
" Document(page_content='10. Non-compete. To the maximum extent permitted by applicable law, during the Term of this Agreement and for a period of one ( 1 ) year thereafter, Caleb Divine may not market software products or do business that directly or indirectly competes with Docugami software products .', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheMaximumExtent/docset:Non-compete-section/docset:Non-compete', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Non-compete'}),\n",
|
||||
" Document(page_content='11. Miscellaneous. This Agreement will be governed and construed in accordance with the laws of the State of Washington , excluding its body of law controlling conflict of laws. This Agreement is the complete and exclusive understanding and agreement between the parties regarding the subject matter of this Agreement and supersedes all prior agreements, understandings and communications, oral or written, between the parties regarding the subject matter of this Agreement . If any provision of this Agreement is held invalid or unenforceable by a court of competent jurisdiction, that provision of this Agreement will be enforced to the maximum extent permissible and the other provisions of this Agreement will remain in full force and effect. Neither party may assign this Agreement , in whole or in part, by operation of law or otherwise, without the other party’s prior written consent, and any attempted assignment without such consent will be void. This Agreement may be executed in counterparts, each of which will be deemed an original, but all of which together will constitute one and the same instrument.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Accordance/docset:Miscellaneous-section/docset:Miscellaneous', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Miscellaneous'}),\n",
|
||||
" Document(page_content='[SIGNATURE PAGE FOLLOWS] IN WITNESS WHEREOF, the parties hereto have executed this Mutual Non-Disclosure Agreement by their duly authorized officers or representatives as of the date first set forth above.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:Witness/docset:TheParties/docset:TheParties', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheParties'}),\n",
|
||||
" Document(page_content='DOCUGAMI INC . : \\n\\n Caleb Divine : \\n\\n Signature: Signature: Name: \\n\\n Jean Paoli Name: Title: \\n\\n CEO Title:', metadata={'xpath': '/docset:MutualNon-disclosure/docset:Witness/docset:TheParties/docset:DocugamiInc/docset:DocugamiInc/xhtml:table', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': '', 'tag': 'table'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
@@ -116,7 +116,7 @@
|
||||
"source": [
|
||||
"The `metadata` for each `Document` (really, a chunk of an actual PDF, DOC or DOCX) contains some useful additional information:\n",
|
||||
"\n",
|
||||
"1. **id and name:** ID and Name of the file (PDF, DOC or DOCX) the chunk is sourced from within Docugami.\n",
|
||||
"1. **id and source:** ID and Name of the file (PDF, DOC or DOCX) the chunk is sourced from within Docugami.\n",
|
||||
"2. **xpath:** XPath inside the XML representation of the document, for the chunk. Useful for source citations directly to the actual chunk inside the document XML.\n",
|
||||
"3. **structure:** Structural attributes of the chunk, e.g. h1, h2, div, table, td, etc. Useful to filter out certain kinds of chunks if needed by the caller.\n",
|
||||
"4. **tag:** Semantic tag for the chunk, using various generative and extractive techniques. More details here: https://github.com/docugami/DFM-benchmarks"
|
||||
@@ -133,7 +133,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -142,7 +142,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -170,15 +170,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Using embedded DuckDB without persistence: data will be transient\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embedding = OpenAIEmbeddings()\n",
|
||||
"vectordb = Chroma.from_documents(documents=documents, embedding=embedding)\n",
|
||||
@@ -197,11 +189,11 @@
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'query': 'What can tenants do with signage on their properties?',\n",
|
||||
" 'result': ' Tenants may place signs (digital or otherwise) or other form of identification on the premises after receiving written permission from the landlord which shall not be unreasonably withheld. The tenant is responsible for any damage caused to the premises and must conform to any applicable laws, ordinances, etc. governing the same. The tenant must also remove and clean any window or glass identification promptly upon vacating the premises.',\n",
|
||||
" 'source_documents': [Document(page_content='ARTICLE VI SIGNAGE 6.01 Signage . Tenant may place or attach to the Premises signs (digital or otherwise) or other such identification as needed after receiving written permission from the Landlord , which permission shall not be unreasonably withheld. Any damage caused to the Premises by the Tenant ’s erecting or removing such signs shall be repaired promptly by the Tenant at the Tenant ’s expense . Any signs or other form of identification allowed must conform to all applicable laws, ordinances, etc. governing the same. Tenant also agrees to have any window or glass identification completely removed and cleaned at its expense promptly upon vacating the Premises.', metadata={'xpath': '/docset:OFFICELEASEAGREEMENT-section/docset:OFFICELEASEAGREEMENT/docset:Article/docset:ARTICLEVISIGNAGE-section/docset:_601Signage-section/docset:_601Signage', 'id': 'v1bvgaozfkak', 'name': 'TruTone Lane 2.docx', 'structure': 'div', 'tag': '_601Signage', 'Landlord': 'BUBBA CENTER PARTNERSHIP', 'Tenant': 'Truetone Lane LLC'}),\n",
|
||||
" Document(page_content='Signage. Tenant may place or attach to the Premises signs (digital or otherwise) or other such identification as needed after receiving written permission from the Landlord , which permission shall not be unreasonably withheld. Any damage caused to the Premises by the Tenant ’s erecting or removing such signs shall be repaired promptly by the Tenant at the Tenant ’s expense . Any signs or other form of identification allowed must conform to all applicable laws, ordinances, etc. governing the same. Tenant also agrees to have any window or glass identification completely removed and cleaned at its expense promptly upon vacating the Premises. \\n\\n ARTICLE VII UTILITIES 7.01', metadata={'xpath': '/docset:OFFICELEASEAGREEMENT-section/docset:OFFICELEASEAGREEMENT/docset:ThisOFFICELEASEAGREEMENTThis/docset:ArticleIBasic/docset:ArticleIiiUseAndCareOf/docset:ARTICLEIIIUSEANDCAREOFPREMISES-section/docset:ARTICLEIIIUSEANDCAREOFPREMISES/docset:NoOtherPurposes/docset:TenantsResponsibility/dg:chunk', 'id': 'g2fvhekmltza', 'name': 'TruTone Lane 6.pdf', 'structure': 'lim', 'tag': 'chunk', 'Landlord': 'GLORY ROAD LLC', 'Tenant': 'Truetone Lane LLC'}),\n",
|
||||
" Document(page_content='Landlord , its agents, servants, employees, licensees, invitees, and contractors during the last year of the term of this Lease at any and all times during regular business hours, after 24 hour notice to tenant, to pass and repass on and through the Premises, or such portion thereof as may be necessary, in order that they or any of them may gain access to the Premises for the purpose of showing the Premises to potential new tenants or real estate brokers. In addition, Landlord shall be entitled to place a \"FOR RENT \" or \"FOR LEASE\" sign (not exceeding 8.5 ” x 11 ”) in the front window of the Premises during the last six months of the term of this Lease .', metadata={'xpath': '/docset:Rider/docset:RIDERTOLEASE-section/docset:RIDERTOLEASE/docset:FixedRent/docset:TermYearPeriod/docset:Lease/docset:_42FLandlordSAccess-section/docset:_42FLandlordSAccess/docset:LandlordsRights/docset:Landlord', 'id': 'omvs4mysdk6b', 'name': 'TruTone Lane 1.docx', 'structure': 'p', 'tag': 'Landlord', 'Landlord': 'BIRCH STREET , LLC', 'Tenant': 'Trutone Lane LLC'}),\n",
|
||||
" Document(page_content=\"24. SIGNS . No signage shall be placed by Tenant on any portion of the Project . However, Tenant shall be permitted to place a sign bearing its name in a location approved by Landlord near the entrance to the Premises (at Tenant's cost ) and will be furnished a single listing of its name in the Building's directory (at Landlord 's cost ), all in accordance with the criteria adopted from time to time by Landlord for the Project . Any changes or additional listings in the directory shall be furnished (subject to availability of space) for the then Building Standard charge .\", metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Period/docset:ApplicableSalesTax/docset:PercentageRent/docset:TheTerms/docset:Indemnification/docset:INDEMNIFICATION-section/docset:INDEMNIFICATION/docset:Waiver/docset:Waiver/docset:Signs/docset:SIGNS-section/docset:SIGNS', 'id': 'qkn9cyqsiuch', 'name': 'Shorebucks LLC_AZ.pdf', 'structure': 'div', 'tag': 'SIGNS', 'Landlord': 'Menlo Group', 'Tenant': 'Shorebucks LLC'})]}"
|
||||
" 'result': \" Tenants can place or attach signs (digital or otherwise) to their premises with written permission from the landlord. The signs must conform to all applicable laws, ordinances, etc. governing the same. Tenants can also have their name listed in the building's directory at the landlord's cost.\",\n",
|
||||
" 'source_documents': [Document(page_content='ARTICLE VI SIGNAGE 6.01 Signage . Tenant may place or attach to the Premises signs (digital or otherwise) or other such identification as needed after receiving written permission from the Landlord , which permission shall not be unreasonably withheld. Any damage caused to the Premises by the Tenant ’s erecting or removing such signs shall be repaired promptly by the Tenant at the Tenant ’s expense . Any signs or other form of identification allowed must conform to all applicable laws, ordinances, etc. governing the same. Tenant also agrees to have any window or glass identification completely removed and cleaned at its expense promptly upon vacating the Premises.', metadata={'Landlord': 'BUBBA CENTER PARTNERSHIP', 'Lease Date': 'April 24 \\n\\n ,', 'Lease Parties': 'This OFFICE LEASE AGREEMENT (this \"Lease\") is made and entered into by and between BUBBA CENTER PARTNERSHIP (\" Landlord \"), and Truetone Lane LLC , a Delaware limited liability company (\" Tenant \").', 'Tenant': 'Truetone Lane LLC', 'id': 'v1bvgaozfkak', 'source': 'TruTone Lane 2.docx', 'structure': 'div', 'tag': '_601Signage', 'xpath': '/docset:OFFICELEASEAGREEMENT-section/docset:OFFICELEASEAGREEMENT/docset:Article/docset:ARTICLEVISIGNAGE-section/docset:_601Signage-section/docset:_601Signage'}),\n",
|
||||
" Document(page_content='Signage. Tenant may place or attach to the Premises signs (digital or otherwise) or other such identification as needed after receiving written permission from the Landlord , which permission shall not be unreasonably withheld. Any damage caused to the Premises by the Tenant ’s erecting or removing such signs shall be repaired promptly by the Tenant at the Tenant ’s expense . Any signs or other form of identification allowed must conform to all applicable laws, ordinances, etc. governing the same. Tenant also agrees to have any window or glass identification completely removed and cleaned at its expense promptly upon vacating the Premises. \\n\\n ARTICLE VII UTILITIES 7.01', metadata={'Landlord': 'GLORY ROAD LLC', 'Lease Date': 'April 30 , 2020', 'Lease Parties': 'This OFFICE LEASE AGREEMENT (this \"Lease\") is made and entered into by and between GLORY ROAD LLC (\" Landlord \"), and Truetone Lane LLC , a Delaware limited liability company (\" Tenant \").', 'Tenant': 'Truetone Lane LLC', 'id': 'g2fvhekmltza', 'source': 'TruTone Lane 6.pdf', 'structure': 'lim', 'tag': 'chunk', 'xpath': '/docset:OFFICELEASEAGREEMENT-section/docset:OFFICELEASEAGREEMENT/docset:Article/docset:ArticleIiiUse/docset:ARTICLEIIIUSEANDCAREOFPREMISES-section/docset:ARTICLEIIIUSEANDCAREOFPREMISES/docset:AnyTime/docset:Addition/dg:chunk'}),\n",
|
||||
" Document(page_content='Landlord , its agents, servants, employees, licensees, invitees, and contractors during the last year of the term of this Lease at any and all times during regular business hours, after 24 hour notice to tenant, to pass and repass on and through the Premises, or such portion thereof as may be necessary, in order that they or any of them may gain access to the Premises for the purpose of showing the Premises to potential new tenants or real estate brokers. In addition, Landlord shall be entitled to place a \"FOR RENT \" or \"FOR LEASE\" sign (not exceeding 8.5 ” x 11 ”) in the front window of the Premises during the last six months of the term of this Lease .', metadata={'Landlord': 'BIRCH STREET , LLC', 'Lease Date': 'October 15 , 2021', 'Lease Parties': 'The provisions of this rider are hereby incorporated into and made a part of the Lease dated as of October 15 , 2021 between BIRCH STREET , LLC , having an address at c/o Birch Palace , 6 Grace Avenue Suite 200 , Great Neck , New York 11021 (\" Landlord \"), and Trutone Lane LLC , having an address at 4 Pearl Street , New York , New York 10012 (\" Tenant \") of Premises known as the ground floor space and lower level space, as per floor plan annexed hereto and made a part hereof as Exhibit A (“Premises”) at 4 Pearl Street , New York , New York 10012 in the City of New York , Borough of Manhattan , to which this rider is annexed. If there is any conflict between the provisions of this rider and the remainder of this Lease , the provisions of this rider shall govern.', 'Tenant': 'Trutone Lane LLC', 'id': 'omvs4mysdk6b', 'source': 'TruTone Lane 1.docx', 'structure': 'p', 'tag': 'Landlord', 'xpath': '/docset:Rider/docset:RIDERTOLEASE-section/docset:RIDERTOLEASE/docset:FixedRent/docset:TermYearPeriod/docset:Lease/docset:_42FLandlordSAccess-section/docset:_42FLandlordSAccess/docset:LandlordsRights/docset:Landlord'}),\n",
|
||||
" Document(page_content=\"24. SIGNS . No signage shall be placed by Tenant on any portion of the Project . However, Tenant shall be permitted to place a sign bearing its name in a location approved by Landlord near the entrance to the Premises (at Tenant's cost ) and will be furnished a single listing of its name in the Building's directory (at Landlord 's cost ), all in accordance with the criteria adopted from time to time by Landlord for the Project . Any changes or additional listings in the directory shall be furnished (subject to availability of space) for the then Building Standard charge .\", metadata={'Landlord': 'Perry & Blair LLC', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'dsyfhh4vpeyf', 'source': 'Shorebucks LLC_CO.pdf', 'structure': 'div', 'tag': 'SIGNS', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:ThisLease-section/docset:ThisLease/docset:Guaranty-section/docset:Guaranty[2]/docset:TheTransfer/docset:TheTerms/docset:Indemnification/docset:INDEMNIFICATION-section/docset:INDEMNIFICATION/docset:Waiver/docset:Waiver/docset:Signs/docset:SIGNS-section/docset:SIGNS'})]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
@@ -233,7 +225,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' 9,753 square feet'"
|
||||
"' 9,753 square feet.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
@@ -243,31 +235,31 @@
|
||||
],
|
||||
"source": [
|
||||
"chain_response = qa_chain(\"What is rentable area for the property owned by DHA Group?\")\n",
|
||||
"chain_response[\"result\"] # the correct answer should be 13,500"
|
||||
"chain_response[\"result\"] # correct answer should be 13,500 sq ft"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"At first glance the answer may seem reasonable, but if you review the source chunks carefully for this answer, you will see that the chunking of the document did not end up putting the Landlord name and the rentable area in the same context, since they are far apart in the document. The retriever therefore ends up finding unrelated chunks from other documents not even related to the **Menlo Group** landlord. That landlord happens to be mentioned on the first page of the file **Shorebucks LLC_NJ.pdf** file, and while one of the source chunks used by the chain is indeed from that doc that contains the correct answer (**13,500**), other source chunks from different docs are included, and the answer is therefore incorrect."
|
||||
"At first glance the answer may seem reasonable, but if you review the source chunks carefully for this answer, you will see that the chunking of the document did not end up putting the Landlord name and the rentable area in the same context, since they are far apart in the document. The retriever therefore ends up finding unrelated chunks from other documents not even related to the **DHA Group** landlord. That landlord happens to be mentioned on the first page of the file **Shorebucks LLC_NJ.pdf** file, and while one of the source chunks used by the chain is indeed from that doc that contains the correct answer (**13,500**), other source chunks from different docs are included, and the answer is therefore incorrect."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='1.1 Landlord . DHA Group , a Delaware limited liability company authorized to transact business in New Jersey .', metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/docset:DhaGroup/docset:DhaGroup/docset:Landlord-section/docset:DhaGroup', 'id': 'md8rieecquyv', 'name': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'DhaGroup', 'Landlord': 'DHA Group', 'Tenant': 'Shorebucks LLC'}),\n",
|
||||
" Document(page_content='WITNESSES: LANDLORD: DHA Group , a Delaware limited liability company', metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Guaranty-section/docset:Guaranty[2]/docset:SIGNATURESONNEXTPAGE-section/docset:INWITNESSWHEREOF-section/docset:INWITNESSWHEREOF/docset:Behalf/docset:Witnesses/xhtml:table/xhtml:tbody/xhtml:tr[3]/xhtml:td[2]/docset:DhaGroup', 'id': 'md8rieecquyv', 'name': 'Shorebucks LLC_NJ.pdf', 'structure': 'p', 'tag': 'DhaGroup', 'Landlord': 'DHA Group', 'Tenant': 'Shorebucks LLC'}),\n",
|
||||
" Document(page_content=\"1.16 Landlord 's Notice Address . DHA Group , Suite 1010 , 111 Bauer Dr , Oakland , New Jersey , 07436 , with a copy to the Building Management Office at the Project , Attention: On - Site Property Manager .\", metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Period/docset:ApplicableSalesTax/docset:PercentageRent/docset:PercentageRent/docset:NoticeAddress[2]/docset:LandlordsNoticeAddress-section/docset:LandlordsNoticeAddress[2]', 'id': 'md8rieecquyv', 'name': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'LandlordsNoticeAddress', 'Landlord': 'DHA Group', 'Tenant': 'Shorebucks LLC'}),\n",
|
||||
" Document(page_content='1.6 Rentable Area of the Premises. 9,753 square feet . This square footage figure includes an add-on factor for Common Areas in the Building and has been agreed upon by the parties as final and correct and is not subject to challenge or dispute by either party.', metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:PerryBlair/docset:PerryBlair/docset:Premises[2]/docset:RentableAreaofthePremises-section/docset:RentableAreaofthePremises', 'id': 'dsyfhh4vpeyf', 'name': 'Shorebucks LLC_CO.pdf', 'structure': 'div', 'tag': 'RentableAreaofthePremises', 'Landlord': 'Perry & Blair LLC', 'Tenant': 'Shorebucks LLC'})]"
|
||||
"[Document(page_content='1.1 Landlord . DHA Group , a Delaware limited liability company authorized to transact business in New Jersey .', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'DhaGroup', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/docset:DhaGroup/docset:DhaGroup/docset:Landlord-section/docset:DhaGroup'}),\n",
|
||||
" Document(page_content='WITNESSES: LANDLORD: DHA Group , a Delaware limited liability company', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'p', 'tag': 'DhaGroup', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Guaranty-section/docset:Guaranty[2]/docset:SIGNATURESONNEXTPAGE-section/docset:INWITNESSWHEREOF-section/docset:INWITNESSWHEREOF/docset:Behalf/docset:Witnesses/xhtml:table/xhtml:tbody/xhtml:tr[3]/xhtml:td[2]/docset:DhaGroup'}),\n",
|
||||
" Document(page_content=\"1.16 Landlord 's Notice Address . DHA Group , Suite 1010 , 111 Bauer Dr , Oakland , New Jersey , 07436 , with a copy to the Building Management Office at the Project , Attention: On - Site Property Manager .\", metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'LandlordsNoticeAddress', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Period/docset:ApplicableSalesTax/docset:PercentageRent/docset:PercentageRent/docset:NoticeAddress[2]/docset:LandlordsNoticeAddress-section/docset:LandlordsNoticeAddress[2]'}),\n",
|
||||
" Document(page_content='1.6 Rentable Area of the Premises. 9,753 square feet . This square footage figure includes an add-on factor for Common Areas in the Building and has been agreed upon by the parties as final and correct and is not subject to challenge or dispute by either party.', metadata={'Landlord': 'Perry & Blair LLC', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'dsyfhh4vpeyf', 'source': 'Shorebucks LLC_CO.pdf', 'structure': 'div', 'tag': 'RentableAreaofthePremises', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:PerryBlair/docset:PerryBlair/docset:Premises[2]/docset:RentableAreaofthePremises-section/docset:RentableAreaofthePremises'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -287,22 +279,24 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'xpath': '/docset:OFFICELEASEAGREEMENT-section/docset:OFFICELEASEAGREEMENT/docset:ThisOfficeLeaseAgreement',\n",
|
||||
"{'xpath': '/docset:OFFICELEASEAGREEMENT-section/docset:OFFICELEASEAGREEMENT/docset:LeaseParties',\n",
|
||||
" 'id': 'v1bvgaozfkak',\n",
|
||||
" 'name': 'TruTone Lane 2.docx',\n",
|
||||
" 'source': 'TruTone Lane 2.docx',\n",
|
||||
" 'structure': 'p',\n",
|
||||
" 'tag': 'ThisOfficeLeaseAgreement',\n",
|
||||
" 'tag': 'LeaseParties',\n",
|
||||
" 'Lease Date': 'April 24 \\n\\n ,',\n",
|
||||
" 'Landlord': 'BUBBA CENTER PARTNERSHIP',\n",
|
||||
" 'Tenant': 'Truetone Lane LLC'}"
|
||||
" 'Tenant': 'Truetone Lane LLC',\n",
|
||||
" 'Lease Parties': 'This OFFICE LEASE AGREEMENT (this \"Lease\") is made and entered into by and between BUBBA CENTER PARTNERSHIP (\" Landlord \"), and Truetone Lane LLC , a Delaware limited liability company (\" Tenant \").'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -322,17 +316,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Using embedded DuckDB without persistence: data will be transient\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
@@ -372,22 +358,30 @@
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/root/Source/github/docugami.langchain/libs/langchain/langchain/chains/llm.py:275: UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query='rentable area' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='Landlord', value='DHA Group')\n"
|
||||
"query='rentable area' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='Landlord', value='DHA Group') limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'query': 'What is rentable area for the property owned by DHA Group?',\n",
|
||||
" 'result': ' 13,500 square feet.',\n",
|
||||
" 'source_documents': [Document(page_content='1.1 Landlord . DHA Group , a Delaware limited liability company authorized to transact business in New Jersey .', metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/docset:DhaGroup/docset:DhaGroup/docset:Landlord-section/docset:DhaGroup', 'id': 'md8rieecquyv', 'name': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'DhaGroup', 'Landlord': 'DHA Group', 'Tenant': 'Shorebucks LLC'}),\n",
|
||||
" Document(page_content='WITNESSES: LANDLORD: DHA Group , a Delaware limited liability company', metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Guaranty-section/docset:Guaranty[2]/docset:SIGNATURESONNEXTPAGE-section/docset:INWITNESSWHEREOF-section/docset:INWITNESSWHEREOF/docset:Behalf/docset:Witnesses/xhtml:table/xhtml:tbody/xhtml:tr[3]/xhtml:td[2]/docset:DhaGroup', 'id': 'md8rieecquyv', 'name': 'Shorebucks LLC_NJ.pdf', 'structure': 'p', 'tag': 'DhaGroup', 'Landlord': 'DHA Group', 'Tenant': 'Shorebucks LLC'}),\n",
|
||||
" Document(page_content=\"1.16 Landlord 's Notice Address . DHA Group , Suite 1010 , 111 Bauer Dr , Oakland , New Jersey , 07436 , with a copy to the Building Management Office at the Project , Attention: On - Site Property Manager .\", metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Period/docset:ApplicableSalesTax/docset:PercentageRent/docset:PercentageRent/docset:NoticeAddress[2]/docset:LandlordsNoticeAddress-section/docset:LandlordsNoticeAddress[2]', 'id': 'md8rieecquyv', 'name': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'LandlordsNoticeAddress', 'Landlord': 'DHA Group', 'Tenant': 'Shorebucks LLC'}),\n",
|
||||
" Document(page_content='1.6 Rentable Area of the Premises. 13,500 square feet . This square footage figure includes an add-on factor for Common Areas in the Building and has been agreed upon by the parties as final and correct and is not subject to challenge or dispute by either party.', metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/docset:DhaGroup/docset:Premises[2]/docset:RentableAreaofthePremises-section/docset:RentableAreaofthePremises', 'id': 'md8rieecquyv', 'name': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'RentableAreaofthePremises', 'Landlord': 'DHA Group', 'Tenant': 'Shorebucks LLC'})]}"
|
||||
" 'result': ' The rentable area for the property owned by DHA Group is 13,500 square feet.',\n",
|
||||
" 'source_documents': [Document(page_content='1.6 Rentable Area of the Premises. 13,500 square feet . This square footage figure includes an add-on factor for Common Areas in the Building and has been agreed upon by the parties as final and correct and is not subject to challenge or dispute by either party.', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'RentableAreaofthePremises', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/docset:DhaGroup/docset:Premises[2]/docset:RentableAreaofthePremises-section/docset:RentableAreaofthePremises'}),\n",
|
||||
" Document(page_content='1.6 Rentable Area of the Premises. 13,500 square feet . This square footage figure includes an add-on factor for Common Areas in the Building and has been agreed upon by the parties as final and correct and is not subject to challenge or dispute by either party.', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'RentableAreaofthePremises', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/docset:DhaGroup/docset:Premises[2]/docset:RentableAreaofthePremises-section/docset:RentableAreaofthePremises'}),\n",
|
||||
" Document(page_content='1.11 Percentage Rent . (a) 55 % of Gross Revenue to Landlord until Landlord receives Percentage Rent in an amount equal to the Annual Market Rent Hurdle (as escalated); and', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'p', 'tag': 'GrossRevenue', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Period/docset:ApplicableSalesTax/docset:PercentageRent/docset:PercentageRent/docset:PercentageRent/docset:PercentageRent-section/docset:PercentageRent[2]/docset:PercentageRent/docset:GrossRevenue[1]/docset:GrossRevenue'}),\n",
|
||||
" Document(page_content='1.11 Percentage Rent . (a) 55 % of Gross Revenue to Landlord until Landlord receives Percentage Rent in an amount equal to the Annual Market Rent Hurdle (as escalated); and', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'p', 'tag': 'GrossRevenue', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Period/docset:ApplicableSalesTax/docset:PercentageRent/docset:PercentageRent/docset:PercentageRent/docset:PercentageRent-section/docset:PercentageRent[2]/docset:PercentageRent/docset:GrossRevenue[1]/docset:GrossRevenue'})]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
@@ -396,7 +390,9 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"qa_chain(\"What is rentable area for the property owned by DHA Group?\")"
|
||||
"qa_chain(\n",
|
||||
" \"What is rentable area for the property owned by DHA Group?\"\n",
|
||||
") # correct answer should be 13,500 sq ft"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -423,7 +419,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -173,7 +173,7 @@
|
||||
"source": [
|
||||
"from langchain.document_loaders import GitLoader\n",
|
||||
"\n",
|
||||
"# eg. loading only python files\n",
|
||||
"# e.g. loading only python files\n",
|
||||
"loader = GitLoader(\n",
|
||||
" repo_path=\"./example_data/test_repo1/\",\n",
|
||||
" file_filter=lambda file_path: file_path.endswith(\".py\"),\n",
|
||||
|
||||
@@ -0,0 +1,105 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Microsoft SharePoint\n",
|
||||
"\n",
|
||||
"> [Microsoft SharePoint](https://en.wikipedia.org/wiki/SharePoint) is a website-based collaboration system that uses workflow applications, “list” databases, and other web parts and security features to empower business teams to work together developed by Microsoft.\n",
|
||||
"\n",
|
||||
"This notebook covers how to load documents from the [SharePoint Document Library](https://support.microsoft.com/en-us/office/what-is-a-document-library-3b5976dd-65cf-4c9e-bf5a-713c10ca2872). Currently, only docx, doc, and pdf files are supported.\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"1. Register an application with the [Microsoft identity platform](https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app) instructions.\n",
|
||||
"2. When registration finishes, the Azure portal displays the app registration's Overview pane. You see the Application (client) ID. Also called the `client ID`, this value uniquely identifies your application in the Microsoft identity platform.\n",
|
||||
"3. During the steps you will be following at **item 1**, you can set the redirect URI as `https://login.microsoftonline.com/common/oauth2/nativeclient`\n",
|
||||
"4. During the steps you will be following at **item 1**, generate a new password (`client_secret`) under Application Secrets section.\n",
|
||||
"5. Follow the instructions at this [document](https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-configure-app-expose-web-apis#add-a-scope) to add the following `SCOPES` (`offline_access` and `Sites.Read.All`) to your application.\n",
|
||||
"6. To retrieve files from your **Document Library**, you will need its ID. To obtain it, you will need values of `Tenant Name`, `Collection ID`, and `Subsite ID`.\n",
|
||||
"7. To find your `Tenant Name` follow the instructions at this [document](https://learn.microsoft.com/en-us/azure/active-directory-b2c/tenant-management-read-tenant-name). Once you got this, just remove `.onmicrosoft.com` from the value and hold the rest as your `Tenant Name`.\n",
|
||||
"8. To obtain your `Collection ID` and `Subsite ID`, you will need your **SharePoint** `site-name`. Your `SharePoint` site URL has the following format `https://<tenant-name>.sharepoint.com/sites/<site-name>`. The last part of this URL is the `site-name`.\n",
|
||||
"9. To Get the Site `Collection ID`, hit this URL in the browser: `https://<tenant>.sharepoint.com/sites/<site-name>/_api/site/id` and copy the value of the `Edm.Guid` property.\n",
|
||||
"10. To get the `Subsite ID` (or web ID) use: `https://<tenant>.sharepoint.com/<site-name>/_api/web/id` and copy the value of the `Edm.Guid` property.\n",
|
||||
"11. The `SharePoint site ID` has the following format: `<tenant-name>.sharepoint.com,<Collection ID>,<subsite ID>`. You can hold that value to use in the next step.\n",
|
||||
"12. Visit the [Graph Explorer Playground](https://developer.microsoft.com/en-us/graph/graph-explorer) to obtain your `Document Library ID`. The first step is to ensure you are logged in with the account associated with your **SharePoint** site. Then you need to make a request to `https://graph.microsoft.com/v1.0/sites/<SharePoint site ID>/drive` and the response will return a payload with a field `id` that holds the ID of your `Document Library ID`.\n",
|
||||
"\n",
|
||||
"## 🧑 Instructions for ingesting your documents from SharePoint Document Library\n",
|
||||
"\n",
|
||||
"### 🔑 Authentication\n",
|
||||
"\n",
|
||||
"By default, the `SharePointLoader` expects that the values of `CLIENT_ID` and `CLIENT_SECRET` must be stored as environment variables named `O365_CLIENT_ID` and `O365_CLIENT_SECRET` respectively. You could pass those environment variables through a `.env` file at the root of your application or using the following command in your script.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"os.environ['O365_CLIENT_ID'] = \"YOUR CLIENT ID\"\n",
|
||||
"os.environ['O365_CLIENT_SECRET'] = \"YOUR CLIENT SECRET\"\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This loader uses an authentication called [*on behalf of a user*](https://learn.microsoft.com/en-us/graph/auth-v2-user?context=graph%2Fapi%2F1.0&view=graph-rest-1.0). It is a 2 step authentication with user consent. When you instantiate the loader, it will call will print a url that the user must visit to give consent to the app on the required permissions. The user must then visit this url and give consent to the application. Then the user must copy the resulting page url and paste it back on the console. The method will then return True if the login attempt was succesful.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain.document_loaders.sharepoint import SharePointLoader\n",
|
||||
"\n",
|
||||
"loader = SharePointLoader(document_library_id=\"YOUR DOCUMENT LIBRARY ID\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Once the authentication has been done, the loader will store a token (`o365_token.txt`) at `~/.credentials/` folder. This token could be used later to authenticate without the copy/paste steps explained earlier. To use this token for authentication, you need to change the `auth_with_token` parameter to True in the instantiation of the loader.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain.document_loaders.sharepoint import SharePointLoader\n",
|
||||
"\n",
|
||||
"loader = SharePointLoader(document_library_id=\"YOUR DOCUMENT LIBRARY ID\", auth_with_token=True)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"### 🗂️ Documents loader\n",
|
||||
"\n",
|
||||
"#### 📑 Loading documents from a Document Library Directory\n",
|
||||
"\n",
|
||||
"`SharePointLoader` can load documents from a specific folder within your Document Library. For instance, you want to load all documents that are stored at `Documents/marketing` folder within your Document Library.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain.document_loaders.sharepoint import SharePointLoader\n",
|
||||
"\n",
|
||||
"loader = SharePointLoader(document_library_id=\"YOUR DOCUMENT LIBRARY ID\", folder_path=\"Documents/marketing\", auth_with_token=True)\n",
|
||||
"documents = loader.load()\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"#### 📑 Loading documents from a list of Documents IDs\n",
|
||||
"\n",
|
||||
"Another possibility is to provide a list of `object_id` for each document you want to load. For that, you will need to query the [Microsoft Graph API](https://developer.microsoft.com/en-us/graph/graph-explorer) to find all the documents ID that you are interested in. This [link](https://learn.microsoft.com/en-us/graph/api/resources/onedrive?view=graph-rest-1.0#commonly-accessed-resources) provides a list of endpoints that will be helpful to retrieve the documents ID.\n",
|
||||
"\n",
|
||||
"For instance, to retrieve information about all objects that are stored at `data/finance/` folder, you need make a request to: `https://graph.microsoft.com/v1.0/drives/<document-library-id>/root:/data/finance:/children`. Once you have the list of IDs that you are interested in, then you can instantiate the loader with the following parameters.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain.document_loaders.sharepoint import SharePointLoader\n",
|
||||
"\n",
|
||||
"loader = SharePointLoader(document_library_id=\"YOUR DOCUMENT LIBRARY ID\", object_ids=[\"ID_1\", \"ID_2\"], auth_with_token=True)\n",
|
||||
"documents = loader.load()\n",
|
||||
"```\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,878 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3cebbe-079a-4bfe-b1a1-07bdac882ce2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Amazon Textract \n",
|
||||
"\n",
|
||||
"Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Today, many companies manually extract data from scanned documents such as PDFs, images, tables, and forms, or through simple OCR software that requires manual configuration (which often must be updated when the form changes). To overcome these manual and expensive processes, Textract uses ML to read and process any type of document, accurately extracting text, handwriting, tables, and other data with no manual effort. You can quickly automate document processing and act on the information extracted, whether you’re automating loans processing or extracting information from invoices and receipts. Textract can extract the data in minutes instead of hours or days.\n",
|
||||
"\n",
|
||||
"This sample demonstrates the use of Amazon Textract in combination with LangChain as a DocumentLoader.\n",
|
||||
"\n",
|
||||
"Textract supports PDF, TIFF, PNG and JPEG format.\n",
|
||||
"\n",
|
||||
"Check https://docs.aws.amazon.com/textract/latest/dg/limits-document.html for supported document sizes, languages and characters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "c049beaf-f904-4ce6-91ca-805da62084c2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.2.1\u001b[0m\n",
|
||||
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython -m pip install --upgrade pip\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!pip install langchain boto3 openai tiktoken python-dotenv -q"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "400b25c6-befa-4730-a201-39ff112c8858",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Sample 1\n",
|
||||
"\n",
|
||||
"The first example uses a local file, which internally will be send to Amazon Textract sync API [DetectDocumentText](https://docs.aws.amazon.com/textract/latest/dg/API_DetectDocumentText.html). \n",
|
||||
"\n",
|
||||
"Local files or URL endpoints like HTTP:// are limited to one page documents for Textract.\n",
|
||||
"Multi-page documents have to reside on S3. This sample file is a jpeg."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "1becee92-e82f-42d4-9b4e-b23d77cbe88d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AmazonTextractPDFLoader\n",
|
||||
"loader = AmazonTextractPDFLoader(\"example_data/alejandro_rosalez_sample-small.jpeg\")\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d566dc56-c9a9-44ec-84fb-a81928f90d40",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Output from the file"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "1272ce8c-d298-4059-ac0a-780bf5f82302",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No ', metadata={'source': 'example_data/alejandro_rosalez_sample-small.jpeg', 'page': 1})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4cf7f19c-3635-453a-9c76-4baf98b8d7f4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Sample 2\n",
|
||||
"The next sample loads a file from an HTTPS endpoint. \n",
|
||||
"It has to be single page, as Amazon Textract requires all multi-page documents to be stored on S3."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "10374bfb-b325-451f-8bd0-c686710ab68c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AmazonTextractPDFLoader\n",
|
||||
"loader = AmazonTextractPDFLoader(\"https://amazon-textract-public-content.s3.us-east-2.amazonaws.com/langchain/alejandro_rosalez_sample_1.jpg\")\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "16a2b6a3-7514-4c2c-a427-6847169af473",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No ', metadata={'source': 'example_data/alejandro_rosalez_sample-small.jpeg', 'page': 1})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a9cd8ec-e663-4dc7-9db1-d2f575253141",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Sample 3\n",
|
||||
"\n",
|
||||
"Processing a multi-page document requires the document to be on S3. The sample document resides in a bucket in us-east-2 and Textract needs to be called in that same region to be successful, so we set the region_name on the client and pass that in to the loader to ensure Textract is called from us-east-2. You could also to have your notebook running in us-east-2, setting the AWS_DEFAULT_REGION set to us-east-2 or when running in a different environment, pass in a boto3 Textract client with that region name like in the cell below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "8185e3e6-9599-4a47-8969-d6dcef3e6404",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import boto3\n",
|
||||
"textract_client = boto3.client('textract', region_name='us-east-2')\n",
|
||||
"\n",
|
||||
"file_path = \"s3://amazon-textract-public-content/langchain/layout-parser-paper.pdf\"\n",
|
||||
"loader = AmazonTextractPDFLoader(file_path, client=textract_client)\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b8901eec-070d-4fd6-9d65-52211d332441",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now getting the number of pages to validate the response (printing out the full response would be quite long...). We expect 16 pages."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "b23c01c8-cf69-4fe2-8141-4621edb7d79c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"16"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b3e41b4d-b159-4274-89be-80d8159134ef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using the AmazonTextractPDFLoader in an LangChain chain (e. g. OpenAI)\n",
|
||||
"\n",
|
||||
"The AmazonTextractPDFLoader can be used in a chain the same way the other loaders are used.\n",
|
||||
"Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this sample, which is worth checking out as well."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "53c47b24-cc06-4256-9e5b-a82fc80bc55d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You can store your OPENAI_API_KEY in a .env file as well\n",
|
||||
"# import os \n",
|
||||
"# from dotenv import load_dotenv\n",
|
||||
"\n",
|
||||
"# load_dotenv()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "a9ae004c-246c-4c7f-8458-191cd7424a9b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Or set the OpenAI key in the environment directly\n",
|
||||
"import os \n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"your-OpenAI-API-key\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "d52b089c-10ca-45fb-8669-8a1c5fee10d5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The authors are Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li, Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N., Peters, M., Schmitz, M., Zettlemoyer, L., Lukasz Garncarek, Powalski, R., Stanislawek, T., Topolski, B., Halama, P., Gralinski, F., Graves, A., Fernández, S., Gomez, F., Schmidhuber, J., Harley, A.W., Ufkes, A., Derpanis, K.G., He, K., Gkioxari, G., Dollár, P., Girshick, R., He, K., Zhang, X., Ren, S., Sun, J., Kay, A., Lamiroy, B., Lopresti, D., Mears, J., Jakeway, E., Ferriter, M., Adams, C., Yarasavage, N., Thomas, D., Zwaard, K., Li, M., Cui, L., Huang,'"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains.question_answering import load_qa_chain\n",
|
||||
"\n",
|
||||
"chain = load_qa_chain(llm=OpenAI(), chain_type=\"map_reduce\")\n",
|
||||
"query = [\"Who are the autors?\"]\n",
|
||||
"\n",
|
||||
"chain.run(input_documents=documents, question=query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1a09d18b-ab7b-468e-ae66-f92abf666b9b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"availableInstances": [
|
||||
{
|
||||
"_defaultOrder": 0,
|
||||
"_isFastLaunch": true,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 4,
|
||||
"name": "ml.t3.medium",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 1,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 8,
|
||||
"name": "ml.t3.large",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 2,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.t3.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 3,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.t3.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 4,
|
||||
"_isFastLaunch": true,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 8,
|
||||
"name": "ml.m5.large",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 5,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.m5.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 6,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.m5.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 7,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 64,
|
||||
"name": "ml.m5.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 8,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 128,
|
||||
"name": "ml.m5.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 9,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 192,
|
||||
"name": "ml.m5.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 10,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 256,
|
||||
"name": "ml.m5.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 11,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 384,
|
||||
"name": "ml.m5.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 12,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 8,
|
||||
"name": "ml.m5d.large",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 13,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.m5d.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 14,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.m5d.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 15,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 64,
|
||||
"name": "ml.m5d.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 16,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 128,
|
||||
"name": "ml.m5d.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 17,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 192,
|
||||
"name": "ml.m5d.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 18,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 256,
|
||||
"name": "ml.m5d.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 19,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 384,
|
||||
"name": "ml.m5d.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 20,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": true,
|
||||
"memoryGiB": 0,
|
||||
"name": "ml.geospatial.interactive",
|
||||
"supportedImageNames": [
|
||||
"sagemaker-geospatial-v1-0"
|
||||
],
|
||||
"vcpuNum": 0
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 21,
|
||||
"_isFastLaunch": true,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 4,
|
||||
"name": "ml.c5.large",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 22,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 8,
|
||||
"name": "ml.c5.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 23,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.c5.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 24,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.c5.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 25,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 72,
|
||||
"name": "ml.c5.9xlarge",
|
||||
"vcpuNum": 36
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 26,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 96,
|
||||
"name": "ml.c5.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 27,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 144,
|
||||
"name": "ml.c5.18xlarge",
|
||||
"vcpuNum": 72
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 28,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 192,
|
||||
"name": "ml.c5.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 29,
|
||||
"_isFastLaunch": true,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.g4dn.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 30,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.g4dn.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 31,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 64,
|
||||
"name": "ml.g4dn.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 32,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 128,
|
||||
"name": "ml.g4dn.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 33,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 4,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 192,
|
||||
"name": "ml.g4dn.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 34,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 256,
|
||||
"name": "ml.g4dn.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 35,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 61,
|
||||
"name": "ml.p3.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 36,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 4,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 244,
|
||||
"name": "ml.p3.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 37,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 8,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 488,
|
||||
"name": "ml.p3.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 38,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 8,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 768,
|
||||
"name": "ml.p3dn.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 39,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.r5.large",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 40,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.r5.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 41,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 64,
|
||||
"name": "ml.r5.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 42,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 128,
|
||||
"name": "ml.r5.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 43,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 256,
|
||||
"name": "ml.r5.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 44,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 384,
|
||||
"name": "ml.r5.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 45,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 512,
|
||||
"name": "ml.r5.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 46,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 768,
|
||||
"name": "ml.r5.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 47,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.g5.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 48,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.g5.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 49,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 64,
|
||||
"name": "ml.g5.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 50,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 128,
|
||||
"name": "ml.g5.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 51,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 256,
|
||||
"name": "ml.g5.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 52,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 4,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 192,
|
||||
"name": "ml.g5.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 53,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 4,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 384,
|
||||
"name": "ml.g5.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 54,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 8,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 768,
|
||||
"name": "ml.g5.48xlarge",
|
||||
"vcpuNum": 192
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 55,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 8,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 1152,
|
||||
"name": "ml.p4d.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 56,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 8,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 1152,
|
||||
"name": "ml.p4de.24xlarge",
|
||||
"vcpuNum": 96
|
||||
}
|
||||
],
|
||||
"instance_type": "ml.t3.medium",
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
225
docs/extras/integrations/document_loaders/polars_dataframe.ipynb
Normal file
@@ -0,0 +1,225 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "213a38a2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Polars DataFrame\n",
|
||||
"\n",
|
||||
"This notebook goes over how to load data from a [polars](https://pola-rs.github.io/polars-book/user-guide/) DataFrame."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "f6a7a9e4-80d6-486a-b2e3-636c568aa97c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install polars"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "79331964",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import polars as pl"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "e487044c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df = pl.read_csv(\"example_data/mlb_teams_2012.csv\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "ac273ca1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div><style>\n",
|
||||
".dataframe > thead > tr > th,\n",
|
||||
".dataframe > tbody > tr > td {\n",
|
||||
" text-align: right;\n",
|
||||
"}\n",
|
||||
"</style>\n",
|
||||
"<small>shape: (5, 3)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Team</th><th> "Payroll (millions)"</th><th> "Wins"</th></tr><tr><td>str</td><td>f64</td><td>i64</td></tr></thead><tbody><tr><td>"Nationals"</td><td>81.34</td><td>98</td></tr><tr><td>"Reds"</td><td>82.2</td><td>97</td></tr><tr><td>"Yankees"</td><td>197.96</td><td>95</td></tr><tr><td>"Giants"</td><td>117.62</td><td>94</td></tr><tr><td>"Braves"</td><td>83.31</td><td>94</td></tr></tbody></table></div>"
|
||||
],
|
||||
"text/plain": [
|
||||
"shape: (5, 3)\n",
|
||||
"┌───────────┬───────────────────────┬─────────┐\n",
|
||||
"│ Team ┆ \"Payroll (millions)\" ┆ \"Wins\" │\n",
|
||||
"│ --- ┆ --- ┆ --- │\n",
|
||||
"│ str ┆ f64 ┆ i64 │\n",
|
||||
"╞═══════════╪═══════════════════════╪═════════╡\n",
|
||||
"│ Nationals ┆ 81.34 ┆ 98 │\n",
|
||||
"│ Reds ┆ 82.2 ┆ 97 │\n",
|
||||
"│ Yankees ┆ 197.96 ┆ 95 │\n",
|
||||
"│ Giants ┆ 117.62 ┆ 94 │\n",
|
||||
"│ Braves ┆ 83.31 ┆ 94 │\n",
|
||||
"└───────────┴───────────────────────┴─────────┘"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "66e47a13",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import PolarsDataFrameLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "2334caca",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = PolarsDataFrameLoader(df, page_content_column=\"Team\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "d616c2b0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Nationals', metadata={' \"Payroll (millions)\"': 81.34, ' \"Wins\"': 98}),\n",
|
||||
" Document(page_content='Reds', metadata={' \"Payroll (millions)\"': 82.2, ' \"Wins\"': 97}),\n",
|
||||
" Document(page_content='Yankees', metadata={' \"Payroll (millions)\"': 197.96, ' \"Wins\"': 95}),\n",
|
||||
" Document(page_content='Giants', metadata={' \"Payroll (millions)\"': 117.62, ' \"Wins\"': 94}),\n",
|
||||
" Document(page_content='Braves', metadata={' \"Payroll (millions)\"': 83.31, ' \"Wins\"': 94}),\n",
|
||||
" Document(page_content='Athletics', metadata={' \"Payroll (millions)\"': 55.37, ' \"Wins\"': 94}),\n",
|
||||
" Document(page_content='Rangers', metadata={' \"Payroll (millions)\"': 120.51, ' \"Wins\"': 93}),\n",
|
||||
" Document(page_content='Orioles', metadata={' \"Payroll (millions)\"': 81.43, ' \"Wins\"': 93}),\n",
|
||||
" Document(page_content='Rays', metadata={' \"Payroll (millions)\"': 64.17, ' \"Wins\"': 90}),\n",
|
||||
" Document(page_content='Angels', metadata={' \"Payroll (millions)\"': 154.49, ' \"Wins\"': 89}),\n",
|
||||
" Document(page_content='Tigers', metadata={' \"Payroll (millions)\"': 132.3, ' \"Wins\"': 88}),\n",
|
||||
" Document(page_content='Cardinals', metadata={' \"Payroll (millions)\"': 110.3, ' \"Wins\"': 88}),\n",
|
||||
" Document(page_content='Dodgers', metadata={' \"Payroll (millions)\"': 95.14, ' \"Wins\"': 86}),\n",
|
||||
" Document(page_content='White Sox', metadata={' \"Payroll (millions)\"': 96.92, ' \"Wins\"': 85}),\n",
|
||||
" Document(page_content='Brewers', metadata={' \"Payroll (millions)\"': 97.65, ' \"Wins\"': 83}),\n",
|
||||
" Document(page_content='Phillies', metadata={' \"Payroll (millions)\"': 174.54, ' \"Wins\"': 81}),\n",
|
||||
" Document(page_content='Diamondbacks', metadata={' \"Payroll (millions)\"': 74.28, ' \"Wins\"': 81}),\n",
|
||||
" Document(page_content='Pirates', metadata={' \"Payroll (millions)\"': 63.43, ' \"Wins\"': 79}),\n",
|
||||
" Document(page_content='Padres', metadata={' \"Payroll (millions)\"': 55.24, ' \"Wins\"': 76}),\n",
|
||||
" Document(page_content='Mariners', metadata={' \"Payroll (millions)\"': 81.97, ' \"Wins\"': 75}),\n",
|
||||
" Document(page_content='Mets', metadata={' \"Payroll (millions)\"': 93.35, ' \"Wins\"': 74}),\n",
|
||||
" Document(page_content='Blue Jays', metadata={' \"Payroll (millions)\"': 75.48, ' \"Wins\"': 73}),\n",
|
||||
" Document(page_content='Royals', metadata={' \"Payroll (millions)\"': 60.91, ' \"Wins\"': 72}),\n",
|
||||
" Document(page_content='Marlins', metadata={' \"Payroll (millions)\"': 118.07, ' \"Wins\"': 69}),\n",
|
||||
" Document(page_content='Red Sox', metadata={' \"Payroll (millions)\"': 173.18, ' \"Wins\"': 69}),\n",
|
||||
" Document(page_content='Indians', metadata={' \"Payroll (millions)\"': 78.43, ' \"Wins\"': 68}),\n",
|
||||
" Document(page_content='Twins', metadata={' \"Payroll (millions)\"': 94.08, ' \"Wins\"': 66}),\n",
|
||||
" Document(page_content='Rockies', metadata={' \"Payroll (millions)\"': 78.06, ' \"Wins\"': 64}),\n",
|
||||
" Document(page_content='Cubs', metadata={' \"Payroll (millions)\"': 88.19, ' \"Wins\"': 61}),\n",
|
||||
" Document(page_content='Astros', metadata={' \"Payroll (millions)\"': 60.65, ' \"Wins\"': 55})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "beb55c2f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='Nationals' metadata={' \"Payroll (millions)\"': 81.34, ' \"Wins\"': 98}\n",
|
||||
"page_content='Reds' metadata={' \"Payroll (millions)\"': 82.2, ' \"Wins\"': 97}\n",
|
||||
"page_content='Yankees' metadata={' \"Payroll (millions)\"': 197.96, ' \"Wins\"': 95}\n",
|
||||
"page_content='Giants' metadata={' \"Payroll (millions)\"': 117.62, ' \"Wins\"': 94}\n",
|
||||
"page_content='Braves' metadata={' \"Payroll (millions)\"': 83.31, ' \"Wins\"': 94}\n",
|
||||
"page_content='Athletics' metadata={' \"Payroll (millions)\"': 55.37, ' \"Wins\"': 94}\n",
|
||||
"page_content='Rangers' metadata={' \"Payroll (millions)\"': 120.51, ' \"Wins\"': 93}\n",
|
||||
"page_content='Orioles' metadata={' \"Payroll (millions)\"': 81.43, ' \"Wins\"': 93}\n",
|
||||
"page_content='Rays' metadata={' \"Payroll (millions)\"': 64.17, ' \"Wins\"': 90}\n",
|
||||
"page_content='Angels' metadata={' \"Payroll (millions)\"': 154.49, ' \"Wins\"': 89}\n",
|
||||
"page_content='Tigers' metadata={' \"Payroll (millions)\"': 132.3, ' \"Wins\"': 88}\n",
|
||||
"page_content='Cardinals' metadata={' \"Payroll (millions)\"': 110.3, ' \"Wins\"': 88}\n",
|
||||
"page_content='Dodgers' metadata={' \"Payroll (millions)\"': 95.14, ' \"Wins\"': 86}\n",
|
||||
"page_content='White Sox' metadata={' \"Payroll (millions)\"': 96.92, ' \"Wins\"': 85}\n",
|
||||
"page_content='Brewers' metadata={' \"Payroll (millions)\"': 97.65, ' \"Wins\"': 83}\n",
|
||||
"page_content='Phillies' metadata={' \"Payroll (millions)\"': 174.54, ' \"Wins\"': 81}\n",
|
||||
"page_content='Diamondbacks' metadata={' \"Payroll (millions)\"': 74.28, ' \"Wins\"': 81}\n",
|
||||
"page_content='Pirates' metadata={' \"Payroll (millions)\"': 63.43, ' \"Wins\"': 79}\n",
|
||||
"page_content='Padres' metadata={' \"Payroll (millions)\"': 55.24, ' \"Wins\"': 76}\n",
|
||||
"page_content='Mariners' metadata={' \"Payroll (millions)\"': 81.97, ' \"Wins\"': 75}\n",
|
||||
"page_content='Mets' metadata={' \"Payroll (millions)\"': 93.35, ' \"Wins\"': 74}\n",
|
||||
"page_content='Blue Jays' metadata={' \"Payroll (millions)\"': 75.48, ' \"Wins\"': 73}\n",
|
||||
"page_content='Royals' metadata={' \"Payroll (millions)\"': 60.91, ' \"Wins\"': 72}\n",
|
||||
"page_content='Marlins' metadata={' \"Payroll (millions)\"': 118.07, ' \"Wins\"': 69}\n",
|
||||
"page_content='Red Sox' metadata={' \"Payroll (millions)\"': 173.18, ' \"Wins\"': 69}\n",
|
||||
"page_content='Indians' metadata={' \"Payroll (millions)\"': 78.43, ' \"Wins\"': 68}\n",
|
||||
"page_content='Twins' metadata={' \"Payroll (millions)\"': 94.08, ' \"Wins\"': 66}\n",
|
||||
"page_content='Rockies' metadata={' \"Payroll (millions)\"': 78.06, ' \"Wins\"': 64}\n",
|
||||
"page_content='Cubs' metadata={' \"Payroll (millions)\"': 88.19, ' \"Wins\"': 61}\n",
|
||||
"page_content='Astros' metadata={' \"Payroll (millions)\"': 60.65, ' \"Wins\"': 55}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Use lazy load for larger table, which won't read the full table into memory\n",
|
||||
"for i in loader.lazy_load():\n",
|
||||
" print(i)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -299,7 +299,7 @@
|
||||
"id": "1cf27fc8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you need to post process the `unstructured` elements after extraction, you can pass in a list of `Element` -> `Element` functions to the `post_processors` kwarg when you instantiate the `UnstructuredFileLoader`. This applies to other Unstructured loaders as well. Below is an example. Post processors are only applied if you run the loader in `\"elements\"` mode."
|
||||
"If you need to post process the `unstructured` elements after extraction, you can pass in a list of `str` -> `str` functions to the `post_processors` kwarg when you instantiate the `UnstructuredFileLoader`. This applies to other Unstructured loaders as well. Below is an example."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -495,7 +495,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.13"
|
||||
"version": "3.8.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
283
docs/extras/integrations/document_transformers/docai.ipynb
Normal file
@@ -0,0 +1,283 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "48438efb-9f0d-473b-a91c-9f1e29c2539d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.blob_loaders import Blob\n",
|
||||
"from langchain.document_loaders.parsers import DocAIParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f95ac25b-f025-40c3-95b8-77919fc4da7f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"DocAI is a Google Cloud platform to transform unstructured data from documents into structured data, making it easier to understand, analyze, and consume. You can read more about it: https://cloud.google.com/document-ai/docs/overview "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "51946817-798c-4d11-abd6-db2ae53a0270",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, you need to set up a GCS bucket and create your own OCR processor as described here: https://cloud.google.com/document-ai/docs/create-processor\n",
|
||||
"The GCS_OUTPUT_PATH should be a path to a folder on GCS (starting with `gs://`) and a processor name should look like `projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR_ID`. You can get it either programmatically or copy from the `Prediction endpoint` section of the `Processor details` tab in the Google Cloud Console."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "ac85f7f3-3ef6-41d5-920a-b55f2939c202",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"PROJECT = \"PUT_SOMETHING_HERE\"\n",
|
||||
"GCS_OUTPUT_PATH = \"PUT_SOMETHING_HERE\"\n",
|
||||
"PROCESSOR_NAME = \"PUT_SOMETHING_HERE\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fad2bcca-1c0e-4888-b82d-15823ba57e60",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's create a parser:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "dcc0c65a-86c5-448d-8b21-2e564b1903b7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"parser = DocAIParser(location=\"us\", processor_name=PROCESSOR_NAME, gcs_output_path=GCS_OUTPUT_PATH)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b8b5a3ff-650a-4ad3-a73a-395f86e4c9e1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's go and parse an Alphabet's take from here: https://abc.xyz/assets/a7/5b/9e5ae0364b12b4c883f3cf748226/goog-exhibit-99-1-q1-2023-19.pdf. Copy it to your GCS bucket first, and adjust the path below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "373cc18e-a311-4c8d-8180-47e4ade1d2ad",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"blob = Blob(path=\"gs://vertex-pgt/examples/goog-exhibit-99-1-q1-2023-19.pdf\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "6ef84fad-2981-456d-a6b4-3a6a1a46d511",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = list(parser.lazy_parse(blob))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3f8e4ee1-e07d-4c29-a120-4d56aae91859",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We'll get one document per page, 11 in total:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "343919f5-35d2-47fb-9790-de464649ebdf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"11\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(len(docs))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b104ae56-011b-4abe-ac07-e999c69494c5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can run end-to-end parsing of a blob one-by-one. If you have many documents, it might be a better approach to batch them together and maybe even detach parsing from handling the results of parsing."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "9ecc1b99-5cef-47b0-a125-dbb2c41d2224",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"['projects/543079149601/locations/us/operations/16447136779727347991']\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"operations = parser.docai_parse([blob])\n",
|
||||
"print([op.operation.name for op in operations])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a2d24d63-c2c7-454c-9df3-2a9cf51309a6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can check whether operations are finished:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "ab11efb0-e514-4f44-9ba5-3d638a59c9e6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"parser.is_running(operations)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "602ca0bc-080a-4a4e-a413-0e705aeab189",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And when they're finished, you can parse the results:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "ec1e6041-bc10-47d4-ba64-d09055c14f27",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"False"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"parser.is_running(operations)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "95d89da4-1c8a-413d-8473-ddd4a39375a5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"DocAIParsingResults(source_path='gs://vertex-pgt/examples/goog-exhibit-99-1-q1-2023-19.pdf', parsed_path='gs://vertex-pgt/test/run1/16447136779727347991/0')\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = parser.get_results(operations)\n",
|
||||
"print(results[0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "87e5b606-1679-46c7-9577-4cf9bc93a752",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And now we can finally generate Documents from parsed results:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "08e8878d-889b-41ad-9500-2f772d38782f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = list(parser.parse_from_results(results))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "c59525fb-448d-444b-8f12-c4aea791e19b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"11\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(len(docs))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"environment": {
|
||||
"kernel": "python3",
|
||||
"name": "common-cpu.m109",
|
||||
"type": "gcloud",
|
||||
"uri": "gcr.io/deeplearning-platform-release/base-cpu:m109"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
167
docs/extras/integrations/llms/bittensor.ipynb
Normal file
@@ -0,0 +1,167 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# NIBittensorLLM\n",
|
||||
"\n",
|
||||
"NIBittensorLLM is developed by [Neural Internet](https://neuralinternet.ai/), powered by [Bittensor](https://bittensor.com/).\n",
|
||||
"\n",
|
||||
"This LLM showcases true potential of decentralized AI by giving you the best response(s) from the Bittensor protocol, which consist of various AI models such as OpenAI, LLaMA2 etc.\n",
|
||||
"\n",
|
||||
"Users can view their logs, requests, and API keys on the [Validator Endpoint Frontend](https://api.neuralinternet.ai/). However, changes to the configuration are currently prohibited; otherwise, the user's queries will be blocked.\n",
|
||||
"\n",
|
||||
"If you encounter any difficulties or have any questions, please feel free to reach out to our developer on [GitHub](https://github.com/Kunj-2206), [Discord](https://discordapp.com/users/683542109248159777) or join our discord server for latest update and queries [Neural Internet](https://discord.gg/neuralinternet).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Different Parameter and response handling for NIBittensorLLM "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import langchain\n",
|
||||
"from langchain.llms import NIBittensorLLM\n",
|
||||
"import json\n",
|
||||
"from pprint import pprint\n",
|
||||
"\n",
|
||||
"langchain.debug = True\n",
|
||||
"\n",
|
||||
"# System parameter in NIBittensorLLM is optional but you can set whatever you want to perform with model\n",
|
||||
"llm_sys = NIBittensorLLM(\n",
|
||||
" system_prompt=\"Your task is to determine response based on user prompt.Explain me like I am technical lead of a project\"\n",
|
||||
")\n",
|
||||
"sys_resp = llm_sys(\n",
|
||||
" \"What is bittensor and What are the potential benifits of decentralized AI?\"\n",
|
||||
")\n",
|
||||
"print(f\"Response provided by LLM with system prompt set is : {sys_resp}\")\n",
|
||||
"\n",
|
||||
"# The top_responses parameter can give multiple responses based on its parameter value\n",
|
||||
"# This below code retrive top 10 miner's response all the response are in format of json\n",
|
||||
"\n",
|
||||
"# Json response structure is\n",
|
||||
"\"\"\" {\n",
|
||||
" \"choices\": [\n",
|
||||
" {\"index\": Bittensor's Metagraph index number,\n",
|
||||
" \"uid\": Unique Identifier of a miner,\n",
|
||||
" \"responder_hotkey\": Hotkey of a miner,\n",
|
||||
" \"message\":{\"role\":\"assistant\",\"content\": Contains actual response},\n",
|
||||
" \"response_ms\": Time in millisecond required to fetch response from a miner} \n",
|
||||
" ]\n",
|
||||
" } \"\"\"\n",
|
||||
"\n",
|
||||
"multi_response_llm = NIBittensorLLM(top_responses=10)\n",
|
||||
"multi_resp = multi_response_llm(\"What is Neural Network Feeding Mechanism?\")\n",
|
||||
"json_multi_resp = json.loads(multi_resp)\n",
|
||||
"pprint(json_multi_resp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using NIBittensorLLM with LLMChain and PromptTemplate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import langchain\n",
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"from langchain.llms import NIBittensorLLM\n",
|
||||
"\n",
|
||||
"langchain.debug = True\n",
|
||||
"\n",
|
||||
"template = \"\"\"Question: {question}\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step.\"\"\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
|
||||
"\n",
|
||||
"# System parameter in NIBittensorLLM is optional but you can set whatever you want to perform with model\n",
|
||||
"llm = NIBittensorLLM(system_prompt=\"Your task is to determine response based on user prompt.\")\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"question = \"What is bittensor?\"\n",
|
||||
"\n",
|
||||
"llm_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using NIBittensorLLM with Conversational Agent and Google Search Tool"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import (\n",
|
||||
" AgentType,\n",
|
||||
" initialize_agent,\n",
|
||||
" load_tools,\n",
|
||||
" ZeroShotAgent,\n",
|
||||
" Tool,\n",
|
||||
" AgentExecutor,\n",
|
||||
")\n",
|
||||
"from langchain.memory import ConversationBufferMemory\n",
|
||||
"from langchain import LLMChain, PromptTemplate\n",
|
||||
"from langchain.utilities import GoogleSearchAPIWrapper, SerpAPIWrapper\n",
|
||||
"from langchain.llms import NIBittensorLLM\n",
|
||||
"\n",
|
||||
"memory = ConversationBufferMemory(memory_key=\"chat_history\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"prefix = \"\"\"Answer prompt based on LLM if there is need to search something then use internet and observe internet result and give accurate reply of user questions also try to use authenticated sources\"\"\"\n",
|
||||
"suffix = \"\"\"Begin!\n",
|
||||
" {chat_history}\n",
|
||||
" Question: {input}\n",
|
||||
" {agent_scratchpad}\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = ZeroShotAgent.create_prompt(\n",
|
||||
" tools,\n",
|
||||
" prefix=prefix,\n",
|
||||
" suffix=suffix,\n",
|
||||
" input_variables=[\"input\", \"chat_history\", \"agent_scratchpad\"],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"llm = NIBittensorLLM(system_prompt=\"Your task is to determine response based on user prompt\")\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
|
||||
"\n",
|
||||
"memory = ConversationBufferMemory(memory_key=\"chat_history\")\n",
|
||||
"\n",
|
||||
"agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)\n",
|
||||
"agent_chain = AgentExecutor.from_agent_and_tools(\n",
|
||||
" agent=agent, tools=tools, verbose=True, memory=memory\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"response = agent_chain.run(input=prompt)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
78
docs/extras/integrations/llms/deepsparse.ipynb
Normal file
@@ -0,0 +1,78 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "15d7ce70-8879-42a0-86d9-a3d604a3ec83",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# DeepSparse\n",
|
||||
"\n",
|
||||
"This page covers how to use the [DeepSparse](https://github.com/neuralmagic/deepsparse) inference runtime within LangChain.\n",
|
||||
"It is broken into two parts: installation and setup, and then examples of DeepSparse usage.\n",
|
||||
"\n",
|
||||
"## Installation and Setup\n",
|
||||
"\n",
|
||||
"- Install the Python package with `pip install deepsparse`\n",
|
||||
"- Choose a [SparseZoo model](https://sparsezoo.neuralmagic.com/?useCase=text_generation) or export a support model to ONNX [using Optimum](https://github.com/neuralmagic/notebooks/blob/main/notebooks/opt-text-generation-deepsparse-quickstart/OPT_Text_Generation_DeepSparse_Quickstart.ipynb)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"There exists a DeepSparse LLM wrapper, that provides a unified interface for all models:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "79d24d37-737a-428c-b6c5-84c1633070d7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import DeepSparse\n",
|
||||
"\n",
|
||||
"llm = DeepSparse(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none')\n",
|
||||
"\n",
|
||||
"print(llm('def fib():'))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ea7ea674-d6b0-49d9-9c2b-014032973be6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Additional parameters can be passed using the `config` parameter:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ff61b845-41e6-4457-8625-6e21a11bfe7c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"config = {'max_generated_tokens': 256}\n",
|
||||
"\n",
|
||||
"llm = DeepSparse(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none', config=config)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -30,6 +31,14 @@
|
||||
"%pip install gpt4all > /dev/null"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Import GPT4All"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
@@ -43,6 +52,14 @@
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Set Up Question to pass to LLM"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
@@ -59,6 +76,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -66,18 +84,14 @@
|
||||
"\n",
|
||||
"To run locally, download a compatible ggml-formatted model. \n",
|
||||
" \n",
|
||||
"**Download option 1**: The [gpt4all page](https://gpt4all.io/index.html) has a useful `Model Explorer` section:\n",
|
||||
"The [gpt4all page](https://gpt4all.io/index.html) has a useful `Model Explorer` section:\n",
|
||||
"\n",
|
||||
"* Select a model of interest\n",
|
||||
"* Download using the UI and move the `.bin` to the `local_path` (noted below)\n",
|
||||
"\n",
|
||||
"For more info, visit https://github.com/nomic-ai/gpt4all.\n",
|
||||
"\n",
|
||||
"--- \n",
|
||||
"\n",
|
||||
"**Download option 2**: Uncomment the below block to download a model. \n",
|
||||
"\n",
|
||||
"* You may want to update `url` to a new version, whih can be browsed using the [gpt4all page](https://gpt4all.io/index.html)."
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -88,27 +102,7 @@
|
||||
"source": [
|
||||
"local_path = (\n",
|
||||
" \"./models/ggml-gpt4all-l13b-snoozy.bin\" # replace with your desired local file path\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# import requests\n",
|
||||
"\n",
|
||||
"# from pathlib import Path\n",
|
||||
"# from tqdm import tqdm\n",
|
||||
"\n",
|
||||
"# Path(local_path).parent.mkdir(parents=True, exist_ok=True)\n",
|
||||
"\n",
|
||||
"# # Example model. Check https://github.com/nomic-ai/gpt4all for the latest models.\n",
|
||||
"# url = 'http://gpt4all.io/models/ggml-gpt4all-l13b-snoozy.bin'\n",
|
||||
"\n",
|
||||
"# # send a GET request to the URL to download the file. Stream since it's large\n",
|
||||
"# response = requests.get(url, stream=True)\n",
|
||||
"\n",
|
||||
"# # open the file in binary mode and write the contents of the response to it in chunks\n",
|
||||
"# # This is a large file, so be prepared to wait.\n",
|
||||
"# with open(local_path, 'wb') as f:\n",
|
||||
"# for chunk in tqdm(response.iter_content(chunk_size=8192)):\n",
|
||||
"# if chunk:\n",
|
||||
"# f.write(chunk)"
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -147,6 +141,14 @@
|
||||
"\n",
|
||||
"llm_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Justin Bieber was born on March 1, 1994. In 1994, The Cowboys won Super Bowl XXVIII."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -32,7 +32,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"id": "d772b637-de00-4663-bd77-9bc96d798db2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -135,7 +135,7 @@
|
||||
"id": "4c16fded-70d1-42af-8bfa-6ddda9f0bc63",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Flan, by Google"
|
||||
"### `Flan`, by `Google`"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -178,7 +178,7 @@
|
||||
"id": "1a5c97af-89bc-4e59-95c1-223742a9160b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Dolly, by Databricks\n",
|
||||
"### `Dolly`, by `Databricks`\n",
|
||||
"\n",
|
||||
"See [Databricks](https://huggingface.co/databricks) organization page for a list of available models."
|
||||
]
|
||||
@@ -225,14 +225,14 @@
|
||||
"id": "03f6ae52-b5f9-4de6-832c-551cb3fa11ae",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Camel, by Writer\n",
|
||||
"### `Camel`, by `Writer`\n",
|
||||
"\n",
|
||||
"See [Writer's](https://huggingface.co/Writer) organization page for a list of available models."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 11,
|
||||
"id": "257a091d-750b-4910-ac08-fe1c7b3fd98b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -261,7 +261,7 @@
|
||||
"id": "2bf838eb-1083-402f-b099-b07c452418c8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### XGen, by Salesforce\n",
|
||||
"### `XGen`, by `Salesforce`\n",
|
||||
"\n",
|
||||
"See [more information](https://github.com/salesforce/xgen)."
|
||||
]
|
||||
@@ -295,7 +295,7 @@
|
||||
"id": "0aca9f9e-f333-449c-97b2-10d1dbf17e75",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Falcon, by Technology Innovation Institute (TII)\n",
|
||||
"### `Falcon`, by `Technology Innovation Institute (TII)`\n",
|
||||
"\n",
|
||||
"See [more information](https://huggingface.co/tiiuae/falcon-40b)."
|
||||
]
|
||||
@@ -323,6 +323,86 @@
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"print(llm_chain.run(question))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7e15849b-5561-4bb9-86ec-6412ca10196a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### `InternLM-Chat`, by `Shanghai AI Laboratory`\n",
|
||||
"\n",
|
||||
"See [more information](https://huggingface.co/internlm/internlm-7b)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "3b533461-59f8-406e-907b-000841fa60a7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"repo_id = \"internlm/internlm-chat-7b\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c71210b9-5895-41a2-889a-f430d22fa1aa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = HuggingFaceHub(\n",
|
||||
" repo_id=repo_id, model_kwargs={\"max_length\": 128, \"temperature\": 0.8}\n",
|
||||
")\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"print(llm_chain.run(question))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4f2e5132-1713-42d7-919a-8c313744ce95",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### `Qwen`, by `Alibaba Cloud`\n",
|
||||
"\n",
|
||||
">`Tongyi Qianwen-7B` (`Qwen-7B`) is a model with a scale of 7 billion parameters in the `Tongyi Qianwen` large model series developed by `Alibaba Cloud`. `Qwen-7B` is a large language model based on Transformer, which is trained on ultra-large-scale pre-training data.\n",
|
||||
"\n",
|
||||
"See [more information on HuggingFace](https://huggingface.co/Qwen/Qwen-7B) of on [GitHub](https://github.com/QwenLM/Qwen-7B).\n",
|
||||
"\n",
|
||||
"See here a [big example for LangChain integration and Qwen](https://github.com/QwenLM/Qwen-7B/blob/main/examples/langchain_tooluse.ipynb)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "f598b1ca-77c7-40f1-a83f-c21ea9910c88",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"repo_id = \"Qwen/Qwen-7B\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2c97f4e2-d401-44fb-9da7-b60b2e2cc663",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = HuggingFaceHub(\n",
|
||||
" repo_id=repo_id, model_kwargs={\"max_length\": 128, \"temperature\": 0.5}\n",
|
||||
")\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"print(llm_chain.run(question))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1dd67c1e-1efc-4def-bde4-2e5265725303",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -341,7 +421,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -21,19 +21,19 @@
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"To use, you should have the ``transformers`` python [package installed](https://pypi.org/project/transformers/)."
|
||||
"To use, you should have the ``transformers`` python [package installed](https://pypi.org/project/transformers/), as well as [pytorch](https://pytorch.org/get-started/locally/). You can also install `xformer` for a more memory-efficient attention implementation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"id": "d772b637-de00-4663-bd77-9bc96d798db2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install transformers > /dev/null"
|
||||
"%pip install transformers --quiet"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -46,22 +46,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 6,
|
||||
"id": "165ae236-962a-4763-8052-c4836d78a5d2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"WARNING:root:Failed to default session, using empty session: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /sessions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1117f9790>: Failed to establish a new connection: [Errno 61] Connection refused'))\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import HuggingFacePipeline\n",
|
||||
"from langchain.llms import HuggingFacePipeline\n",
|
||||
"\n",
|
||||
"llm = HuggingFacePipeline.from_model_id(\n",
|
||||
" model_id=\"bigscience/bloom-1b7\",\n",
|
||||
@@ -75,24 +67,18 @@
|
||||
"id": "00104b27-0c15-4a97-b198-4512337ee211",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Integrate the model in an LLMChain"
|
||||
"### Create Chain\n",
|
||||
"\n",
|
||||
"With the model loaded into memory, you can compose it with a prompt to\n",
|
||||
"form a chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 7,
|
||||
"id": "3acf0069",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/wfh/code/lc/lckg/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:1288: UserWarning: Using `max_length`'s default (64) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.\n",
|
||||
" warnings.warn(\n",
|
||||
"WARNING:root:Failed to persist run: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /chain-runs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x144d06910>: Failed to establish a new connection: [Errno 61] Connection refused'))\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
@@ -102,27 +88,19 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"template = \"\"\"Question: {question}\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step.\"\"\"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
|
||||
"prompt = PromptTemplate.from_template(template)\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"chain = prompt | llm\n",
|
||||
"\n",
|
||||
"question = \"What is electroencephalography?\"\n",
|
||||
"\n",
|
||||
"print(llm_chain.run(question))"
|
||||
"print(chain.invoke({\"question\": question}))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "843a3837",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -74,7 +74,7 @@
|
||||
" typical_p=0.95,\n",
|
||||
" temperature=0.01,\n",
|
||||
" repetition_penalty=1.03,\n",
|
||||
" stream=True\n",
|
||||
" streaming=True\n",
|
||||
")\n",
|
||||
"llm(\"What did foo say about bar?\", callbacks=[StreamingStdOutCallbackHandler()])"
|
||||
]
|
||||
|
||||
@@ -241,7 +241,9 @@
|
||||
"# Make sure the model path is correct for your system!\n",
|
||||
"llm = LlamaCpp(\n",
|
||||
" model_path=\"/Users/rlm/Desktop/Code/llama/llama-2-7b-ggml/llama-2-7b-chat.ggmlv3.q4_0.bin\",\n",
|
||||
" input={\"temperature\": 0.75, \"max_length\": 2000, \"top_p\": 1},\n",
|
||||
" temperature=0.75,\n",
|
||||
" max_tokens=2000,\n",
|
||||
" top_p=1,\n",
|
||||
" callback_manager=callback_manager,\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
|
||||
@@ -63,7 +63,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = MosaicML(inject_instruction_format=True, model_kwargs={\"do_sample\": False})"
|
||||
"llm = MosaicML(inject_instruction_format=True, model_kwargs={\"max_new_tokens\": 128})"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -19,8 +19,29 @@
|
||||
"First, follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance:\n",
|
||||
"\n",
|
||||
"* [Download](https://ollama.ai/download)\n",
|
||||
"* Fetch a model, e.g., `Llama-7b`: `ollama pull llama2`\n",
|
||||
"* Run `ollama run llama2`\n",
|
||||
"* Fetch a model via `ollama pull <model family>`\n",
|
||||
"* e.g., for `Llama-7b`: `ollama pull llama2` (see full list [here](https://github.com/jmorganca/ollama))\n",
|
||||
"* This will download the most basic version of the model typically (e.g., smallest # parameters and `q4_0`)\n",
|
||||
"* On Mac, it will download to \n",
|
||||
"\n",
|
||||
"`~/.ollama/models/manifests/registry.ollama.ai/library/<model family>/latest`\n",
|
||||
"\n",
|
||||
"* And we specify a particular version, e.g., for `ollama pull vicuna:13b-v1.5-16k-q4_0`\n",
|
||||
"* The file is here with the model version in place of `latest`\n",
|
||||
"\n",
|
||||
"`~/.ollama/models/manifests/registry.ollama.ai/library/vicuna/13b-v1.5-16k-q4_0`\n",
|
||||
"\n",
|
||||
"You can easily access models in a few ways:\n",
|
||||
"\n",
|
||||
"1/ if the app is running:\n",
|
||||
"* All of your local models are automatically served on `localhost:11434`\n",
|
||||
"* Select your model when setting `llm = Ollama(..., model=\"<model family>:<version>\")`\n",
|
||||
"* If you set `llm = Ollama(..., model=\"<model family\")` withoout a version it will simply look for `latest`\n",
|
||||
"\n",
|
||||
"2/ if building from source or just running the binary: \n",
|
||||
"* Then you must run `ollama serve`\n",
|
||||
"* All of your local models are automatically served on `localhost:11434`\n",
|
||||
"* Then, select as shown above\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Usage\n",
|
||||
|
||||
214
docs/extras/integrations/llms/promptguard.ipynb
Normal file
@@ -0,0 +1,214 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PromptGuard\n",
|
||||
"\n",
|
||||
"[PromptGuard](https://promptguard.readthedocs.io/en/latest/) is a service that enables applications to leverage the power of language models without compromising user privacy. Designed for composability and ease of integration into existing applications and services, PromptGuard is consumable via a simple Python library as well as through LangChain. Perhaps more importantly, PromptGuard leverages the power of [confidential computing](https://en.wikipedia.org/wiki/Confidential_computing) to ensure that even the PromptGuard service itself cannot access the data it is protecting.\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"This notebook goes over how to use LangChain to interact with `PromptGuard`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# install the promptguard and langchain packages\n",
|
||||
"! pip install promptguard langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Accessing the PromptGuard API requires an API key, which you can get by creating an account on [the PromptGuard website](https://promptguard.opaque.co/). Once you have an account, you can find your API key on [the API Keys page](https://promptguard.opaque.co/api-keys)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"# Set API keys\n",
|
||||
"\n",
|
||||
"os.environ['PROMPTGUARD_API_KEY'] = \"<PROMPTGUARD_API_KEY>\"\n",
|
||||
"os.environ['OPENAI_API_KEY'] = \"<OPENAI_API_KEY>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Use PromptGuard LLM Wrapper\n",
|
||||
"\n",
|
||||
"Applying promptguard to your application could be as simple as wrapping your LLM using the PromptGuard class by replace `llm=OpenAI()` with `llm=PromptGuard(base_llm=OpenAI())`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import langchain\n",
|
||||
"from langchain import LLMChain, PromptTemplate\n",
|
||||
"from langchain.callbacks.stdout import StdOutCallbackHandler\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.memory import ConversationBufferWindowMemory\n",
|
||||
"\n",
|
||||
"from langchain.llms import PromptGuard\n",
|
||||
"\n",
|
||||
"langchain.verbose = True\n",
|
||||
"langchain.debug = True\n",
|
||||
"\n",
|
||||
"prompt_template = \"\"\"\n",
|
||||
"As an AI assistant, you will answer questions according to given context.\n",
|
||||
"\n",
|
||||
"Sensitive personal information in the question is masked for privacy.\n",
|
||||
"For instance, if the original text says \"Giana is good,\" it will be changed\n",
|
||||
"to \"PERSON_998 is good.\" \n",
|
||||
"\n",
|
||||
"Here's how to handle these changes:\n",
|
||||
"* Consider these masked phrases just as placeholders, but still refer to\n",
|
||||
"them in a relevant way when answering.\n",
|
||||
"* It's possible that different masked terms might mean the same thing.\n",
|
||||
"Stick with the given term and don't modify it.\n",
|
||||
"* All masked terms follow the \"TYPE_ID\" pattern.\n",
|
||||
"* Please don't invent new masked terms. For instance, if you see \"PERSON_998,\"\n",
|
||||
"don't come up with \"PERSON_997\" or \"PERSON_999\" unless they're already in the question.\n",
|
||||
"\n",
|
||||
"Conversation History: ```{history}```\n",
|
||||
"Context : ```During our recent meeting on February 23, 2023, at 10:30 AM,\n",
|
||||
"John Doe provided me with his personal details. His email is johndoe@example.com\n",
|
||||
"and his contact number is 650-456-7890. He lives in New York City, USA, and\n",
|
||||
"belongs to the American nationality with Christian beliefs and a leaning towards\n",
|
||||
"the Democratic party. He mentioned that he recently made a transaction using his\n",
|
||||
"credit card 4111 1111 1111 1111 and transferred bitcoins to the wallet address\n",
|
||||
"1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. While discussing his European travels, he noted\n",
|
||||
"down his IBAN as GB29 NWBK 6016 1331 9268 19. Additionally, he provided his website\n",
|
||||
"as https://johndoeportfolio.com. John also discussed some of his US-specific details.\n",
|
||||
"He said his bank account number is 1234567890123456 and his drivers license is Y12345678.\n",
|
||||
"His ITIN is 987-65-4321, and he recently renewed his passport, the number for which is\n",
|
||||
"123456789. He emphasized not to share his SSN, which is 123-45-6789. Furthermore, he\n",
|
||||
"mentioned that he accesses his work files remotely through the IP 192.168.1.1 and has\n",
|
||||
"a medical license number MED-123456. ```\n",
|
||||
"Question: ```{question}```\n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"chain = LLMChain(\n",
|
||||
" prompt=PromptTemplate.from_template(prompt_template),\n",
|
||||
" llm=PromptGuard(base_llm=OpenAI()),\n",
|
||||
" memory=ConversationBufferWindowMemory(k=2),\n",
|
||||
" verbose=True,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"print(\n",
|
||||
" chain.run(\n",
|
||||
" {\"question\": \"\"\"Write a message to remind John to do password reset for his website to stay secure.\"\"\"},\n",
|
||||
" callbacks=[StdOutCallbackHandler()],\n",
|
||||
" )\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"From the output, you can see the following context from user input has sensitive data.\n",
|
||||
"\n",
|
||||
"``` \n",
|
||||
"# Context from user input\n",
|
||||
"\n",
|
||||
"During our recent meeting on February 23, 2023, at 10:30 AM, John Doe provided me with his personal details. His email is johndoe@example.com and his contact number is 650-456-7890. He lives in New York City, USA, and belongs to the American nationality with Christian beliefs and a leaning towards the Democratic party. He mentioned that he recently made a transaction using his credit card 4111 1111 1111 1111 and transferred bitcoins to the wallet address 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. While discussing his European travels, he noted down his IBAN as GB29 NWBK 6016 1331 9268 19. Additionally, he provided his website as https://johndoeportfolio.com. John also discussed some of his US-specific details. He said his bank account number is 1234567890123456 and his drivers license is Y12345678. His ITIN is 987-65-4321, and he recently renewed his passport, the number for which is 123456789. He emphasized not to share his SSN, which is 669-45-6789. Furthermore, he mentioned that he accesses his work files remotely through the IP 192.168.1.1 and has a medical license number MED-123456.\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"PromptGuard will automatically detect the sensitive data and replace it with a placeholder. \n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"# Context after PromptGuard\n",
|
||||
"\n",
|
||||
"During our recent meeting on DATE_TIME_3, at DATE_TIME_2, PERSON_3 provided me with his personal details. His email is EMAIL_ADDRESS_1 and his contact number is PHONE_NUMBER_1. He lives in LOCATION_3, LOCATION_2, and belongs to the NRP_3 nationality with NRP_2 beliefs and a leaning towards the Democratic party. He mentioned that he recently made a transaction using his credit card CREDIT_CARD_1 and transferred bitcoins to the wallet address CRYPTO_1. While discussing his NRP_1 travels, he noted down his IBAN as IBAN_CODE_1. Additionally, he provided his website as URL_1. PERSON_2 also discussed some of his LOCATION_1-specific details. He said his bank account number is US_BANK_NUMBER_1 and his drivers license is US_DRIVER_LICENSE_2. His ITIN is US_ITIN_1, and he recently renewed his passport, the number for which is DATE_TIME_1. He emphasized not to share his SSN, which is US_SSN_1. Furthermore, he mentioned that he accesses his work files remotely through the IP IP_ADDRESS_1 and has a medical license number MED-US_DRIVER_LICENSE_1.\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Placeholder is used in the LLM response.\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"# response returned by LLM\n",
|
||||
"\n",
|
||||
"Hey PERSON_1, just wanted to remind you to do a password reset for your website URL_1 through your email EMAIL_ADDRESS_1. It's important to stay secure online, so don't forget to do it!\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Response is desanitized by replacing the placeholder with the original sensitive data.\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"# desanitized LLM response from PromptGuard\n",
|
||||
"\n",
|
||||
"Hey John, just wanted to remind you to do a password reset for your website https://johndoeportfolio.com through your email johndoe@example.com. It's important to stay secure online, so don't forget to do it!\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Use PromptGuard in LangChain expression\n",
|
||||
"\n",
|
||||
"There are functions that can be used with LangChain expression as well if a drop-in replacement doesn't offer the flexibility you need. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import langchain.utilities.promptguard as pgf\n",
|
||||
"from langchain.schema.runnable import RunnableMap\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"prompt=PromptTemplate.from_template(prompt_template), \n",
|
||||
"llm = OpenAI()\n",
|
||||
"pg_chain = (\n",
|
||||
" pgf.sanitize\n",
|
||||
" | RunnableMap(\n",
|
||||
" {\n",
|
||||
" \"response\": (lambda x: x[\"sanitized_input\"])\n",
|
||||
" | prompt\n",
|
||||
" | llm\n",
|
||||
" | StrOutputParser(),\n",
|
||||
" \"secure_context\": lambda x: x[\"secure_context\"],\n",
|
||||
" }\n",
|
||||
" )\n",
|
||||
" | (lambda x: pgf.desanitize(x[\"response\"], x[\"secure_context\"]))\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"pg_chain.invoke({\"question\": \"Write a text message to remind John to do password reset for his website through his email to stay secure.\", \"history\": \"\"})"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "langchain",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python",
|
||||
"version": "3.10.10"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,80 +1,83 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9597802c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Nebula\n",
|
||||
"# Nebula (Symbl.ai)\n",
|
||||
"[Nebula](https://symbl.ai/nebula/) is a large language model (LLM) built by [Symbl.ai](https://symbl.ai). It is trained to perform generative tasks on human conversations. Nebula excels at modeling the nuanced details of a conversation and performing tasks on the conversation.\n",
|
||||
"\n",
|
||||
"[Nebula](https://symbl.ai/nebula/) is a fully-managed Conversation platform, on which you can build, deploy, and manage scalable AI applications.\n",
|
||||
"Nebula documentation: https://docs.symbl.ai/docs/nebula-llm\n",
|
||||
"\n",
|
||||
"This example goes over how to use LangChain to interact with the [Nebula platform](https://docs.symbl.ai/docs/nebula-llm-overview). \n",
|
||||
"\n",
|
||||
"It will send the requests to Nebula Service endpoint, which concatenates `SYMBLAI_NEBULA_SERVICE_URL` and `SYMBLAI_NEBULA_SERVICE_PATH`, with a token defined in `SYMBLAI_NEBULA_SERVICE_TOKEN`"
|
||||
]
|
||||
"This example goes over how to use LangChain to interact with the [Nebula platform](https://docs.symbl.ai/docs/nebula-llm)."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"id": "bb8cd830db4a004e"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f15ebe0d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Integrate with a LLMChain"
|
||||
]
|
||||
"Make sure you have API Key with you. If you don't have one please [request one](https://info.symbl.ai/Nebula_Private_Beta.html)."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"id": "519570b6539aa18c"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5472a7cd-af26-48ca-ae9b-5f6ae73c74d2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from langchain.llms.symblai_nebula import Nebula\n",
|
||||
"\n",
|
||||
"os.environ[\"NEBULA_SERVICE_URL\"] = NEBULA_SERVICE_URL\n",
|
||||
"os.environ[\"NEBULA_SERVICE_PATH\"] = NEBULA_SERVICE_PATH\n",
|
||||
"os.environ[\"NEBULA_SERVICE_API_KEY\"] = NEBULA_SERVICE_API_KEY"
|
||||
]
|
||||
"llm = Nebula(nebula_api_key='<your_api_key>')"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"id": "9f47bef45880aece"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6fb585dd",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"from langchain.llms import OpenLLM\n",
|
||||
"\n",
|
||||
"llm = OpenLLM(\n",
|
||||
" conversation=\"<Drop your text conversation that you want to ask Nebula to analyze here>\",\n",
|
||||
")"
|
||||
]
|
||||
"Use a conversation transcript and instruction to construct a prompt."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"id": "88c6a516ef51c74b"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "035dea0f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"\n",
|
||||
"template = \"Identify the {count} main objectives or goals mentioned in this context concisely in less points. Emphasize on key intents.\"\n",
|
||||
"conversation = \"\"\"Sam: Good morning, team! Let's keep this standup concise. We'll go in the usual order: what you did yesterday, what you plan to do today, and any blockers. Alex, kick us off.\n",
|
||||
"Alex: Morning! Yesterday, I wrapped up the UI for the user dashboard. The new charts and widgets are now responsive. I also had a sync with the design team to ensure the final touchups are in line with the brand guidelines. Today, I'll start integrating the frontend with the new API endpoints Rhea was working on. The only blocker is waiting for some final API documentation, but I guess Rhea can update on that.\n",
|
||||
"Rhea: Hey, all! Yep, about the API documentation - I completed the majority of the backend work for user data retrieval yesterday. The endpoints are mostly set up, but I need to do a bit more testing today. I'll finalize the API documentation by noon, so that should unblock Alex. After that, I’ll be working on optimizing the database queries for faster data fetching. No other blockers on my end.\n",
|
||||
"Sam: Great, thanks Rhea. Do reach out if you need any testing assistance or if there are any hitches with the database. Now, my update: Yesterday, I coordinated with the client to get clarity on some feature requirements. Today, I'll be updating our project roadmap and timelines based on their feedback. Additionally, I'll be sitting with the QA team in the afternoon for preliminary testing. Blocker: I might need both of you to be available for a quick call in case the client wants to discuss the changes live.\n",
|
||||
"Alex: Sounds good, Sam. Just let us know a little in advance for the call.\n",
|
||||
"Rhea: Agreed. We can make time for that.\n",
|
||||
"Sam: Perfect! Let's keep the momentum going. Reach out if there are any sudden issues or support needed. Have a productive day!\n",
|
||||
"Alex: You too.\n",
|
||||
"Rhea: Thanks, bye!\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"count\"])\n",
|
||||
"instruction = \"Identify the main objectives mentioned in this conversation.\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate.from_template(\"{instruction}\\n{conversation}\")\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"\n",
|
||||
"generated = llm_chain.run(count=\"five\")\n",
|
||||
"print(generated)"
|
||||
]
|
||||
"llm_chain.run(instruction=instruction, conversation=conversation)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"id": "5977ccc2d4432624"
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -26,7 +26,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
@@ -61,6 +61,71 @@
|
||||
"\n",
|
||||
"llm_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Streaming Version"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You should install websocket-client to use this feature.\n",
|
||||
"`pip install websocket-client`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_url = \"ws://localhost:5005\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import langchain\n",
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"from langchain.llms import TextGen\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
||||
"\n",
|
||||
"langchain.debug = True\n",
|
||||
"\n",
|
||||
"template = \"\"\"Question: {question}\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step.\"\"\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
|
||||
"llm = TextGen(model_url=model_url, streaming=True, callbacks=[StreamingStdOutCallbackHandler()])\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"question = \"What NFL team won the Super Bowl in the year Justin Bieber was born?\"\n",
|
||||
"\n",
|
||||
"llm_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = TextGen(\n",
|
||||
" model_url = model_url,\n",
|
||||
" streaming=True\n",
|
||||
")\n",
|
||||
"for chunk in llm.stream(\"Ask 'Hi, how are you?' like a pirate:'\",\n",
|
||||
" stop=[\"'\",\"\\n\"]):\n",
|
||||
" print(chunk, end='', flush=True)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -79,7 +144,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.7"
|
||||
"version": "3.10.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -170,6 +170,51 @@
|
||||
"\n",
|
||||
"llm(\"What is the future of AI?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "64e89be0-6ad7-43a8-9dac-1324dcd4e851",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## OpenAI-Compatible Server\n",
|
||||
"\n",
|
||||
"vLLM can be deployed as a server that mimics the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API.\n",
|
||||
"\n",
|
||||
"This server can be queried in the same format as OpenAI API.\n",
|
||||
"\n",
|
||||
"### OpenAI-Compatible Completion"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "c3cbc428-0bb8-422a-913e-1c6fef8b89d4",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" a city that is filled with history, ancient buildings, and art around every corner\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import VLLMOpenAI\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"llm = VLLMOpenAI(\n",
|
||||
" openai_api_key=\"EMPTY\",\n",
|
||||
" openai_api_base=\"http://localhost:8000/v1\",\n",
|
||||
" model_name=\"tiiuae/falcon-7b\",\n",
|
||||
" model_kwargs={\"stop\": [\".\"]}\n",
|
||||
")\n",
|
||||
"print(llm(\"Rome is\"))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
326
docs/extras/integrations/memory/xata_chat_message_history.ipynb
Normal file
@@ -0,0 +1,326 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Xata chat memory\n",
|
||||
"\n",
|
||||
"[Xata](https://xata.io) is a serverless data platform, based on PostgreSQL and Elasticsearch. It provides a Python SDK for interacting with your database, and a UI for managing your data. With the `XataChatMessageHistory` class, you can use Xata databases for longer-term persistence of chat sessions.\n",
|
||||
"\n",
|
||||
"This notebook covers:\n",
|
||||
"\n",
|
||||
"* A simple example showing what `XataChatMessageHistory` does.\n",
|
||||
"* A more complex example using a REACT agent that answer questions based on a knowledge based or documentation (stored in Xata as a vector store) and also having a long-term searchable history of its past messages (stored in Xata as a memory store)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"### Create a database\n",
|
||||
"\n",
|
||||
"In the [Xata UI](https://app.xata.io) create a new database. You can name it whatever you want, in this notepad we'll use `langchain`. The Langchain integration can auto-create the table used for storying the memory, and this is what we'll use in this example. If you want to pre-create the table, ensure it has the right schema and set `create_table` to `False` when creating the class. Pre-creating the table saves one round-trip to the database during each session initialization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's first install our dependencies:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install xata==1.0.0rc0 openai langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, we need to get the environment variables for Xata. You can create a new API key by visiting your [account settings](https://app.xata.io/settings). To find the database URL, go to the Settings page of the database that you have created. The database URL should look something like this: `https://demo-uni3q8.eu-west-1.xata.sh/db/langchain`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"api_key = getpass.getpass(\"Xata API key: \")\n",
|
||||
"db_url = input(\"Xata database URL (copy it from your DB settings):\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create a simple memory store\n",
|
||||
"\n",
|
||||
"To test the memory store functionality in isolation, let's use the following code snippet:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory import XataChatMessageHistory\n",
|
||||
"\n",
|
||||
"history = XataChatMessageHistory(\n",
|
||||
" session_id=\"session-1\",\n",
|
||||
" api_key=api_key,\n",
|
||||
" db_url=db_url,\n",
|
||||
" table_name=\"memory\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"history.add_user_message(\"hi!\")\n",
|
||||
"\n",
|
||||
"history.add_ai_message(\"whats up?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The above code creates a session with the ID `session-1` and stores two messages in it. After running the above, if you visit the Xata UI, you should see a table named `memory` and the two messages added to it.\n",
|
||||
"\n",
|
||||
"You can retrieve the message history for a particular session with the following code:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"history.messages"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Conversational Q&A chain on your data with memory\n",
|
||||
"\n",
|
||||
"Let's now see a more complex example in which we combine OpenAI, the Xata Vector Store integration, and the Xata memory store integration to create a Q&A chat bot on your data, with follow-up questions and history."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We're going to need to access the OpenAI API, so let's configure the API key:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To store the documents that the chatbot will search for answers, add a table named `docs` to your `langchain` database using the Xata UI, and add the following columns:\n",
|
||||
"\n",
|
||||
"* `content` of type \"Text\". This is used to store the `Document.pageContent` values.\n",
|
||||
"* `embedding` of type \"Vector\". Use the dimension used by the model you plan to use. In this notebook we use OpenAI embeddings, which have 1536 dimensions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's create the vector store and add some sample docs to it:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.vectorstores.xata import XataVectorStore\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
"texts = [\n",
|
||||
" \"Xata is a Serverless Data platform based on PostgreSQL\",\n",
|
||||
" \"Xata offers a built-in vector type that can be used to store and query vectors\",\n",
|
||||
" \"Xata includes similarity search\"\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"vector_store = XataVectorStore.from_texts(texts, embeddings, api_key=api_key, db_url=db_url, table_name=\"docs\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"After running the above command, if you go to the Xata UI, you should see the documents loaded together with their embeddings in the `docs` table."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's now create a ConversationBufferMemory to store the chat messages from both the user and the AI."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory import ConversationBufferMemory\n",
|
||||
"from uuid import uuid4\n",
|
||||
"\n",
|
||||
"chat_memory = XataChatMessageHistory(\n",
|
||||
" session_id=str(uuid4()), # needs to be unique per user session\n",
|
||||
" api_key=api_key,\n",
|
||||
" db_url=db_url,\n",
|
||||
" table_name=\"memory\"\n",
|
||||
")\n",
|
||||
"memory = ConversationBufferMemory(memory_key=\"chat_history\", chat_memory=chat_memory, return_messages=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now it's time to create an Agent to use both the vector store and the chat memory together."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import initialize_agent, AgentType\n",
|
||||
"from langchain.agents.agent_toolkits import create_retriever_tool\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"tool = create_retriever_tool(\n",
|
||||
" vector_store.as_retriever(), \n",
|
||||
" \"search_docs\",\n",
|
||||
" \"Searches and returns documents from the Xata manual. Useful when you need to answer questions about Xata.\"\n",
|
||||
")\n",
|
||||
"tools = [tool]\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools,\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,\n",
|
||||
" verbose=True,\n",
|
||||
" memory=memory)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To test, let's tell the agent our name:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent.run(input=\"My name is bob\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's now ask the agent some questions about Xata:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent.run(input=\"What is xata?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice that it answers based on the data stored in the document store. And now, let's ask a follow up question:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent.run(input=\"Does it support similarity search?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And now let's test its memory:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent.run(input=\"Did I tell you my name? What is it?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,4 +1,4 @@
|
||||
# Deep Lake
|
||||
# Activeloop Deep Lake
|
||||
This page covers how to use the Deep Lake ecosystem within LangChain.
|
||||
|
||||
## Why Deep Lake?
|
||||
@@ -6,9 +6,15 @@ This page covers how to use the Deep Lake ecosystem within LangChain.
|
||||
- Not only stores embeddings, but also the original data with automatic version control.
|
||||
- Truly serverless. Doesn't require another service and can be used with major cloud providers (AWS S3, GCS, etc.)
|
||||
|
||||
|
||||
Activeloop Deep Lake supports SelfQuery Retrieval:
|
||||
[Activeloop Deep Lake Self Query Retrieval](/docs/extras/modules/data_connection/retrievers/self_query/activeloop_deeplake_self_query)
|
||||
|
||||
|
||||
## More Resources
|
||||
1. [Ultimate Guide to LangChain & Deep Lake: Build ChatGPT to Answer Questions on Your Financial Data](https://www.activeloop.ai/resources/ultimate-guide-to-lang-chain-deep-lake-build-chat-gpt-to-answer-questions-on-your-financial-data/)
|
||||
2. [Twitter the-algorithm codebase analysis with Deep Lake](../use_cases/code/twitter-the-algorithm-analysis-deeplake.html)
|
||||
2. [Twitter the-algorithm codebase analysis with Deep Lake](/docs/use_cases/question_answering/how_to/code/twitter-the-algorithm-analysis-deeplake)
|
||||
4. [Code Understanding](/docs/modules/data_connection/retrievers/self_query/activeloop_deeplake_self_query)
|
||||
3. Here is [whitepaper](https://www.deeplake.ai/whitepaper) and [academic paper](https://arxiv.org/pdf/2209.10785.pdf) for Deep Lake
|
||||
4. Here is a set of additional resources available for review: [Deep Lake](https://github.com/activeloopai/deeplake), [Get started](https://docs.activeloop.ai/getting-started) and [Tutorials](https://docs.activeloop.ai/hub-tutorials)
|
||||
|
||||
@@ -27,4 +33,4 @@ from langchain.vectorstores import DeepLake
|
||||
```
|
||||
|
||||
|
||||
For a more detailed walkthrough of the Deep Lake wrapper, see [this notebook](/docs/integrations/vectorstores/deeplake.html)
|
||||
For a more detailed walkthrough of the Deep Lake wrapper, see [this notebook](/docs/integrations/vectorstores/activeloop_deeplake)
|
||||
@@ -13,7 +13,7 @@ pip install python-arango
|
||||
|
||||
Connect your ArangoDB Database with a Chat Model to get insights on your data.
|
||||
|
||||
See the notebook example [here](/docs/use_cases/graph/graph_arangodb_qa.html).
|
||||
See the notebook example [here](/docs/use_cases/more/graph/graph_arangodb_qa.html).
|
||||
|
||||
```python
|
||||
from arango import ArangoClient
|
||||
|
||||
@@ -1,27 +1,19 @@
|
||||
# AtlasDB
|
||||
# Atlas
|
||||
|
||||
>[Nomic Atlas](https://docs.nomic.ai/index.html) is a platform for interacting with both
|
||||
> small and internet scale unstructured datasets.
|
||||
|
||||
This page covers how to use Nomic's Atlas ecosystem within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Atlas wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Install the Python package with `pip install nomic`
|
||||
- Nomic is also included in langchains poetry extras `poetry install -E all`
|
||||
|
||||
## Wrappers
|
||||
|
||||
### VectorStore
|
||||
|
||||
There exists a wrapper around the Atlas neural database, allowing you to use it as a vectorstore.
|
||||
This vectorstore also gives you full access to the underlying AtlasProject object, which will allow you to use the full range of Atlas map interactions, such as bulk tagging and automatic topic modeling.
|
||||
Please see [the Atlas docs](https://docs.nomic.ai/atlas_api.html) for more detailed information.
|
||||
- `Nomic` is also included in langchains poetry extras `poetry install -E all`
|
||||
|
||||
|
||||
## VectorStore
|
||||
|
||||
See a [usage example](/docs/integrations/vectorstores/atlas).
|
||||
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain.vectorstores import AtlasDB
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the AtlasDB wrapper, see [this notebook](/docs/integrations/vectorstores/atlas.html)
|
||||
```
|
||||
37
docs/extras/integrations/providers/bittensor.mdx
Normal file
@@ -0,0 +1,37 @@
|
||||
# NIBittensor
|
||||
|
||||
This page covers how to use the BittensorLLM inference runtime within LangChain.
|
||||
It is broken into two parts: installation and setup, and then examples of NIBittensorLLM usage.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Install the Python package with `pip install langchain`
|
||||
|
||||
## Wrappers
|
||||
|
||||
### LLM
|
||||
|
||||
There exists a NIBittensor LLM wrapper, which you can access with:
|
||||
|
||||
```python
|
||||
from langchain.llms import NIBittensorLLM
|
||||
```
|
||||
|
||||
It provides a unified interface for all models:
|
||||
|
||||
```python
|
||||
llm = NIBittensorLLM(system_prompt="Your task is to provide consice and accurate response based on user prompt")
|
||||
|
||||
print(llm('Write a fibonacci function in python with golder ratio'))
|
||||
```
|
||||
|
||||
Multiple responses from top miners can be accessible using the `top_responses` parameter:
|
||||
|
||||
```python
|
||||
multi_response_llm = NIBittensorLLM(top_responses=10)
|
||||
multi_resp = multi_response_llm("What is Neural Network Feeding Mechanism?")
|
||||
json_multi_resp = json.loads(multi_resp)
|
||||
|
||||
print(json_multi_resp)
|
||||
```
|
||||
|
||||
@@ -37,7 +37,7 @@ There is a Clarifai Embedding model in LangChain, which you can access with:
|
||||
from langchain.embeddings import ClarifaiEmbeddings
|
||||
embeddings = ClarifaiEmbeddings(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)
|
||||
```
|
||||
For more details, the docs on the Clarifai Embeddings wrapper provide a [detailed walthrough](/docs/integrations/text_embedding/clarifai.html).
|
||||
For more details, the docs on the Clarifai Embeddings wrapper provide a [detailed walkthrough](/docs/integrations/text_embedding/clarifai.html).
|
||||
|
||||
## Vectorstore
|
||||
|
||||
@@ -49,4 +49,4 @@ You an also add data directly from LangChain as well, and the auto-indexing will
|
||||
from langchain.vectorstores import Clarifai
|
||||
clarifai_vector_db = Clarifai.from_texts(user_id=USER_ID, app_id=APP_ID, texts=texts, pat=CLARIFAI_PAT, number_of_docs=NUMBER_OF_DOCS, metadatas = metadatas)
|
||||
```
|
||||
For more details, the docs on the Clarifai vector store provide a [detailed walthrough](/docs/integrations/text_embedding/clarifai.html).
|
||||
For more details, the docs on the Clarifai vector store provide a [detailed walkthrough](/docs/integrations/text_embedding/clarifai.html).
|
||||
|
||||
25
docs/extras/integrations/providers/clickhouse.mdx
Normal file
@@ -0,0 +1,25 @@
|
||||
# ClickHouse
|
||||
|
||||
> [ClickHouse](https://clickhouse.com/) is the fast and resource efficient open-source database for real-time
|
||||
> apps and analytics with full SQL support and a wide range of functions to assist users in writing analytical queries.
|
||||
> It has data structures and distance search functions (like `L2Distance`) as well as
|
||||
> [approximate nearest neighbor search indexes](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes)
|
||||
> That enables ClickHouse to be used as a high performance and scalable vector database to store and search vectors with SQL.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
We need to install `clickhouse-connect` python package.
|
||||
|
||||
```bash
|
||||
pip install clickhouse-connect
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
See a [usage example](/docs/integrations/vectorstores/clickhouse).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Clickhouse, ClickhouseSettings
|
||||
```
|
||||
|
||||
24
docs/extras/integrations/providers/dashvector.mdx
Normal file
@@ -0,0 +1,24 @@
|
||||
# DashVector
|
||||
|
||||
> [DashVector](https://help.aliyun.com/document_detail/2510225.html) is a fully-managed vectorDB service that supports high-dimension dense and sparse vectors, real-time insertion and filtered search. It is built to scale automatically and can adapt to different application requirements.
|
||||
|
||||
This document demonstrates to leverage DashVector within the LangChain ecosystem. In particular, it shows how to install DashVector, and how to use it as a VectorStore plugin in LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific DashVector wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
Install the Python SDK:
|
||||
```bash
|
||||
pip install dashvector
|
||||
```
|
||||
|
||||
## VectorStore
|
||||
|
||||
A DashVector Collection is wrapped as a familiar VectorStore for native usage within LangChain,
|
||||
which allows it to be readily used for various scenarios, such as semantic search or example selection.
|
||||
|
||||
You may import the vectorstore by:
|
||||
```python
|
||||
from langchain.vectorstores import DashVector
|
||||
```
|
||||
|
||||
For a detailed walkthrough of the DashVector wrapper, please refer to [this notebook](/docs/integrations/vectorstores/dashvector.html)
|
||||
@@ -52,7 +52,7 @@ Note that using `ddtrace-run` or `patch_all()` will also enable the `requests` a
|
||||
from ddtrace import config, patch
|
||||
|
||||
# Note: be sure to configure the integration before calling ``patch()``!
|
||||
# eg. config.langchain["logs_enabled"] = True
|
||||
# e.g. config.langchain["logs_enabled"] = True
|
||||
|
||||
patch(langchain=True)
|
||||
|
||||
|
||||
35
docs/extras/integrations/providers/deepsparse.mdx
Normal file
@@ -0,0 +1,35 @@
|
||||
# DeepSparse
|
||||
|
||||
This page covers how to use the [DeepSparse](https://github.com/neuralmagic/deepsparse) inference runtime within LangChain.
|
||||
It is broken into two parts: installation and setup, and then examples of DeepSparse usage.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Install the Python package with `pip install deepsparse`
|
||||
- Choose a [SparseZoo model](https://sparsezoo.neuralmagic.com/?useCase=text_generation) or export a support model to ONNX [using Optimum](https://github.com/neuralmagic/notebooks/blob/main/notebooks/opt-text-generation-deepsparse-quickstart/OPT_Text_Generation_DeepSparse_Quickstart.ipynb)
|
||||
|
||||
## Wrappers
|
||||
|
||||
### LLM
|
||||
|
||||
There exists a DeepSparse LLM wrapper, which you can access with:
|
||||
|
||||
```python
|
||||
from langchain.llms import DeepSparse
|
||||
```
|
||||
|
||||
It provides a unified interface for all models:
|
||||
|
||||
```python
|
||||
llm = DeepSparse(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none')
|
||||
|
||||
print(llm('def fib():'))
|
||||
```
|
||||
|
||||
Additional parameters can be passed using the `config` parameter:
|
||||
|
||||
```python
|
||||
config = {'max_generated_tokens': 256}
|
||||
|
||||
llm = DeepSparse(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none', config=config)
|
||||
```
|
||||
30
docs/extras/integrations/providers/docarray.mdx
Normal file
@@ -0,0 +1,30 @@
|
||||
# DocArray
|
||||
|
||||
> [DocArray](https://docarray.jina.ai/) is a library for nested, unstructured, multimodal data in transit,
|
||||
> including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process,
|
||||
> embed, search, recommend, store, and transfer multimodal data with a Pythonic API.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
We need to install `docarray` python package.
|
||||
|
||||
```bash
|
||||
pip install docarray
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
LangChain provides an access to the `In-memory` and `HNSW` vector stores from the `DocArray` library.
|
||||
|
||||
See a [usage example](/docs/integrations/vectorstores/docarray_hnsw).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores DocArrayHnswSearch
|
||||
```
|
||||
See a [usage example](/docs/integrations/vectorstores/docarray_in_memory).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores DocArrayInMemorySearch
|
||||
```
|
||||
|
||||