mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-07 09:40:07 +00:00
Compare commits
5 Commits
harrison/d
...
wfh/tqdm_f
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
85eca4c055 | ||
|
|
b27f9da12b | ||
|
|
4bf249f7a3 | ||
|
|
50413a9648 | ||
|
|
b3b17d76f3 |
29
.github/CONTRIBUTING.md
vendored
29
.github/CONTRIBUTING.md
vendored
@@ -23,7 +23,7 @@ It's essential that we maintain great documentation and testing. If you:
|
||||
- Update any affected example notebooks and documentation. These live in `docs`.
|
||||
- Update unit and integration tests when relevant.
|
||||
- Add a feature
|
||||
- Add a demo notebook in `docs/docs/`.
|
||||
- Add a demo notebook in `docs/modules`.
|
||||
- Add unit and integration tests.
|
||||
|
||||
We are a small, progress-oriented team. If there's something you'd like to add or change, opening a pull request is the
|
||||
@@ -72,10 +72,9 @@ tell Poetry to use the virtualenv python environment (`poetry config virtualenvs
|
||||
|
||||
### Core vs. Experimental
|
||||
|
||||
This repository contains three separate projects:
|
||||
This repository contains two separate projects:
|
||||
- `langchain`: core langchain code, abstractions, and use cases.
|
||||
- `langchain_core`: contain interfaces for key abstractions as well as logic for combining them in chains (LCEL).
|
||||
- `langchain_experimental`: see the [Experimental README](https://github.com/langchain-ai/langchain/tree/master/libs/experimental/README.md) for more information.
|
||||
- `langchain.experimental`: see the [Experimental README](https://github.com/langchain-ai/langchain/tree/master/libs/experimental/README.md) for more information.
|
||||
|
||||
Each of these has its own development environment. Docs are run from the top-level makefile, but development
|
||||
is split across separate test & release flows.
|
||||
@@ -129,24 +128,6 @@ make docker_tests
|
||||
|
||||
There are also [integration tests and code-coverage](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/tests/README.md) available.
|
||||
|
||||
### Only develop langchain_core or langchain_experimental
|
||||
|
||||
If you are only developing `langchain_core` or `langchain_experimental`, you can simply install the dependencies for the respective projects and run tests:
|
||||
|
||||
```bash
|
||||
cd libs/core
|
||||
poetry install --with test
|
||||
make test
|
||||
```
|
||||
|
||||
Or:
|
||||
|
||||
```bash
|
||||
cd libs/experimental
|
||||
poetry install --with test
|
||||
make test
|
||||
```
|
||||
|
||||
### Formatting and Linting
|
||||
|
||||
Run these locally before submitting a PR; the CI system will check also.
|
||||
@@ -233,10 +214,6 @@ ignore-words-list = 'momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogy
|
||||
|
||||
Langchain relies heavily on optional dependencies to keep the Langchain package lightweight.
|
||||
|
||||
You only need to add a new dependency if a **unit test** relies on the package.
|
||||
If your package is only required for **integration tests**, then you can skip these
|
||||
steps and leave all pyproject.toml and poetry.lock files alone.
|
||||
|
||||
If you're adding a new dependency to Langchain, assume that it will be an optional dependency, and
|
||||
that most users won't have it installed.
|
||||
|
||||
|
||||
12
.github/workflows/_compile_integration_test.yml
vendored
12
.github/workflows/_compile_integration_test.yml
vendored
@@ -7,6 +7,10 @@ on:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
langchain-core-location:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain core library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.6.1"
|
||||
@@ -40,6 +44,14 @@ jobs:
|
||||
shell: bash
|
||||
run: poetry install --with=test_integration
|
||||
|
||||
- name: Install langchain core editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.langchain-core-location }}
|
||||
env:
|
||||
LANGCHAIN_CORE_LOCATION: ${{ inputs.langchain-core-location }}
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_CORE_LOCATION"
|
||||
|
||||
- name: Check integration tests compile
|
||||
shell: bash
|
||||
run: poetry run pytest -m compile tests/integration_tests
|
||||
|
||||
14
.github/workflows/_lint.yml
vendored
14
.github/workflows/_lint.yml
vendored
@@ -11,6 +11,10 @@ on:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain library folder"
|
||||
langchain-core-location:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain core library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.6.1"
|
||||
@@ -68,7 +72,7 @@ jobs:
|
||||
# It doesn't matter how you change it, any change will cause a cache-bust.
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry install --with lint,typing
|
||||
poetry install --with dev,lint,test,typing
|
||||
|
||||
- name: Install langchain editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
@@ -78,6 +82,14 @@ jobs:
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_LOCATION"
|
||||
|
||||
- name: Install langchain core editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.langchain-core-location }}
|
||||
env:
|
||||
LANGCHAIN_CORE_LOCATION: ${{ inputs.langchain-core-location }}
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_CORE_LOCATION"
|
||||
|
||||
- name: Get .mypy_cache to speed up mypy
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
|
||||
12
.github/workflows/_pydantic_compatibility.yml
vendored
12
.github/workflows/_pydantic_compatibility.yml
vendored
@@ -11,6 +11,10 @@ on:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain library folder"
|
||||
langchain-core-location:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain core library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.6.1"
|
||||
@@ -52,6 +56,14 @@ jobs:
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_LOCATION"
|
||||
|
||||
- name: Install langchain core editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.langchain-core-location }}
|
||||
env:
|
||||
LANGCHAIN_CORE_LOCATION: ${{ inputs.langchain-core-location }}
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_CORE_LOCATION"
|
||||
|
||||
- name: Install the opposite major version of pydantic
|
||||
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
|
||||
shell: bash
|
||||
|
||||
14
.github/workflows/_test.yml
vendored
14
.github/workflows/_test.yml
vendored
@@ -11,6 +11,10 @@ on:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain library folder"
|
||||
langchain-core-location:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain core library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.6.1"
|
||||
@@ -42,7 +46,7 @@ jobs:
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install --with test
|
||||
run: poetry install
|
||||
|
||||
- name: Install langchain editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
@@ -52,6 +56,14 @@ jobs:
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_LOCATION"
|
||||
|
||||
- name: Install langchain core editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.langchain-core-location }}
|
||||
env:
|
||||
LANGCHAIN_CORE_LOCATION: ${{ inputs.langchain-core-location }}
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_CORE_LOCATION"
|
||||
|
||||
- name: Run core tests
|
||||
shell: bash
|
||||
run: |
|
||||
|
||||
77
.github/workflows/langchain_ci.yml
vendored
77
.github/workflows/langchain_ci.yml
vendored
@@ -3,19 +3,18 @@ name: libs/langchain CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
branches: [ master ]
|
||||
pull_request:
|
||||
paths:
|
||||
- ".github/actions/poetry_setup/action.yml"
|
||||
- ".github/tools/**"
|
||||
- ".github/workflows/_lint.yml"
|
||||
- ".github/workflows/_test.yml"
|
||||
- ".github/workflows/_pydantic_compatibility.yml"
|
||||
- ".github/workflows/langchain_ci.yml"
|
||||
- "libs/*"
|
||||
- "libs/langchain/**"
|
||||
- "libs/core/**"
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
- '.github/actions/poetry_setup/action.yml'
|
||||
- '.github/tools/**'
|
||||
- '.github/workflows/_lint.yml'
|
||||
- '.github/workflows/_test.yml'
|
||||
- '.github/workflows/_pydantic_compatibility.yml'
|
||||
- '.github/workflows/langchain_ci.yml'
|
||||
- 'libs/*'
|
||||
- 'libs/langchain/**'
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
@@ -33,29 +32,77 @@ env:
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
uses: ./.github/workflows/_lint.yml
|
||||
uses:
|
||||
./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
langchain-core-location: ../core
|
||||
secrets: inherit
|
||||
|
||||
test:
|
||||
uses: ./.github/workflows/_test.yml
|
||||
uses:
|
||||
./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
langchain-core-location: ../core
|
||||
secrets: inherit
|
||||
|
||||
compile-integration-tests:
|
||||
uses: ./.github/workflows/_compile_integration_test.yml
|
||||
uses:
|
||||
./.github/workflows/_compile_integration_test.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
langchain-core-location: ../core
|
||||
secrets: inherit
|
||||
|
||||
pydantic-compatibility:
|
||||
uses: ./.github/workflows/_pydantic_compatibility.yml
|
||||
uses:
|
||||
./.github/workflows/_pydantic_compatibility.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
langchain-core-location: ../core
|
||||
secrets: inherit
|
||||
|
||||
# It's possible that langchain works fine with the latest *published* langchain-core,
|
||||
# but is broken with the langchain-core on `master`.
|
||||
#
|
||||
# We want to catch situations like that *before* releasing a new langchain-core, hence this test.
|
||||
test-with-latest-langchain-core:
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ env.WORKDIR }}
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: test with unpublished langchain-core - Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ env.WORKDIR }}
|
||||
cache-key: unpublished-langchain-core
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running tests with unpublished langchain, installing dependencies with poetry..."
|
||||
poetry install
|
||||
|
||||
echo "Editably installing langchain-core outside of poetry, to avoid messing up lockfile..."
|
||||
poetry run pip install -e ../core
|
||||
|
||||
- name: Run tests
|
||||
run: make test
|
||||
|
||||
extended-tests:
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
|
||||
33
.github/workflows/langchain_experimental_ci.yml
vendored
33
.github/workflows/langchain_experimental_ci.yml
vendored
@@ -3,19 +3,17 @@ name: libs/experimental CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
branches: [ master ]
|
||||
pull_request:
|
||||
paths:
|
||||
- ".github/actions/poetry_setup/action.yml"
|
||||
- ".github/tools/**"
|
||||
- ".github/workflows/_lint.yml"
|
||||
- ".github/workflows/_test.yml"
|
||||
- ".github/workflows/langchain_experimental_ci.yml"
|
||||
- "libs/*"
|
||||
- "libs/experimental/**"
|
||||
- "libs/langchain/**"
|
||||
- "libs/core/**"
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
- '.github/actions/poetry_setup/action.yml'
|
||||
- '.github/tools/**'
|
||||
- '.github/workflows/_lint.yml'
|
||||
- '.github/workflows/_test.yml'
|
||||
- '.github/workflows/langchain_experimental_ci.yml'
|
||||
- 'libs/*'
|
||||
- 'libs/experimental/**'
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
@@ -33,19 +31,26 @@ env:
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
uses: ./.github/workflows/_lint.yml
|
||||
uses:
|
||||
./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
langchain-location: ../langchain
|
||||
langchain-core-location: ../core
|
||||
secrets: inherit
|
||||
|
||||
test:
|
||||
uses: ./.github/workflows/_test.yml
|
||||
uses:
|
||||
./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
langchain-location: ../langchain
|
||||
langchain-core-location: ../core
|
||||
secrets: inherit
|
||||
|
||||
compile-integration-tests:
|
||||
uses: ./.github/workflows/_compile_integration_test.yml
|
||||
uses:
|
||||
./.github/workflows/_compile_integration_test.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
secrets: inherit
|
||||
|
||||
12
LICENSE
12
LICENSE
@@ -1,6 +1,6 @@
|
||||
MIT License
|
||||
The MIT License
|
||||
|
||||
Copyright (c) LangChain, Inc.
|
||||
Copyright (c) Harrison Chase
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
@@ -9,13 +9,13 @@ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
The above copyright notice and this permission notice shall be included in
|
||||
all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
THE SOFTWARE.
|
||||
1
Makefile
1
Makefile
@@ -44,7 +44,6 @@ spell_fix:
|
||||
lint:
|
||||
poetry run ruff docs templates cookbook
|
||||
poetry run ruff format docs templates cookbook --diff
|
||||
poetry run ruff --select I docs templates cookbook
|
||||
|
||||
format format_diff:
|
||||
poetry run ruff format docs templates cookbook
|
||||
|
||||
@@ -30,7 +30,7 @@ pip install langchain
|
||||
|
||||
With conda:
|
||||
```bash
|
||||
conda install langchain -c conda-forge
|
||||
pip install langsmith && conda install langchain -c conda-forge
|
||||
```
|
||||
|
||||
## 🤔 What is LangChain?
|
||||
@@ -104,7 +104,3 @@ Please see [here](https://python.langchain.com) for full documentation, which in
|
||||
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
|
||||
|
||||
For detailed information on how to contribute, see [here](.github/CONTRIBUTING.md).
|
||||
|
||||
## 🌟 Contributors
|
||||
|
||||
[](https://github.com/langchain-ai/langchain/graphs/contributors)
|
||||
|
||||
@@ -34,12 +34,12 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": null,
|
||||
"id": "5740fc70-c513-4ff4-9d72-cfc098f85fef",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain docugami==0.0.8 dgml-utils==0.3.0 pydantic langchainhub chromadb hnswlib --upgrade --quiet"
|
||||
"! pip install langchain docugami==0.0.4 dgml-utils==0.2.0 pydantic langchainhub chromadb --upgrade --quiet"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -76,7 +76,98 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 45,
|
||||
"id": "fc0767d4-9155-4591-855c-ef2e14e0e10f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import tempfile\n",
|
||||
"from pathlib import Path\n",
|
||||
"from pprint import pprint\n",
|
||||
"from time import sleep\n",
|
||||
"from typing import Dict, List\n",
|
||||
"\n",
|
||||
"import requests\n",
|
||||
"from docugami import Docugami\n",
|
||||
"from docugami.types import Document as DocugamiDocument\n",
|
||||
"\n",
|
||||
"api_key = os.environ.get(\"DOCUGAMI_API_KEY\")\n",
|
||||
"if not api_key:\n",
|
||||
" raise Exception(\"Please set Docugami API key environment variable\")\n",
|
||||
"\n",
|
||||
"client = Docugami()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def upload_files(local_paths: List[str], docset_name: str) -> List[DocugamiDocument]:\n",
|
||||
" docset_list_response = client.docsets.list(name=docset_name)\n",
|
||||
" if docset_list_response and docset_list_response.docsets:\n",
|
||||
" # Docset already exists with this name\n",
|
||||
" docset_id = docset_list_response.docsets[0]\n",
|
||||
" else:\n",
|
||||
" dg_docset = client.docsets.create(name=docset_name)\n",
|
||||
" docset_id = dg_docset.id\n",
|
||||
"\n",
|
||||
" document_list_response = client.documents.list(limit=int(1e5))\n",
|
||||
" dg_docs: List[DocugamiDocument] = []\n",
|
||||
" if document_list_response and document_list_response.documents:\n",
|
||||
" new_names = [Path(f).name for f in local_paths]\n",
|
||||
"\n",
|
||||
" dg_docs = [\n",
|
||||
" d\n",
|
||||
" for d in document_list_response.documents\n",
|
||||
" if Path(d.name).name in new_names\n",
|
||||
" ]\n",
|
||||
" existing_names = [Path(d.name).name for d in dg_docs]\n",
|
||||
"\n",
|
||||
" # Upload any files not previously uploaded\n",
|
||||
" for f in local_paths:\n",
|
||||
" if Path(f).name not in existing_names:\n",
|
||||
" dg_docs.append(\n",
|
||||
" client.documents.contents.upload(\n",
|
||||
" file=Path(f).absolute(),\n",
|
||||
" docset_id=docset_id,\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" return dg_docs\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def wait_for_xml(dg_docs: List[DocugamiDocument]) -> dict[str, str]:\n",
|
||||
" dgml_paths: dict[str, str] = {}\n",
|
||||
" while len(dgml_paths) < len(dg_docs):\n",
|
||||
" for doc in dg_docs:\n",
|
||||
" doc = client.documents.retrieve(doc.id) # update with latest\n",
|
||||
" current_status = doc.status\n",
|
||||
" if current_status == \"Error\":\n",
|
||||
" raise Exception(\n",
|
||||
" \"Document could not be processed, please confirm it is not a zero length, corrupt or password protected file\"\n",
|
||||
" )\n",
|
||||
" elif current_status == \"Ready\":\n",
|
||||
" dgml_url = doc.docset.url + f\"/documents/{doc.id}/dgml\"\n",
|
||||
" headers = {\"Authorization\": f\"Bearer {api_key}\"}\n",
|
||||
" dgml_response = requests.get(dgml_url, headers=headers)\n",
|
||||
" if not dgml_response.ok:\n",
|
||||
" raise Exception(\n",
|
||||
" f\"Could not download DGML artifact {dgml_url}: {dgml_response.status_code}\"\n",
|
||||
" )\n",
|
||||
" dgml_contents = dgml_response.text\n",
|
||||
" with tempfile.NamedTemporaryFile(delete=False, mode=\"w\") as temp_file:\n",
|
||||
" temp_file.write(dgml_contents)\n",
|
||||
" temp_file_path = temp_file.name\n",
|
||||
" dgml_paths[doc.name] = temp_file_path\n",
|
||||
"\n",
|
||||
" print(f\"{len(dgml_paths)} docs done processing out of {len(dg_docs)}...\")\n",
|
||||
"\n",
|
||||
" if len(dgml_paths) == len(dg_docs):\n",
|
||||
" # done\n",
|
||||
" return dgml_paths\n",
|
||||
" else:\n",
|
||||
" sleep(30) # try again in a bit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 46,
|
||||
"id": "ce0b2b21-7623-46e7-ae2c-3a9f67e8b9b9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -84,22 +175,18 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'Report_CEN23LA277_192541.pdf': '/tmp/tmpa0c77x46',\n",
|
||||
" 'Report_CEN23LA338_192753.pdf': '/tmp/tmpaftfld2w',\n",
|
||||
" 'Report_CEN23LA363_192876.pdf': '/tmp/tmpn7gp6be2',\n",
|
||||
" 'Report_CEN23LA394_192995.pdf': '/tmp/tmp9udymprf',\n",
|
||||
" 'Report_ERA23LA114_106615.pdf': '/tmp/tmpxdjbh4r_',\n",
|
||||
" 'Report_WPR23LA254_192532.pdf': '/tmp/tmpz6h75a0h'}\n"
|
||||
"6 docs done processing out of 6...\n",
|
||||
"{'Report_CEN23LA277_192541.pdf': '/var/folders/0h/6cchx4k528bdj8cfcsdm0dqr0000gn/T/tmpel3o0rpg',\n",
|
||||
" 'Report_CEN23LA338_192753.pdf': '/var/folders/0h/6cchx4k528bdj8cfcsdm0dqr0000gn/T/tmpgugb9ut1',\n",
|
||||
" 'Report_CEN23LA363_192876.pdf': '/var/folders/0h/6cchx4k528bdj8cfcsdm0dqr0000gn/T/tmp3_gf2sky',\n",
|
||||
" 'Report_CEN23LA394_192995.pdf': '/var/folders/0h/6cchx4k528bdj8cfcsdm0dqr0000gn/T/tmpwmfgoxkl',\n",
|
||||
" 'Report_ERA23LA114_106615.pdf': '/var/folders/0h/6cchx4k528bdj8cfcsdm0dqr0000gn/T/tmptibrz2yu',\n",
|
||||
" 'Report_WPR23LA254_192532.pdf': '/var/folders/0h/6cchx4k528bdj8cfcsdm0dqr0000gn/T/tmpvazrbbsi'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from pprint import pprint\n",
|
||||
"\n",
|
||||
"from docugami import Docugami\n",
|
||||
"from docugami.lib.upload import upload_to_named_docset, wait_for_dgml\n",
|
||||
"\n",
|
||||
"#### START DOCSET INFO (please change this values as needed)\n",
|
||||
"#### START DOCSET INFO (please change)\n",
|
||||
"DOCSET_NAME = \"NTSB Aviation Incident Reports\"\n",
|
||||
"FILE_PATHS = [\n",
|
||||
" \"/Users/tjaffri/ntsb/Report_CEN23LA277_192541.pdf\",\n",
|
||||
@@ -110,15 +197,13 @@
|
||||
" \"/Users/tjaffri/ntsb/Report_WPR23LA254_192532.pdf\",\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"# Note: Please specify ~6 (or more!) similar files to process together as a document set\n",
|
||||
"# This is currently a requirement for Docugami to automatically detect motifs\n",
|
||||
"# across the document set to generate a semantic XML Knowledge Graph.\n",
|
||||
"assert len(FILE_PATHS) > 5, \"Please provide at least 6 files\"\n",
|
||||
"assert (\n",
|
||||
" len(FILE_PATHS) > 5\n",
|
||||
") # Please specify ~6 (or more!) similar files to process together as a document set\n",
|
||||
"#### END DOCSET INFO\n",
|
||||
"\n",
|
||||
"dg_client = Docugami()\n",
|
||||
"dg_docs = upload_to_named_docset(dg_client, FILE_PATHS, DOCSET_NAME)\n",
|
||||
"dgml_paths = wait_for_dgml(dg_client, dg_docs)\n",
|
||||
"dg_docs = upload_files(FILE_PATHS, DOCSET_NAME)\n",
|
||||
"dgml_paths = wait_for_xml(dg_docs)\n",
|
||||
"\n",
|
||||
"pprint(dgml_paths)"
|
||||
]
|
||||
@@ -143,7 +228,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 47,
|
||||
"id": "05fcdd57-090f-44bf-a1fb-2c3609c80e34",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -152,13 +237,13 @@
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"found 30 chunks, here are the first few\n",
|
||||
"<AviationInvestigationFinalReport-section>Aviation </AviationInvestigationFinalReport-section>Investigation Final Report\n",
|
||||
"<table><tbody><tr><td>Location: </td> <td><Location><TownName>Elbert</TownName>, <USState>Colorado </USState></Location></td> <td>Accident Number: </td> <td><AccidentNumber>CEN23LA277 </AccidentNumber></td></tr> <tr><td><LocationDateTime>Date & Time: </LocationDateTime></td> <td><DateTime><EventDate>June 26, 2023</EventDate>, <EventTime>11:00 Local </EventTime></DateTime></td> <td><DateTimeAccidentNumber>Registration: </DateTimeAccidentNumber></td> <td><Registration>N23161 </Registration></td></tr> <tr><td><LocationAircraft>Aircraft: </LocationAircraft></td> <td><AircraftType>Piper <AircraftType>J3C-50 </AircraftType></AircraftType></td> <td><AircraftAccidentNumber>Aircraft Damage: </AircraftAccidentNumber></td> <td><AircraftDamage>Substantial </AircraftDamage></td></tr> <tr><td><LocationDefiningEvent>Defining Event: </LocationDefiningEvent></td> <td><DefiningEvent>Nose over/nose down </DefiningEvent></td> <td><DefiningEventAccidentNumber>Injuries: </DefiningEventAccidentNumber></td> <td><Injuries><Minor>1 </Minor>Minor </Injuries></td></tr> <tr><td><LocationFlightConductedUnder>Flight Conducted Under: </LocationFlightConductedUnder></td> <td><FlightConductedUnder><Part91-cell>Part <RegulationPart>91</RegulationPart>: General aviation - Personal </Part91-cell></FlightConductedUnder></td><td/><td><FlightConductedUnderCEN23LA277/></td></tr></tbody></table>\n",
|
||||
"Aviation Investigation Final Report\n",
|
||||
"<table><tbody><tr><td>Location: </td> <td><Location><TownName>Elbert</TownName>, <USState>Colorado </USState></Location></td> <td>Accident Number: </td> <td><AccidentNumber>CEN23LA277 </AccidentNumber></td></tr> <tr><td><LocationDateTime>Date & Time: </LocationDateTime></td> <td><DateTime><EventDate>June 26, 2023</EventDate>, <EventTime>11:00 Local </EventTime></DateTime></td> <td><DateTimeAccidentNumber>Registration: </DateTimeAccidentNumber></td> <td><Registration>N23161 </Registration></td></tr> <tr><td><LocationAircraft>Aircraft: </LocationAircraft></td> <td><Aircraft>Piper <AircraftType>J3C-50 </AircraftType></Aircraft></td> <td><AircraftAccidentNumber>Aircraft Damage: </AircraftAccidentNumber></td> <td><AircraftDamage>Substantial </AircraftDamage></td></tr> <tr><td><LocationDefiningEvent>Defining Event: </LocationDefiningEvent></td> <td><DefiningEvent>Nose over/nose down </DefiningEvent></td> <td><DefiningEventAccidentNumber>Injuries: </DefiningEventAccidentNumber></td> <td><Injuries><Minor>1 </Minor>Minor </Injuries></td></tr> <tr><td><LocationFlightConductedUnder>Flight Conducted Under: </LocationFlightConductedUnder></td> <td><Part91-cell>Part <RegulationPart>91</RegulationPart>: General aviation - Personal </Part91-cell></td><td/><td><FlightConductedUnderCEN23LA277/></td></tr></tbody></table>\n",
|
||||
"Analysis\n",
|
||||
"<TakeoffAccident> <Analysis>The pilot reported that, as the tail lifted during takeoff, the airplane veered left. He attempted to correct with full right rudder and full brakes. However, the airplane subsequently nosed over resulting in substantial damage to the fuselage, lift struts, rudder, and vertical stabilizer. </Analysis></TakeoffAccident>\n",
|
||||
"<TakeoffAccident> The pilot reported that, as the tail lifted during takeoff, the airplane veered left. He attempted to correct with full right rudder and full brakes. However, the airplane subsequently nosed over resulting in substantial damage to the fuselage, lift struts, rudder, and vertical stabilizer. </TakeoffAccident>\n",
|
||||
"<AircraftCondition> The pilot reported that there were no preaccident mechanical malfunctions or anomalies with the airplane that would have precluded normal operation. </AircraftCondition>\n",
|
||||
"<WindConditions> At about the time of the accident, wind was from <WindDirection>180</WindDirection>° at <WindConditions>5 </WindConditions>knots. The pilot decided to depart on runway <Runway>35 </Runway>due to the prevailing airport traffic. He stated that departing with “more favorable wind conditions” may have prevented the accident. </WindConditions>\n",
|
||||
"<ProbableCauseAndFindings-section>Probable Cause and Findings </ProbableCauseAndFindings-section>\n",
|
||||
"Probable Cause and Findings\n",
|
||||
"<ProbableCause> The <ProbableCause>National Transportation Safety Board </ProbableCause>determines the probable cause(s) of this accident to be: </ProbableCause>\n",
|
||||
"<AccidentCause> The pilot's loss of directional control during takeoff and subsequent excessive use of brakes which resulted in a nose-over. Contributing to the accident was his decision to takeoff downwind. </AccidentCause>\n",
|
||||
"Page 1 of <PageNumber>5 </PageNumber>\n"
|
||||
@@ -166,8 +251,6 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from pathlib import Path\n",
|
||||
"\n",
|
||||
"from dgml_utils.segmentation import get_chunks_str\n",
|
||||
"\n",
|
||||
"# Here we just read the first file, you can do the same for others\n",
|
||||
@@ -200,7 +283,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 48,
|
||||
"id": "8a4b49e0-de78-4790-a930-ad7cf324697a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -260,7 +343,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 109,
|
||||
"id": "7b697d30-1e94-47f0-87e8-f81d4b180da2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -270,14 +353,12 @@
|
||||
"39"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 109,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import requests\n",
|
||||
"\n",
|
||||
"# Download XML from known URL\n",
|
||||
"dgml = requests.get(\n",
|
||||
" \"https://raw.githubusercontent.com/docugami/dgml-utils/main/python/tests/test_data/article/Jane%20Doe.xml\"\n",
|
||||
@@ -288,7 +369,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 98,
|
||||
"id": "14714576-6e1d-499b-bcc8-39140bb2fd78",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -298,7 +379,7 @@
|
||||
"{'h1': 9, 'div': 12, 'p': 3, 'lim h1': 9, 'lim': 1, 'table': 1, 'h1 div': 4}"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 98,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -319,7 +400,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 99,
|
||||
"id": "5462f29e-fd59-4e0e-9493-ea3b560e523e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -352,7 +433,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 100,
|
||||
"id": "2b4ece00-2e43-4254-adc9-66dbb79139a6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -390,7 +471,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 101,
|
||||
"id": "08350119-aa22-4ec1-8f65-b1316a0d4123",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -418,7 +499,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 112,
|
||||
"id": "bcac8294-c54a-4b6e-af9d-3911a69620b2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -465,7 +546,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 113,
|
||||
"id": "8e275736-3408-4d7a-990e-4362c88e81f8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -496,7 +577,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 114,
|
||||
"id": "1b12536a-1303-41ad-9948-4eb5a5f32614",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -513,7 +594,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 115,
|
||||
"id": "8d8b567c-b442-4bf0-b639-04bd89effc62",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -538,7 +619,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 116,
|
||||
"id": "346c3a02-8fea-4f75-a69e-fc9542b99dbc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -600,7 +681,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 117,
|
||||
"id": "f2489de4-51e3-48b4-bbcd-ed9171deadf3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -644,17 +725,10 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"execution_count": 120,
|
||||
"id": "636e992f-823b-496b-a082-8b4fcd479de5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
@@ -696,7 +770,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"execution_count": 121,
|
||||
"id": "0e4a2f43-dd48-4ae3-8e27-7e87d169965f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -706,7 +780,7 @@
|
||||
"669"
|
||||
]
|
||||
},
|
||||
"execution_count": 20,
|
||||
"execution_count": 121,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -721,7 +795,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"execution_count": 124,
|
||||
"id": "56b78fb3-603d-4343-ae72-be54a3c5dd72",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -746,7 +820,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"execution_count": 125,
|
||||
"id": "d3cc5ba9-8553-4eda-a5d1-b799751186af",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -758,7 +832,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"execution_count": 126,
|
||||
"id": "d7c73faf-74cb-400d-8059-b69e2493de38",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -770,7 +844,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"execution_count": 127,
|
||||
"id": "4c553722-be42-42ce-83b8-76a17f323f1c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -780,7 +854,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"execution_count": 128,
|
||||
"id": "65dce40b-f1c3-494a-949e-69a9c9544ddb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -790,7 +864,7 @@
|
||||
"'The number of training tokens for LLaMA2 is 2.0T for all parameter sizes.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 25,
|
||||
"execution_count": 128,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -885,37 +959,14 @@
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"```"
|
||||
"``"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "867f8e11-384c-4aa1-8b3e-c59fb8d5fd7d",
|
||||
"id": "0879349e-7298-4f2c-b246-f1142e97a8e5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Finally, you can ask other questions that rely on more subtle parsing of the table, e.g.:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "d38f1459-7d2b-40df-8dcd-e747f85eb144",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The learning rate for LLaMA2 was 3.0 × 10−4 for the 7B and 13B models, and 1.5 × 10−4 for the 34B and 70B models.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 26,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llama2_chain.invoke(\"What was the learning rate for LLaMA2?\")"
|
||||
]
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -9,16 +9,12 @@ SCRIPT_DIR="$(cd "$(dirname "$0")"; pwd)"
|
||||
cd "${SCRIPT_DIR}"
|
||||
|
||||
mkdir -p ../_dist
|
||||
rsync -ruv . ../_dist
|
||||
cp -r . ../_dist
|
||||
cd ../_dist
|
||||
poetry run python scripts/model_feat_table.py
|
||||
poetry run nbdoc_build --srcdir docs --pause 0
|
||||
mkdir docs/templates
|
||||
cp ../templates/docs/INDEX.md docs/templates/index.md
|
||||
poetry run nbdoc_build --srcdir docs
|
||||
cp ../cookbook/README.md src/pages/cookbook.mdx
|
||||
cp ../.github/CONTRIBUTING.md docs/contributing.md
|
||||
mkdir -p docs/templates
|
||||
cp ../templates/docs/INDEX.md docs/templates/index.md
|
||||
wget https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md -O docs/langserve.md
|
||||
poetry run python scripts/generate_api_reference_links.py
|
||||
yarn install
|
||||
|
||||
@@ -296,7 +296,7 @@ def _document_langchain_experimental() -> None:
|
||||
def _document_langchain_core() -> None:
|
||||
"""Document the langchain_core package."""
|
||||
# Generate core_api_reference.rst
|
||||
core_members = _load_package_modules(CORE_DIR)
|
||||
core_members = _load_package_modules(EXP_DIR)
|
||||
core_doc = ".. _core_api_reference:\n\n" + _construct_doc(
|
||||
"langchain_core", core_members
|
||||
)
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
-e libs/langchain
|
||||
-e libs/experimental
|
||||
-e libs/core
|
||||
pydantic<2
|
||||
autodoc_pydantic==1.8.0
|
||||
myst_parser
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 3
|
||||
sidebar_position: 2
|
||||
---
|
||||
|
||||
# Cookbook
|
||||
|
||||
@@ -146,7 +146,7 @@
|
||||
"source": [
|
||||
"### Branching and Merging\n",
|
||||
"\n",
|
||||
"You may want the output of one component to be processed by 2 or more other components. [RunnableParallels](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.RunnableParallel.html#langchain_core.runnables.base.RunnableParallel) let you split or fork the chain so multiple components can process the input in parallel. Later, other components can join or merge the results to synthesize a final response. This type of chain creates a computation graph that looks like the following:\n",
|
||||
"You may want the output of one component to be processed by 2 or more other components. [RunnableMaps](https://api.python.langchain.com/en/latest/schema/langchain.schema.runnable.base.RunnableMap.html) let you split or fork the chain so multiple components can process the input in parallel. Later, other components can join or merge the results to synthesize a final response. This type of chain creates a computation graph that looks like the following:\n",
|
||||
"\n",
|
||||
"```text\n",
|
||||
" Input\n",
|
||||
|
||||
@@ -317,7 +317,7 @@
|
||||
"source": [
|
||||
"## Simplifying input\n",
|
||||
"\n",
|
||||
"To make invocation even simpler, we can add a `RunnableParallel` to take care of creating the prompt input dict for us:"
|
||||
"To make invocation even simpler, we can add a `RunnableMap` to take care of creating the prompt input dict for us:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -327,9 +327,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema.runnable import RunnableParallel, RunnablePassthrough\n",
|
||||
"from langchain.schema.runnable import RunnableMap, RunnablePassthrough\n",
|
||||
"\n",
|
||||
"map_ = RunnableParallel(foo=RunnablePassthrough())\n",
|
||||
"map_ = RunnableMap(foo=RunnablePassthrough())\n",
|
||||
"chain = (\n",
|
||||
" map_\n",
|
||||
" | prompt\n",
|
||||
|
||||
@@ -171,7 +171,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import format_document\n",
|
||||
"from langchain.schema.runnable import RunnableParallel"
|
||||
"from langchain.schema.runnable import RunnableMap"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -234,13 +234,7 @@
|
||||
"from typing import List, Tuple\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def _format_chat_history(chat_history: List[Tuple[str, str]]) -> str:\n",
|
||||
" # chat history is of format:\n",
|
||||
" # [\n",
|
||||
" # (human_message_str, ai_message_str),\n",
|
||||
" # ...\n",
|
||||
" # ]\n",
|
||||
" # see below for an example of how it's invoked\n",
|
||||
"def _format_chat_history(chat_history: List[Tuple]) -> str:\n",
|
||||
" buffer = \"\"\n",
|
||||
" for dialogue_turn in chat_history:\n",
|
||||
" human = \"Human: \" + dialogue_turn[0]\n",
|
||||
@@ -256,7 +250,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"_inputs = RunnableParallel(\n",
|
||||
"_inputs = RunnableMap(\n",
|
||||
" standalone_question=RunnablePassthrough.assign(\n",
|
||||
" chat_history=lambda x: _format_chat_history(x[\"chat_history\"])\n",
|
||||
" )\n",
|
||||
|
||||
@@ -1,493 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "366a0e68-fd67-4fe5-a292-5c33733339ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_position: 0\n",
|
||||
"title: Get started\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "befa7fd1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as streaming, parallelism, and logging."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9a9acd2e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Basic example: prompt + model + output parser\n",
|
||||
"\n",
|
||||
"The most basic and common use case is chaining a prompt template and a model together. To see how this works, let's create a chain that takes a topic and generates a joke:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "466b65b3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Why did the ice cream go to therapy?\\n\\nBecause it had too many toppings and couldn't find its cone-fidence!\""
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(\"tell me a short joke about {topic}\")\n",
|
||||
"model = ChatOpenAI()\n",
|
||||
"output_parser = StrOutputParser()\n",
|
||||
"\n",
|
||||
"chain = prompt | model | output_parser\n",
|
||||
"\n",
|
||||
"chain.invoke({\"topic\": \"ice cream\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "81c502c5-85ee-4f36-aaf4-d6e350b7792f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice this line of this code, where we piece together then different components into a single chain using LCEL:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"chain = prompt | model | output_parser\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"The `|` symbol is similar to a [unix pipe operator](https://en.wikipedia.org/wiki/Pipeline_(Unix)), which chains together the different components feeds the output from one component as input into the next component. \n",
|
||||
"\n",
|
||||
"In this chain the user input is passed to the prompt template, then the prompt template output is passed to the model, then the model output is passed to the output parser. Let's take a look at each component individually to really understand what's going on. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "aa1b77fa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1. Prompt\n",
|
||||
"\n",
|
||||
"`prompt` is a `BasePromptTemplate`, which means it takes in a dictionary of template variables and produces a `PromptValue`. A `PromptValue` is a wrapper around a completed prompt that can be passed to either an `LLM` (which takes a string as input) or `ChatModel` (which takes a sequence of messages as input). It can work with either language model type because it defines logic both for producing `BaseMessage`s and for producing a string."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "b8656990",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"ChatPromptValue(messages=[HumanMessage(content='tell me a short joke about ice cream')])"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"prompt_value = prompt.invoke({\"topic\": \"ice cream\"})\n",
|
||||
"prompt_value"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "e6034488",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[HumanMessage(content='tell me a short joke about ice cream')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"prompt_value.to_messages()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "60565463",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Human: tell me a short joke about ice cream'"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"prompt_value.to_string()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "577f0f76",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2. Model\n",
|
||||
"\n",
|
||||
"The `PromptValue` is then passed to `model`. In this case our `model` is a `ChatModel`, meaning it will output a `BaseMessage`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "33cf5f72",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\"Why did the ice cream go to therapy? \\n\\nBecause it had too many toppings and couldn't find its cone-fidence!\")"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"message = model.invoke(prompt_value)\n",
|
||||
"message"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "327e7db8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If our `model` was an `LLM`, it would output a string."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "8feb05da",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'\\n\\nRobot: Why did the ice cream go to therapy? Because it had a rocky road.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(model=\"gpt-3.5-turbo-instruct\")\n",
|
||||
"llm.invoke(prompt_value)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "91847478",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 3. Output parser\n",
|
||||
"\n",
|
||||
"And lastly we pass our `model` output to the `output_parser`, which is a `BaseOutputParser` meaning it takes either a string or a \n",
|
||||
"`BaseMessage` as input. The `StrOutputParser` specifically simple converts any input into a string."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "533e59a8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Why did the ice cream go to therapy? \\n\\nBecause it had too many toppings and couldn't find its cone-fidence!\""
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"output_parser.invoke(message)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9851e842",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 4. Entire Pipeline\n",
|
||||
"\n",
|
||||
"To follow the steps along:\n",
|
||||
"\n",
|
||||
"1. We pass in user input on the desired topic as `{\"topic\": \"ice cream\"}`\n",
|
||||
"2. The `prompt` component takes the user input, which is then used to construct a PromptValue after using the `topic` to construct the prompt. \n",
|
||||
"3. The `model` component takes the generated prompt, and passes into the OpenAI LLM model for evaluation. The generated output from the model is a `ChatMessage` object. \n",
|
||||
"4. Finally, the `output_parser` component takes in a `ChatMessage`, and transforms this into a Python string, which is returned from the invoke method. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c4873109",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```mermaid\n",
|
||||
"graph LR\n",
|
||||
" A(Input: topic=ice cream) --> |Dict| B(PromptTemplate)\n",
|
||||
" B -->|PromptValue| C(ChatModel) \n",
|
||||
" C -->|ChatMessage| D(StrOutputParser)\n",
|
||||
" D --> |String| F(Result)\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fe63534d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
":::info\n",
|
||||
"\n",
|
||||
"Note that if you’re curious about the output of any components, you can always test out a smaller version of the chain such as `prompt` or `prompt | model` to see the intermediate results:\n",
|
||||
"\n",
|
||||
":::"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "11089b6f-23f8-474f-97ec-8cae8d0ca6d4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"input = {\"topic\": \"ice cream\"}\n",
|
||||
"\n",
|
||||
"prompt.invoke(input)\n",
|
||||
"# > ChatPromptValue(messages=[HumanMessage(content='tell me a short joke about ice cream')])\n",
|
||||
"\n",
|
||||
"(prompt | model).invoke(input)\n",
|
||||
"# > AIMessage(content=\"Why did the ice cream go to therapy?\\nBecause it had too many toppings and couldn't cone-trol itself!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cc7d3b9d-e400-4c9b-9188-f29dac73e6bb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## RAG Search Example\n",
|
||||
"\n",
|
||||
"For our next example, we want to run a retrieval-augmented generation chain to add some context when responding to questions. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "662426e8-4316-41dc-8312-9b58edc7e0c9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Requires:\n",
|
||||
"# pip install langchain docarray\n",
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser\n",
|
||||
"from langchain.schema.runnable import RunnableParallel, RunnablePassthrough\n",
|
||||
"from langchain.vectorstores import DocArrayInMemorySearch\n",
|
||||
"\n",
|
||||
"vectorstore = DocArrayInMemorySearch.from_texts(\n",
|
||||
" [\"harrison worked at kensho\", \"bears like to eat honey\"],\n",
|
||||
" embedding=OpenAIEmbeddings(),\n",
|
||||
")\n",
|
||||
"retriever = vectorstore.as_retriever()\n",
|
||||
"\n",
|
||||
"template = \"\"\"Answer the question based only on the following context:\n",
|
||||
"{context}\n",
|
||||
"\n",
|
||||
"Question: {question}\n",
|
||||
"\"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(template)\n",
|
||||
"model = ChatOpenAI()\n",
|
||||
"output_parser = StrOutputParser()\n",
|
||||
"\n",
|
||||
"setup_and_retrieval = RunnableParallel(\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
|
||||
")\n",
|
||||
"chain = setup_and_retrieval | prompt | model | output_parser\n",
|
||||
"\n",
|
||||
"chain.invoke(\"where did harrison work?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f0999140-6001-423b-970b-adf1dfdb4dec",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this case, the composed chain is: "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5b88e9bb-f04a-4a56-87ec-19a0e6350763",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = setup_and_retrieval | prompt | model | output_parser"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6e929e15-40a5-4569-8969-384f636cab87",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To explain this, we first can see that the prompt template above takes in `context` and `question` as values to be substituted in the prompt. Before building the prompt template, we want to retrieve relevant documents to the search and include them as part of the context. \n",
|
||||
"\n",
|
||||
"As a preliminary step, we’ve setup the retriever using an in memory store, which can retrieve documents based on a query. This is a runnable component as well that can be chained together with other components, but you can also try to run it separately:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a7319ef6-613b-4638-ad7d-4a2183702c1d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever.invoke(\"where did harrison work?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e6833844-f1c4-444c-a3d2-31b3c6b31d46",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We then use the `RunnableParallel` to prepare the expected inputs into the prompt by using the entries for the retrieved documents as well as the original user question, using the retriever for document search, and RunnablePassthrough to pass the user’s question:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dcbca26b-d6b9-4c24-806c-1ec8fdaab4ed",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"setup_and_retrieval = RunnableParallel(\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "68c721c1-048b-4a64-9d78-df54fe465992",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To review, the complete chain is:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1d5115a7-7b8e-458b-b936-26cc87ee81c4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"setup_and_retrieval = RunnableParallel(\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
|
||||
")\n",
|
||||
"chain = setup_and_retrieval | prompt | model | output_parser"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5c6f5f74-b387-48a0-bedd-1fae202cd10a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"With the flow being:\n",
|
||||
"\n",
|
||||
"1. The first steps create a `RunnableParallel` object with two entries. The first entry, `context` will include the document results fetched by the retriever. The second entry, `question` will contain the user’s original question. To pass on the question, we use `RunnablePassthrough` to copy this entry. \n",
|
||||
"2. Feed the dictionary from the step above to the `prompt` component. It then takes the user input which is `question` as well as the retrieved document which is `context` to construct a prompt and output a PromptValue. \n",
|
||||
"3. The `model` component takes the generated prompt, and passes into the OpenAI LLM model for evaluation. The generated output from the model is a `ChatMessage` object. \n",
|
||||
"4. Finally, the `output_parser` component takes in a `ChatMessage`, and transforms this into a Python string, which is returned from the invoke method.\n",
|
||||
"\n",
|
||||
"```mermaid\n",
|
||||
"graph LR\n",
|
||||
" A(Question) --> B(RunnableParallel)\n",
|
||||
" B -->|Question| C(Retriever)\n",
|
||||
" B -->|Question| D(RunnablePassThrough)\n",
|
||||
" C -->|context=retrieved docs| E(PromptTemplate)\n",
|
||||
" D -->|question=Question| E\n",
|
||||
" E -->|PromptValue| F(ChatModel) \n",
|
||||
" F -->|ChatMessage| G(StrOutputParser)\n",
|
||||
" G --> |String| H(Result)\n",
|
||||
"```\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8c2438df-164e-4bbe-b5f4-461695e45b0f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"We recommend reading our [Why use LCEL](/docs/expression_language/why) section next to see a side-by-side comparison of the code needed to produce common functionality with and without LCEL."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -26,7 +26,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 2,
|
||||
"id": "d3e893bf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -44,24 +44,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 3,
|
||||
"id": "dfdd8bf5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from unittest.mock import patch\n",
|
||||
"\n",
|
||||
"import httpx\n",
|
||||
"from openai import RateLimitError\n",
|
||||
"\n",
|
||||
"request = httpx.Request(\"GET\", \"/\")\n",
|
||||
"response = httpx.Response(200, request=request)\n",
|
||||
"error = RateLimitError(\"rate limit\", response=response, body=\"\")"
|
||||
"from openai.error import RateLimitError"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 5,
|
||||
"id": "e6fdffc1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -74,7 +69,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 27,
|
||||
"id": "584461ab",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -88,7 +83,7 @@
|
||||
],
|
||||
"source": [
|
||||
"# Let's use just the OpenAI LLm first, to show that we run into an error\n",
|
||||
"with patch(\"openai.resources.chat.completions.Completions.create\", side_effect=error):\n",
|
||||
"with patch(\"openai.ChatCompletion.create\", side_effect=RateLimitError()):\n",
|
||||
" try:\n",
|
||||
" print(openai_llm.invoke(\"Why did the chicken cross the road?\"))\n",
|
||||
" except:\n",
|
||||
@@ -111,7 +106,7 @@
|
||||
],
|
||||
"source": [
|
||||
"# Now let's try with fallbacks to Anthropic\n",
|
||||
"with patch(\"openai.resources.chat.completions.Completions.create\", side_effect=error):\n",
|
||||
"with patch(\"openai.ChatCompletion.create\", side_effect=RateLimitError()):\n",
|
||||
" try:\n",
|
||||
" print(llm.invoke(\"Why did the chicken cross the road?\"))\n",
|
||||
" except:\n",
|
||||
@@ -153,7 +148,7 @@
|
||||
" ]\n",
|
||||
")\n",
|
||||
"chain = prompt | llm\n",
|
||||
"with patch(\"openai.resources.chat.completions.Completions.create\", side_effect=error):\n",
|
||||
"with patch(\"openai.ChatCompletion.create\", side_effect=RateLimitError()):\n",
|
||||
" try:\n",
|
||||
" print(chain.invoke({\"animal\": \"kangaroo\"}))\n",
|
||||
" except:\n",
|
||||
@@ -291,7 +286,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -82,7 +82,7 @@
|
||||
"source": [
|
||||
"## Accepting a Runnable Config\n",
|
||||
"\n",
|
||||
"Runnable lambdas can optionally accept a [RunnableConfig](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.config.RunnableConfig.html#langchain_core.runnables.config.RunnableConfig), which they can use to pass callbacks, tags, and other configuration information to nested runs."
|
||||
"Runnable lambdas can optionally accept a [RunnableConfig](https://api.python.langchain.com/en/latest/schema/langchain.schema.runnable.config.RunnableConfig.html?highlight=runnableconfig#langchain.schema.runnable.config.RunnableConfig), which they can use to pass callbacks, tags, and other configuration information to nested runs."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 2
|
||||
sidebar_position: 1
|
||||
---
|
||||
|
||||
# How to
|
||||
|
||||
@@ -104,7 +104,7 @@
|
||||
"source": [
|
||||
"Here the input to prompt is expected to be a map with keys \"context\" and \"question\". The user input is just the question. So we need to get the context using our retriever and passthrough the user input under the \"question\" key.\n",
|
||||
"\n",
|
||||
"Note that when composing a RunnableParallel with another Runnable we don't even need to wrap our dictionary in the RunnableParallel class — the type conversion is handled for us."
|
||||
"Note that when composing a RunnableMap when another Runnable we don't even need to wrap our dictionary in the RunnableMap class — the type conversion is handled for us."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -114,7 +114,7 @@
|
||||
"source": [
|
||||
"## Parallelism\n",
|
||||
"\n",
|
||||
"RunnableParallel are also useful for running independent processes in parallel, since each Runnable in the map is executed in parallel. For example, we can see our earlier `joke_chain`, `poem_chain` and `map_chain` all have about the same runtime, even though `map_chain` executes both of the other two."
|
||||
"RunnableMaps are also useful for running independent processes in parallel, since each Runnable in the map is executed in parallel. For example, we can see our earlier `joke_chain`, `poem_chain` and `map_chain` all have about the same runtime, even though `map_chain` executes both of the other two."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -290,9 +290,9 @@
|
||||
],
|
||||
"source": [
|
||||
"from langchain.schema.messages import HumanMessage\n",
|
||||
"from langchain.schema.runnable import RunnableParallel\n",
|
||||
"from langchain.schema.runnable import RunnableMap\n",
|
||||
"\n",
|
||||
"chain = RunnableParallel({\"output_message\": ChatAnthropic(model=\"claude-2\")})\n",
|
||||
"chain = RunnableMap({\"output_message\": ChatAnthropic(model=\"claude-2\")})\n",
|
||||
"chain_with_history = RunnableWithMessageHistory(\n",
|
||||
" chain,\n",
|
||||
" lambda session_id: RedisChatMessageHistory(session_id, url=REDIS_URL),\n",
|
||||
|
||||
@@ -20,7 +20,7 @@ Whenever your LCEL chains have steps that can be executed in parallel (eg if you
|
||||
Configure retries and fallbacks for any part of your LCEL chain. This is a great way to make your chains more reliable at scale. We’re currently working on adding streaming support for retries/fallbacks, so you can get the added reliability without any latency cost.
|
||||
|
||||
**Access intermediate results**
|
||||
For more complex chains it’s often very useful to access the results of intermediate steps even before the final output is produced. This can be used to let end-users know something is happening, or even just to debug your chain. You can stream intermediate results, and it’s available on every [LangServe](/docs/langserve) server.
|
||||
For more complex chains it’s often very useful to access the results of intermediate steps even before the final output is produced. This can be used let end-users know something is happening, or even just to debug your chain. You can stream intermediate results, and it’s available on every [LangServe](/docs/langserve) server.
|
||||
|
||||
**Input and output schemas**
|
||||
Input and output schemas give every LCEL chain Pydantic and JSONSchema schemas inferred from the structure of your chain. This can be used for validation of inputs and outputs, and is an integral part of LangServe.
|
||||
@@ -30,4 +30,4 @@ As your chains get more and more complex, it becomes increasingly important to u
|
||||
With LCEL, **all** steps are automatically logged to [LangSmith](/docs/langsmith/) for maximum observability and debuggability.
|
||||
|
||||
**Seamless LangServe deployment integration**
|
||||
Any chain created with LCEL can be easily deployed using [LangServe](/docs/langserve).
|
||||
Any chain created with LCEL can be easily deployed using [LangServe](/docs/langserve).
|
||||
@@ -6,7 +6,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_position: 1\n",
|
||||
"sidebar_position: 0\n",
|
||||
"title: Interface\n",
|
||||
"---"
|
||||
]
|
||||
@@ -16,7 +16,7 @@
|
||||
"id": "9a9acd2e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To make it as easy as possible to create custom chains, we've implemented a [\"Runnable\"](https://api.python.langchain.com/en/stable/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) protocol. The `Runnable` protocol is implemented for most components. \n",
|
||||
"To make it as easy as possible to create custom chains, we've implemented a [\"Runnable\"](https://api.python.langchain.com/en/latest/schema/langchain.schema.runnable.base.Runnable.html#langchain.schema.runnable.base.Runnable) protocol. The `Runnable` protocol is implemented for most components. \n",
|
||||
"This is a standard interface, which makes it easy to define custom chains as well as invoke them in a standard way. \n",
|
||||
"The standard interface includes:\n",
|
||||
"\n",
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -79,7 +79,7 @@ Walkthroughs and techniques for common end-to-end use cases, like:
|
||||
### [Integrations](/docs/integrations/providers/)
|
||||
LangChain is part of a rich ecosystem of tools that integrate with our framework and build on top of it. Check out our growing list of [integrations](/docs/integrations/providers/).
|
||||
|
||||
### [Guides](/docs/guides/guides/debugging)
|
||||
### [Guides](/docs/guides/adapters/openai)
|
||||
Best practices for developing with LangChain.
|
||||
|
||||
### [API reference](https://api.python.langchain.com)
|
||||
|
||||
@@ -344,7 +344,7 @@ category_chain = chat_prompt | ChatOpenAI() | CommaSeparatedListOutputParser()
|
||||
app = FastAPI(
|
||||
title="LangChain Server",
|
||||
version="1.0",
|
||||
description="A simple API server using LangChain's Runnable interfaces",
|
||||
description="A simple api server using Langchain's Runnable interfaces",
|
||||
)
|
||||
|
||||
# 3. Adding chain route
|
||||
|
||||
@@ -5,9 +5,7 @@
|
||||
"id": "700a516b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAI Adapter(Old)\n",
|
||||
"\n",
|
||||
"**Please ensure OpenAI library is less than 1.0.0; otherwise, refer to the newer doc [OpenAI Adapter](./openai.ipynb).**\n",
|
||||
"# OpenAI Adapter\n",
|
||||
"\n",
|
||||
"A lot of people get started with OpenAI but want to explore other models. LangChain's integrations with many model providers make this easy to do so. While LangChain has it's own message and model APIs, we've also made it as easy as possible to explore other models by exposing an adapter to adapt LangChain models to the OpenAI api.\n",
|
||||
"\n",
|
||||
@@ -51,6 +49,18 @@
|
||||
"Original OpenAI call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "e1d27dfa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"result = openai.ChatCompletion.create(\n",
|
||||
" messages=messages, model=\"gpt-3.5-turbo\", temperature=0\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
@@ -69,9 +79,6 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = openai.ChatCompletion.create(\n",
|
||||
" messages=messages, model=\"gpt-3.5-turbo\", temperature=0\n",
|
||||
")\n",
|
||||
"result[\"choices\"][0][\"message\"].to_dict_recursive()"
|
||||
]
|
||||
},
|
||||
@@ -83,6 +90,18 @@
|
||||
"LangChain OpenAI wrapper call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "87c2d515",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"lc_result = lc_openai.ChatCompletion.create(\n",
|
||||
" messages=messages, model=\"gpt-3.5-turbo\", temperature=0\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
@@ -101,9 +120,6 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"lc_result = lc_openai.ChatCompletion.create(\n",
|
||||
" messages=messages, model=\"gpt-3.5-turbo\", temperature=0\n",
|
||||
")\n",
|
||||
"lc_result[\"choices\"][0][\"message\"]"
|
||||
]
|
||||
},
|
||||
@@ -115,6 +131,18 @@
|
||||
"Swapping out model providers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "7a2c011c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"lc_result = lc_openai.ChatCompletion.create(\n",
|
||||
" messages=messages, model=\"claude-2\", temperature=0, provider=\"ChatAnthropic\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
@@ -133,9 +161,6 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"lc_result = lc_openai.ChatCompletion.create(\n",
|
||||
" messages=messages, model=\"claude-2\", temperature=0, provider=\"ChatAnthropic\"\n",
|
||||
")\n",
|
||||
"lc_result[\"choices\"][0][\"message\"]"
|
||||
]
|
||||
},
|
||||
@@ -277,7 +302,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
@@ -28,7 +28,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 18,
|
||||
"id": "d3e893bf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -46,24 +46,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 21,
|
||||
"id": "dfdd8bf5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from unittest.mock import patch\n",
|
||||
"\n",
|
||||
"import httpx\n",
|
||||
"from openai import RateLimitError\n",
|
||||
"\n",
|
||||
"request = httpx.Request(\"GET\", \"/\")\n",
|
||||
"response = httpx.Response(200, request=request)\n",
|
||||
"error = RateLimitError(\"rate limit\", response=response, body=\"\")"
|
||||
"from openai.error import RateLimitError"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 24,
|
||||
"id": "e6fdffc1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -76,7 +71,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 27,
|
||||
"id": "584461ab",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -90,7 +85,7 @@
|
||||
],
|
||||
"source": [
|
||||
"# Let's use just the OpenAI LLm first, to show that we run into an error\n",
|
||||
"with patch(\"openai.resources.chat.completions.Completions.create\", side_effect=error):\n",
|
||||
"with patch(\"openai.ChatCompletion.create\", side_effect=RateLimitError()):\n",
|
||||
" try:\n",
|
||||
" print(openai_llm.invoke(\"Why did the chicken cross the road?\"))\n",
|
||||
" except:\n",
|
||||
@@ -113,7 +108,7 @@
|
||||
],
|
||||
"source": [
|
||||
"# Now let's try with fallbacks to Anthropic\n",
|
||||
"with patch(\"openai.resources.chat.completions.Completions.create\", side_effect=error):\n",
|
||||
"with patch(\"openai.ChatCompletion.create\", side_effect=RateLimitError()):\n",
|
||||
" try:\n",
|
||||
" print(llm.invoke(\"Why did the chicken cross the road?\"))\n",
|
||||
" except:\n",
|
||||
@@ -155,7 +150,7 @@
|
||||
" ]\n",
|
||||
")\n",
|
||||
"chain = prompt | llm\n",
|
||||
"with patch(\"openai.resources.chat.completions.Completions.create\", side_effect=error):\n",
|
||||
"with patch(\"openai.ChatCompletion.create\", side_effect=RateLimitError()):\n",
|
||||
" try:\n",
|
||||
" print(chain.invoke({\"animal\": \"kangaroo\"}))\n",
|
||||
" except:\n",
|
||||
@@ -436,7 +431,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -60,7 +60,7 @@
|
||||
"\n",
|
||||
" Firstly, the wallet contains my credit card with number 4111 1111 1111 1111, which is registered under my name and linked to my bank account, PL61109010140000071219812874.\n",
|
||||
"\n",
|
||||
" Additionally, the wallet had a driver's license - DL No: 999000680 issued to my name. It also houses my Social Security Number, 602-76-4532.\n",
|
||||
" Additionally, the wallet had a driver's license - DL No: 999000680 issued to my name. It also houses my Social Security Number, 602-76-4532. \n",
|
||||
"\n",
|
||||
" What's more, I had my polish identity card there, with the number ABC123456.\n",
|
||||
"\n",
|
||||
@@ -68,7 +68,7 @@
|
||||
"\n",
|
||||
" In case any information arises regarding my wallet, please reach out to me on my phone number, 999-888-7777, or through my personal email, johndoe@example.com.\n",
|
||||
"\n",
|
||||
" Please consider this information to be highly confidential and respect my privacy.\n",
|
||||
" Please consider this information to be highly confidential and respect my privacy. \n",
|
||||
"\n",
|
||||
" The bank has been informed about the stolen credit card and necessary actions have been taken from their end. They will be reachable at their official email, support@bankname.com.\n",
|
||||
" My representative there is Victoria Cherry (her business phone: 987-654-3210).\n",
|
||||
@@ -667,11 +667,7 @@
|
||||
"from langchain.chat_models.openai import ChatOpenAI\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser\n",
|
||||
"from langchain.schema.runnable import (\n",
|
||||
" RunnableLambda,\n",
|
||||
" RunnableParallel,\n",
|
||||
" RunnablePassthrough,\n",
|
||||
")\n",
|
||||
"from langchain.schema.runnable import RunnableLambda, RunnableMap, RunnablePassthrough\n",
|
||||
"\n",
|
||||
"# 6. Create anonymizer chain\n",
|
||||
"template = \"\"\"Answer the question based only on the following context:\n",
|
||||
@@ -684,7 +680,7 @@
|
||||
"model = ChatOpenAI(temperature=0.3)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"_inputs = RunnableParallel(\n",
|
||||
"_inputs = RunnableMap(\n",
|
||||
" question=RunnablePassthrough(),\n",
|
||||
" # It is important to remember about question anonymization\n",
|
||||
" anonymized_question=RunnableLambda(anonymizer.anonymize),\n",
|
||||
@@ -886,7 +882,7 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"chain_with_deanonymization = (\n",
|
||||
" RunnableParallel({\"question\": RunnablePassthrough()})\n",
|
||||
" RunnableMap({\"question\": RunnablePassthrough()})\n",
|
||||
" | {\n",
|
||||
" \"context\": itemgetter(\"question\")\n",
|
||||
" | retriever\n",
|
||||
|
||||
@@ -7,9 +7,7 @@
|
||||
"source": [
|
||||
"# Amazon Comprehend Moderation Chain\n",
|
||||
"\n",
|
||||
">[Amazon Comprehend](https://aws.amazon.com/comprehend/) is a natural-language processing (NLP) service that uses machine learning to uncover valuable insights and connections in text.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use `Amazon Comprehend` to detect and handle `Personally Identifiable Information` (`PII`) and toxicity.\n",
|
||||
"This notebook shows how to use [Amazon Comprehend](https://aws.amazon.com/comprehend/) to detect and handle `Personally Identifiable Information` (`PII`) and toxicity.\n",
|
||||
"\n",
|
||||
"## Setting up"
|
||||
]
|
||||
@@ -1419,7 +1417,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
"# Hugging Face prompt injection identification\n",
|
||||
"\n",
|
||||
"This notebook shows how to prevent prompt injection attacks using the text classification model from `HuggingFace`.\n",
|
||||
"By default it uses a *deberta* model trained to identify prompt injections. In this walkthrough we'll use https://huggingface.co/laiyer/deberta-v3-base-prompt-injection."
|
||||
"It exploits the *deberta* model trained to identify prompt injections: https://huggingface.co/deepset/deberta-v3-base-injection"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -21,37 +21,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"id": "aea25588-3c3f-4506-9094-221b3a0d519b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "58ab3557623a495d8cc3c3e32a61938f",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Downloading config.json: 0%| | 0.00/994 [00:00<?, ?B/s]"
|
||||
"'hugging_face_injection_identifier'"
|
||||
]
|
||||
},
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "3bf062f02d304ab5a485a2a228b4cf41",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Downloading model.safetensors: 0%| | 0.00/738M [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
@@ -59,10 +41,7 @@
|
||||
" HuggingFaceInjectionIdentifier,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Using https://huggingface.co/laiyer/deberta-v3-base-prompt-injection\n",
|
||||
"injection_identifier = HuggingFaceInjectionIdentifier(\n",
|
||||
" model=\"laiyer/deberta-v3-base-prompt-injection\"\n",
|
||||
")\n",
|
||||
"injection_identifier = HuggingFaceInjectionIdentifier()\n",
|
||||
"injection_identifier.name"
|
||||
]
|
||||
},
|
||||
@@ -320,9 +299,9 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "poetry-venv",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "poetry-venv"
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
@@ -334,7 +313,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,318 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "700a516b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAI Adapter\n",
|
||||
"\n",
|
||||
"**Please ensure OpenAI library is version 1.0.0 or higher; otherwise, refer to the older doc [OpenAI Adapter(Old)](./openai-old.ipynb).**\n",
|
||||
"\n",
|
||||
"A lot of people get started with OpenAI but want to explore other models. LangChain's integrations with many model providers make this easy to do so. While LangChain has it's own message and model APIs, we've also made it as easy as possible to explore other models by exposing an adapter to adapt LangChain models to the OpenAI api.\n",
|
||||
"\n",
|
||||
"At the moment this only deals with output and does not return other information (token counts, stop reasons, etc)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "6017f26a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import openai\n",
|
||||
"from langchain.adapters import openai as lc_openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b522ceda",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## chat.completions.create"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "1d22eb61",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"messages = [{\"role\": \"user\", \"content\": \"hi\"}]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d550d3ad",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Original OpenAI call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "012d81ae",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'content': 'Hello! How can I assist you today?',\n",
|
||||
" 'role': 'assistant',\n",
|
||||
" 'function_call': None,\n",
|
||||
" 'tool_calls': None}"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = openai.chat.completions.create(\n",
|
||||
" messages=messages, model=\"gpt-3.5-turbo\", temperature=0\n",
|
||||
")\n",
|
||||
"result.choices[0].message.model_dump()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "db5b5500",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"LangChain OpenAI wrapper call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "c67a5ac8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'role': 'assistant', 'content': 'Hello! How can I help you today?'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"lc_result = lc_openai.chat.completions.create(\n",
|
||||
" messages=messages, model=\"gpt-3.5-turbo\", temperature=0\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"lc_result.choices[0].message # Attribute access"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "37a6e461-8608-47f6-ac45-12ad753c062a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'role': 'assistant', 'content': 'Hello! How can I help you today?'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"lc_result[\"choices\"][0][\"message\"] # Also compatible with index access"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "034ba845",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Swapping out model providers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "f7c94827",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'role': 'assistant', 'content': 'Hello! How can I assist you today?'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"lc_result = lc_openai.chat.completions.create(\n",
|
||||
" messages=messages, model=\"claude-2\", temperature=0, provider=\"ChatAnthropic\"\n",
|
||||
")\n",
|
||||
"lc_result.choices[0].message"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cb3f181d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## chat.completions.stream"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f7b8cd18",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Original OpenAI call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "fd8cb1ea",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'content': '', 'function_call': None, 'role': 'assistant', 'tool_calls': None}\n",
|
||||
"{'content': 'Hello', 'function_call': None, 'role': None, 'tool_calls': None}\n",
|
||||
"{'content': '!', 'function_call': None, 'role': None, 'tool_calls': None}\n",
|
||||
"{'content': ' How', 'function_call': None, 'role': None, 'tool_calls': None}\n",
|
||||
"{'content': ' can', 'function_call': None, 'role': None, 'tool_calls': None}\n",
|
||||
"{'content': ' I', 'function_call': None, 'role': None, 'tool_calls': None}\n",
|
||||
"{'content': ' assist', 'function_call': None, 'role': None, 'tool_calls': None}\n",
|
||||
"{'content': ' you', 'function_call': None, 'role': None, 'tool_calls': None}\n",
|
||||
"{'content': ' today', 'function_call': None, 'role': None, 'tool_calls': None}\n",
|
||||
"{'content': '?', 'function_call': None, 'role': None, 'tool_calls': None}\n",
|
||||
"{'content': None, 'function_call': None, 'role': None, 'tool_calls': None}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for c in openai.chat.completions.create(\n",
|
||||
" messages=messages, model=\"gpt-3.5-turbo\", temperature=0, stream=True\n",
|
||||
"):\n",
|
||||
" print(c.choices[0].delta.model_dump())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0b2a076b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"LangChain OpenAI wrapper call"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "9521218c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'role': 'assistant', 'content': ''}\n",
|
||||
"{'content': 'Hello'}\n",
|
||||
"{'content': '!'}\n",
|
||||
"{'content': ' How'}\n",
|
||||
"{'content': ' can'}\n",
|
||||
"{'content': ' I'}\n",
|
||||
"{'content': ' assist'}\n",
|
||||
"{'content': ' you'}\n",
|
||||
"{'content': ' today'}\n",
|
||||
"{'content': '?'}\n",
|
||||
"{}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for c in lc_openai.chat.completions.create(\n",
|
||||
" messages=messages, model=\"gpt-3.5-turbo\", temperature=0, stream=True\n",
|
||||
"):\n",
|
||||
" print(c.choices[0].delta)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0fc39750",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Swapping out model providers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "68f0214e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'role': 'assistant', 'content': ''}\n",
|
||||
"{'content': 'Hello'}\n",
|
||||
"{'content': '!'}\n",
|
||||
"{'content': ' How'}\n",
|
||||
"{'content': ' can'}\n",
|
||||
"{'content': ' I'}\n",
|
||||
"{'content': ' assist'}\n",
|
||||
"{'content': ' you'}\n",
|
||||
"{'content': ' today'}\n",
|
||||
"{'content': '?'}\n",
|
||||
"{}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for c in lc_openai.chat.completions.create(\n",
|
||||
" messages=messages,\n",
|
||||
" model=\"claude-2\",\n",
|
||||
" temperature=0,\n",
|
||||
" stream=True,\n",
|
||||
" provider=\"ChatAnthropic\",\n",
|
||||
"):\n",
|
||||
" print(c[\"choices\"][0][\"delta\"])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -7,15 +7,7 @@
|
||||
"source": [
|
||||
"# Bedrock Chat\n",
|
||||
"\n",
|
||||
">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that offers a choice of \n",
|
||||
"> high-performing foundation models (FMs) from leading AI companies like `AI21 Labs`, `Anthropic`, `Cohere`, \n",
|
||||
"> `Meta`, `Stability AI`, and `Amazon` via a single API, along with a broad set of capabilities you need to \n",
|
||||
"> build generative AI applications with security, privacy, and responsible AI. Using `Amazon Bedrock`, \n",
|
||||
"> you can easily experiment with and evaluate top FMs for your use case, privately customize them with \n",
|
||||
"> your data using techniques such as fine-tuning and `Retrieval Augmented Generation` (`RAG`), and build \n",
|
||||
"> agents that execute tasks using your enterprise systems and data sources. Since `Amazon Bedrock` is \n",
|
||||
"> serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy \n",
|
||||
"> generative AI capabilities into your applications using the AWS services you are already familiar with.\n"
|
||||
"[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -139,7 +131,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -7,19 +7,7 @@
|
||||
"# ERNIE-Bot Chat\n",
|
||||
"\n",
|
||||
"[ERNIE-Bot](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/jlil56u11) is a large language model developed by Baidu, covering a huge amount of Chinese data.\n",
|
||||
"This notebook covers how to get started with ErnieBot chat models.\n",
|
||||
"\n",
|
||||
"**Note:** We recommend users using this class to switch to [Baidu Qianfan](./baidu_qianfan_endpoint). they are 3 why we recommend users to use `QianfanChatEndpoint`:\n",
|
||||
"1. `QianfanChatEndpoint` support more LLM in the Qianfan platform.\n",
|
||||
"2. `QianfanChatEndpoint` support streaming mode.\n",
|
||||
"3. `QianfanChatEndpoint` support function calling usgage.\n",
|
||||
"\n",
|
||||
"Some tips for migration:\n",
|
||||
"- change `ernie_client_id` to `qianfan_ak`, also change `ernie_client_secret` to `qianfan_sk`.\n",
|
||||
"- install `qianfan` package. \n",
|
||||
" ```\n",
|
||||
" pip install qianfan\n",
|
||||
" ```"
|
||||
"This notebook covers how to get started with ErnieBot chat models."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -119,159 +119,6 @@
|
||||
"chat_model(messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Extraction\n",
|
||||
" \n",
|
||||
"Update your version of Ollama and supply the [`format`](https://github.com/jmorganca/ollama/blob/main/docs/api.md#json-mode) flag.\n",
|
||||
"\n",
|
||||
"We can enforce the model to produce JSON.\n",
|
||||
"\n",
|
||||
"**Note:** You can also try out the experimental [OllamaFunctions](https://python.langchain.com/docs/integrations/chat/ollama_functions) wrapper for convenience."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
||||
"from langchain.chat_models import ChatOllama\n",
|
||||
"\n",
|
||||
"chat_model = ChatOllama(\n",
|
||||
" model=\"llama2\",\n",
|
||||
" format=\"json\",\n",
|
||||
" callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Sure! Here's a JSON response with the colors of the sky at different times of the day:\n",
|
||||
" Begriffe und Abkürzungen:\n",
|
||||
"\n",
|
||||
"* `time`: The time of day (in 24-hour format)\n",
|
||||
"* `sky_color`: The color of the sky at that time (as a hex code)\n",
|
||||
"\n",
|
||||
"Here are the colors of the sky at different times of the day:\n",
|
||||
"```json\n",
|
||||
"[\n",
|
||||
" {\n",
|
||||
" \"time\": \"6am\",\n",
|
||||
" \"sky_color\": \"#0080c0\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"time\": \"9am\",\n",
|
||||
" \"sky_color\": \"#3498db\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"time\": \"12pm\",\n",
|
||||
" \"sky_color\": \"#ef7c00\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"time\": \"3pm\",\n",
|
||||
" \"sky_color\": \"#9564b6\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"time\": \"6pm\",\n",
|
||||
" \"sky_color\": \"#e78ac3\"\n",
|
||||
" },\n",
|
||||
" {\n",
|
||||
" \"time\": \"9pm\",\n",
|
||||
" \"sky_color\": \"#5f006a\"\n",
|
||||
" }\n",
|
||||
"]\n",
|
||||
"```\n",
|
||||
"In this response, the `time` property is a string in 24-hour format, representing the time of day. The `sky_color` property is a hex code representing the color of the sky at that time. For example, at 6am, the sky is blue (#0080c0), while at 9pm, it's dark blue (#5f006a)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.schema import HumanMessage\n",
|
||||
"\n",
|
||||
"messages = [\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"What color is the sky at different times of the day? Respond using JSON\"\n",
|
||||
" )\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"chat_model_response = chat_model(messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Sure! Based on the JSON schema you provided, here's the information we can gather about a person named John who is 35 years old and loves pizza:\n",
|
||||
"\n",
|
||||
"**Name:** John\n",
|
||||
"\n",
|
||||
"**Age:** 35 (integer)\n",
|
||||
"\n",
|
||||
"**Favorite food:** Pizza (string)\n",
|
||||
"\n",
|
||||
"So, the JSON object for John would look like this:\n",
|
||||
"```json\n",
|
||||
"{\n",
|
||||
" \"name\": \"John\",\n",
|
||||
" \"age\": 35,\n",
|
||||
" \"fav_food\": \"pizza\"\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"Note that we cannot provide additional information about John beyond what is specified in the schema. For example, we do not have any information about his gender, occupation, or address, as those fields are not included in the schema."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"\n",
|
||||
"from langchain.schema import HumanMessage\n",
|
||||
"\n",
|
||||
"json_schema = {\n",
|
||||
" \"title\": \"Person\",\n",
|
||||
" \"description\": \"Identifying information about a person.\",\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"name\": {\"title\": \"Name\", \"description\": \"The person's name\", \"type\": \"string\"},\n",
|
||||
" \"age\": {\"title\": \"Age\", \"description\": \"The person's age\", \"type\": \"integer\"},\n",
|
||||
" \"fav_food\": {\n",
|
||||
" \"title\": \"Fav Food\",\n",
|
||||
" \"description\": \"The person's favorite food\",\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" \"required\": [\"name\", \"age\"],\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"messages = [\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Please tell me about a person using the following JSON schema:\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(content=json.dumps(json_schema, indent=2)),\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Now, considering the schema, tell me about a person named John who is 35 years old and loves pizza.\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"chat_model_response = chat_model(messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -528,5 +375,5 @@
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
|
||||
@@ -1,171 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ollama Functions\n",
|
||||
"\n",
|
||||
"This notebook shows how to use an experimental wrapper around Ollama that gives it the same API as OpenAI Functions.\n",
|
||||
"\n",
|
||||
"Note that more powerful and capable models will perform better with complex schema and/or multiple functions. The examples below use Mistral.\n",
|
||||
"For a complete list of supported models and model variants, see the [Ollama model library](https://ollama.ai/library).\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"Follow [these instructions](https://github.com/jmorganca/ollama) to set up and run a local Ollama instance.\n",
|
||||
"\n",
|
||||
"## Usage\n",
|
||||
"\n",
|
||||
"You can initialize OllamaFunctions in a similar way to how you'd initialize a standard ChatOllama instance:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_experimental.llms.ollama_functions import OllamaFunctions\n",
|
||||
"\n",
|
||||
"model = OllamaFunctions(model=\"mistral\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can then bind functions defined with JSON Schema parameters and a `function_call` parameter to force the model to call the given function:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model = model.bind(\n",
|
||||
" functions=[\n",
|
||||
" {\n",
|
||||
" \"name\": \"get_current_weather\",\n",
|
||||
" \"description\": \"Get the current weather in a given location\",\n",
|
||||
" \"parameters\": {\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"location\": {\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" \"description\": \"The city and state, \" \"e.g. San Francisco, CA\",\n",
|
||||
" },\n",
|
||||
" \"unit\": {\n",
|
||||
" \"type\": \"string\",\n",
|
||||
" \"enum\": [\"celsius\", \"fahrenheit\"],\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" \"required\": [\"location\"],\n",
|
||||
" },\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" function_call={\"name\": \"get_current_weather\"},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Calling a function with this model then results in JSON output matching the provided schema:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='', additional_kwargs={'function_call': {'name': 'get_current_weather', 'arguments': '{\"location\": \"Boston, MA\", \"unit\": \"celsius\"}'}})"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.schema import HumanMessage\n",
|
||||
"\n",
|
||||
"model.invoke(\"what is the weather in Boston?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using for extraction\n",
|
||||
"\n",
|
||||
"One useful thing you can do with function calling here is extracting properties from a given input in a structured format:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
|
||||
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains import create_extraction_chain\n",
|
||||
"\n",
|
||||
"# Schema\n",
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"name\": {\"type\": \"string\"},\n",
|
||||
" \"height\": {\"type\": \"integer\"},\n",
|
||||
" \"hair_color\": {\"type\": \"string\"},\n",
|
||||
" },\n",
|
||||
" \"required\": [\"name\", \"height\"],\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"# Input\n",
|
||||
"input = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller than Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
|
||||
"\n",
|
||||
"# Run chain\n",
|
||||
"llm = OllamaFunctions(model=\"mistral\", temperature=0)\n",
|
||||
"chain = create_extraction_chain(schema, llm)\n",
|
||||
"chain.run(input)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,177 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "404758628c7b20f6",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"# Volc Engine Maas\n",
|
||||
"\n",
|
||||
"This notebook provides you with a guide on how to get started with volc engine maas chat models."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2cd2ebd9d023c4d3",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Install the package\n",
|
||||
"!pip install volcengine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "51e7f967cb78f5b7",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-11-27T10:43:37.131292Z",
|
||||
"start_time": "2023-11-27T10:43:37.127250Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import VolcEngineMaasChat\n",
|
||||
"from langchain.schema import HumanMessage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "139667d44689f9e0",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-11-27T10:43:49.911867Z",
|
||||
"start_time": "2023-11-27T10:43:49.908329Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat = VolcEngineMaasChat(volc_engine_maas_ak=\"your ak\", volc_engine_maas_sk=\"your sk\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e84ebc4feedcc739",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"or you can set access_key and secret_key in your environment variables\n",
|
||||
"```bash\n",
|
||||
"export VOLC_ACCESSKEY=YOUR_AK\n",
|
||||
"export VOLC_SECRETKEY=YOUR_SK\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "35da18414ad17aa0",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-11-27T10:43:53.101852Z",
|
||||
"start_time": "2023-11-27T10:43:51.741041Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": "AIMessage(content='好的,这是一个笑话:\\n\\n为什么鸟儿不会玩电脑游戏?\\n\\n因为它们没有翅膀!')"
|
||||
},
|
||||
"execution_count": 26,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat([HumanMessage(content=\"给我讲个笑话\")])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a55e5a9ed80ec49e",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"# volc engine maas chat with stream"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"id": "b4e4049980ac68ef",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-11-27T10:43:55.120405Z",
|
||||
"start_time": "2023-11-27T10:43:55.114707Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat = VolcEngineMaasChat(\n",
|
||||
" volc_engine_maas_ak=\"your ak\",\n",
|
||||
" volc_engine_maas_sk=\"your sk\",\n",
|
||||
" streaming=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"id": "fe709a4ffb5c811d",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-11-27T10:43:58.775294Z",
|
||||
"start_time": "2023-11-27T10:43:56.799401Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": "AIMessage(content='好的,这是一个笑话:\\n\\n三岁的女儿说她会造句了,妈妈让她用“年轻”造句,女儿说:“妈妈减肥,一年轻了好几斤”。')"
|
||||
},
|
||||
"execution_count": 28,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat([HumanMessage(content=\"给我讲个笑话\")])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -30,7 +30,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Writing discord_chats.txt\n"
|
||||
"Overwriting discord_chats.txt\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -240,14 +240,14 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'messages': [AIMessage(content='Love music! Do you like jazz?', additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': '08/15/2023 11:10 AM\\n'}]}),\n",
|
||||
" HumanMessage(content='Yes! Jazz is fantastic. Ever heard this one?\\nWebsite\\nListen to classic jazz track...', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': '08/15/2023 9:27 PM\\n'}]}),\n",
|
||||
" AIMessage(content='Indeed! Great choice. 🎷', additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': 'Yesterday at 5:03 AM\\n'}]}),\n",
|
||||
" HumanMessage(content='Thanks! How about some virtual sightseeing?\\nWebsite\\nVirtual tour of famous landmarks...', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': 'Yesterday at 5:23 AM\\n'}]}),\n",
|
||||
" AIMessage(content=\"Sounds fun! Let's explore.\", additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': 'Today at 2:38 PM\\n'}]}),\n",
|
||||
" HumanMessage(content='Enjoy the tour! See you around.', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': 'Today at 2:56 PM\\n'}]}),\n",
|
||||
" AIMessage(content='Thank you! Goodbye! 👋', additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': 'Today at 3:00 PM\\n'}]}),\n",
|
||||
" HumanMessage(content='Farewell! Happy exploring.', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': 'Today at 3:02 PM\\n'}]})]}]"
|
||||
"[{'messages': [AIMessage(content='Love music! Do you like jazz?', additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': '08/15/2023 11:10 AM\\n'}]}, example=False),\n",
|
||||
" HumanMessage(content='Yes! Jazz is fantastic. Ever heard this one?\\nWebsite\\nListen to classic jazz track...', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': '08/15/2023 9:27 PM\\n'}]}, example=False),\n",
|
||||
" AIMessage(content='Indeed! Great choice. 🎷', additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': 'Yesterday at 5:03 AM\\n'}]}, example=False),\n",
|
||||
" HumanMessage(content='Thanks! How about some virtual sightseeing?\\nWebsite\\nVirtual tour of famous landmarks...', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': 'Yesterday at 5:23 AM\\n'}]}, example=False),\n",
|
||||
" AIMessage(content=\"Sounds fun! Let's explore.\", additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': 'Today at 2:38 PM\\n'}]}, example=False),\n",
|
||||
" HumanMessage(content='Enjoy the tour! See you around.', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': 'Today at 2:56 PM\\n'}]}, example=False),\n",
|
||||
" AIMessage(content='Thank you! Goodbye! 👋', additional_kwargs={'sender': 'talkingtower', 'events': [{'message_time': 'Today at 3:00 PM\\n'}]}, example=False),\n",
|
||||
" HumanMessage(content='Farewell! Happy exploring.', additional_kwargs={'sender': 'reporterbob', 'events': [{'message_time': 'Today at 3:02 PM\\n'}]}, example=False)]}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
@@ -279,7 +279,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Thank you! Have a great day!"
|
||||
"Thank you! Have a wonderful day! 🌟"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -317,7 +317,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -32,7 +32,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 2,
|
||||
"id": "647f2158-a42e-4634-b283-b8492caf542a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -91,7 +91,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 1,
|
||||
"id": "a0869bc6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -114,7 +114,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 6,
|
||||
"id": "f61ee277",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -126,19 +126,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 9,
|
||||
"id": "ec466ad7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[HumanMessage(content=\"Hi Hermione! How's your summer going so far?\", additional_kwargs={'sender': 'Harry Potter'}),\n",
|
||||
" HumanMessage(content=\"Harry! Lovely to hear from you. My summer is going well, though I do miss everyone. I'm spending most of my time going through my books and researching fascinating new topics. How about you?\", additional_kwargs={'sender': 'Hermione Granger'}),\n",
|
||||
" HumanMessage(content=\"I miss you all too. The Dursleys are being their usual unpleasant selves but I'm getting by. At least I can practice some spells in my room without them knowing. Let me know if you find anything good in your researching!\", additional_kwargs={'sender': 'Harry Potter'})]"
|
||||
"[HumanMessage(content=\"Hi Hermione! How's your summer going so far?\", additional_kwargs={'sender': 'Harry Potter'}, example=False),\n",
|
||||
" HumanMessage(content=\"Harry! Lovely to hear from you. My summer is going well, though I do miss everyone. I'm spending most of my time going through my books and researching fascinating new topics. How about you?\", additional_kwargs={'sender': 'Hermione Granger'}, example=False),\n",
|
||||
" HumanMessage(content=\"I miss you all too. The Dursleys are being their usual unpleasant selves but I'm getting by. At least I can practice some spells in my room without them knowing. Let me know if you find anything good in your researching!\", additional_kwargs={'sender': 'Harry Potter'}, example=False)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -150,7 +150,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 10,
|
||||
"id": "8a3ee473",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -162,7 +162,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 12,
|
||||
"id": "9f41e122",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -172,7 +172,7 @@
|
||||
"9"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -196,7 +196,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 14,
|
||||
"id": "5a78030d-b757-4bbe-8a6c-841056f46df7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -209,7 +209,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 17,
|
||||
"id": "ff35b028-78bf-4c5b-9ec6-939fe67de7f7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -220,19 +220,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 19,
|
||||
"id": "4b11906e-a496-4d01-9f0d-1938c14147bf",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[AIMessage(content=\"Professor Snape, I was hoping I could speak with you for a moment about something that's been concerning me lately.\", additional_kwargs={'sender': 'Harry Potter'}),\n",
|
||||
" HumanMessage(content=\"What is it, Potter? I'm quite busy at the moment.\", additional_kwargs={'sender': 'Severus Snape'}),\n",
|
||||
" AIMessage(content=\"I apologize for the interruption, sir. I'll be brief. I've noticed some strange activity around the school grounds at night. I saw a cloaked figure lurking near the Forbidden Forest last night. I'm worried someone may be plotting something sinister.\", additional_kwargs={'sender': 'Harry Potter'})]"
|
||||
"[AIMessage(content=\"Professor Snape, I was hoping I could speak with you for a moment about something that's been concerning me lately.\", additional_kwargs={'sender': 'Harry Potter'}, example=False),\n",
|
||||
" HumanMessage(content=\"What is it, Potter? I'm quite busy at the moment.\", additional_kwargs={'sender': 'Severus Snape'}, example=False),\n",
|
||||
" AIMessage(content=\"I apologize for the interruption, sir. I'll be brief. I've noticed some strange activity around the school grounds at night. I saw a cloaked figure lurking near the Forbidden Forest last night. I'm worried someone may be plotting something sinister.\", additional_kwargs={'sender': 'Harry Potter'}, example=False)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -253,7 +253,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 20,
|
||||
"id": "21372331",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -263,7 +263,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 38,
|
||||
"id": "92c5ae7a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -282,7 +282,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 33,
|
||||
"id": "dfcbd181",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
@@ -299,7 +299,7 @@
|
||||
" 'content': \"I apologize for the interruption, sir. I'll be brief. I've noticed some strange activity around the school grounds at night. I saw a cloaked figure lurking near the Forbidden Forest last night. I'm worried someone may be plotting something sinister.\"}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"execution_count": 33,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -321,7 +321,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 42,
|
||||
"id": "13cd290a-b1e9-4686-bb5e-d99de8b8612b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -331,7 +331,7 @@
|
||||
"100"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"execution_count": 42,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -364,7 +364,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 43,
|
||||
"id": "95ce3f63-3c80-44b2-9060-534ad74e16fa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -374,7 +374,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 58,
|
||||
"id": "ab9e28eb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -382,7 +382,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"File file-ULumAXLEFw3vB6bb9uy6DNVC ready after 0.00 seconds.\n"
|
||||
"File file-zCyNBeg4snpbBL7VkvsuhCz8 ready afer 30.55 seconds.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -399,16 +399,16 @@
|
||||
" my_file.write((json.dumps({\"messages\": m}) + \"\\n\").encode(\"utf-8\"))\n",
|
||||
"\n",
|
||||
"my_file.seek(0)\n",
|
||||
"training_file = openai.files.create(file=my_file, purpose=\"fine-tune\")\n",
|
||||
"training_file = openai.File.create(file=my_file, purpose=\"fine-tune\")\n",
|
||||
"\n",
|
||||
"# OpenAI audits each training file for compliance reasons.\n",
|
||||
"# This make take a few minutes\n",
|
||||
"status = openai.files.retrieve(training_file.id).status\n",
|
||||
"status = openai.File.retrieve(training_file.id).status\n",
|
||||
"start_time = time.time()\n",
|
||||
"while status != \"processed\":\n",
|
||||
" print(f\"Status=[{status}]... {time.time() - start_time:.2f}s\", end=\"\\r\", flush=True)\n",
|
||||
" time.sleep(5)\n",
|
||||
" status = openai.files.retrieve(training_file.id).status\n",
|
||||
" status = openai.File.retrieve(training_file.id).status\n",
|
||||
"print(f\"File {training_file.id} ready after {time.time() - start_time:.2f} seconds.\")"
|
||||
]
|
||||
},
|
||||
@@ -422,12 +422,12 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 59,
|
||||
"id": "3f451425",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"job = openai.fine_tuning.jobs.create(\n",
|
||||
"job = openai.FineTuningJob.create(\n",
|
||||
" training_file=training_file.id,\n",
|
||||
" model=\"gpt-3.5-turbo\",\n",
|
||||
")"
|
||||
@@ -443,7 +443,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 60,
|
||||
"id": "bac1637a-c087-4523-ade1-c47f9bf4c6f4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -451,23 +451,23 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Status=[running]... 874.29s. 56.93s\r"
|
||||
"Status=[running]... 908.87s\r"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"status = openai.fine_tuning.jobs.retrieve(job.id).status\n",
|
||||
"status = openai.FineTuningJob.retrieve(job.id).status\n",
|
||||
"start_time = time.time()\n",
|
||||
"while status != \"succeeded\":\n",
|
||||
" print(f\"Status=[{status}]... {time.time() - start_time:.2f}s\", end=\"\\r\", flush=True)\n",
|
||||
" time.sleep(5)\n",
|
||||
" job = openai.fine_tuning.jobs.retrieve(job.id)\n",
|
||||
" job = openai.FineTuningJob.retrieve(job.id)\n",
|
||||
" status = job.status"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"execution_count": 66,
|
||||
"id": "535895e1-bc69-40e5-82ed-e24ed2baeeee",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -475,7 +475,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"ft:gpt-3.5-turbo-0613:personal::8QnAzWMr\n"
|
||||
"ft:gpt-3.5-turbo-0613:personal::7rDwkaOq\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -495,7 +495,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"execution_count": 67,
|
||||
"id": "3925d60d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -510,7 +510,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"execution_count": 69,
|
||||
"id": "7190cf2e-ab34-4ceb-bdad-45f24f069c29",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -529,7 +529,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"execution_count": 72,
|
||||
"id": "f02057e9-f914-40b1-9c9d-9432ff594b98",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -537,7 +537,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"I'm taking Charms, Defense Against the Dark Arts, Herbology, Potions, Transfiguration, and Ancient Runes. How about you?"
|
||||
"The usual - Potions, Transfiguration, Defense Against the Dark Arts. What about you?"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -545,6 +545,14 @@
|
||||
"for tok in chain.stream({\"input\": \"What classes are you taking?\"}):\n",
|
||||
" print(tok, end=\"\", flush=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "35331503-3cc6-4d64-955e-64afe6b5fef3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -563,7 +571,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -243,16 +243,16 @@
|
||||
" my_file.write((json.dumps({\"messages\": m}) + \"\\n\").encode(\"utf-8\"))\n",
|
||||
"\n",
|
||||
"my_file.seek(0)\n",
|
||||
"training_file = openai.files.create(file=my_file, purpose=\"fine-tune\")\n",
|
||||
"training_file = openai.File.create(file=my_file, purpose=\"fine-tune\")\n",
|
||||
"\n",
|
||||
"# OpenAI audits each training file for compliance reasons.\n",
|
||||
"# This make take a few minutes\n",
|
||||
"status = openai.files.retrieve(training_file.id).status\n",
|
||||
"status = openai.File.retrieve(training_file.id).status\n",
|
||||
"start_time = time.time()\n",
|
||||
"while status != \"processed\":\n",
|
||||
" print(f\"Status=[{status}]... {time.time() - start_time:.2f}s\", end=\"\\r\", flush=True)\n",
|
||||
" time.sleep(5)\n",
|
||||
" status = openai.files.retrieve(training_file.id).status\n",
|
||||
" status = openai.File.retrieve(training_file.id).status\n",
|
||||
"print(f\"File {training_file.id} ready after {time.time() - start_time:.2f} seconds.\")"
|
||||
]
|
||||
},
|
||||
@@ -271,7 +271,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"job = openai.fine_tuning.jobs.create(\n",
|
||||
"job = openai.FineTuningJob.create(\n",
|
||||
" training_file=training_file.id,\n",
|
||||
" model=\"gpt-3.5-turbo\",\n",
|
||||
")"
|
||||
@@ -300,12 +300,12 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"status = openai.fine_tuning.jobs.retrieve(job.id).status\n",
|
||||
"status = openai.FineTuningJob.retrieve(job.id).status\n",
|
||||
"start_time = time.time()\n",
|
||||
"while status != \"succeeded\":\n",
|
||||
" print(f\"Status=[{status}]... {time.time() - start_time:.2f}s\", end=\"\\r\", flush=True)\n",
|
||||
" time.sleep(5)\n",
|
||||
" job = openai.fine_tuning.jobs.retrieve(job.id)\n",
|
||||
" job = openai.FineTuningJob.retrieve(job.id)\n",
|
||||
" status = job.status"
|
||||
]
|
||||
},
|
||||
@@ -416,7 +416,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -123,7 +123,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 10,
|
||||
"id": "817bc077-c18a-473b-94a4-a7d810d583a8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -145,7 +145,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 11,
|
||||
"id": "9e5ac127-b094-4584-9159-5a6d3d7315c7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -166,7 +166,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": null,
|
||||
"id": "11d19e28-be49-4801-8065-1a58d13cd192",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -174,7 +174,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Status=[running]... 429.55s. 46.34s\r"
|
||||
"Status=[running]... 302.42s. 143.85s\r"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -190,20 +190,20 @@
|
||||
" my_file.write((json.dumps({\"messages\": dialog}) + \"\\n\").encode(\"utf-8\"))\n",
|
||||
"\n",
|
||||
"my_file.seek(0)\n",
|
||||
"training_file = openai.files.create(file=my_file, purpose=\"fine-tune\")\n",
|
||||
"training_file = openai.File.create(file=my_file, purpose=\"fine-tune\")\n",
|
||||
"\n",
|
||||
"job = openai.fine_tuning.jobs.create(\n",
|
||||
"job = openai.FineTuningJob.create(\n",
|
||||
" training_file=training_file.id,\n",
|
||||
" model=\"gpt-3.5-turbo\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Wait for the fine-tuning to complete (this may take some time)\n",
|
||||
"status = openai.fine_tuning.jobs.retrieve(job.id).status\n",
|
||||
"status = openai.FineTuningJob.retrieve(job.id).status\n",
|
||||
"start_time = time.time()\n",
|
||||
"while status != \"succeeded\":\n",
|
||||
" print(f\"Status=[{status}]... {time.time() - start_time:.2f}s\", end=\"\\r\", flush=True)\n",
|
||||
" time.sleep(5)\n",
|
||||
" status = openai.fine_tuning.jobs.retrieve(job.id).status\n",
|
||||
" status = openai.FineTuningJob.retrieve(job.id).status\n",
|
||||
"\n",
|
||||
"# Now your model is fine-tuned!"
|
||||
]
|
||||
@@ -220,18 +220,16 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": null,
|
||||
"id": "3f472ca4-fa9b-485d-bd37-8ce3c59c44db",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get the fine-tuned model ID\n",
|
||||
"job = openai.fine_tuning.jobs.retrieve(job.id)\n",
|
||||
"job = openai.FineTuningJob.retrieve(job.id)\n",
|
||||
"model_id = job.fine_tuned_model\n",
|
||||
"\n",
|
||||
"# Use the fine-tuned model in LangChain\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(\n",
|
||||
" model=model_id,\n",
|
||||
" temperature=1,\n",
|
||||
@@ -240,21 +238,10 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": null,
|
||||
"id": "7d3b5845-6385-42d1-9f7d-5ea798dc2cd9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='[{\"s\": \"There were three ravens\", \"object\": \"tree\", \"relation\": \"sat on\"}, {\"s\": \"three ravens\", \"object\": \"a tree\", \"relation\": \"sat on\"}]')"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model.invoke(\"There were three ravens sat on a tree.\")"
|
||||
]
|
||||
@@ -284,7 +271,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -35,7 +35,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 2,
|
||||
"id": "473adce5-c863-49e6-85c3-049e0ec2222e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -65,7 +65,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 3,
|
||||
"id": "9a36d27f-2f3b-4148-b94a-9436fe8b00e0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -105,7 +105,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 4,
|
||||
"id": "89bcc676-27e8-40dc-a4d6-92cf28e0db58",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -144,7 +144,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"id": "cd44ff01-22cf-431a-8bf4-29a758d1fcff",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -169,10 +169,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 6,
|
||||
"id": "62da7d8f-5cfc-45a6-946e-2bcda2b0ba1f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"math_questions = [\n",
|
||||
" \"What's 45/9?\",\n",
|
||||
@@ -211,7 +219,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 7,
|
||||
"id": "d6037992-050d-4ada-a061-860c124f0bf1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -223,7 +231,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 8,
|
||||
"id": "0444919a-6f5a-4726-9916-4603b1420d0e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -258,7 +266,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 9,
|
||||
"id": "817bc077-c18a-473b-94a4-a7d810d583a8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -280,7 +288,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 10,
|
||||
"id": "9e5ac127-b094-4584-9159-5a6d3d7315c7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -301,7 +309,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 11,
|
||||
"id": "11d19e28-be49-4801-8065-1a58d13cd192",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -309,7 +317,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Status=[running]... 349.84s. 17.72s\r"
|
||||
"Status=[running]... 346.26s. 31.70s\r"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -325,20 +333,20 @@
|
||||
" my_file.write((json.dumps({\"messages\": dialog}) + \"\\n\").encode(\"utf-8\"))\n",
|
||||
"\n",
|
||||
"my_file.seek(0)\n",
|
||||
"training_file = openai.files.create(file=my_file, purpose=\"fine-tune\")\n",
|
||||
"training_file = openai.File.create(file=my_file, purpose=\"fine-tune\")\n",
|
||||
"\n",
|
||||
"job = openai.fine_tuning.jobs.create(\n",
|
||||
"job = openai.FineTuningJob.create(\n",
|
||||
" training_file=training_file.id,\n",
|
||||
" model=\"gpt-3.5-turbo\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Wait for the fine-tuning to complete (this may take some time)\n",
|
||||
"status = openai.fine_tuning.jobs.retrieve(job.id).status\n",
|
||||
"status = openai.FineTuningJob.retrieve(job.id).status\n",
|
||||
"start_time = time.time()\n",
|
||||
"while status != \"succeeded\":\n",
|
||||
" print(f\"Status=[{status}]... {time.time() - start_time:.2f}s\", end=\"\\r\", flush=True)\n",
|
||||
" time.sleep(5)\n",
|
||||
" status = openai.fine_tuning.jobs.retrieve(job.id).status\n",
|
||||
" status = openai.FineTuningJob.retrieve(job.id).status\n",
|
||||
"\n",
|
||||
"# Now your model is fine-tuned!"
|
||||
]
|
||||
@@ -355,18 +363,16 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 12,
|
||||
"id": "7f45b281-1dfa-43cb-bd28-99fa7e9f45d1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Get the fine-tuned model ID\n",
|
||||
"job = openai.fine_tuning.jobs.retrieve(job.id)\n",
|
||||
"job = openai.FineTuningJob.retrieve(job.id)\n",
|
||||
"model_id = job.fine_tuned_model\n",
|
||||
"\n",
|
||||
"# Use the fine-tuned model in LangChain\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(\n",
|
||||
" model=model_id,\n",
|
||||
" temperature=1,\n",
|
||||
@@ -375,17 +381,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 18,
|
||||
"id": "7d3b5845-6385-42d1-9f7d-5ea798dc2cd9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='Let me calculate that for you.')"
|
||||
"AIMessage(content='{\\n \"num1\": 56,\\n \"num2\": 7,\\n \"operation\": \"/\"\\n}')"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -419,7 +425,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,884 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3cebbe-079a-4bfe-b1a1-07bdac882ce2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Amazon Textract \n",
|
||||
"\n",
|
||||
">[Amazon Textract](https://docs.aws.amazon.com/managedservices/latest/userguide/textract.html) is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents.\n",
|
||||
">\n",
|
||||
">It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Today, many companies manually extract data from scanned documents such as PDFs, images, tables, and forms, or through simple OCR software that requires manual configuration (which often must be updated when the form changes). To overcome these manual and expensive processes, `Textract` uses ML to read and process any type of document, accurately extracting text, handwriting, tables, and other data with no manual effort. \n",
|
||||
"\n",
|
||||
"This sample demonstrates the use of `Amazon Textract` in combination with LangChain as a DocumentLoader.\n",
|
||||
"\n",
|
||||
"`Textract` supports`PDF`, `TIF`F, `PNG` and `JPEG` format.\n",
|
||||
"\n",
|
||||
"`Textract` supports these [document sizes, languages and characters](https://docs.aws.amazon.com/textract/latest/dg/limits-document.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "a1aa66d4-85f2-42ad-a8d3-de7cea8d6c35",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install boto3 openai tiktoken python-dotenv"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "e4305a0d-37da-41f9-a52c-7d166d7dbabf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install \"amazon-textract-caller>=0.2.0\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "400b25c6-befa-4730-a201-39ff112c8858",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Sample 1\n",
|
||||
"\n",
|
||||
"The first example uses a local file, which internally will be send to Amazon Textract sync API [DetectDocumentText](https://docs.aws.amazon.com/textract/latest/dg/API_DetectDocumentText.html). \n",
|
||||
"\n",
|
||||
"Local files or URL endpoints like HTTP:// are limited to one page documents for Textract.\n",
|
||||
"Multi-page documents have to reside on S3. This sample file is a jpeg."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1becee92-e82f-42d4-9b4e-b23d77cbe88d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AmazonTextractPDFLoader\n",
|
||||
"\n",
|
||||
"loader = AmazonTextractPDFLoader(\"example_data/alejandro_rosalez_sample-small.jpeg\")\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d566dc56-c9a9-44ec-84fb-a81928f90d40",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Output from the file"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "1272ce8c-d298-4059-ac0a-780bf5f82302",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No ', metadata={'source': 'example_data/alejandro_rosalez_sample-small.jpeg', 'page': 1})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4cf7f19c-3635-453a-9c76-4baf98b8d7f4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Sample 2\n",
|
||||
"The next sample loads a file from an HTTPS endpoint. \n",
|
||||
"It has to be single page, as Amazon Textract requires all multi-page documents to be stored on S3."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "10374bfb-b325-451f-8bd0-c686710ab68c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AmazonTextractPDFLoader\n",
|
||||
"\n",
|
||||
"loader = AmazonTextractPDFLoader(\n",
|
||||
" \"https://amazon-textract-public-content.s3.us-east-2.amazonaws.com/langchain/alejandro_rosalez_sample_1.jpg\"\n",
|
||||
")\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "16a2b6a3-7514-4c2c-a427-6847169af473",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No ', metadata={'source': 'example_data/alejandro_rosalez_sample-small.jpeg', 'page': 1})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a9cd8ec-e663-4dc7-9db1-d2f575253141",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Sample 3\n",
|
||||
"\n",
|
||||
"Processing a multi-page document requires the document to be on S3. The sample document resides in a bucket in us-east-2 and Textract needs to be called in that same region to be successful, so we set the region_name on the client and pass that in to the loader to ensure Textract is called from us-east-2. You could also to have your notebook running in us-east-2, setting the AWS_DEFAULT_REGION set to us-east-2 or when running in a different environment, pass in a boto3 Textract client with that region name like in the cell below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "8185e3e6-9599-4a47-8969-d6dcef3e6404",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import boto3\n",
|
||||
"\n",
|
||||
"textract_client = boto3.client(\"textract\", region_name=\"us-east-2\")\n",
|
||||
"\n",
|
||||
"file_path = \"s3://amazon-textract-public-content/langchain/layout-parser-paper.pdf\"\n",
|
||||
"loader = AmazonTextractPDFLoader(file_path, client=textract_client)\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b8901eec-070d-4fd6-9d65-52211d332441",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now getting the number of pages to validate the response (printing out the full response would be quite long...). We expect 16 pages."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "b23c01c8-cf69-4fe2-8141-4621edb7d79c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"16"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b3e41b4d-b159-4274-89be-80d8159134ef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using the AmazonTextractPDFLoader in an LangChain chain (e. g. OpenAI)\n",
|
||||
"\n",
|
||||
"The AmazonTextractPDFLoader can be used in a chain the same way the other loaders are used.\n",
|
||||
"Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this sample, which is worth checking out as well."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "53c47b24-cc06-4256-9e5b-a82fc80bc55d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You can store your OPENAI_API_KEY in a .env file as well\n",
|
||||
"# import os\n",
|
||||
"# from dotenv import load_dotenv\n",
|
||||
"\n",
|
||||
"# load_dotenv()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "a9ae004c-246c-4c7f-8458-191cd7424a9b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Or set the OpenAI key in the environment directly\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"your-OpenAI-API-key\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "d52b089c-10ca-45fb-8669-8a1c5fee10d5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The authors are Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li, Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N., Peters, M., Schmitz, M., Zettlemoyer, L., Lukasz Garncarek, Powalski, R., Stanislawek, T., Topolski, B., Halama, P., Gralinski, F., Graves, A., Fernández, S., Gomez, F., Schmidhuber, J., Harley, A.W., Ufkes, A., Derpanis, K.G., He, K., Gkioxari, G., Dollár, P., Girshick, R., He, K., Zhang, X., Ren, S., Sun, J., Kay, A., Lamiroy, B., Lopresti, D., Mears, J., Jakeway, E., Ferriter, M., Adams, C., Yarasavage, N., Thomas, D., Zwaard, K., Li, M., Cui, L., Huang,'"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains.question_answering import load_qa_chain\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"chain = load_qa_chain(llm=OpenAI(), chain_type=\"map_reduce\")\n",
|
||||
"query = [\"Who are the autors?\"]\n",
|
||||
"\n",
|
||||
"chain.run(input_documents=documents, question=query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1a09d18b-ab7b-468e-ae66-f92abf666b9b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"availableInstances": [
|
||||
{
|
||||
"_defaultOrder": 0,
|
||||
"_isFastLaunch": true,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 4,
|
||||
"name": "ml.t3.medium",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 1,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 8,
|
||||
"name": "ml.t3.large",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 2,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.t3.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 3,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.t3.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 4,
|
||||
"_isFastLaunch": true,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 8,
|
||||
"name": "ml.m5.large",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 5,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.m5.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 6,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.m5.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 7,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 64,
|
||||
"name": "ml.m5.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 8,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 128,
|
||||
"name": "ml.m5.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 9,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 192,
|
||||
"name": "ml.m5.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 10,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 256,
|
||||
"name": "ml.m5.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 11,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 384,
|
||||
"name": "ml.m5.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 12,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 8,
|
||||
"name": "ml.m5d.large",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 13,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.m5d.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 14,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.m5d.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 15,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 64,
|
||||
"name": "ml.m5d.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 16,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 128,
|
||||
"name": "ml.m5d.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 17,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 192,
|
||||
"name": "ml.m5d.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 18,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 256,
|
||||
"name": "ml.m5d.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 19,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 384,
|
||||
"name": "ml.m5d.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 20,
|
||||
"_isFastLaunch": false,
|
||||
"category": "General purpose",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": true,
|
||||
"memoryGiB": 0,
|
||||
"name": "ml.geospatial.interactive",
|
||||
"supportedImageNames": [
|
||||
"sagemaker-geospatial-v1-0"
|
||||
],
|
||||
"vcpuNum": 0
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 21,
|
||||
"_isFastLaunch": true,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 4,
|
||||
"name": "ml.c5.large",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 22,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 8,
|
||||
"name": "ml.c5.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 23,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.c5.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 24,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.c5.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 25,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 72,
|
||||
"name": "ml.c5.9xlarge",
|
||||
"vcpuNum": 36
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 26,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 96,
|
||||
"name": "ml.c5.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 27,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 144,
|
||||
"name": "ml.c5.18xlarge",
|
||||
"vcpuNum": 72
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 28,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Compute optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 192,
|
||||
"name": "ml.c5.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 29,
|
||||
"_isFastLaunch": true,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.g4dn.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 30,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.g4dn.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 31,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 64,
|
||||
"name": "ml.g4dn.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 32,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 128,
|
||||
"name": "ml.g4dn.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 33,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 4,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 192,
|
||||
"name": "ml.g4dn.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 34,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 256,
|
||||
"name": "ml.g4dn.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 35,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 61,
|
||||
"name": "ml.p3.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 36,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 4,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 244,
|
||||
"name": "ml.p3.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 37,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 8,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 488,
|
||||
"name": "ml.p3.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 38,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 8,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 768,
|
||||
"name": "ml.p3dn.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 39,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.r5.large",
|
||||
"vcpuNum": 2
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 40,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.r5.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 41,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 64,
|
||||
"name": "ml.r5.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 42,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 128,
|
||||
"name": "ml.r5.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 43,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 256,
|
||||
"name": "ml.r5.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 44,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 384,
|
||||
"name": "ml.r5.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 45,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 512,
|
||||
"name": "ml.r5.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 46,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Memory Optimized",
|
||||
"gpuNum": 0,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 768,
|
||||
"name": "ml.r5.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 47,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 16,
|
||||
"name": "ml.g5.xlarge",
|
||||
"vcpuNum": 4
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 48,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 32,
|
||||
"name": "ml.g5.2xlarge",
|
||||
"vcpuNum": 8
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 49,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 64,
|
||||
"name": "ml.g5.4xlarge",
|
||||
"vcpuNum": 16
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 50,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 128,
|
||||
"name": "ml.g5.8xlarge",
|
||||
"vcpuNum": 32
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 51,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 1,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 256,
|
||||
"name": "ml.g5.16xlarge",
|
||||
"vcpuNum": 64
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 52,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 4,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 192,
|
||||
"name": "ml.g5.12xlarge",
|
||||
"vcpuNum": 48
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 53,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 4,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 384,
|
||||
"name": "ml.g5.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 54,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 8,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 768,
|
||||
"name": "ml.g5.48xlarge",
|
||||
"vcpuNum": 192
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 55,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 8,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 1152,
|
||||
"name": "ml.p4d.24xlarge",
|
||||
"vcpuNum": 96
|
||||
},
|
||||
{
|
||||
"_defaultOrder": 56,
|
||||
"_isFastLaunch": false,
|
||||
"category": "Accelerated computing",
|
||||
"gpuNum": 8,
|
||||
"hideHardwareSpecs": false,
|
||||
"memoryGiB": 1152,
|
||||
"name": "ml.p4de.24xlarge",
|
||||
"vcpuNum": 96
|
||||
}
|
||||
],
|
||||
"instance_type": "ml.t3.medium",
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,174 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a634365e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Azure AI Data\n",
|
||||
"\n",
|
||||
">[Azure AI Studio](https://ai.azure.com/) provides the capability to upload data assets to cloud storage and register existing data assets from the following sources:\n",
|
||||
"\n",
|
||||
"- Microsoft OneLake\n",
|
||||
"- Azure Blob Storage\n",
|
||||
"- Azure Data Lake gen 2\n",
|
||||
"\n",
|
||||
"The benefit of this approach over `AzureBlobStorageContainerLoader` and `AzureBlobStorageFileLoader` is that authentication is handled seamlessly to cloud storage. You can use either *identity-based* data access control to the data or *credential-based* (e.g. SAS token, account key). In the case of credential-based data access you do not need to specify secrets in your code or set up key vaults - the system handles that for you.\n",
|
||||
"\n",
|
||||
"This notebook covers how to load document objects from a data asset in AI Studio."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "49815096",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install azureml-fsspec, azure-ai-generative"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "2f0cd6a5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azure.ai.resources.client import AIClient\n",
|
||||
"from azure.identity import DefaultAzureCredential\n",
|
||||
"from langchain.document_loaders import AzureAIDataLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "08d40b11-e87a-426e-a6b0-89f24e47ce2c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create a connection to your project\n",
|
||||
"client = AIClient(\n",
|
||||
" credential=DefaultAzureCredential(),\n",
|
||||
" subscription_id=\"<subscription_id>\",\n",
|
||||
" resource_group_name=\"<resource_group_name>\",\n",
|
||||
" project_name=\"<project_name>\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "321cc7f1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get the latest version of your data asset\n",
|
||||
"data_asset = client.data.get(name=\"<data_asset_name>\", label=\"latest\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "25d91cea-c5f2-4a53-ac19-442810451ec6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load the data asset\n",
|
||||
"loader = AzureAIDataLoader(url=data_asset.path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "2b11d155",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpaa9xl6ch/fake.docx'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0690c40a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Specifying a glob pattern\n",
|
||||
"You can also specify a glob pattern for more finegrained control over what files to load. In the example below, only files with a `pdf` extension will be loaded."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "72d44781",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = AzureAIDataLoader(url=data_asset.path, glob=\"*.pdf\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "2d3c32db",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpujbkzf_l/fake.docx'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "885dc280",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
@@ -21,8 +21,8 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You need the dgml-utils package to use the DocugamiLoader (run pip install directly without \"poetry run\" if you are not using poetry)\n",
|
||||
"!poetry run pip install dgml-utils==0.3.0 --upgrade --quiet"
|
||||
"# You need the lxml package to use the DocugamiLoader (run pip install directly without \"poetry run\" if you are not using poetry)\n",
|
||||
"!poetry run pip install lxml --quiet"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -43,8 +43,8 @@
|
||||
"Appropriate chunking of your documents is critical for retrieval from documents. Many chunking techniques exist, including simple ones that rely on whitespace and recursive chunk splitting based on character length. Docugami offers a different approach:\n",
|
||||
"\n",
|
||||
"1. **Intelligent Chunking:** Docugami breaks down every document into a hierarchical semantic XML tree of chunks of varying sizes, from single words or numerical values to entire sections. These chunks follow the semantic contours of the document, providing a more meaningful representation than arbitrary length or simple whitespace-based chunking.\n",
|
||||
"2. **Semantic Annotations:** Chunks are annotated with semantic tags that are coherent across the document set, facilitating consistent hierarchical queries across multiple documents, even if they are written and formatted differently. For example, in set of lease agreements, you can easily identify key provisions like the Landlord, Tenant, or Renewal Date, as well as more complex information such as the wording of any sub-lease provision or whether a specific jurisdiction has an exception section within a Termination Clause.\n",
|
||||
"3. **Structured Representation:** In addition, the XML tree indicates the structural contours of every document, using attributes denoting headings, paragraphs, lists, tables, and other common elements, and does that consistently across all supported document formats, such as scanned PDFs or DOCX files. It appropriately handles long-form document characteristics like page headers/footers or multi-column flows for clean text extraction.\n",
|
||||
"2. **Structured Representation:** In addition, the XML tree indicates the structural contours of every document, using attributes denoting headings, paragraphs, lists, tables, and other common elements, and does that consistently across all supported document formats, such as scanned PDFs or DOCX files. It appropriately handles long-form document characteristics like page headers/footers or multi-column flows for clean text extraction.\n",
|
||||
"3. **Semantic Annotations:** Chunks are annotated with semantic tags that are coherent across the document set, facilitating consistent hierarchical queries across multiple documents, even if they are written and formatted differently. For example, in set of lease agreements, you can easily identify key provisions like the Landlord, Tenant, or Renewal Date, as well as more complex information such as the wording of any sub-lease provision or whether a specific jurisdiction has an exception section within a Termination Clause.\n",
|
||||
"4. **Additional Metadata:** Chunks are also annotated with additional metadata, if a user has been using Docugami. This additional metadata can be used for high-accuracy Document QA without context window restrictions. See detailed code walk-through below.\n"
|
||||
]
|
||||
},
|
||||
@@ -65,42 +65,52 @@
|
||||
"source": [
|
||||
"## Load Documents\n",
|
||||
"\n",
|
||||
"If the DOCUGAMI_API_KEY environment variable is set, there is no need to pass it in to the loader explicitly otherwise you can pass it in as the `access_token` parameter."
|
||||
"If the DOCUGAMI_API_KEY environment variable is set, there is no need to pass it in to the loader explicitly otherwise you can pass it in as the `access_token` parameter.\n",
|
||||
"\n",
|
||||
"The DocugamiLoader has a default minimum chunk size of 32. Chunks smaller than that are appended to subsequent chunks. Set min_chunk_size to 0 to get all structural chunks regardless of size."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"DOCUGAMI_API_KEY = os.environ.get(\"DOCUGAMI_API_KEY\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"120"
|
||||
"[Document(page_content='MUTUAL NON-DISCLOSURE AGREEMENT This Mutual Non-Disclosure Agreement (this “ Agreement ”) is entered into and made effective as of April 4 , 2018 between Docugami Inc. , a Delaware corporation , whose address is 150 Lake Street South , Suite 221 , Kirkland , Washington 98033 , and Caleb Divine , an individual, whose address is 1201 Rt 300 , Newburgh NY 12550 .', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:ThisMutualNon-disclosureAgreement', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'ThisMutualNon-disclosureAgreement'}),\n",
|
||||
" Document(page_content='The above named parties desire to engage in discussions regarding a potential agreement or other transaction between the parties (the “Purpose”). In connection with such discussions, it may be necessary for the parties to disclose to each other certain confidential information or materials to enable them to evaluate whether to enter into such agreement or transaction.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Discussions', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'Discussions'}),\n",
|
||||
" Document(page_content='In consideration of the foregoing, the parties agree as follows:', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Consideration', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'Consideration'}),\n",
|
||||
" Document(page_content='1. Confidential Information . For purposes of this Agreement , “ Confidential Information ” means any information or materials disclosed by one party to the other party that: (i) if disclosed in writing or in the form of tangible materials, is marked “confidential” or “proprietary” at the time of such disclosure; (ii) if disclosed orally or by visual presentation, is identified as “confidential” or “proprietary” at the time of such disclosure, and is summarized in a writing sent by the disclosing party to the receiving party within thirty ( 30 ) days after any such disclosure; or (iii) due to its nature or the circumstances of its disclosure, a person exercising reasonable business judgment would understand to be confidential or proprietary.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Purposes/docset:ConfidentialInformation-section/docset:ConfidentialInformation[2]', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'ConfidentialInformation'}),\n",
|
||||
" Document(page_content=\"2. Obligations and Restrictions . Each party agrees: (i) to maintain the other party's Confidential Information in strict confidence; (ii) not to disclose such Confidential Information to any third party; and (iii) not to use such Confidential Information for any purpose except for the Purpose. Each party may disclose the other party’s Confidential Information to its employees and consultants who have a bona fide need to know such Confidential Information for the Purpose, but solely to the extent necessary to pursue the Purpose and for no other purpose; provided, that each such employee and consultant first executes a written agreement (or is otherwise already bound by a written agreement) that contains use and nondisclosure restrictions at least as protective of the other party’s Confidential Information as those set forth in this Agreement .\", metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Obligations/docset:ObligationsAndRestrictions-section/docset:ObligationsAndRestrictions', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'ObligationsAndRestrictions'}),\n",
|
||||
" Document(page_content='3. Exceptions. The obligations and restrictions in Section 2 will not apply to any information or materials that:', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Exceptions/docset:Exceptions-section/docset:Exceptions[2]', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Exceptions'}),\n",
|
||||
" Document(page_content='(i) were, at the date of disclosure, or have subsequently become, generally known or available to the public through no act or failure to act by the receiving party;', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheDate/docset:TheDate/docset:TheDate', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheDate'}),\n",
|
||||
" Document(page_content='(ii) were rightfully known by the receiving party prior to receiving such information or materials from the disclosing party;', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheDate/docset:SuchInformation/docset:TheReceivingParty', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheReceivingParty'}),\n",
|
||||
" Document(page_content='(iii) are rightfully acquired by the receiving party from a third party who has the right to disclose such information or materials without breach of any confidentiality obligation to the disclosing party;', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheDate/docset:TheReceivingParty/docset:TheReceivingParty', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheReceivingParty'}),\n",
|
||||
" Document(page_content='4. Compelled Disclosure . Nothing in this Agreement will be deemed to restrict a party from disclosing the other party’s Confidential Information to the extent required by any order, subpoena, law, statute or regulation; provided, that the party required to make such a disclosure uses reasonable efforts to give the other party reasonable advance notice of such required disclosure in order to enable the other party to prevent or limit such disclosure.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Disclosure/docset:CompelledDisclosure-section/docset:CompelledDisclosure', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'CompelledDisclosure'}),\n",
|
||||
" Document(page_content='5. Return of Confidential Information . Upon the completion or abandonment of the Purpose, and in any event upon the disclosing party’s request, the receiving party will promptly return to the disclosing party all tangible items and embodiments containing or consisting of the disclosing party’s Confidential Information and all copies thereof (including electronic copies), and any notes, analyses, compilations, studies, interpretations, memoranda or other documents (regardless of the form thereof) prepared by or on behalf of the receiving party that contain or are based upon the disclosing party’s Confidential Information .', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheCompletion/docset:ReturnofConfidentialInformation-section/docset:ReturnofConfidentialInformation', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'ReturnofConfidentialInformation'}),\n",
|
||||
" Document(page_content='6. No Obligations . Each party retains the right to determine whether to disclose any Confidential Information to the other party.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:NoObligations/docset:NoObligations-section/docset:NoObligations[2]', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'NoObligations'}),\n",
|
||||
" Document(page_content='7. No Warranty. ALL CONFIDENTIAL INFORMATION IS PROVIDED BY THE DISCLOSING PARTY “AS IS ”.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:NoWarranty/docset:NoWarranty-section/docset:NoWarranty[2]', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'NoWarranty'}),\n",
|
||||
" Document(page_content='8. Term. This Agreement will remain in effect for a period of seven ( 7 ) years from the date of last disclosure of Confidential Information by either party, at which time it will terminate.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:ThisAgreement/docset:Term-section/docset:Term', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Term'}),\n",
|
||||
" Document(page_content='9. Equitable Relief . Each party acknowledges that the unauthorized use or disclosure of the disclosing party’s Confidential Information may cause the disclosing party to incur irreparable harm and significant damages, the degree of which may be difficult to ascertain. Accordingly, each party agrees that the disclosing party will have the right to seek immediate equitable relief to enjoin any unauthorized use or disclosure of its Confidential Information , in addition to any other rights and remedies that it may have at law or otherwise.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:EquitableRelief/docset:EquitableRelief-section/docset:EquitableRelief[2]', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'EquitableRelief'}),\n",
|
||||
" Document(page_content='10. Non-compete. To the maximum extent permitted by applicable law, during the Term of this Agreement and for a period of one ( 1 ) year thereafter, Caleb Divine may not market software products or do business that directly or indirectly competes with Docugami software products .', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:TheMaximumExtent/docset:Non-compete-section/docset:Non-compete', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Non-compete'}),\n",
|
||||
" Document(page_content='11. Miscellaneous. This Agreement will be governed and construed in accordance with the laws of the State of Washington , excluding its body of law controlling conflict of laws. This Agreement is the complete and exclusive understanding and agreement between the parties regarding the subject matter of this Agreement and supersedes all prior agreements, understandings and communications, oral or written, between the parties regarding the subject matter of this Agreement . If any provision of this Agreement is held invalid or unenforceable by a court of competent jurisdiction, that provision of this Agreement will be enforced to the maximum extent permissible and the other provisions of this Agreement will remain in full force and effect. Neither party may assign this Agreement , in whole or in part, by operation of law or otherwise, without the other party’s prior written consent, and any attempted assignment without such consent will be void. This Agreement may be executed in counterparts, each of which will be deemed an original, but all of which together will constitute one and the same instrument.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:MutualNon-disclosure/docset:MUTUALNON-DISCLOSUREAGREEMENT-section/docset:MUTUALNON-DISCLOSUREAGREEMENT/docset:Consideration/docset:Purposes/docset:Accordance/docset:Miscellaneous-section/docset:Miscellaneous', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'div', 'tag': 'Miscellaneous'}),\n",
|
||||
" Document(page_content='[SIGNATURE PAGE FOLLOWS] IN WITNESS WHEREOF, the parties hereto have executed this Mutual Non-Disclosure Agreement by their duly authorized officers or representatives as of the date first set forth above.', metadata={'xpath': '/docset:MutualNon-disclosure/docset:Witness/docset:TheParties/docset:TheParties', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': 'p', 'tag': 'TheParties'}),\n",
|
||||
" Document(page_content='DOCUGAMI INC . : \\n\\n Caleb Divine : \\n\\n Signature: Signature: Name: \\n\\n Jean Paoli Name: Title: \\n\\n CEO Title:', metadata={'xpath': '/docset:MutualNon-disclosure/docset:Witness/docset:TheParties/docset:DocugamiInc/docset:DocugamiInc/xhtml:table', 'id': '43rj0ds7s0ur', 'source': 'NDA simple layout.docx', 'structure': '', 'tag': 'table'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docset_id = \"26xpy3aes7xp\"\n",
|
||||
"document_ids = [\"d7jqdzcj50sj\", \"cgd1eacfkchw\"]\n",
|
||||
"DOCUGAMI_API_KEY = os.environ.get(\"DOCUGAMI_API_KEY\")\n",
|
||||
"\n",
|
||||
"# To load all docs in the given docset ID, just don't provide document_ids\n",
|
||||
"loader = DocugamiLoader(docset_id=docset_id, document_ids=document_ids)\n",
|
||||
"chunks = loader.load()\n",
|
||||
"len(chunks)"
|
||||
"loader = DocugamiLoader(docset_id=\"ecxqpipcoe2p\", document_ids=[\"43rj0ds7s0ur\"])\n",
|
||||
"docs = loader.load()\n",
|
||||
"docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -112,39 +122,7 @@
|
||||
"1. **id and source:** ID and Name of the file (PDF, DOC or DOCX) the chunk is sourced from within Docugami.\n",
|
||||
"2. **xpath:** XPath inside the XML representation of the document, for the chunk. Useful for source citations directly to the actual chunk inside the document XML.\n",
|
||||
"3. **structure:** Structural attributes of the chunk, e.g. h1, h2, div, table, td, etc. Useful to filter out certain kinds of chunks if needed by the caller.\n",
|
||||
"4. **tag:** Semantic tag for the chunk, using various generative and extractive techniques. More details here: https://github.com/docugami/DFM-benchmarks\n",
|
||||
"\n",
|
||||
"You can control chunking behavior by setting the following properties on the `DocugamiLoader` instance:\n",
|
||||
"\n",
|
||||
"1. You can set min and max chunk size, which the system tries to adhere to with minimal truncation. You can set `loader.min_text_length` and `loader.max_text_length` to control these.\n",
|
||||
"2. By default, only the text for chunks is returned. However, Docugami's XML knowledge graph has additional rich information including semantic tags for entities inside the chunk. Set `loader.include_xml_tags = True` if you want the additional xml metadata on the returned chunks.\n",
|
||||
"3. In addition, you can set `loader.parent_hierarchy_levels` if you want Docugami to return parent chunks in the chunks it returns. The child chunks point to the parent chunks via the `loader.parent_id_key` value. This is useful e.g. with the [MultiVector Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector) for [small-to-big](https://www.youtube.com/watch?v=ihSiRrOUwmg) retrieval. See detailed example later in this notebook."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='MASTER SERVICES AGREEMENT\\n <ThisServicesAgreement> This Services Agreement (the “Agreement”) sets forth terms under which <Company>MagicSoft, Inc. </Company>a <Org><USState>Washington </USState>Corporation </Org>(“Company”) located at <CompanyAddress><CompanyStreetAddress><Company>600 </Company><Company>4th Ave</Company></CompanyStreetAddress>, <Company>Seattle</Company>, <Client>WA </Client><ProvideServices>98104 </ProvideServices></CompanyAddress>shall provide services to <Client>Daltech, Inc.</Client>, a <Company><USState>Washington </USState>Corporation </Company>(the “Client”) located at <ClientAddress><ClientStreetAddress><Client>701 </Client><Client>1st St</Client></ClientStreetAddress>, <Client>Kirkland</Client>, <State>WA </State><Client>98033</Client></ClientAddress>. This Agreement is effective as of <EffectiveDate>February 15, 2021 </EffectiveDate>(“Effective Date”). </ThisServicesAgreement>' metadata={'xpath': '/dg:chunk/docset:MASTERSERVICESAGREEMENT-section/dg:chunk', 'id': 'c28554d0af5114e2b102e6fc4dcbbde5', 'name': 'Master Services Agreement - Daltech.docx', 'source': 'Master Services Agreement - Daltech.docx', 'structure': 'h1 p', 'tag': 'chunk ThisServicesAgreement', 'Liability': '', 'Workers Compensation Insurance': '$1,000,000', 'Limit': '$1,000,000', 'Commercial General Liability Insurance': '$2,000,000', 'Technology Professional Liability Errors Omissions Policy': '$5,000,000', 'Excess Liability Umbrella Coverage': '$9,000,000', 'Client': 'Daltech, Inc.', 'Services Agreement Date': 'INITIAL STATEMENT OF WORK (SOW) The purpose of this SOW is to describe the Software and Services that Company will initially provide to Daltech, Inc. the “Client”) under the terms and conditions of the Services Agreement entered into between the parties on June 15, 2021', 'Completion of the Services by Company Date': 'February 15, 2022', 'Charge': 'one hundred percent (100%)', 'Company': 'MagicSoft, Inc.', 'Effective Date': 'February 15, 2021', 'Start Date': '03/15/2021', 'Scheduled Onsite Visits Are Cancelled': 'ten (10) working days', 'Limit on Liability': '', 'Liability Cap': '', 'Business Automobile Liability': 'Business Automobile Liability covering all vehicles that Company owns, hires or leases with a limit of no less than $1,000,000 (combined single limit for bodily injury and property damage) for each accident.', 'Contractual Liability Coverage': 'Commercial General Liability insurance including Contractual Liability Coverage , with coverage for products liability, completed operations, property damage and bodily injury, including death , with an aggregate limit of no less than $2,000,000 . This policy shall name Client as an additional insured with respect to the provision of services provided under this Agreement. This policy shall include a waiver of subrogation against Client.', 'Technology Professional Liability Errors Omissions': 'Technology Professional Liability Errors & Omissions policy (which includes Cyber Risk coverage and Computer Security and Privacy Liability coverage) with a limit of no less than $5,000,000 per occurrence and in the aggregate.'}\n",
|
||||
"page_content='A. STANDARD SOFTWARE AND SERVICES AGREEMENT\\n 1. Deliverables.\\n Company shall provide Client with software, technical support, product management, development, and <_testRef>testing </_testRef>services (“Services”) to the Client as described on one or more Statements of Work signed by Company and Client that reference this Agreement (“SOW” or “Statement of Work”). Company shall perform Services in a prompt manner and have the final product or service (“Deliverable”) ready for Client no later than the due date specified in the applicable SOW (“Completion Date”). This due date is subject to change in accordance with the Change Order process defined in the applicable SOW. Client shall assist Company by promptly providing all information requests known or available and relevant to the Services in a timely manner.' metadata={'xpath': '/dg:chunk/docset:MASTERSERVICESAGREEMENT-section/docset:MASTERSERVICESAGREEMENT/dg:chunk[1]/docset:Standard/dg:chunk[1]/dg:chunk[1]', 'id': 'de60160d328df10fa2637637c803d2d4', 'name': 'Master Services Agreement - Daltech.docx', 'source': 'Master Services Agreement - Daltech.docx', 'structure': 'lim h1 lim h1 div', 'tag': 'chunk', 'Liability': '', 'Workers Compensation Insurance': '$1,000,000', 'Limit': '$1,000,000', 'Commercial General Liability Insurance': '$2,000,000', 'Technology Professional Liability Errors Omissions Policy': '$5,000,000', 'Excess Liability Umbrella Coverage': '$9,000,000', 'Client': 'Daltech, Inc.', 'Services Agreement Date': 'INITIAL STATEMENT OF WORK (SOW) The purpose of this SOW is to describe the Software and Services that Company will initially provide to Daltech, Inc. the “Client”) under the terms and conditions of the Services Agreement entered into between the parties on June 15, 2021', 'Completion of the Services by Company Date': 'February 15, 2022', 'Charge': 'one hundred percent (100%)', 'Company': 'MagicSoft, Inc.', 'Effective Date': 'February 15, 2021', 'Start Date': '03/15/2021', 'Scheduled Onsite Visits Are Cancelled': 'ten (10) working days', 'Limit on Liability': '', 'Liability Cap': '', 'Business Automobile Liability': 'Business Automobile Liability covering all vehicles that Company owns, hires or leases with a limit of no less than $1,000,000 (combined single limit for bodily injury and property damage) for each accident.', 'Contractual Liability Coverage': 'Commercial General Liability insurance including Contractual Liability Coverage , with coverage for products liability, completed operations, property damage and bodily injury, including death , with an aggregate limit of no less than $2,000,000 . This policy shall name Client as an additional insured with respect to the provision of services provided under this Agreement. This policy shall include a waiver of subrogation against Client.', 'Technology Professional Liability Errors Omissions': 'Technology Professional Liability Errors & Omissions policy (which includes Cyber Risk coverage and Computer Security and Privacy Liability coverage) with a limit of no less than $5,000,000 per occurrence and in the aggregate.'}\n",
|
||||
"page_content='2. Onsite Services.\\n 2.1 Onsite visits will be charged on a <Frequency>daily </Frequency>basis (minimum <OnsiteVisits>8 hours</OnsiteVisits>).' metadata={'xpath': '/dg:chunk/docset:MASTERSERVICESAGREEMENT-section/docset:MASTERSERVICESAGREEMENT/dg:chunk[1]/docset:Standard/dg:chunk[3]/dg:chunk[1]', 'id': 'db18315b437ac2de6b555d2d8ef8f893', 'name': 'Master Services Agreement - Daltech.docx', 'source': 'Master Services Agreement - Daltech.docx', 'structure': 'lim h1 lim p', 'tag': 'chunk', 'Liability': '', 'Workers Compensation Insurance': '$1,000,000', 'Limit': '$1,000,000', 'Commercial General Liability Insurance': '$2,000,000', 'Technology Professional Liability Errors Omissions Policy': '$5,000,000', 'Excess Liability Umbrella Coverage': '$9,000,000', 'Client': 'Daltech, Inc.', 'Services Agreement Date': 'INITIAL STATEMENT OF WORK (SOW) The purpose of this SOW is to describe the Software and Services that Company will initially provide to Daltech, Inc. the “Client”) under the terms and conditions of the Services Agreement entered into between the parties on June 15, 2021', 'Completion of the Services by Company Date': 'February 15, 2022', 'Charge': 'one hundred percent (100%)', 'Company': 'MagicSoft, Inc.', 'Effective Date': 'February 15, 2021', 'Start Date': '03/15/2021', 'Scheduled Onsite Visits Are Cancelled': 'ten (10) working days', 'Limit on Liability': '', 'Liability Cap': '', 'Business Automobile Liability': 'Business Automobile Liability covering all vehicles that Company owns, hires or leases with a limit of no less than $1,000,000 (combined single limit for bodily injury and property damage) for each accident.', 'Contractual Liability Coverage': 'Commercial General Liability insurance including Contractual Liability Coverage , with coverage for products liability, completed operations, property damage and bodily injury, including death , with an aggregate limit of no less than $2,000,000 . This policy shall name Client as an additional insured with respect to the provision of services provided under this Agreement. This policy shall include a waiver of subrogation against Client.', 'Technology Professional Liability Errors Omissions': 'Technology Professional Liability Errors & Omissions policy (which includes Cyber Risk coverage and Computer Security and Privacy Liability coverage) with a limit of no less than $5,000,000 per occurrence and in the aggregate.'}\n",
|
||||
"page_content='2.2 <Expenses>Time and expenses will be charged based on actuals unless otherwise described in an Order Form or accompanying SOW. </Expenses>' metadata={'xpath': '/dg:chunk/docset:MASTERSERVICESAGREEMENT-section/docset:MASTERSERVICESAGREEMENT/dg:chunk[1]/docset:Standard/dg:chunk[3]/dg:chunk[2]/docset:ADailyBasis/dg:chunk[2]/dg:chunk', 'id': '506220fa472d5c48c8ee3db78c1122c1', 'name': 'Master Services Agreement - Daltech.docx', 'source': 'Master Services Agreement - Daltech.docx', 'structure': 'lim p', 'tag': 'chunk Expenses', 'Liability': '', 'Workers Compensation Insurance': '$1,000,000', 'Limit': '$1,000,000', 'Commercial General Liability Insurance': '$2,000,000', 'Technology Professional Liability Errors Omissions Policy': '$5,000,000', 'Excess Liability Umbrella Coverage': '$9,000,000', 'Client': 'Daltech, Inc.', 'Services Agreement Date': 'INITIAL STATEMENT OF WORK (SOW) The purpose of this SOW is to describe the Software and Services that Company will initially provide to Daltech, Inc. the “Client”) under the terms and conditions of the Services Agreement entered into between the parties on June 15, 2021', 'Completion of the Services by Company Date': 'February 15, 2022', 'Charge': 'one hundred percent (100%)', 'Company': 'MagicSoft, Inc.', 'Effective Date': 'February 15, 2021', 'Start Date': '03/15/2021', 'Scheduled Onsite Visits Are Cancelled': 'ten (10) working days', 'Limit on Liability': '', 'Liability Cap': '', 'Business Automobile Liability': 'Business Automobile Liability covering all vehicles that Company owns, hires or leases with a limit of no less than $1,000,000 (combined single limit for bodily injury and property damage) for each accident.', 'Contractual Liability Coverage': 'Commercial General Liability insurance including Contractual Liability Coverage , with coverage for products liability, completed operations, property damage and bodily injury, including death , with an aggregate limit of no less than $2,000,000 . This policy shall name Client as an additional insured with respect to the provision of services provided under this Agreement. This policy shall include a waiver of subrogation against Client.', 'Technology Professional Liability Errors Omissions': 'Technology Professional Liability Errors & Omissions policy (which includes Cyber Risk coverage and Computer Security and Privacy Liability coverage) with a limit of no less than $5,000,000 per occurrence and in the aggregate.'}\n",
|
||||
"page_content='2.3 <RegularWorkingHours>All work will be executed during regular working hours <RegularWorkingHours>Monday</RegularWorkingHours>-<Weekday>Friday </Weekday><RegularWorkingHours><RegularWorkingHours>0800</RegularWorkingHours>-<Number>1900</Number></RegularWorkingHours>. For work outside of these hours on weekdays, Company will charge <Charge>one hundred percent (100%) </Charge>of the regular hourly rate and <Charge>two hundred percent (200%) </Charge>for Saturdays, Sundays and public holidays applicable to Company. </RegularWorkingHours>' metadata={'xpath': '/dg:chunk/docset:MASTERSERVICESAGREEMENT-section/docset:MASTERSERVICESAGREEMENT/dg:chunk[1]/docset:Standard/dg:chunk[3]/dg:chunk[2]/docset:ADailyBasis/dg:chunk[3]/dg:chunk', 'id': 'dac7a3ded61b5c4f3e59771243ea46c1', 'name': 'Master Services Agreement - Daltech.docx', 'source': 'Master Services Agreement - Daltech.docx', 'structure': 'lim p', 'tag': 'chunk RegularWorkingHours', 'Liability': '', 'Workers Compensation Insurance': '$1,000,000', 'Limit': '$1,000,000', 'Commercial General Liability Insurance': '$2,000,000', 'Technology Professional Liability Errors Omissions Policy': '$5,000,000', 'Excess Liability Umbrella Coverage': '$9,000,000', 'Client': 'Daltech, Inc.', 'Services Agreement Date': 'INITIAL STATEMENT OF WORK (SOW) The purpose of this SOW is to describe the Software and Services that Company will initially provide to Daltech, Inc. the “Client”) under the terms and conditions of the Services Agreement entered into between the parties on June 15, 2021', 'Completion of the Services by Company Date': 'February 15, 2022', 'Charge': 'one hundred percent (100%)', 'Company': 'MagicSoft, Inc.', 'Effective Date': 'February 15, 2021', 'Start Date': '03/15/2021', 'Scheduled Onsite Visits Are Cancelled': 'ten (10) working days', 'Limit on Liability': '', 'Liability Cap': '', 'Business Automobile Liability': 'Business Automobile Liability covering all vehicles that Company owns, hires or leases with a limit of no less than $1,000,000 (combined single limit for bodily injury and property damage) for each accident.', 'Contractual Liability Coverage': 'Commercial General Liability insurance including Contractual Liability Coverage , with coverage for products liability, completed operations, property damage and bodily injury, including death , with an aggregate limit of no less than $2,000,000 . This policy shall name Client as an additional insured with respect to the provision of services provided under this Agreement. This policy shall include a waiver of subrogation against Client.', 'Technology Professional Liability Errors Omissions': 'Technology Professional Liability Errors & Omissions policy (which includes Cyber Risk coverage and Computer Security and Privacy Liability coverage) with a limit of no less than $5,000,000 per occurrence and in the aggregate.'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.min_text_length = 64\n",
|
||||
"loader.include_xml_tags = True\n",
|
||||
"chunks = loader.load()\n",
|
||||
"\n",
|
||||
"for chunk in chunks[:5]:\n",
|
||||
" print(chunk)"
|
||||
"4. **tag:** Semantic tag for the chunk, using various generative and extractive techniques. More details here: https://github.com/docugami/DFM-benchmarks"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -158,41 +136,27 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!poetry run pip install --upgrade openai tiktoken chromadb hnswlib --quiet"
|
||||
"!poetry run pip -q install openai tiktoken chromadb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"4674\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"\n",
|
||||
"# For this example, we already have a processed docset for a set of lease documents\n",
|
||||
"loader = DocugamiLoader(docset_id=\"zo954yqy53wp\")\n",
|
||||
"chunks = loader.load()\n",
|
||||
"\n",
|
||||
"# strip semantic metadata intentionally, to test how things work without semantic metadata\n",
|
||||
"for chunk in chunks:\n",
|
||||
" stripped_metadata = chunk.metadata.copy()\n",
|
||||
" for key in chunk.metadata:\n",
|
||||
" if key not in [\"name\", \"xpath\", \"id\", \"structure\"]:\n",
|
||||
" # remove semantic metadata\n",
|
||||
" del stripped_metadata[key]\n",
|
||||
" chunk.metadata = stripped_metadata\n",
|
||||
"\n",
|
||||
"print(len(chunks))"
|
||||
"loader = DocugamiLoader(docset_id=\"wh2kned25uqm\")\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -206,17 +170,12 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.llms.openai import OpenAI\n",
|
||||
"from langchain.vectorstores.chroma import Chroma\n",
|
||||
"\n",
|
||||
"embedding = OpenAIEmbeddings()\n",
|
||||
"vectordb = Chroma.from_documents(documents=chunks, embedding=embedding)\n",
|
||||
"vectordb = Chroma.from_documents(documents=documents, embedding=embedding)\n",
|
||||
"retriever = vectordb.as_retriever()\n",
|
||||
"qa_chain = RetrievalQA.from_chain_type(\n",
|
||||
" llm=OpenAI(), chain_type=\"stuff\", retriever=retriever, return_source_documents=True\n",
|
||||
@@ -225,21 +184,21 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'query': 'What can tenants do with signage on their properties?',\n",
|
||||
" 'result': ' Tenants can place or attach signage (digital or otherwise) to their property after receiving written permission from the landlord, which permission shall not be unreasonably withheld. The signage must conform to all applicable laws, ordinances, etc. governing the same, and tenants must remove all such signs by the termination of the lease.',\n",
|
||||
" 'source_documents': [Document(page_content='6.01 Signage. Tenant may place or attach to the Premises signs (digital or otherwise) or other such identification as needed after receiving written permission from the Landlord, which permission shall not be unreasonably withheld. Any damage caused to the Premises by the Tenant’s erecting or removing such signs shall be repaired promptly by the Tenant at the Tenant’s expense. Any signs or other form of identification allowed must conform to all applicable laws, ordinances, etc. governing the same. Tenant also agrees to have any window or glass identification completely removed and cleaned at its expense promptly upon vacating the Premises. ARTICLE VII UTILITIES', metadata={'id': '1c290eea05915ba0f24c4a1ffc05d6f3', 'name': 'Sample Commercial Leases/TruTone Lane 6.pdf', 'structure': 'lim h1', 'xpath': '/dg:chunk/dg:chunk/dg:chunk[2]/dg:chunk[1]/docset:TheApprovedUse/dg:chunk[12]/dg:chunk[1]'}),\n",
|
||||
" Document(page_content='6.01 Signage. Tenant may place or attach to the Premises signs (digital or otherwise) or other such identification as needed after receiving written permission from the Landlord, which permission shall not be unreasonably withheld. Any damage caused to the Premises by the Tenant’s erecting or removing such signs shall be repaired promptly by the Tenant at the Tenant’s expense. Any signs or other form of identification allowed must conform to all applicable laws, ordinances, etc. governing the same. Tenant also agrees to have any window or glass identification completely removed and cleaned at its expense promptly upon vacating the Premises. ARTICLE VII UTILITIES', metadata={'id': '1c290eea05915ba0f24c4a1ffc05d6f3', 'name': 'Sample Commercial Leases/TruTone Lane 2.pdf', 'structure': 'lim h1', 'xpath': '/dg:chunk/dg:chunk/dg:chunk[2]/dg:chunk[1]/docset:TheApprovedUse/dg:chunk[12]/dg:chunk[1]'}),\n",
|
||||
" Document(page_content='Tenant may place or attach to the Premises signs (digital or otherwise) or other such identification as needed after receiving written permission from the Landlord, which permission shall not be unreasonably withheld. Any damage caused to the Premises by the Tenant’s erecting or removing such signs shall be repaired promptly by the Tenant at the Tenant’s expense. Any signs or other form of identification allowed must conform to all applicable laws, ordinances, etc. governing the same. Tenant also agrees to have any window or glass identification completely removed and cleaned at its expense promptly upon vacating the Premises.', metadata={'id': '58d268162ecc36d8633b7bc364afcb8c', 'name': 'Sample Commercial Leases/TruTone Lane 2.docx', 'structure': 'div', 'xpath': '/docset:OFFICELEASEAGREEMENT-section/docset:OFFICELEASEAGREEMENT/dg:chunk/docset:ARTICLEVISIGNAGE-section/docset:ARTICLEVISIGNAGE/docset:_601Signage'}),\n",
|
||||
" Document(page_content='8. SIGNS:\\n Tenant shall not install signs upon the Premises without Landlord’s prior written approval, which approval shall not be unreasonably withheld or delayed, and any such signage shall be subject to any applicable governmental laws, ordinances, regulations, and other requirements. Tenant shall remove all such signs by the terminations of this Lease. Such installations and removals shall be made in such a manner as to avoid injury or defacement of the Building and other improvements, and Tenant shall repair any injury or defacement, including without limitation discoloration caused by such installations and/or removal.', metadata={'id': '6b7d88f0c979c65d5db088fc177fa81f', 'name': 'Lease Agreements/Bioplex, Inc.pdf', 'structure': 'lim h1 div', 'xpath': '/dg:chunk/docset:WITNESSETH-section/docset:WITNESSETH/dg:chunk/docset:TheObligation/dg:chunk[8]/dg:chunk'})]}"
|
||||
" 'result': \" Tenants can place or attach signs (digital or otherwise) to their premises with written permission from the landlord. The signs must conform to all applicable laws, ordinances, etc. governing the same. Tenants can also have their name listed in the building's directory at the landlord's cost.\",\n",
|
||||
" 'source_documents': [Document(page_content='ARTICLE VI SIGNAGE 6.01 Signage . Tenant may place or attach to the Premises signs (digital or otherwise) or other such identification as needed after receiving written permission from the Landlord , which permission shall not be unreasonably withheld. Any damage caused to the Premises by the Tenant ’s erecting or removing such signs shall be repaired promptly by the Tenant at the Tenant ’s expense . Any signs or other form of identification allowed must conform to all applicable laws, ordinances, etc. governing the same. Tenant also agrees to have any window or glass identification completely removed and cleaned at its expense promptly upon vacating the Premises.', metadata={'Landlord': 'BUBBA CENTER PARTNERSHIP', 'Lease Date': 'April 24 \\n\\n ,', 'Lease Parties': 'This OFFICE LEASE AGREEMENT (this \"Lease\") is made and entered into by and between BUBBA CENTER PARTNERSHIP (\" Landlord \"), and Truetone Lane LLC , a Delaware limited liability company (\" Tenant \").', 'Tenant': 'Truetone Lane LLC', 'id': 'v1bvgaozfkak', 'source': 'TruTone Lane 2.docx', 'structure': 'div', 'tag': '_601Signage', 'xpath': '/docset:OFFICELEASEAGREEMENT-section/docset:OFFICELEASEAGREEMENT/docset:Article/docset:ARTICLEVISIGNAGE-section/docset:_601Signage-section/docset:_601Signage'}),\n",
|
||||
" Document(page_content='Signage. Tenant may place or attach to the Premises signs (digital or otherwise) or other such identification as needed after receiving written permission from the Landlord , which permission shall not be unreasonably withheld. Any damage caused to the Premises by the Tenant ’s erecting or removing such signs shall be repaired promptly by the Tenant at the Tenant ’s expense . Any signs or other form of identification allowed must conform to all applicable laws, ordinances, etc. governing the same. Tenant also agrees to have any window or glass identification completely removed and cleaned at its expense promptly upon vacating the Premises. \\n\\n ARTICLE VII UTILITIES 7.01', metadata={'Landlord': 'GLORY ROAD LLC', 'Lease Date': 'April 30 , 2020', 'Lease Parties': 'This OFFICE LEASE AGREEMENT (this \"Lease\") is made and entered into by and between GLORY ROAD LLC (\" Landlord \"), and Truetone Lane LLC , a Delaware limited liability company (\" Tenant \").', 'Tenant': 'Truetone Lane LLC', 'id': 'g2fvhekmltza', 'source': 'TruTone Lane 6.pdf', 'structure': 'lim', 'tag': 'chunk', 'xpath': '/docset:OFFICELEASEAGREEMENT-section/docset:OFFICELEASEAGREEMENT/docset:Article/docset:ArticleIiiUse/docset:ARTICLEIIIUSEANDCAREOFPREMISES-section/docset:ARTICLEIIIUSEANDCAREOFPREMISES/docset:AnyTime/docset:Addition/dg:chunk'}),\n",
|
||||
" Document(page_content='Landlord , its agents, servants, employees, licensees, invitees, and contractors during the last year of the term of this Lease at any and all times during regular business hours, after 24 hour notice to tenant, to pass and repass on and through the Premises, or such portion thereof as may be necessary, in order that they or any of them may gain access to the Premises for the purpose of showing the Premises to potential new tenants or real estate brokers. In addition, Landlord shall be entitled to place a \"FOR RENT \" or \"FOR LEASE\" sign (not exceeding 8.5 ” x 11 ”) in the front window of the Premises during the last six months of the term of this Lease .', metadata={'Landlord': 'BIRCH STREET , LLC', 'Lease Date': 'October 15 , 2021', 'Lease Parties': 'The provisions of this rider are hereby incorporated into and made a part of the Lease dated as of October 15 , 2021 between BIRCH STREET , LLC , having an address at c/o Birch Palace , 6 Grace Avenue Suite 200 , Great Neck , New York 11021 (\" Landlord \"), and Trutone Lane LLC , having an address at 4 Pearl Street , New York , New York 10012 (\" Tenant \") of Premises known as the ground floor space and lower level space, as per floor plan annexed hereto and made a part hereof as Exhibit A (“Premises”) at 4 Pearl Street , New York , New York 10012 in the City of New York , Borough of Manhattan , to which this rider is annexed. If there is any conflict between the provisions of this rider and the remainder of this Lease , the provisions of this rider shall govern.', 'Tenant': 'Trutone Lane LLC', 'id': 'omvs4mysdk6b', 'source': 'TruTone Lane 1.docx', 'structure': 'p', 'tag': 'Landlord', 'xpath': '/docset:Rider/docset:RIDERTOLEASE-section/docset:RIDERTOLEASE/docset:FixedRent/docset:TermYearPeriod/docset:Lease/docset:_42FLandlordSAccess-section/docset:_42FLandlordSAccess/docset:LandlordsRights/docset:Landlord'}),\n",
|
||||
" Document(page_content=\"24. SIGNS . No signage shall be placed by Tenant on any portion of the Project . However, Tenant shall be permitted to place a sign bearing its name in a location approved by Landlord near the entrance to the Premises (at Tenant's cost ) and will be furnished a single listing of its name in the Building's directory (at Landlord 's cost ), all in accordance with the criteria adopted from time to time by Landlord for the Project . Any changes or additional listings in the directory shall be furnished (subject to availability of space) for the then Building Standard charge .\", metadata={'Landlord': 'Perry & Blair LLC', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'dsyfhh4vpeyf', 'source': 'Shorebucks LLC_CO.pdf', 'structure': 'div', 'tag': 'SIGNS', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:ThisLease-section/docset:ThisLease/docset:Guaranty-section/docset:Guaranty[2]/docset:TheTransfer/docset:TheTerms/docset:Indemnification/docset:INDEMNIFICATION-section/docset:INDEMNIFICATION/docset:Waiver/docset:Waiver/docset:Signs/docset:SIGNS-section/docset:SIGNS'})]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -253,7 +212,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using Docugami Knowledge Graph for High Accuracy Document QA\n",
|
||||
"## Using Docugami to Add Metadata to Chunks for High Accuracy Document QA\n",
|
||||
"\n",
|
||||
"One issue with large documents is that the correct answer to your question may depend on chunks that are far apart in the document. Typical chunking techniques, even with overlap, will struggle with providing the LLM sufficent context to answer such questions. With upcoming very large context LLMs, it may be possible to stuff a lot of tokens, perhaps even entire documents, inside the context but this will still hit limits at some point with very long documents, or a lot of documents.\n",
|
||||
"\n",
|
||||
@@ -262,16 +221,16 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" I don't know.\""
|
||||
"' 9,753 square feet.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -281,21 +240,28 @@
|
||||
"chain_response[\"result\"] # correct answer should be 13,500 sq ft"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"At first glance the answer may seem reasonable, but if you review the source chunks carefully for this answer, you will see that the chunking of the document did not end up putting the Landlord name and the rentable area in the same context, since they are far apart in the document. The retriever therefore ends up finding unrelated chunks from other documents not even related to the **DHA Group** landlord. That landlord happens to be mentioned on the first page of the file **Shorebucks LLC_NJ.pdf** file, and while one of the source chunks used by the chain is indeed from that doc that contains the correct answer (**13,500**), other source chunks from different docs are included, and the answer is therefore incorrect."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='1.6 Rentable Area of the Premises.', metadata={'id': '5b39a1ae84d51682328dca1467be211f', 'name': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'structure': 'lim h1', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/dg:chunk/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:CatalystGroup/dg:chunk[6]/dg:chunk'}),\n",
|
||||
" Document(page_content='1.6 Rentable Area of the Premises.', metadata={'id': '5b39a1ae84d51682328dca1467be211f', 'name': 'Sample Commercial Leases/Shorebucks LLC_AZ.pdf', 'structure': 'lim h1', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE-section/docset:WITNESSETH-section/docset:WITNESSETH/dg:chunk/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:MenloGroup/dg:chunk[6]/dg:chunk'}),\n",
|
||||
" Document(page_content='1.6 Rentable Area of the Premises.', metadata={'id': '5b39a1ae84d51682328dca1467be211f', 'name': 'Sample Commercial Leases/Shorebucks LLC_FL.pdf', 'structure': 'lim h1', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:Florida-section/docset:Florida/docset:Shorebucks/dg:chunk[5]/dg:chunk'}),\n",
|
||||
" Document(page_content='1.6 Rentable Area of the Premises.', metadata={'id': '5b39a1ae84d51682328dca1467be211f', 'name': 'Sample Commercial Leases/Shorebucks LLC_TX.pdf', 'structure': 'lim h1', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/dg:chunk/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:LandmarkLlc/dg:chunk[6]/dg:chunk'})]"
|
||||
"[Document(page_content='1.1 Landlord . DHA Group , a Delaware limited liability company authorized to transact business in New Jersey .', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'DhaGroup', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/docset:DhaGroup/docset:DhaGroup/docset:Landlord-section/docset:DhaGroup'}),\n",
|
||||
" Document(page_content='WITNESSES: LANDLORD: DHA Group , a Delaware limited liability company', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'p', 'tag': 'DhaGroup', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Guaranty-section/docset:Guaranty[2]/docset:SIGNATURESONNEXTPAGE-section/docset:INWITNESSWHEREOF-section/docset:INWITNESSWHEREOF/docset:Behalf/docset:Witnesses/xhtml:table/xhtml:tbody/xhtml:tr[3]/xhtml:td[2]/docset:DhaGroup'}),\n",
|
||||
" Document(page_content=\"1.16 Landlord 's Notice Address . DHA Group , Suite 1010 , 111 Bauer Dr , Oakland , New Jersey , 07436 , with a copy to the Building Management Office at the Project , Attention: On - Site Property Manager .\", metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'LandlordsNoticeAddress', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Period/docset:ApplicableSalesTax/docset:PercentageRent/docset:PercentageRent/docset:NoticeAddress[2]/docset:LandlordsNoticeAddress-section/docset:LandlordsNoticeAddress[2]'}),\n",
|
||||
" Document(page_content='1.6 Rentable Area of the Premises. 9,753 square feet . This square footage figure includes an add-on factor for Common Areas in the Building and has been agreed upon by the parties as final and correct and is not subject to challenge or dispute by either party.', metadata={'Landlord': 'Perry & Blair LLC', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'dsyfhh4vpeyf', 'source': 'Shorebucks LLC_CO.pdf', 'structure': 'div', 'tag': 'RentableAreaofthePremises', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:PerryBlair/docset:PerryBlair/docset:Premises[2]/docset:RentableAreaofthePremises-section/docset:RentableAreaofthePremises'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -304,42 +270,43 @@
|
||||
"chain_response[\"source_documents\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"At first glance the answer may seem reasonable, but it is incorrect. If you review the source chunks carefully for this answer, you will see that the chunking of the document did not end up putting the Landlord name and the rentable area in the same context, and produced irrelevant chunks therefore the answer is incorrect (should be **13,500 sq ft**)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Docugami can help here. Chunks are annotated with additional metadata created using different techniques if a user has been [using Docugami](https://help.docugami.com/home/reports). More technical approaches will be added later.\n",
|
||||
"\n",
|
||||
"Specifically, let's ask Docugami to return XML tags on its output, as well as additional metadata:"
|
||||
"Specifically, let's look at the additional metadata that is returned on the documents returned by docugami, in the form of some simple key/value pairs on all the text chunks:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'xpath': '/docset:OFFICELEASE-section/dg:chunk', 'id': '47297e277e556f3ce8b570047304560b', 'name': 'Sample Commercial Leases/Shorebucks LLC_AZ.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_AZ.pdf', 'structure': 'h1 h1 p', 'tag': 'chunk Lease', 'Lease Date': 'March 29th , 2019', 'Landlord': 'Menlo Group', 'Tenant': 'Shorebucks LLC', 'Premises Address': '1564 E Broadway Rd , Tempe , Arizona 85282', 'Term of Lease': '96 full calendar months', 'Square Feet': '16,159'}\n"
|
||||
]
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'xpath': '/docset:OFFICELEASEAGREEMENT-section/docset:OFFICELEASEAGREEMENT/docset:LeaseParties',\n",
|
||||
" 'id': 'v1bvgaozfkak',\n",
|
||||
" 'source': 'TruTone Lane 2.docx',\n",
|
||||
" 'structure': 'p',\n",
|
||||
" 'tag': 'LeaseParties',\n",
|
||||
" 'Lease Date': 'April 24 \\n\\n ,',\n",
|
||||
" 'Landlord': 'BUBBA CENTER PARTNERSHIP',\n",
|
||||
" 'Tenant': 'Truetone Lane LLC',\n",
|
||||
" 'Lease Parties': 'This OFFICE LEASE AGREEMENT (this \"Lease\") is made and entered into by and between BUBBA CENTER PARTNERSHIP (\" Landlord \"), and Truetone Lane LLC , a Delaware limited liability company (\" Tenant \").'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader = DocugamiLoader(docset_id=\"zo954yqy53wp\")\n",
|
||||
"loader.include_xml_tags = (\n",
|
||||
" True # for additional semantics from the Docugami knowledge graph\n",
|
||||
")\n",
|
||||
"chunks = loader.load()\n",
|
||||
"print(chunks[0].metadata)"
|
||||
"loader = DocugamiLoader(docset_id=\"wh2kned25uqm\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"documents[0].metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -351,22 +318,12 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!poetry run pip install --upgrade lark --quiet"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain.vectorstores.chroma import Chroma\n",
|
||||
"\n",
|
||||
"EXCLUDE_KEYS = [\"id\", \"xpath\", \"structure\"]\n",
|
||||
"metadata_field_info = [\n",
|
||||
@@ -375,23 +332,19 @@
|
||||
" description=f\"The {key} for this chunk\",\n",
|
||||
" type=\"string\",\n",
|
||||
" )\n",
|
||||
" for key in chunks[0].metadata\n",
|
||||
" for key in documents[0].metadata\n",
|
||||
" if key.lower() not in EXCLUDE_KEYS\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"document_content_description = \"Contents of this chunk\"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"\n",
|
||||
"vectordb = Chroma.from_documents(documents=chunks, embedding=embedding)\n",
|
||||
"vectordb = Chroma.from_documents(documents=documents, embedding=embedding)\n",
|
||||
"retriever = SelfQueryRetriever.from_llm(\n",
|
||||
" llm, vectordb, document_content_description, metadata_field_info, verbose=True\n",
|
||||
")\n",
|
||||
"qa_chain = RetrievalQA.from_chain_type(\n",
|
||||
" llm=OpenAI(),\n",
|
||||
" chain_type=\"stuff\",\n",
|
||||
" retriever=retriever,\n",
|
||||
" return_source_documents=True,\n",
|
||||
" verbose=True,\n",
|
||||
" llm=OpenAI(), chain_type=\"stuff\", retriever=retriever, return_source_documents=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -404,32 +357,36 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/root/Source/github/docugami.langchain/libs/langchain/langchain/chains/llm.py:275: UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new RetrievalQA chain...\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
"query='rentable area' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='Landlord', value='DHA Group') limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'query': 'What is rentable area for the property owned by DHA Group?',\n",
|
||||
" 'result': ' The rentable area of the property owned by DHA Group is 13,500 square feet.',\n",
|
||||
" 'source_documents': [Document(page_content='1.6 Rentable Area of the Premises.', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Premises Address': '111 Bauer Dr , Oakland , New Jersey , 07436', 'Square Feet': '13,500', 'Tenant': 'Shorebucks LLC', 'Term of Lease': '84 full calendar months', 'id': '5b39a1ae84d51682328dca1467be211f', 'name': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'structure': 'lim h1', 'tag': 'chunk', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/dg:chunk/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/dg:chunk[6]/dg:chunk'}),\n",
|
||||
" Document(page_content='<RentableAreaofthePremises><SquareFeet>13,500 </SquareFeet>square feet. This square footage figure includes an add-on factor for Common Areas in the Building and has been agreed upon by the parties as final and correct and is not subject to challenge or dispute by either party. </RentableAreaofthePremises>', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Premises Address': '111 Bauer Dr , Oakland , New Jersey , 07436', 'Square Feet': '13,500', 'Tenant': 'Shorebucks LLC', 'Term of Lease': '84 full calendar months', 'id': '4c06903d087f5a83e486ee42cd702d31', 'name': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'RentableAreaofthePremises', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/dg:chunk/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/dg:chunk[6]/docset:RentableAreaofthePremises-section/docset:RentableAreaofthePremises'}),\n",
|
||||
" Document(page_content='<TheTermAnnualMarketRent>shall mean (i) for the initial Lease Year (“Year 1”) <Money>$2,239,748.00 </Money>per year (i.e., the product of the Rentable Area of the Premises multiplied by <Money>$82.00</Money>) (the “Year 1 Market Rent Hurdle”); (ii) for the Lease Year thereafter, <Percent>one hundred three percent (103%) </Percent>of the Year 1 Market Rent Hurdle, and (iii) for each Lease Year thereafter until the termination or expiration of this Lease, the Annual Market Rent Threshold shall be <AnnualMarketRentThreshold>one hundred three percent (103%) </AnnualMarketRentThreshold>of the Annual Market Rent Threshold for the immediately prior Lease Year. </TheTermAnnualMarketRent>', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Premises Address': '111 Bauer Dr , Oakland , New Jersey , 07436', 'Square Feet': '13,500', 'Tenant': 'Shorebucks LLC', 'Term of Lease': '84 full calendar months', 'id': '6b90beeadace5d4d12b25706fb48e631', 'name': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'TheTermAnnualMarketRent', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCredit-section/docset:GrossRentCredit/dg:chunk/dg:chunk/dg:chunk/dg:chunk[2]/docset:PercentageRent/dg:chunk[2]/dg:chunk[2]/docset:TenantSRevenue/dg:chunk[2]/docset:TenantSRevenue/dg:chunk[3]/docset:TheTermAnnualMarketRent-section/docset:TheTermAnnualMarketRent'}),\n",
|
||||
" Document(page_content='1.11 Percentage Rent.\\n (a) <GrossRevenue><Percent>55% </Percent>of Gross Revenue to Landlord until Landlord receives Percentage Rent in an amount equal to the Annual Market Rent Hurdle (as escalated); and </GrossRevenue>', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Premises Address': '111 Bauer Dr , Oakland , New Jersey , 07436', 'Square Feet': '13,500', 'Tenant': 'Shorebucks LLC', 'Term of Lease': '84 full calendar months', 'id': 'c8bb9cbedf65a578d9db3f25f519dd3d', 'name': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'structure': 'lim h1 lim p', 'tag': 'chunk GrossRevenue', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCredit-section/docset:GrossRentCredit/dg:chunk/dg:chunk/dg:chunk/docset:PercentageRent/dg:chunk[1]/dg:chunk[1]'})]}"
|
||||
" 'result': ' The rentable area for the property owned by DHA Group is 13,500 square feet.',\n",
|
||||
" 'source_documents': [Document(page_content='1.6 Rentable Area of the Premises. 13,500 square feet . This square footage figure includes an add-on factor for Common Areas in the Building and has been agreed upon by the parties as final and correct and is not subject to challenge or dispute by either party.', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'RentableAreaofthePremises', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/docset:DhaGroup/docset:Premises[2]/docset:RentableAreaofthePremises-section/docset:RentableAreaofthePremises'}),\n",
|
||||
" Document(page_content='1.6 Rentable Area of the Premises. 13,500 square feet . This square footage figure includes an add-on factor for Common Areas in the Building and has been agreed upon by the parties as final and correct and is not subject to challenge or dispute by either party.', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'div', 'tag': 'RentableAreaofthePremises', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:TheTerms/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/docset:DhaGroup/docset:DhaGroup/docset:Premises[2]/docset:RentableAreaofthePremises-section/docset:RentableAreaofthePremises'}),\n",
|
||||
" Document(page_content='1.11 Percentage Rent . (a) 55 % of Gross Revenue to Landlord until Landlord receives Percentage Rent in an amount equal to the Annual Market Rent Hurdle (as escalated); and', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'p', 'tag': 'GrossRevenue', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Period/docset:ApplicableSalesTax/docset:PercentageRent/docset:PercentageRent/docset:PercentageRent/docset:PercentageRent-section/docset:PercentageRent[2]/docset:PercentageRent/docset:GrossRevenue[1]/docset:GrossRevenue'}),\n",
|
||||
" Document(page_content='1.11 Percentage Rent . (a) 55 % of Gross Revenue to Landlord until Landlord receives Percentage Rent in an amount equal to the Annual Market Rent Hurdle (as escalated); and', metadata={'Landlord': 'DHA Group', 'Lease Date': 'March 29th , 2019', 'Lease Parties': 'THIS OFFICE LEASE (the \"Lease\") is made and entered into as of March 29th , 2019 , by and between Landlord and Tenant . \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease .', 'Tenant': 'Shorebucks LLC', 'id': 'md8rieecquyv', 'source': 'Shorebucks LLC_NJ.pdf', 'structure': 'p', 'tag': 'GrossRevenue', 'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:THISOFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/docset:GrossRentCreditTheRentCredit-section/docset:GrossRentCreditTheRentCredit/docset:Period/docset:ApplicableSalesTax/docset:PercentageRent/docset:PercentageRent/docset:PercentageRent/docset:PercentageRent-section/docset:PercentageRent[2]/docset:PercentageRent/docset:GrossRevenue[1]/docset:GrossRevenue'})]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -446,198 +403,6 @@
|
||||
"source": [
|
||||
"This time the answer is correct, since the self-querying retriever created a filter on the landlord attribute of the metadata, correctly filtering to document that specifically is about the DHA Group landlord. The resulting source chunks are all relevant to this landlord, and this improves answer accuracy even though the landlord is not directly mentioned in the specific chunk that contains the correct answer."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Advanced Topic: Small-to-Big Retrieval with Document Knowledge Graph Hierarchy"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Documents are inherently semi-structured and the DocugamiLoader is able to navigate the semantic and structural contours of the document to provide parent chunk references on the chunks it returns. This is useful e.g. with the [MultiVector Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector) for [small-to-big](https://www.youtube.com/watch?v=ihSiRrOUwmg) retrieval.\n",
|
||||
"\n",
|
||||
"To get parent chunk references, you can set `loader.parent_hierarchy_levels` to a non-zero value."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Dict, List\n",
|
||||
"\n",
|
||||
"from langchain.document_loaders import DocugamiLoader\n",
|
||||
"from langchain.schema.document import Document\n",
|
||||
"\n",
|
||||
"loader = DocugamiLoader(docset_id=\"zo954yqy53wp\")\n",
|
||||
"loader.include_xml_tags = (\n",
|
||||
" True # for additional semantics from the Docugami knowledge graph\n",
|
||||
")\n",
|
||||
"loader.parent_hierarchy_levels = 3 # for expanded context\n",
|
||||
"loader.max_text_length = (\n",
|
||||
" 1024 * 8\n",
|
||||
") # 8K chars are roughly 2K tokens (ref: https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)\n",
|
||||
"loader.include_project_metadata_in_doc_metadata = (\n",
|
||||
" False # Not filtering on vector metadata, so remove to lighten the vectors\n",
|
||||
")\n",
|
||||
"chunks: List[Document] = loader.load()\n",
|
||||
"\n",
|
||||
"# build separate maps of parent and child chunks\n",
|
||||
"parents_by_id: Dict[str, Document] = {}\n",
|
||||
"children_by_id: Dict[str, Document] = {}\n",
|
||||
"for chunk in chunks:\n",
|
||||
" chunk_id = chunk.metadata.get(\"id\")\n",
|
||||
" parent_chunk_id = chunk.metadata.get(loader.parent_id_key)\n",
|
||||
" if not parent_chunk_id:\n",
|
||||
" # parent chunk\n",
|
||||
" parents_by_id[chunk_id] = chunk\n",
|
||||
" else:\n",
|
||||
" # child chunk\n",
|
||||
" children_by_id[chunk_id] = chunk"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"PARENT CHUNK 7df09fbfc65bb8377054808aac2d16fd: page_content='OFFICE LEASE\\n THIS OFFICE LEASE\\n <Lease>(the \"Lease\") is made and entered into as of <LeaseDate>March 29th, 2019</LeaseDate>, by and between Landlord and Tenant. \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease. </Lease>\\nW I T N E S S E T H\\n <TheTerms> Subject to and on the terms and conditions of this Lease, Landlord leases to Tenant and Tenant hires from Landlord the Premises. </TheTerms>\\n1. BASIC LEASE INFORMATION AND DEFINED TERMS.\\nThe key business terms of this Lease and the defined terms used in this Lease are as follows:' metadata={'xpath': '/docset:OFFICELEASE-section/dg:chunk', 'id': '7df09fbfc65bb8377054808aac2d16fd', 'name': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'structure': 'h1 h1 p h1 p lim h1 p', 'tag': 'chunk Lease chunk TheTerms'}\n",
|
||||
"CHUNK 47297e277e556f3ce8b570047304560b: page_content='OFFICE LEASE\\n THIS OFFICE LEASE\\n <Lease>(the \"Lease\") is made and entered into as of <LeaseDate>March 29th, 2019</LeaseDate>, by and between Landlord and Tenant. \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease. </Lease>' metadata={'xpath': '/docset:OFFICELEASE-section/dg:chunk', 'id': '47297e277e556f3ce8b570047304560b', 'name': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_NJ.pdf', 'structure': 'h1 h1 p', 'tag': 'chunk Lease', 'doc_id': '7df09fbfc65bb8377054808aac2d16fd'}\n",
|
||||
"PARENT CHUNK bb84925da3bed22c30ea1bdc173ff54f: page_content='OFFICE LEASE\\n THIS OFFICE LEASE\\n <Lease>(the \"Lease\") is made and entered into as of <LeaseDate>January 8th, 2018</LeaseDate>, by and between Landlord and Tenant. \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease. </Lease>\\nW I T N E S S E T H\\n <TheTerms> Subject to and on the terms and conditions of this Lease, Landlord leases to Tenant and Tenant hires from Landlord the Premises. </TheTerms>\\n1. BASIC LEASE INFORMATION AND DEFINED TERMS.\\nThe key business terms of this Lease and the defined terms used in this Lease are as follows:\\n1.1 Landlord.\\n <Landlord>Catalyst Group LLC </Landlord>' metadata={'xpath': '/docset:OFFICELEASE-section/dg:chunk', 'id': 'bb84925da3bed22c30ea1bdc173ff54f', 'name': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'structure': 'h1 h1 p h1 p lim h1 p lim h1 div', 'tag': 'chunk Lease chunk TheTerms chunk Landlord'}\n",
|
||||
"CHUNK 2f1746cbd546d1d61a9250c50de7a7fa: page_content='W I T N E S S E T H\\n <TheTerms> Subject to and on the terms and conditions of this Lease, Landlord leases to Tenant and Tenant hires from Landlord the Premises. </TheTerms>' metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:WITNESSETH-section/dg:chunk', 'id': '2f1746cbd546d1d61a9250c50de7a7fa', 'name': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'structure': 'h1 p', 'tag': 'chunk TheTerms', 'doc_id': 'bb84925da3bed22c30ea1bdc173ff54f'}\n",
|
||||
"PARENT CHUNK 0b0d765b6e504a6ba54fa76b203e62ec: page_content='OFFICE LEASE\\n THIS OFFICE LEASE\\n <Lease>(the \"Lease\") is made and entered into as of <LeaseDate>January 8th, 2018</LeaseDate>, by and between Landlord and Tenant. \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease. </Lease>\\nW I T N E S S E T H\\n <TheTerms> Subject to and on the terms and conditions of this Lease, Landlord leases to Tenant and Tenant hires from Landlord the Premises. </TheTerms>\\n1. BASIC LEASE INFORMATION AND DEFINED TERMS.\\nThe key business terms of this Lease and the defined terms used in this Lease are as follows:\\n1.1 Landlord.\\n <Landlord>Catalyst Group LLC </Landlord>\\n1.2 Tenant.\\n <Tenant>Shorebucks LLC </Tenant>' metadata={'xpath': '/docset:OFFICELEASE-section/dg:chunk', 'id': '0b0d765b6e504a6ba54fa76b203e62ec', 'name': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'structure': 'h1 h1 p h1 p lim h1 p lim h1 div lim h1 div', 'tag': 'chunk Lease chunk TheTerms chunk Landlord chunk Tenant'}\n",
|
||||
"CHUNK b362dfe776ec5a7a66451a8c7c220b59: page_content='1. BASIC LEASE INFORMATION AND DEFINED TERMS.' metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/dg:chunk/dg:chunk/docset:BasicLeaseInformation/dg:chunk', 'id': 'b362dfe776ec5a7a66451a8c7c220b59', 'name': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'structure': 'lim h1', 'tag': 'chunk', 'doc_id': '0b0d765b6e504a6ba54fa76b203e62ec'}\n",
|
||||
"PARENT CHUNK c942010baaf76aa4d4657769492f6edb: page_content='OFFICE LEASE\\n THIS OFFICE LEASE\\n <Lease>(the \"Lease\") is made and entered into as of <LeaseDate>January 8th, 2018</LeaseDate>, by and between Landlord and Tenant. \"Date of this Lease\" shall mean the date on which the last one of the Landlord and Tenant has signed this Lease. </Lease>\\nW I T N E S S E T H\\n <TheTerms> Subject to and on the terms and conditions of this Lease, Landlord leases to Tenant and Tenant hires from Landlord the Premises. </TheTerms>\\n1. BASIC LEASE INFORMATION AND DEFINED TERMS.\\nThe key business terms of this Lease and the defined terms used in this Lease are as follows:\\n1.1 Landlord.\\n <Landlord>Catalyst Group LLC </Landlord>\\n1.2 Tenant.\\n <Tenant>Shorebucks LLC </Tenant>\\n1.3 Building.\\n <Building>The building containing the Premises located at <PremisesAddress><PremisesStreetAddress><MainStreet>600 </MainStreet><StreetName>Main Street</StreetName></PremisesStreetAddress>, <City>Bellevue</City>, <State>WA</State>, <Premises>98004</Premises></PremisesAddress>. The Building is located within the Project. </Building>' metadata={'xpath': '/docset:OFFICELEASE-section/dg:chunk', 'id': 'c942010baaf76aa4d4657769492f6edb', 'name': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'structure': 'h1 h1 p h1 p lim h1 p lim h1 div lim h1 div lim h1 div', 'tag': 'chunk Lease chunk TheTerms chunk Landlord chunk Tenant chunk Building'}\n",
|
||||
"CHUNK a95971d693b7aa0f6640df1fbd18c2ba: page_content='The key business terms of this Lease and the defined terms used in this Lease are as follows:' metadata={'xpath': '/docset:OFFICELEASE-section/docset:OFFICELEASE-section/docset:OFFICELEASE/docset:WITNESSETH-section/docset:WITNESSETH/dg:chunk/dg:chunk/docset:BasicLeaseInformation/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS-section/docset:BASICLEASEINFORMATIONANDDEFINEDTERMS/dg:chunk', 'id': 'a95971d693b7aa0f6640df1fbd18c2ba', 'name': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'source': 'Sample Commercial Leases/Shorebucks LLC_WA.pdf', 'structure': 'p', 'tag': 'chunk', 'doc_id': 'c942010baaf76aa4d4657769492f6edb'}\n",
|
||||
"PARENT CHUNK f34b649cde7fc4ae156849a56d690495: page_content='W I T N E S S E T H\\n <TheTerms> Subject to and on the terms and conditions of this Lease, Landlord leases to Tenant and Tenant hires from Landlord the Premises. </TheTerms>\\n1. BASIC LEASE INFORMATION AND DEFINED TERMS.\\n<BASICLEASEINFORMATIONANDDEFINEDTERMS>The key business terms of this Lease and the defined terms used in this Lease are as follows: </BASICLEASEINFORMATIONANDDEFINEDTERMS>\\n1.1 Landlord.\\n <Landlord><Landlord>Menlo Group</Landlord>, a <USState>Delaware </USState>limited liability company authorized to transact business in <USState>Arizona</USState>. </Landlord>\\n1.2 Tenant.\\n <Tenant>Shorebucks LLC </Tenant>\\n1.3 Building.\\n <Building>The building containing the Premises located at <PremisesAddress><PremisesStreetAddress><Premises>1564 </Premises><Premises>E Broadway Rd</Premises></PremisesStreetAddress>, <City>Tempe</City>, <USState>Arizona </USState><Premises>85282</Premises></PremisesAddress>. The Building is located within the Project. </Building>\\n1.4 Project.\\n <Project>The parcel of land and the buildings and improvements located on such land known as Shorebucks Office <ShorebucksOfficeAddress><ShorebucksOfficeStreetAddress><ShorebucksOffice>6 </ShorebucksOffice><ShorebucksOffice6>located at <Number>1564 </Number>E Broadway Rd</ShorebucksOffice6></ShorebucksOfficeStreetAddress>, <City>Tempe</City>, <USState>Arizona </USState><Number>85282</Number></ShorebucksOfficeAddress>. The Project is legally described in EXHIBIT \"A\" to this Lease. </Project>' metadata={'xpath': '/dg:chunk/docset:WITNESSETH-section/dg:chunk', 'id': 'f34b649cde7fc4ae156849a56d690495', 'name': 'Sample Commercial Leases/Shorebucks LLC_AZ.docx', 'source': 'Sample Commercial Leases/Shorebucks LLC_AZ.docx', 'structure': 'h1 p lim h1 div lim h1 div lim h1 div lim h1 div lim h1 div', 'tag': 'chunk TheTerms BASICLEASEINFORMATIONANDDEFINEDTERMS chunk Landlord chunk Tenant chunk Building chunk Project'}\n",
|
||||
"CHUNK 21b4d9517f7ccdc0e3a028ce5043a2a0: page_content='1.1 Landlord.\\n <Landlord><Landlord>Menlo Group</Landlord>, a <USState>Delaware </USState>limited liability company authorized to transact business in <USState>Arizona</USState>. </Landlord>' metadata={'xpath': '/dg:chunk/docset:WITNESSETH-section/docset:WITNESSETH/dg:chunk[1]/dg:chunk[1]/dg:chunk/dg:chunk[2]/dg:chunk', 'id': '21b4d9517f7ccdc0e3a028ce5043a2a0', 'name': 'Sample Commercial Leases/Shorebucks LLC_AZ.docx', 'source': 'Sample Commercial Leases/Shorebucks LLC_AZ.docx', 'structure': 'lim h1 div', 'tag': 'chunk Landlord', 'doc_id': 'f34b649cde7fc4ae156849a56d690495'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Explore some of the parent chunk relationships\n",
|
||||
"for id, chunk in list(children_by_id.items())[:5]:\n",
|
||||
" parent_chunk_id = chunk.metadata.get(loader.parent_id_key)\n",
|
||||
" if parent_chunk_id:\n",
|
||||
" # child chunks have the parent chunk id set\n",
|
||||
" print(f\"PARENT CHUNK {parent_chunk_id}: {parents_by_id[parent_chunk_id]}\")\n",
|
||||
" print(f\"CHUNK {id}: {chunk}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever, SearchType\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain.vectorstores.chroma import Chroma\n",
|
||||
"\n",
|
||||
"# The vectorstore to use to index the child chunks\n",
|
||||
"vectorstore = Chroma(collection_name=\"big2small\", embedding_function=OpenAIEmbeddings())\n",
|
||||
"\n",
|
||||
"# The storage layer for the parent documents\n",
|
||||
"store = InMemoryStore()\n",
|
||||
"\n",
|
||||
"# The retriever (empty to start)\n",
|
||||
"retriever = MultiVectorRetriever(\n",
|
||||
" vectorstore=vectorstore,\n",
|
||||
" docstore=store,\n",
|
||||
" search_type=SearchType.mmr, # use max marginal relevance search\n",
|
||||
" search_kwargs={\"k\": 2},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Add child chunks to vector store\n",
|
||||
"retriever.vectorstore.add_documents(list(children_by_id.values()))\n",
|
||||
"\n",
|
||||
"# Add parent chunks to docstore\n",
|
||||
"retriever.docstore.mset(parents_by_id.items())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"24. SIGNS.\n",
|
||||
" <SIGNS>No signage shall be placed by Tenant on any portion of the Project. However, Tenant shall be permitted to place a sign bearing its name in a location approved by Landlord near the entrance to the Premises (at Tenant's cost) and will be furnished a single listing of its name in the Building's directory (at Landlord's cost), all in accordance with the criteria adopted <Frequency>from time to time </Frequency>by Landlord for the Project. Any changes or additional listings in the directory shall be furnished (subject to availability of space) for the then Building Standard charge. </SIGNS>\n",
|
||||
"43090337ed2409e0da24ee07e2adbe94\n",
|
||||
"<TheExterior> Tenant agrees that all signs, awnings, protective gates, security devices and other installations visible from the exterior of the Premises shall be subject to Landlord's prior written approval, shall be subject to the prior approval of the <Org>Landmarks </Org><Landmarks>Preservation Commission </Landmarks>of the City of <USState>New <Org>York</Org></USState>, if required, and shall not interfere with or block either of the adjacent stores, provided, however, that Landlord shall not unreasonably withhold consent for signs that Tenant desires to install. Tenant agrees that any permitted signs, awnings, protective gates, security devices, and other installations shall be installed at Tenant’s sole cost and expense professionally prepared and dignified and subject to Landlord's prior written approval, which shall not be unreasonably withheld, delayed or conditioned, and subject to such reasonable rules and restrictions as Landlord <Frequency>from time to time </Frequency>may impose. Tenant shall submit to Landlord drawings of the proposed signs and other installations, showing the size, color, illumination and general appearance thereof, together with a statement of the manner in which the same are to be affixed to the Premises. Tenant shall not commence the installation of the proposed signs and other installations unless and until Landlord shall have approved the same in writing. . Tenant shall not install any neon sign. The aforesaid signs shall be used solely for the purpose of identifying Tenant's business. No changes shall be made in the signs and other installations without first obtaining Landlord's prior written consent thereto, which consent shall not be unreasonably withheld, delayed or conditioned. Tenant shall, at its own cost and expense, obtain and exhibit to Landlord such permits or certificates of approval as Tenant may be required to obtain from any and all City, State and other authorities having jurisdiction covering the erection, installation, maintenance or use of said signs or other installations, and Tenant shall maintain the said signs and other installations together with any appurtenances thereto in good order and condition and to the satisfaction of the Landlord and in accordance with any and all orders, regulations, requirements and rules of any public authorities having jurisdiction thereover. Landlord consents to Tenant’s Initial Signage described in annexed Exhibit D. </TheExterior>\n",
|
||||
"54ddfc3e47f41af7e747b2bc439ea96b\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Query vector store directly, should return chunks\n",
|
||||
"found_chunks = vectorstore.similarity_search(\n",
|
||||
" \"what signs does Birch Street allow on their property?\", k=2\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"for chunk in found_chunks:\n",
|
||||
" print(chunk.page_content)\n",
|
||||
" print(chunk.metadata[loader.parent_id_key])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"21. SERVICES AND UTILITIES.\n",
|
||||
" <SERVICESANDUTILITIES>Landlord shall have no obligation to provide any utilities or services to the Premises other than passenger elevator service to the Premises. Tenant shall be solely responsible for and shall promptly pay all charges for water, electricity, or any other utility used or consumed in the Premises, including all costs associated with separately metering for the Premises. Tenant shall be responsible for repairs and maintenance to exit lighting, emergency lighting, and fire extinguishers for the Premises. Tenant is responsible for interior janitorial, pest control, and waste removal services. Landlord may at any time change the electrical utility provider for the Building. Tenant’s use of electrical, HVAC, or other services furnished by Landlord shall not exceed, either in voltage, rated capacity, use, or overall load, that which Landlord deems to be standard for the Building. In no event shall Landlord be liable for damages resulting from the failure to furnish any service, and any interruption or failure shall in no manner entitle Tenant to any remedies including abatement of Rent. If at any time during the Lease Term the Project has any type of card access system for the Parking Areas or the Building, Tenant shall purchase access cards for all occupants of the Premises from Landlord at a Building Standard charge and shall comply with Building Standard terms relating to access to the Parking Areas and the Building. </SERVICESANDUTILITIES>\n",
|
||||
"22. SECURITY DEPOSIT.\n",
|
||||
" <SECURITYDEPOSIT>The Security Deposit shall be held by Landlord as security for Tenant's full and faithful performance of this Lease including the payment of Rent. Tenant grants Landlord a security interest in the Security Deposit. The Security Deposit may be commingled with other funds of Landlord and Landlord shall have no liability for payment of any interest on the Security Deposit. Landlord may apply the Security Deposit to the extent required to cure any default by Tenant. If Landlord so applies the Security Deposit, Tenant shall deliver to Landlord the amount necessary to replenish the Security Deposit to its original sum within <Deliver>five days </Deliver>after notice from Landlord. The Security Deposit shall not be deemed an advance payment of Rent or a measure of damages for any default by Tenant, nor shall it be a defense to any action that Landlord may bring against Tenant. </SECURITYDEPOSIT>\n",
|
||||
"23. GOVERNMENTAL REGULATIONS.\n",
|
||||
" <GOVERNMENTALREGULATIONS>Tenant, at Tenant's sole cost and expense, shall promptly comply (and shall cause all subtenants and licensees to comply) with all laws, codes, and ordinances of governmental authorities, including the Americans with Disabilities Act of <AmericanswithDisabilitiesActDate>1990 </AmericanswithDisabilitiesActDate>as amended (the \"ADA\"), and all recorded covenants and restrictions affecting the Project, pertaining to Tenant, its conduct of business, and its use and occupancy of the Premises, including the performance of any work to the Common Areas required because of Tenant's specific use (as opposed to general office use) of the Premises or Alterations to the Premises made by Tenant. </GOVERNMENTALREGULATIONS>\n",
|
||||
"24. SIGNS.\n",
|
||||
" <SIGNS>No signage shall be placed by Tenant on any portion of the Project. However, Tenant shall be permitted to place a sign bearing its name in a location approved by Landlord near the entrance to the Premises (at Tenant's cost) and will be furnished a single listing of its name in the Building's directory (at Landlord's cost), all in accordance with the criteria adopted <Frequency>from time to time </Frequency>by Landlord for the Project. Any changes or additional listings in the directory shall be furnished (subject to availability of space) for the then Building Standard charge. </SIGNS>\n",
|
||||
"25. BROKER.\n",
|
||||
" <BROKER>Landlord and Tenant each represent and warrant that they have neither consulted nor negotiated with any broker or finder regarding the Premises, except the Landlord's Broker and Tenant's Broker. Tenant shall indemnify, defend, and hold Landlord harmless from and against any claims for commissions from any real estate broker other than Landlord's Broker and Tenant's Broker with whom Tenant has dealt in connection with this Lease. Landlord shall indemnify, defend, and hold Tenant harmless from and against payment of any leasing commission due Landlord's Broker and Tenant's Broker in connection with this Lease and any claims for commissions from any real estate broker other than Landlord's Broker and Tenant's Broker with whom Landlord has dealt in connection with this Lease. The terms of this article shall survive the expiration or earlier termination of this Lease. </BROKER>\n",
|
||||
"26. END OF TERM.\n",
|
||||
" <ENDOFTERM>Tenant shall surrender the Premises to Landlord at the expiration or sooner termination of this Lease or Tenant's right of possession in good order and condition, broom-clean, except for reasonable wear and tear. All Alterations made by Landlord or Tenant to the Premises shall become Landlord's property on the expiration or sooner termination of the Lease Term. On the expiration or sooner termination of the Lease Term, Tenant, at its expense, shall remove from the Premises all of Tenant's personal property, all computer and telecommunications wiring, and all Alterations that Landlord designates by notice to Tenant. Tenant shall also repair any damage to the Premises caused by the removal. Any items of Tenant's property that shall remain in the Premises after the expiration or sooner termination of the Lease Term, may, at the option of Landlord and without notice, be deemed to have been abandoned, and in that case, those items may be retained by Landlord as its property to be disposed of by Landlord, without accountability or notice to Tenant or any other party, in the manner Landlord shall determine, at Tenant's expense. </ENDOFTERM>\n",
|
||||
"27. ATTORNEYS' FEES.\n",
|
||||
" <ATTORNEYSFEES>Except as otherwise provided in this Lease, the prevailing party in any litigation or other dispute resolution proceeding, including arbitration, arising out of or in any manner based on or relating to this Lease, including tort actions and actions for injunctive, declaratory, and provisional relief, shall be entitled to recover from the losing party actual attorneys' fees and costs, including fees for litigating the entitlement to or amount of fees or costs owed under this provision, and fees in connection with bankruptcy, appellate, or collection proceedings. No person or entity other than Landlord or Tenant has any right to recover fees under this paragraph. In addition, if Landlord becomes a party to any suit or proceeding affecting the Premises or involving this Lease or Tenant's interest under this Lease, other than a suit between Landlord and Tenant, or if Landlord engages counsel to collect any of the amounts owed under this Lease, or to enforce performance of any of the agreements, conditions, covenants, provisions, or stipulations of this Lease, without commencing litigation, then the costs, expenses, and reasonable attorneys' fees and disbursements incurred by Landlord shall be paid to Landlord by Tenant. </ATTORNEYSFEES>\n",
|
||||
"43090337ed2409e0da24ee07e2adbe94\n",
|
||||
"<TenantsSoleCost> Tenant, at Tenant's sole cost and expense, shall be responsible for the removal and disposal of all of garbage, waste, and refuse from the Premises on a <Frequency>daily </Frequency>basis. Tenant shall cause all garbage, waste and refuse to be stored within the Premises until <Stored>thirty (30) minutes </Stored>before closing, except that Tenant shall be permitted, to the extent permitted by law, to place garbage outside the Premises after the time specified in the immediately preceding sentence for pick up prior to <PickUp>6:00 A.M. </PickUp>next following. Garbage shall be placed at the edge of the sidewalk in front of the Premises at the location furthest from he main entrance to the Building or such other location in front of the Building as may be specified by Landlord. </TenantsSoleCost>\n",
|
||||
"<ItsSoleCost> Tenant, at its sole cost and expense, agrees to use all reasonable diligence in accordance with the best prevailing methods for the prevention and extermination of vermin, rats, and mice, mold, fungus, allergens, <Bacterium>bacteria </Bacterium>and all other similar conditions in the Premises. Tenant, at Tenant's expense, shall cause the Premises to be exterminated <Exterminated>from time to time </Exterminated>to the reasonable satisfaction of Landlord and shall employ licensed exterminating companies. Landlord shall not be responsible for any cleaning, waste removal, janitorial, or similar services for the Premises, and Tenant sha ll not be entitled to seek any abatement, setoff or credit from the Landlord in the event any conditions described in this Article are found to exist in the Premises. </ItsSoleCost>\n",
|
||||
"42B. Sidewalk Use and Maintenance\n",
|
||||
"<TheSidewalk> Tenant shall, at its sole cost and expense, keep the sidewalk in front of the Premises 18 inches into the street from the curb clean free of garbage, waste, refuse, excess water, snow, and ice and Tenant shall pay, as additional rent, any fine, cost, or expense caused by Tenant's failure to do so. In the event Tenant operates a sidewalk café, Tenant shall, at its sole cost and expense, maintain, repair, and replace as necessary, the sidewalk in front of the Premises and the metal trapdoor leading to the basement of the Premises, if any. Tenant shall post warning signs and cones on all sides of any side door when in use and attach a safety bar across any such door at all times when open. </TheSidewalk>\n",
|
||||
"<Display> In no event shall Tenant use, or permit to be used, the space adjacent to or any other space outside of the Premises, for display, sale or any other similar undertaking; except [1] in the event of a legal and licensed “street fair” type program or [<Number>2</Number>] if the local zoning, Community Board [if applicable] and other municipal laws, rules and regulations, allow for sidewalk café use and, if such I s the case, said operation shall be in strict accordance with all of the aforesaid requirements and conditions. . In no event shall Tenant use, or permit to be used, any advertising medium and/or loud speaker and/or sound amplifier and/or radio or television broadcast which may be heard outside of the Premises or which does not comply with the reasonable rules and regulations of Landlord which then will be in effect. </Display>\n",
|
||||
"42C. Store Front Maintenance\n",
|
||||
" <TheBulkheadAndSecurityGate> Tenant agrees to wash the storefront, including the bulkhead and security gate, from the top to the ground, monthly or more often as Landlord reasonably requests and make all repairs and replacements as and when deemed necessary by Landlord, to all windows and plate and ot her glass in or about the Premises and the security gate, if any. In case of any default by Tenant in maintaining the storefront as herein provided, Landlord may do so at its own expense and bill the cost thereof to Tenant as additional rent. </TheBulkheadAndSecurityGate>\n",
|
||||
"42D. Music, Noise, and Vibration\n",
|
||||
"4474c92ae7ccec9184ed2fef9f072734\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Query retriever, should return parents (using MMR since that was set as search_type above)\n",
|
||||
"retrieved_parent_docs = retriever.get_relevant_documents(\n",
|
||||
" \"what signs does Birch Street allow on their property?\"\n",
|
||||
")\n",
|
||||
"for chunk in retrieved_parent_docs:\n",
|
||||
" print(chunk.page_content)\n",
|
||||
" print(chunk.metadata[\"id\"])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -15,7 +15,7 @@
|
||||
"1. Create a Dropbox app.\n",
|
||||
"2. Give the app these scope permissions: `files.metadata.read` and `files.content.read`.\n",
|
||||
"3. Generate access token: https://www.dropbox.com/developers/apps/create.\n",
|
||||
"4. `pip install dropbox` (requires `pip install \"unstructured[pdf]\"` for PDF filetype).\n",
|
||||
"4. `pip install dropbox` (requires `pip install unstructured` for PDF filetype).\n",
|
||||
"\n",
|
||||
"## Instructions\n",
|
||||
"\n",
|
||||
|
||||
167
docs/docs/integrations/document_loaders/embaas.ipynb
Normal file
167
docs/docs/integrations/document_loaders/embaas.ipynb
Normal file
@@ -0,0 +1,167 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"# Embaas\n",
|
||||
"[embaas](https://embaas.io) is a fully managed NLP API service that offers features like embedding generation, document text extraction, document to embeddings and more. You can choose a [variety of pre-trained models](https://embaas.io/docs/models/embeddings).\n",
|
||||
"\n",
|
||||
"### Prerequisites\n",
|
||||
"Create a free embaas account at [https://embaas.io/register](https://embaas.io/register) and generate an [API key](https://embaas.io/dashboard/api-keys)\n",
|
||||
"\n",
|
||||
"### Document Text Extraction API\n",
|
||||
"The document text extraction API allows you to extract the text from a given document. The API supports a variety of document formats, including PDF, mp3, mp4 and more. For a full list of supported formats, check out the API docs (link below)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Set API key\n",
|
||||
"embaas_api_key = \"YOUR_API_KEY\"\n",
|
||||
"# or set environment variable\n",
|
||||
"os.environ[\"EMBAAS_API_KEY\"] = \"YOUR_API_KEY\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"#### Using a blob (bytes)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.blob_loaders import Blob\n",
|
||||
"from langchain.document_loaders.embaas import EmbaasBlobLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"blob_loader = EmbaasBlobLoader()\n",
|
||||
"blob = Blob.from_path(\"example.pdf\")\n",
|
||||
"documents = blob_loader.load(blob)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-12T22:19:48.380467Z",
|
||||
"start_time": "2023-06-12T22:19:48.366886Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You can also directly create embeddings with your preferred embeddings model\n",
|
||||
"blob_loader = EmbaasBlobLoader(params={\"model\": \"e5-large-v2\", \"should_embed\": True})\n",
|
||||
"blob = Blob.from_path(\"example.pdf\")\n",
|
||||
"documents = blob_loader.load(blob)\n",
|
||||
"\n",
|
||||
"print(documents[0][\"metadata\"][\"embedding\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"#### Using a file"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.embaas import EmbaasLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"file_loader = EmbaasLoader(file_path=\"example.pdf\")\n",
|
||||
"documents = file_loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-12T22:24:31.894665Z",
|
||||
"start_time": "2023-06-12T22:24:31.880857Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Disable automatic text splitting\n",
|
||||
"file_loader = EmbaasLoader(file_path=\"example.mp3\", params={\"should_chunk\": False})\n",
|
||||
"documents = file_loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"For more detailed information about the embaas document text extraction API, please refer to [the official embaas API documentation](https://embaas.io/api-reference)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
@@ -1,118 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6125a85e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Microsoft OneNote\n",
|
||||
"\n",
|
||||
"This notebook covers how to load documents from `OneNote`.\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"1. Register an application with the [Microsoft identity platform](https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app) instructions.\n",
|
||||
"2. When registration finishes, the Azure portal displays the app registration's Overview pane. You see the Application (client) ID. Also called the `client ID`, this value uniquely identifies your application in the Microsoft identity platform.\n",
|
||||
"3. During the steps you will be following at **item 1**, you can set the redirect URI as `http://localhost:8000/callback`\n",
|
||||
"4. During the steps you will be following at **item 1**, generate a new password (`client_secret`) under Application Secrets section.\n",
|
||||
"5. Follow the instructions at this [document](https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-configure-app-expose-web-apis#add-a-scope) to add the following `SCOPES` (`Notes.Read`) to your application.\n",
|
||||
"6. You need to install the msal and bs4 packages using the commands `pip install msal` and `pip install beautifulsoup4`.\n",
|
||||
"7. At the end of the steps you must have the following values: \n",
|
||||
"- `CLIENT_ID`\n",
|
||||
"- `CLIENT_SECRET`\n",
|
||||
"\n",
|
||||
"## 🧑 Instructions for ingesting your documents from OneNote\n",
|
||||
"\n",
|
||||
"### 🔑 Authentication\n",
|
||||
"\n",
|
||||
"By default, the `OneNoteLoader` expects that the values of `CLIENT_ID` and `CLIENT_SECRET` must be stored as environment variables named `MS_GRAPH_CLIENT_ID` and `MS_GRAPH_CLIENT_SECRET` respectively. You could pass those environment variables through a `.env` file at the root of your application or using the following command in your script.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"os.environ['MS_GRAPH_CLIENT_ID'] = \"YOUR CLIENT ID\"\n",
|
||||
"os.environ['MS_GRAPH_CLIENT_SECRET'] = \"YOUR CLIENT SECRET\"\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This loader uses an authentication called [*on behalf of a user*](https://learn.microsoft.com/en-us/graph/auth-v2-user?context=graph%2Fapi%2F1.0&view=graph-rest-1.0). It is a 2 step authentication with user consent. When you instantiate the loader, it will call will print a url that the user must visit to give consent to the app on the required permissions. The user must then visit this url and give consent to the application. Then the user must copy the resulting page url and paste it back on the console. The method will then return True if the login attempt was successful.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain.document_loaders.onenote import OneNoteLoader\n",
|
||||
"\n",
|
||||
"loader = OneNoteLoader(notebook_name=\"NOTEBOOK NAME\", section_name=\"SECTION NAME\", page_title=\"PAGE TITLE\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Once the authentication has been done, the loader will store a token (`onenote_graph_token.txt`) at `~/.credentials/` folder. This token could be used later to authenticate without the copy/paste steps explained earlier. To use this token for authentication, you need to change the `auth_with_token` parameter to True in the instantiation of the loader.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain.document_loaders.onenote import OneNoteLoader\n",
|
||||
"\n",
|
||||
"loader = OneNoteLoader(notebook_name=\"NOTEBOOK NAME\", section_name=\"SECTION NAME\", page_title=\"PAGE TITLE\", auth_with_token=True)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Alternatively, you can also pass the token directly to the loader. This is useful when you want to authenticate with a token that was generated by another application. For instance, you can use the [Microsoft Graph Explorer](https://developer.microsoft.com/en-us/graph/graph-explorer) to generate a token and then pass it to the loader.\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain.document_loaders.onenote import OneNoteLoader\n",
|
||||
"\n",
|
||||
"loader = OneNoteLoader(notebook_name=\"NOTEBOOK NAME\", section_name=\"SECTION NAME\", page_title=\"PAGE TITLE\", access_token=\"TOKEN\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"### 🗂️ Documents loader\n",
|
||||
"\n",
|
||||
"#### 📑 Loading pages from a OneNote Notebook\n",
|
||||
"\n",
|
||||
"`OneNoteLoader` can load pages from OneNote notebooks stored in OneDrive. You can specify any combination of `notebook_name`, `section_name`, `page_title` to filter for pages under a specific notebook, under a specific section, or with a specific title respectively. For instance, you want to load all pages that are stored under a section called `Recipes` within any of your notebooks OneDrive.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain.document_loaders.onenote import OneNoteLoader\n",
|
||||
"\n",
|
||||
"loader = OneNoteLoader(section_name=\"Recipes\", auth_with_token=True)\n",
|
||||
"documents = loader.load()\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"#### 📑 Loading pages from a list of Page IDs\n",
|
||||
"\n",
|
||||
"Another possibility is to provide a list of `object_ids` for each page you want to load. For that, you will need to query the [Microsoft Graph API](https://developer.microsoft.com/en-us/graph/graph-explorer) to find all the documents ID that you are interested in. This [link](https://learn.microsoft.com/en-us/graph/onenote-get-content#page-collection) provides a list of endpoints that will be helpful to retrieve the documents ID.\n",
|
||||
"\n",
|
||||
"For instance, to retrieve information about all pages that are stored in your notebooks, you need make a request to: `https://graph.microsoft.com/v1.0/me/onenote/pages`. Once you have the list of IDs that you are interested in, then you can instantiate the loader with the following parameters.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from langchain.document_loaders.onenote import OneNoteLoader\n",
|
||||
"\n",
|
||||
"loader = OneNoteLoader(object_ids=[\"ID_1\", \"ID_2\"], auth_with_token=True)\n",
|
||||
"documents = loader.load()\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "bb36fe41",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@@ -11,9 +11,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[Amazon API Gateway](https://aws.amazon.com/api-gateway/) is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any >scale. APIs act as the \"front door\" for applications to access data, business logic, or functionality from your backend services. Using `API Gateway`, you can create RESTful APIs and >WebSocket APIs that enable real-time two-way communication applications. API Gateway supports containerized and serverless workloads, as well as web applications.\n",
|
||||
"[Amazon API Gateway](https://aws.amazon.com/api-gateway/) is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the \"front door\" for applications to access data, business logic, or functionality from your backend services. Using API Gateway, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications. API Gateway supports containerized and serverless workloads, as well as web applications.\n",
|
||||
"\n",
|
||||
">`API Gateway` handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, CORS support, authorization >and access control, throttling, monitoring, and API version management. `API Gateway` has no minimum fees or startup costs. You pay for the API calls you receive and the amount of data >transferred out and, with the `API Gateway` tiered pricing model, you can reduce your cost as your API usage scales."
|
||||
"API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, CORS support, authorization and access control, throttling, monitoring, and API version management. API Gateway has no minimum fees or startup costs. You pay for the API calls you receive and the amount of data transferred out and, with the API Gateway tiered pricing model, you can reduce your cost as your API usage scales."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -11,15 +11,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that offers a choice of \n",
|
||||
"> high-performing foundation models (FMs) from leading AI companies like `AI21 Labs`, `Anthropic`, `Cohere`, \n",
|
||||
"> `Meta`, `Stability AI`, and `Amazon` via a single API, along with a broad set of capabilities you need to \n",
|
||||
"> build generative AI applications with security, privacy, and responsible AI. Using `Amazon Bedrock`, \n",
|
||||
"> you can easily experiment with and evaluate top FMs for your use case, privately customize them with \n",
|
||||
"> your data using techniques such as fine-tuning and `Retrieval Augmented Generation` (`RAG`), and build \n",
|
||||
"> agents that execute tasks using your enterprise systems and data sources. Since `Amazon Bedrock` is \n",
|
||||
"> serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy \n",
|
||||
"> generative AI capabilities into your applications using the AWS services you are already familiar with.\n"
|
||||
"[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -124,7 +116,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
"version": "3.10.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -20,121 +20,27 @@
|
||||
"This example notebook shows how to wrap Databricks endpoints as LLMs in LangChain.\n",
|
||||
"It supports two endpoint types:\n",
|
||||
"* Serving endpoint, recommended for production and development,\n",
|
||||
"* Cluster driver proxy app, recommended for interactive development."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation\n",
|
||||
"\n",
|
||||
"`mlflow >= 2.9 ` is required to run the code in this notebook. If it's not installed, please install it using this command:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"pip install mlflow>=2.9\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Wrapping a serving endpoint: External model\n",
|
||||
"\n",
|
||||
"Prerequisite:\n",
|
||||
"\n",
|
||||
"- Register an OpenAI API key as a secret:\n",
|
||||
"\n",
|
||||
" ```bash\n",
|
||||
" databricks secrets create-scope <scope>\n",
|
||||
" databricks secrets put-secret <scope> openai-api-key --string-value $OPENAI_API_KEY\n",
|
||||
" ```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following code creates a new serving endpoint with OpenAI's GPT-4 model for chat and generates a response using the endpoint."
|
||||
"* Cluster driver proxy app, recommended for iteractive development."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"content='Hello! How can I assist you today?'\n"
|
||||
]
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"application/vnd.databricks.v1+cell": {
|
||||
"cellMetadata": {
|
||||
"byteLimit": 2048000,
|
||||
"rowLimit": 10000
|
||||
},
|
||||
"inputWidgets": {},
|
||||
"nuid": "bf07455f-aac9-4873-a8e7-7952af0f8c82",
|
||||
"showTitle": false,
|
||||
"title": ""
|
||||
}
|
||||
],
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatDatabricks\n",
|
||||
"from langchain.schema.messages import HumanMessage\n",
|
||||
"from mlflow.deployments import get_deploy_client\n",
|
||||
"\n",
|
||||
"client = get_deploy_client(\"databricks\")\n",
|
||||
"\n",
|
||||
"secret = \"secrets/<scope>/openai-api-key\" # replace `<scope>` with your scope\n",
|
||||
"name = \"my-chat\" # rename this if my-chat already exists\n",
|
||||
"client.create_endpoint(\n",
|
||||
" name=name,\n",
|
||||
" config={\n",
|
||||
" \"served_entities\": [\n",
|
||||
" {\n",
|
||||
" \"name\": \"my-chat\",\n",
|
||||
" \"external_model\": {\n",
|
||||
" \"name\": \"gpt-4\",\n",
|
||||
" \"provider\": \"openai\",\n",
|
||||
" \"task\": \"llm/v1/chat\",\n",
|
||||
" \"openai_config\": {\n",
|
||||
" \"openai_api_key\": \"{{\" + secret + \"}}\",\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"chat = ChatDatabricks(\n",
|
||||
" target_uri=\"databricks\",\n",
|
||||
" endpoint=name,\n",
|
||||
" temperature=0.1,\n",
|
||||
")\n",
|
||||
"chat([HumanMessage(content=\"hello\")])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Wrapping a serving endpoint: Foundation model\n",
|
||||
"\n",
|
||||
"The following code uses the `databricks-bge-large-en` serving endpoint (no endpoint creation is required) to generate embeddings from input text."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[0.051055908203125, 0.007221221923828125, 0.003879547119140625]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.embeddings import DatabricksEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = DatabricksEmbeddings(endpoint=\"databricks-bge-large-en\")\n",
|
||||
"embeddings.embed_query(\"hello\")[:3]"
|
||||
"from langchain.llms import Databricks"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -150,7 +56,7 @@
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Wrapping a serving endpoint: Custom model\n",
|
||||
"## Wrapping a serving endpoint\n",
|
||||
"\n",
|
||||
"Prerequisites:\n",
|
||||
"* An LLM was registered and deployed to [a Databricks serving endpoint](https://docs.databricks.com/machine-learning/model-serving/index.html).\n",
|
||||
@@ -191,8 +97,6 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import Databricks\n",
|
||||
"\n",
|
||||
"# If running a Databricks notebook attached to an interactive cluster in \"single user\"\n",
|
||||
"# or \"no isolation shared\" mode, you only need to specify the endpoint name to create\n",
|
||||
"# a `Databricks` instance to query a serving endpoint in the same workspace.\n",
|
||||
@@ -620,7 +524,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.18"
|
||||
"version": "3.10.10"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
|
||||
@@ -550,7 +550,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the first example, supply the path to the specified `json.gbnf` file in order to produce JSON:"
|
||||
"In the first example, supply the path to the specifed `json.gbnf` file in order to produce JSON:"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -912,7 +912,7 @@
|
||||
"source": [
|
||||
"## `Cassandra` caches\n",
|
||||
"\n",
|
||||
"You can use Cassandra / Astra DB through CQL for caching LLM responses, choosing from the exact-match `CassandraCache` or the (vector-similarity-based) `CassandraSemanticCache`.\n",
|
||||
"You can use Cassandra / Astra DB for caching LLM responses, choosing from the exact-match `CassandraCache` or the (vector-similarity-based) `CassandraSemanticCache`.\n",
|
||||
"\n",
|
||||
"Let's see both in action in the following cells."
|
||||
]
|
||||
@@ -924,7 +924,7 @@
|
||||
"source": [
|
||||
"#### Connect to the DB\n",
|
||||
"\n",
|
||||
"First you need to establish a `Session` to the DB and to specify a _keyspace_ for the cache table(s). The following gets you connected to Astra DB through CQL (see e.g. [here](https://cassio.org/start_here/#vector-database) for more backends and connection options)."
|
||||
"First you need to establish a `Session` to the DB and to specify a _keyspace_ for the cache table(s). The following gets you started with an Astra DB instance (see e.g. [here](https://cassio.org/start_here/#vector-database) for more backends and connection options)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1132,214 +1132,6 @@
|
||||
"print(llm(\"How come we always see one face of the moon?\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8712f8fc-bb89-4164-beb9-c672778bbd91",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## `Astra DB` Caches"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "173041d9-e4af-4f68-8461-d302bfc7e1bd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can easily use [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) as an LLM cache, with either the \"exact\" or the \"semantic-based\" cache.\n",
|
||||
"\n",
|
||||
"Make sure you have a running database (it must be a Vector-enabled database to use the Semantic cache) and get the required credentials on your Astra dashboard:\n",
|
||||
"\n",
|
||||
"- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n",
|
||||
"- the Token looks like `AstraCS:6gBhNmsk135....`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "feb510b6-99a3-4228-8e11-563051f8178e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"ASTRA_DB_API_ENDPOINT = https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com\n",
|
||||
"ASTRA_DB_APPLICATION_TOKEN = ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n",
|
||||
"ASTRA_DB_APPLICATION_TOKEN = getpass.getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ee6d587f-4b7c-43f4-9e90-5129c842a143",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Astra DB exact LLM cache\n",
|
||||
"\n",
|
||||
"This will avoid invoking the LLM when the supplied prompt is _exactly_ the same as one encountered already:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "ad63c146-ee41-4896-90ee-29fcc39f0ed5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.cache import AstraDBCache\n",
|
||||
"from langchain.globals import set_llm_cache\n",
|
||||
"\n",
|
||||
"set_llm_cache(\n",
|
||||
" AstraDBCache(\n",
|
||||
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
|
||||
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
|
||||
" )\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "83e0fb02-e8eb-4483-9eb1-55b5e14c4487",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"There is no definitive answer to this question as it depends on the interpretation of the terms \"true fakery\" and \"fake truth\". However, one possible interpretation is that a true fakery is a counterfeit or imitation that is intended to deceive, whereas a fake truth is a false statement that is presented as if it were true.\n",
|
||||
"CPU times: user 70.8 ms, sys: 4.13 ms, total: 74.9 ms\n",
|
||||
"Wall time: 2.06 s\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"\n",
|
||||
"print(llm(\"Is a true fakery the same as a fake truth?\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "4d20d498-fe28-4e26-8531-2b31c52ee687",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"There is no definitive answer to this question as it depends on the interpretation of the terms \"true fakery\" and \"fake truth\". However, one possible interpretation is that a true fakery is a counterfeit or imitation that is intended to deceive, whereas a fake truth is a false statement that is presented as if it were true.\n",
|
||||
"CPU times: user 15.1 ms, sys: 3.7 ms, total: 18.8 ms\n",
|
||||
"Wall time: 531 ms\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"\n",
|
||||
"print(llm(\"Is a true fakery the same as a fake truth?\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "524b94fa-6162-4880-884d-d008749d14e2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Astra DB Semantic cache\n",
|
||||
"\n",
|
||||
"This cache will do a semantic similarity search and return a hit if it finds a cached entry that is similar enough, For this, you need to provide an `Embeddings` instance of your choice."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "dc329c55-1cc4-4b74-94f9-61f8990fb214",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embedding = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "83952a90-ab14-4e59-87c0-d2bdc1d43e43",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.cache import AstraDBSemanticCache\n",
|
||||
"\n",
|
||||
"set_llm_cache(\n",
|
||||
" AstraDBSemanticCache(\n",
|
||||
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
|
||||
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
|
||||
" embedding=embedding,\n",
|
||||
" collection_name=\"demo_semantic_cache\",\n",
|
||||
" )\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "d74b249a-94d5-42d0-af74-f7565a994dea",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"There is no definitive answer to this question since it presupposes a great deal about the nature of truth itself, which is a matter of considerable philosophical debate. It is possible, however, to construct scenarios in which something could be considered true despite being false, such as if someone sincerely believes something to be true even though it is not.\n",
|
||||
"CPU times: user 65.6 ms, sys: 15.3 ms, total: 80.9 ms\n",
|
||||
"Wall time: 2.72 s\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"\n",
|
||||
"print(llm(\"Are there truths that are false?\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "11973d73-d2f4-46bd-b229-1c589df9b788",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"There is no definitive answer to this question since it presupposes a great deal about the nature of truth itself, which is a matter of considerable philosophical debate. It is possible, however, to construct scenarios in which something could be considered true despite being false, such as if someone sincerely believes something to be true even though it is not.\n",
|
||||
"CPU times: user 29.3 ms, sys: 6.21 ms, total: 35.5 ms\n",
|
||||
"Wall time: 1.03 s\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"\n",
|
||||
"print(llm(\"Is is possible that something false can be also true?\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0c69d84d",
|
||||
|
||||
@@ -1,124 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "404758628c7b20f6",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"# Volc Engine Maas\n",
|
||||
"\n",
|
||||
"This notebook provides you with a guide on how to get started with Volc Engine's MaaS llm models."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "946db204b33c2ef7",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Install the package\n",
|
||||
"!pip install volcengine"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "51e7f967cb78f5b7",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-11-27T10:40:26.897649Z",
|
||||
"start_time": "2023-11-27T10:40:26.552589Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import VolcEngineMaasLLM\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "139667d44689f9e0",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-11-27T10:40:27.938517Z",
|
||||
"start_time": "2023-11-27T10:40:27.861324Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = VolcEngineMaasLLM(volc_engine_maas_ak=\"your ak\", volc_engine_maas_sk=\"your sk\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e84ebc4feedcc739",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"or you can set access_key and secret_key in your environment variables\n",
|
||||
"```bash\n",
|
||||
"export VOLC_ACCESSKEY=YOUR_AK\n",
|
||||
"export VOLC_SECRETKEY=YOUR_SK\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "35da18414ad17aa0",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-11-27T10:41:35.528526Z",
|
||||
"start_time": "2023-11-27T10:41:32.562238Z"
|
||||
},
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": "'好的,下面是一个笑话:\\n\\n大学暑假我配了隐形眼镜,回家给爷爷说,我现在配了隐形眼镜。\\n爷爷让我给他看看,于是,我用小镊子夹了一片给爷爷看。\\n爷爷看完便准备出门,边走还边说:“真高级啊,还真是隐形眼镜!”\\n等爷爷出去后我才发现,我刚没夹起来!'"
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain = PromptTemplate.from_template(\"给我讲个笑话\") | llm | StrOutputParser()\n",
|
||||
"chain.invoke({})"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,297 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "70996d8a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# WatsonxLLM\n",
|
||||
"\n",
|
||||
"[WatsonxLLM](https://ibm.github.io/watson-machine-learning-sdk/fm_extensions.html) is wrapper for IBM [watsonx.ai](https://www.ibm.com/products/watsonx-ai) foundation models.\n",
|
||||
"This example shows how to communicate with watsonx.ai models using LangChain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ea35b2b7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Install the package [`ibm_watson_machine_learning`](https://ibm.github.io/watson-machine-learning-sdk/install.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2f1fff4e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install ibm_watson_machine_learning"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f406e092",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This cell defines the WML credentials required to work with watsonx Foundation Model inferencing.\n",
|
||||
"\n",
|
||||
"**Action:** Provide the IBM Cloud user API key. For details, see\n",
|
||||
"[documentation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "11d572a1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"watsonx_api_key = getpass()\n",
|
||||
"os.environ[\"WATSONX_APIKEY\"] = watsonx_api_key"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e36acbef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load the model\n",
|
||||
"You might need to adjust model `parameters` for different models or tasks, to do so please refer to [documentation](https://ibm.github.io/watson-machine-learning-sdk/model.html#metanames.GenTextParamsMetaNames)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "407cd500",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams\n",
|
||||
"\n",
|
||||
"parameters = {\n",
|
||||
" GenParams.DECODING_METHOD: \"sample\",\n",
|
||||
" GenParams.MAX_NEW_TOKENS: 100,\n",
|
||||
" GenParams.MIN_NEW_TOKENS: 1,\n",
|
||||
" GenParams.TEMPERATURE: 0.5,\n",
|
||||
" GenParams.TOP_K: 50,\n",
|
||||
" GenParams.TOP_P: 1,\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2b586538",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Initialize the `WatsonxLLM` class with previous set params."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "359898de",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import WatsonxLLM\n",
|
||||
"\n",
|
||||
"watsonx_llm = WatsonxLLM(\n",
|
||||
" model_id=\"google/flan-ul2\",\n",
|
||||
" url=\"https://us-south.ml.cloud.ibm.com\",\n",
|
||||
" project_id=\"***\",\n",
|
||||
" params=parameters,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2202f4e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Alternatively you can use Cloud Pak for Data credentials. For details, see [documentation](https://ibm.github.io/watson-machine-learning-sdk/setup_cpd.html).\n",
|
||||
"```\n",
|
||||
"watsonx_llm = WatsonxLLM(\n",
|
||||
" model_id='google/flan-ul2',\n",
|
||||
" url=\"***\",\n",
|
||||
" username=\"***\",\n",
|
||||
" password=\"***\",\n",
|
||||
" instance_id=\"openshift\",\n",
|
||||
" version=\"4.8\",\n",
|
||||
" project_id='***',\n",
|
||||
" params=parameters\n",
|
||||
")\n",
|
||||
"``` "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c25ecbd1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create Chain\n",
|
||||
"Create `PromptTemplate` objects which will be responsible for creating a random question."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "c7d80c05",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"template = \"Generate a random question about {topic}: Question: \"\n",
|
||||
"prompt = PromptTemplate.from_template(template)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "79056d8e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Provide a topic and run the `LLMChain`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "dc076c56",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'How many breeds of dog are there?'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=watsonx_llm)\n",
|
||||
"llm_chain.run(\"dog\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f571001d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Calling the Model Directly\n",
|
||||
"To obtain completions, you can can the model directly using string prompt."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "beea2b5b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'dog'"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Calling a single prompt\n",
|
||||
"\n",
|
||||
"watsonx_llm(\"Who is man's best friend?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "8ab1a25a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LLMResult(generations=[[Generation(text='greyhounds', generation_info={'generated_token_count': 4, 'input_token_count': 8, 'finish_reason': 'eos_token'})], [Generation(text='The Basenji is a dog breed from South Africa.', generation_info={'generated_token_count': 13, 'input_token_count': 7, 'finish_reason': 'eos_token'})]], llm_output={'model_id': 'google/flan-ul2'}, run=[RunInfo(run_id=UUID('03c73a42-db68-428e-ab8d-8ae10abc84fc')), RunInfo(run_id=UUID('c289f67a-87d6-4c8b-a8b7-0b5012c94ca8'))])"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Calling multiple prompts\n",
|
||||
"\n",
|
||||
"watsonx_llm.generate(\n",
|
||||
" [\n",
|
||||
" \"The fastest dog in the world?\",\n",
|
||||
" \"Describe your chosen dog breed\",\n",
|
||||
" ]\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d2c9da33",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Streaming the Model output \n",
|
||||
"\n",
|
||||
"You can stream the model output."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 45,
|
||||
"id": "3f63166a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"The golden retriever is my favorite dog because it is very friendly and good with children."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for chunk in watsonx_llm.stream(\n",
|
||||
" \"Describe your favorite breed of dog and why it is your favorite.\"\n",
|
||||
"):\n",
|
||||
" print(chunk, end=\"\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.18"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,147 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "90cd3ded",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Astra DB \n",
|
||||
"\n",
|
||||
"> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API.\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use Astra DB to store chat message history."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f507f58b-bf22-4a48-8daf-68d869bcd1ba",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setting up\n",
|
||||
"\n",
|
||||
"To run this notebook you need a running Astra DB. Get the connection secrets on your Astra dashboard:\n",
|
||||
"\n",
|
||||
"- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`;\n",
|
||||
"- the Token looks like `AstraCS:6gBhNmsk135...`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d7092199",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install --quiet \"astrapy>=0.6.2\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3d97b65",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Set up the database connection parameters and secrets"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "163d97f0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"ASTRA_DB_API_ENDPOINT = https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com\n",
|
||||
"ASTRA_DB_APPLICATION_TOKEN = ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n",
|
||||
"ASTRA_DB_APPLICATION_TOKEN = getpass.getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "55860b2d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Depending on whether local or cloud-based Astra DB, create the corresponding database connection \"Session\" object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "36c163e8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "d15e3302",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory import AstraDBChatMessageHistory\n",
|
||||
"\n",
|
||||
"message_history = AstraDBChatMessageHistory(\n",
|
||||
" session_id=\"test-session\",\n",
|
||||
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
|
||||
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"message_history.add_user_message(\"hi!\")\n",
|
||||
"\n",
|
||||
"message_history.add_ai_message(\"whats up?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "64fc465e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[HumanMessage(content='hi!'), AIMessage(content='whats up?')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"message_history.messages"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,22 +1,11 @@
|
||||
# AWS
|
||||
|
||||
The `LangChain` integrations related to [Amazon AWS](https://aws.amazon.com/) platform.
|
||||
All functionality related to [Amazon AWS](https://aws.amazon.com/) platform
|
||||
|
||||
## LLMs
|
||||
|
||||
### Bedrock
|
||||
|
||||
>[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that offers a choice of
|
||||
> high-performing foundation models (FMs) from leading AI companies like `AI21 Labs`, `Anthropic`, `Cohere`,
|
||||
> `Meta`, `Stability AI`, and `Amazon` via a single API, along with a broad set of capabilities you need to
|
||||
> build generative AI applications with security, privacy, and responsible AI. Using `Amazon Bedrock`,
|
||||
> you can easily experiment with and evaluate top FMs for your use case, privately customize them with
|
||||
> your data using techniques such as fine-tuning and `Retrieval Augmented Generation` (`RAG`), and build
|
||||
> agents that execute tasks using your enterprise systems and data sources. Since `Amazon Bedrock` is
|
||||
> serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy
|
||||
> generative AI capabilities into your applications using the AWS services you are already familiar with.
|
||||
|
||||
|
||||
See a [usage example](/docs/integrations/llms/bedrock).
|
||||
|
||||
```python
|
||||
@@ -25,28 +14,32 @@ from langchain.llms.bedrock import Bedrock
|
||||
|
||||
### Amazon API Gateway
|
||||
|
||||
>[Amazon API Gateway](https://aws.amazon.com/api-gateway/) is a fully managed service that makes it easy for
|
||||
> developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the "front door"
|
||||
> for applications to access data, business logic, or functionality from your backend services. Using
|
||||
> `API Gateway`, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication
|
||||
> applications. `API Gateway` supports containerized and serverless workloads, as well as web applications.
|
||||
>
|
||||
> `API Gateway` handles all the tasks involved in accepting and processing up to hundreds of thousands of
|
||||
> concurrent API calls, including traffic management, CORS support, authorization and access control,
|
||||
> throttling, monitoring, and API version management. `API Gateway` has no minimum fees or startup costs.
|
||||
> You pay for the API calls you receive and the amount of data transferred out and, with the `API Gateway`
|
||||
> tiered pricing model, you can reduce your cost as your API usage scales.
|
||||
[Amazon API Gateway](https://aws.amazon.com/api-gateway/) is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the "front door" for applications to access data, business logic, or functionality from your backend services. Using API Gateway, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications. API Gateway supports containerized and serverless workloads, as well as web applications.
|
||||
|
||||
See a [usage example](/docs/integrations/llms/amazon_api_gateway).
|
||||
API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, CORS support, authorization and access control, throttling, monitoring, and API version management. API Gateway has no minimum fees or startup costs. You pay for the API calls you receive and the amount of data transferred out and, with the API Gateway tiered pricing model, you can reduce your cost as your API usage scales.
|
||||
|
||||
See a [usage example](/docs/integrations/llms/amazon_api_gateway_example).
|
||||
|
||||
```python
|
||||
from langchain.llms import AmazonAPIGateway
|
||||
|
||||
api_url = "https://<api_gateway_id>.execute-api.<region>.amazonaws.com/LATEST/HF"
|
||||
# These are sample parameters for Falcon 40B Instruct Deployed from Amazon SageMaker JumpStart
|
||||
model_kwargs = {
|
||||
"max_new_tokens": 100,
|
||||
"num_return_sequences": 1,
|
||||
"top_k": 50,
|
||||
"top_p": 0.95,
|
||||
"do_sample": False,
|
||||
"return_full_text": True,
|
||||
"temperature": 0.2,
|
||||
}
|
||||
llm = AmazonAPIGateway(api_url=api_url, model_kwargs=model_kwargs)
|
||||
```
|
||||
|
||||
### SageMaker Endpoint
|
||||
|
||||
>[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a system that can build, train, and deploy
|
||||
> machine learning (ML) models with fully managed infrastructure, tools, and workflows.
|
||||
>[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a system that can build, train, and deploy machine learning (ML) models with fully managed infrastructure, tools, and workflows.
|
||||
|
||||
We use `SageMaker` to host our model and expose it as the `SageMaker Endpoint`.
|
||||
|
||||
@@ -57,16 +50,6 @@ from langchain.llms import SagemakerEndpoint
|
||||
from langchain.llms.sagemaker_endpoint import LLMContentHandler
|
||||
```
|
||||
|
||||
## Chat models
|
||||
|
||||
### Bedrock Chat
|
||||
|
||||
See a [usage example](/docs/integrations/chat/bedrock).
|
||||
|
||||
```python
|
||||
from langchain.chat_models import BedrockChat
|
||||
```
|
||||
|
||||
## Text Embedding Models
|
||||
|
||||
### Bedrock
|
||||
@@ -84,32 +67,11 @@ from langchain.embeddings import SagemakerEndpointEmbeddings
|
||||
from langchain.llms.sagemaker_endpoint import ContentHandlerBase
|
||||
```
|
||||
|
||||
## Chains
|
||||
|
||||
### Amazon Comprehend Moderation Chain
|
||||
|
||||
>[Amazon Comprehend](https://aws.amazon.com/comprehend/) is a natural-language processing (NLP) service that
|
||||
> uses machine learning to uncover valuable insights and connections in text.
|
||||
|
||||
|
||||
We need to install the `boto3` and `nltk` libraries.
|
||||
|
||||
```bash
|
||||
pip install boto3 nltk
|
||||
```
|
||||
|
||||
See a [usage example](/docs/guides/safety/amazon_comprehend_chain).
|
||||
|
||||
```python
|
||||
from langchain_experimental.comprehend_moderation import AmazonComprehendModerationChain
|
||||
```
|
||||
|
||||
## Document loaders
|
||||
|
||||
### AWS S3 Directory and File
|
||||
|
||||
>[Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html)
|
||||
> is an object storage service.
|
||||
>[Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html) is an object storage service.
|
||||
>[AWS S3 Directory](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html)
|
||||
>[AWS S3 Buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingBucket.html)
|
||||
|
||||
@@ -121,17 +83,6 @@ See a [usage example for S3FileLoader](/docs/integrations/document_loaders/aws_s
|
||||
from langchain.document_loaders import S3DirectoryLoader, S3FileLoader
|
||||
```
|
||||
|
||||
### Amazon Textract
|
||||
|
||||
>[Amazon Textract](https://docs.aws.amazon.com/managedservices/latest/userguide/textract.html) is a machine
|
||||
> learning (ML) service that automatically extracts text, handwriting, and data from scanned documents.
|
||||
|
||||
See a [usage example](/docs/integrations/document_loaders/amazon_textract).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import AmazonTextractPDFLoader
|
||||
```
|
||||
|
||||
## Memory
|
||||
|
||||
### AWS DynamoDB
|
||||
@@ -152,112 +103,3 @@ See a [usage example](/docs/integrations/memory/aws_dynamodb).
|
||||
```python
|
||||
from langchain.memory import DynamoDBChatMessageHistory
|
||||
```
|
||||
|
||||
## Retrievers
|
||||
|
||||
### Amazon Kendra
|
||||
|
||||
> [Amazon Kendra](https://docs.aws.amazon.com/kendra/latest/dg/what-is-kendra.html) is an intelligent search service
|
||||
> provided by `Amazon Web Services` (`AWS`). It utilizes advanced natural language processing (NLP) and machine
|
||||
> learning algorithms to enable powerful search capabilities across various data sources within an organization.
|
||||
> `Kendra` is designed to help users find the information they need quickly and accurately,
|
||||
> improving productivity and decision-making.
|
||||
|
||||
> With `Kendra`, we can search across a wide range of content types, including documents, FAQs, knowledge bases,
|
||||
> manuals, and websites. It supports multiple languages and can understand complex queries, synonyms, and
|
||||
> contextual meanings to provide highly relevant search results.
|
||||
|
||||
We need to install the `boto3` library.
|
||||
|
||||
```bash
|
||||
pip install boto3
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/retrievers/amazon_kendra_retriever).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import AmazonKendraRetriever
|
||||
```
|
||||
|
||||
### Amazon Bedrock (Knowledge Bases)
|
||||
|
||||
> [Knowledge bases for Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/) is an
|
||||
> `Amazon Web Services` (`AWS`) offering which lets you quickly build RAG applications by using your
|
||||
> private data to customize foundation model response.
|
||||
|
||||
We need to install the `boto3` library.
|
||||
|
||||
```bash
|
||||
pip install boto3
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/retrievers/amazon_bedrock_knowledge_bases).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import AmazonKnowledgeBasesRetriever
|
||||
```
|
||||
|
||||
## Vector stores
|
||||
|
||||
### Amazon OpenSearch Service
|
||||
|
||||
> [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) performs
|
||||
> interactive log analytics, real-time application monitoring, website search, and more. `OpenSearch` is
|
||||
> an open source,
|
||||
> distributed search and analytics suite derived from `Elasticsearch`. `Amazon OpenSearch Service` offers the
|
||||
> latest versions of `OpenSearch`, support for many versions of `Elasticsearch`, as well as
|
||||
> visualization capabilities powered by `OpenSearch Dashboards` and `Kibana`.
|
||||
|
||||
We need to install several python libraries.
|
||||
|
||||
```bash
|
||||
pip install boto3 requests requests-aws4auth
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/vectorstores/opensearch#using-aos-amazon-opensearch-service).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import OpenSearchVectorSearch
|
||||
```
|
||||
|
||||
## Tools
|
||||
|
||||
### AWS Lambda
|
||||
|
||||
>[`Amazon AWS Lambda`](https://aws.amazon.com/pm/lambda/) is a serverless computing service provided by
|
||||
> `Amazon Web Services` (`AWS`). It helps developers to build and run applications and services without
|
||||
> provisioning or managing servers. This serverless architecture enables you to focus on writing and
|
||||
> deploying code, while AWS automatically takes care of scaling, patching, and managing the
|
||||
> infrastructure required to run your applications.
|
||||
|
||||
We need to install `boto3` python library.
|
||||
|
||||
```bash
|
||||
pip install boto3
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/tools/awslambda).
|
||||
|
||||
|
||||
## Callbacks
|
||||
|
||||
### SageMaker Tracking
|
||||
|
||||
>[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a fully managed service that is used to quickly
|
||||
> and easily build, train and deploy machine learning (ML) models.
|
||||
|
||||
>[Amazon SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html) is a capability
|
||||
> of `Amazon SageMaker` that lets you organize, track,
|
||||
> compare and evaluate ML experiments and model versions.
|
||||
|
||||
We need to install several python libraries.
|
||||
|
||||
```bash
|
||||
pip install google-search-results sagemaker
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/callbacks/sagemaker_tracking).
|
||||
|
||||
```python
|
||||
from langchain.callbacks import SageMakerCallbackHandler
|
||||
```
|
||||
|
||||
@@ -1,136 +0,0 @@
|
||||
# Hugging Face
|
||||
|
||||
All functionality related to the [Hugging Face Platform](https://huggingface.co/).
|
||||
|
||||
## LLMs
|
||||
|
||||
### Hugging Face Hub
|
||||
|
||||
>The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is a platform
|
||||
> with over 350k models, 75k datasets, and 150k demo apps (Spaces), all open source
|
||||
> and publicly available, in an online platform where people can easily
|
||||
> collaborate and build ML together. The Hub works as a central place where anyone
|
||||
> can explore, experiment, collaborate, and build technology with Machine Learning.
|
||||
|
||||
To use, we should have the `huggingface_hub` python [package installed](https://huggingface.co/docs/huggingface_hub/installation).
|
||||
|
||||
```bash
|
||||
pip install huggingface_hub
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/llms/huggingface_hub).
|
||||
|
||||
```python
|
||||
from langchain.llms import HuggingFaceHub
|
||||
```
|
||||
|
||||
### Hugging Face Local Pipelines
|
||||
|
||||
Hugging Face models can be run locally through the `HuggingFacePipeline` class.
|
||||
|
||||
We need to install `transformers` python package.
|
||||
|
||||
```bash
|
||||
pip install transformers
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/llms/huggingface_pipelines).
|
||||
|
||||
```python
|
||||
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
|
||||
```
|
||||
|
||||
### Hugging Face TextGen Inference
|
||||
|
||||
>[Text Generation Inference](https://github.com/huggingface/text-generation-inference) is
|
||||
> a Rust, Python and gRPC server for text generation inference. Used in production at
|
||||
> [HuggingFace](https://huggingface.co/) to power LLMs api-inference widgets.
|
||||
|
||||
We need to install `text_generation` python package.
|
||||
|
||||
```bash
|
||||
pip install text_generation
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/llms/huggingface_textgen_inference).
|
||||
|
||||
```python
|
||||
from langchain.llms import HuggingFaceTextGenInference
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Document Loaders
|
||||
|
||||
### Hugging Face dataset
|
||||
|
||||
>[Hugging Face Hub](https://huggingface.co/docs/hub/index) is home to over 75,000
|
||||
> [datasets](https://huggingface.co/docs/hub/index#datasets) in more than 100 languages
|
||||
> that can be used for a broad range of tasks across NLP, Computer Vision, and Audio.
|
||||
> They used for a diverse range of tasks such as translation, automatic speech
|
||||
> recognition, and image classification.
|
||||
|
||||
We need to install `datasets` python package.
|
||||
|
||||
```bash
|
||||
pip install datasets
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/document_loaders/hugging_face_dataset).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders.hugging_face_dataset import HuggingFaceDatasetLoader
|
||||
```
|
||||
|
||||
|
||||
## Embedding Models
|
||||
|
||||
### Hugging Face Hub
|
||||
|
||||
>The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is a platform
|
||||
> with over 350k models, 75k datasets, and 150k demo apps (Spaces), all open source
|
||||
> and publicly available, in an online platform where people can easily
|
||||
> collaborate and build ML together. The Hub works as a central place where anyone
|
||||
> can explore, experiment, collaborate, and build technology with Machine Learning.
|
||||
|
||||
We need to install the `sentence_transformers` python package.
|
||||
|
||||
```bash
|
||||
pip install sentence_transformers
|
||||
```
|
||||
|
||||
|
||||
#### HuggingFaceEmbeddings
|
||||
|
||||
See a [usage example](/docs/integrations/text_embedding/huggingfacehub).
|
||||
|
||||
```python
|
||||
from langchain.embeddings import HuggingFaceEmbeddings
|
||||
```
|
||||
#### HuggingFaceInstructEmbeddings
|
||||
|
||||
See a [usage example](/docs/integrations/text_embedding/instruct_embeddings).
|
||||
|
||||
```python
|
||||
from langchain.embeddings import HuggingFaceInstructEmbeddings
|
||||
```
|
||||
|
||||
|
||||
## Tools
|
||||
|
||||
### Hugging Face Hub Tools
|
||||
|
||||
>[Hugging Face Tools](https://huggingface.co/docs/transformers/v4.29.0/en/custom_tools)
|
||||
> support text I/O and are loaded using the `load_huggingface_tool` function.
|
||||
|
||||
We need to install several python packages.
|
||||
|
||||
```bash
|
||||
pip install transformers huggingface_hub
|
||||
```
|
||||
|
||||
See a [usage example](/docs/integrations/tools/huggingface_tools).
|
||||
|
||||
```python
|
||||
from langchain.agents import load_huggingface_tool
|
||||
```
|
||||
@@ -81,7 +81,6 @@ See a [usage example for the Azure Files](/docs/integrations/document_loaders/az
|
||||
from langchain.document_loaders import AzureBlobStorageFileLoader
|
||||
```
|
||||
|
||||
|
||||
### Microsoft OneDrive
|
||||
|
||||
>[Microsoft OneDrive](https://en.wikipedia.org/wiki/OneDrive) (formerly `SkyDrive`) is a file-hosting service operated by Microsoft.
|
||||
@@ -98,7 +97,6 @@ See a [usage example](/docs/integrations/document_loaders/microsoft_onedrive).
|
||||
from langchain.document_loaders import OneDriveLoader
|
||||
```
|
||||
|
||||
|
||||
### Microsoft Word
|
||||
|
||||
>[Microsoft Word](https://www.microsoft.com/en-us/microsoft-365/word) is a word processor developed by Microsoft.
|
||||
@@ -110,48 +108,6 @@ from langchain.document_loaders import UnstructuredWordDocumentLoader
|
||||
```
|
||||
|
||||
|
||||
### Microsoft Excel
|
||||
|
||||
>[Microsoft Excel](https://en.wikipedia.org/wiki/Microsoft_Excel) is a spreadsheet editor developed by
|
||||
> Microsoft for Windows, macOS, Android, iOS and iPadOS.
|
||||
> It features calculation or computation capabilities, graphing tools, pivot tables, and a macro programming
|
||||
> language called Visual Basic for Applications (VBA). Excel forms part of the Microsoft 365 suite of software.
|
||||
|
||||
The `UnstructuredExcelLoader` is used to load `Microsoft Excel` files. The loader works with both `.xlsx` and `.xls` files.
|
||||
The page content will be the raw text of the Excel file. If you use the loader in `"elements"` mode, an HTML
|
||||
representation of the Excel file will be available in the document metadata under the `text_as_html` key.
|
||||
|
||||
See a [usage example](/docs/integrations/document_loaders/excel).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import UnstructuredExcelLoader
|
||||
```
|
||||
|
||||
|
||||
### Microsoft SharePoint
|
||||
|
||||
>[Microsoft SharePoint](https://en.wikipedia.org/wiki/SharePoint) is a website-based collaboration system
|
||||
> that uses workflow applications, “list” databases, and other web parts and security features to
|
||||
> empower business teams to work together developed by Microsoft.
|
||||
|
||||
See a [usage example](/docs/integrations/document_loaders/microsoft_sharepoint).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders.sharepoint import SharePointLoader
|
||||
```
|
||||
|
||||
|
||||
### Microsoft PowerPoint
|
||||
|
||||
>[Microsoft PowerPoint](https://en.wikipedia.org/wiki/Microsoft_PowerPoint) is a presentation program by Microsoft.
|
||||
|
||||
See a [usage example](/docs/integrations/document_loaders/microsoft_powerpoint).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import UnstructuredPowerPointLoader
|
||||
```
|
||||
|
||||
|
||||
## Vector stores
|
||||
|
||||
### Azure Cosmos DB
|
||||
|
||||
@@ -99,10 +99,3 @@ See a [usage example](/docs/guides/safety/moderation).
|
||||
from langchain.chains import OpenAIModerationChain
|
||||
```
|
||||
|
||||
## Adapter
|
||||
|
||||
See a [usage example](/docs/integrations/adapters/openai).
|
||||
|
||||
```python
|
||||
from langchain.adapters import openai as lc_openai
|
||||
```
|
||||
|
||||
@@ -29,47 +29,6 @@ vector_store = AstraDB(
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/vectorstores/astradb).
|
||||
|
||||
### LLM Cache
|
||||
|
||||
```python
|
||||
from langchain.globals import set_llm_cache
|
||||
from langchain.cache import AstraDBCache
|
||||
set_llm_cache(AstraDBCache(
|
||||
api_endpoint="...",
|
||||
token="...",
|
||||
))
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/llms/llm_caching) (scroll to the Astra DB section).
|
||||
|
||||
|
||||
### Semantic LLM Cache
|
||||
|
||||
```python
|
||||
from langchain.globals import set_llm_cache
|
||||
from langchain.cache import AstraDBSemanticCache
|
||||
set_llm_cache(AstraDBSemanticCache(
|
||||
embedding=my_embedding,
|
||||
api_endpoint="...",
|
||||
token="...",
|
||||
))
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/llms/llm_caching) (scroll to the appropriate section).
|
||||
|
||||
### Chat message history
|
||||
|
||||
```python
|
||||
from langchain.memory import AstraDBChatMessageHistory
|
||||
message_history = AstraDBChatMessageHistory(
|
||||
session_id="test-session"
|
||||
api_endpoint="...",
|
||||
token="...",
|
||||
)
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/memory/astradb_chat_message_history).
|
||||
|
||||
|
||||
## Apache Cassandra and Astra DB through CQL
|
||||
|
||||
|
||||
@@ -7,8 +7,9 @@ Databricks embraces the LangChain ecosystem in various ways:
|
||||
|
||||
1. Databricks connector for the SQLDatabase Chain: SQLDatabase.from_databricks() provides an easy way to query your data on Databricks through LangChain
|
||||
2. Databricks MLflow integrates with LangChain: Tracking and serving LangChain applications with fewer steps
|
||||
3. Databricks as an LLM provider: Deploy your fine-tuned LLMs on Databricks via serving endpoints or cluster driver proxy apps, and query it as langchain.llms.Databricks
|
||||
4. Databricks Dolly: Databricks open-sourced Dolly which allows for commercial use, and can be accessed through the Hugging Face Hub
|
||||
3. Databricks MLflow AI Gateway
|
||||
4. Databricks as an LLM provider: Deploy your fine-tuned LLMs on Databricks via serving endpoints or cluster driver proxy apps, and query it as langchain.llms.Databricks
|
||||
5. Databricks Dolly: Databricks open-sourced Dolly which allows for commercial use, and can be accessed through the Hugging Face Hub
|
||||
|
||||
Databricks connector for the SQLDatabase Chain
|
||||
----------------------------------------------
|
||||
@@ -24,58 +25,19 @@ Databricks provides a fully managed and hosted version of MLflow integrated with
|
||||
|
||||
Databricks MLflow makes it more convenient to develop LangChain applications on Databricks. For MLflow tracking, you don't need to set the tracking uri. For MLflow Model Serving, you can save LangChain Chains in the MLflow langchain flavor, and then register and serve the Chain with a few clicks on Databricks, with credentials securely managed by MLflow Model Serving.
|
||||
|
||||
Databricks External Models
|
||||
--------------------------
|
||||
Databricks MLflow AI Gateway
|
||||
----------------------------
|
||||
|
||||
[Databricks External Models](https://docs.databricks.com/generative-ai/external-models/index.html) is a service that is designed to streamline the usage and management of various large language model (LLM) providers, such as OpenAI and Anthropic, within an organization. It offers a high-level interface that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM related requests. The following example creates an endpoint that serves OpenAI's GPT-4 model and generates a chat response from it:
|
||||
|
||||
```python
|
||||
from langchain.chat_models import ChatDatabricks
|
||||
from langchain.schema.messages import HumanMessage
|
||||
from mlflow.deployments import get_deploy_client
|
||||
|
||||
|
||||
client = get_deploy_client("databricks")
|
||||
name = f"chat"
|
||||
client.create_endpoint(
|
||||
name=name,
|
||||
config={
|
||||
"served_entities": [
|
||||
{
|
||||
"name": "test",
|
||||
"external_model": {
|
||||
"name": "gpt-4",
|
||||
"provider": "openai",
|
||||
"task": "llm/v1/chat",
|
||||
"openai_config": {
|
||||
"openai_api_key": "{{secrets/<scope>/<key>}}",
|
||||
},
|
||||
},
|
||||
}
|
||||
],
|
||||
},
|
||||
)
|
||||
chat = ChatDatabricks(endpoint=name, temperature=0.1)
|
||||
print(chat([HumanMessage(content="hello")]))
|
||||
# -> content='Hello! How can I assist you today?'
|
||||
```
|
||||
|
||||
Databricks Foundation Model APIs
|
||||
--------------------------------
|
||||
|
||||
[Databricks Foundation Model APIs](https://docs.databricks.com/machine-learning/foundation-models/index.html) allow you to access and query state-of-the-art open source models from dedicated serving endpoints. With Foundation Model APIs, developers can quickly and easily build applications that leverage a high-quality generative AI model without maintaining their own model deployment. The following example uses the `databricks-bge-large-en` endpoint to generate embeddings from text:
|
||||
|
||||
```python
|
||||
from langchain.llms import DatabricksEmbeddings
|
||||
|
||||
|
||||
embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
|
||||
print(embeddings.embed_query("hello")[:3])
|
||||
# -> [0.051055908203125, 0.007221221923828125, 0.003879547119140625, ...]
|
||||
```
|
||||
See [MLflow AI Gateway](/docs/integrations/providers/mlflow_ai_gateway).
|
||||
|
||||
Databricks as an LLM provider
|
||||
-----------------------------
|
||||
|
||||
The notebook [Wrap Databricks endpoints as LLMs](/docs/integrations/llms/databricks#wrapping-a-serving-endpoint-custom-model) demonstrates how to serve a custom model that has been registered by MLflow as a Databricks endpoint.
|
||||
It supports two types of endpoints: the serving endpoint, which is recommended for both production and development, and the cluster driver proxy app, which is recommended for interactive development.
|
||||
The notebook [Wrap Databricks endpoints as LLMs](/docs/integrations/llms/databricks) illustrates the method to wrap Databricks endpoints as LLMs in LangChain. It supports two types of endpoints: the serving endpoint, which is recommended for both production and development, and the cluster driver proxy app, which is recommended for interactive development.
|
||||
|
||||
Databricks endpoints support Dolly, but are also great for hosting models like MPT-7B or any other models from the Hugging Face ecosystem. Databricks endpoints can also be used with proprietary models like OpenAI to provide a governance layer for enterprises.
|
||||
|
||||
Databricks Dolly
|
||||
----------------
|
||||
|
||||
Databricks’ Dolly is an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. The model is available on Hugging Face Hub as databricks/dolly-v2-12b. See the notebook [Hugging Face Hub](/docs/integrations/llms/huggingface_hub) for instructions to access it through the Hugging Face Hub integration with LangChain.
|
||||
|
||||
@@ -5,117 +5,31 @@ It is broken into two parts: installation and setup, and then examples of DeepSp
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Install the Python packages with `pip install deepsparse-nightly[llm] langchain`
|
||||
- Install the Python package with `pip install deepsparse`
|
||||
- Choose a [SparseZoo model](https://sparsezoo.neuralmagic.com/?useCase=text_generation) or export a support model to ONNX [using Optimum](https://github.com/neuralmagic/notebooks/blob/main/notebooks/opt-text-generation-deepsparse-quickstart/OPT_Text_Generation_DeepSparse_Quickstart.ipynb)
|
||||
- Models hosted on HuggingFace are also supported by prepending `"hf:"` to the model id, such as [`"hf:mgoin/TinyStories-33M-quant-deepsparse"`](https://huggingface.co/mgoin/TinyStories-33M-quant-deepsparse)
|
||||
|
||||
## Using DeepSparse With LangChain
|
||||
## Wrappers
|
||||
|
||||
There is a DeepSparse LLM wrapper, which you can access with:
|
||||
### LLM
|
||||
|
||||
There exists a DeepSparse LLM wrapper, which you can access with:
|
||||
|
||||
```python
|
||||
from langchain.llms import DeepSparse
|
||||
```
|
||||
|
||||
It provides a simple, unified interface for all models:
|
||||
It provides a unified interface for all models:
|
||||
|
||||
```python
|
||||
from langchain.llms import DeepSparse
|
||||
llm = DeepSparse(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none')
|
||||
|
||||
print(llm('def fib():'))
|
||||
"""
|
||||
a, b = 0, 1
|
||||
while True:
|
||||
yield a
|
||||
a, b = b, a + b
|
||||
|
||||
def fib2(n):
|
||||
a, b = 0, 1
|
||||
while a < n:
|
||||
yield a
|
||||
a, b = b, a + b
|
||||
|
||||
def primes():
|
||||
yield 2
|
||||
it = fib()
|
||||
while True:
|
||||
try:
|
||||
yield next(it)
|
||||
except StopIteration:
|
||||
return
|
||||
"""
|
||||
```
|
||||
## Streaming
|
||||
The DeepSparse LangChain wrapper also supports per token output streaming:
|
||||
|
||||
Additional parameters can be passed using the `config` parameter:
|
||||
|
||||
```python
|
||||
from langchain.llms import DeepSparse
|
||||
llm = DeepSparse(
|
||||
model="hf:neuralmagic/mpt-7b-chat-pruned50-quant",
|
||||
streaming=True
|
||||
)
|
||||
for chunk in llm.stream("Tell me a joke", stop=["'","\n"]):
|
||||
print(chunk, end='', flush=True)
|
||||
config = {'max_generated_tokens': 256}
|
||||
|
||||
llm = DeepSparse(model='zoo:nlg/text_generation/codegen_mono-350m/pytorch/huggingface/bigpython_bigquery_thepile/base-none', config=config)
|
||||
```
|
||||
## Using Instruction Fine-tune Models With DeepSparse
|
||||
Here's an example of how to prompt an instruction fine-tuned model using DeepSparse and the MPT-Instruct model:
|
||||
```python
|
||||
prompt="""
|
||||
Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: what is quantization? ### Response:
|
||||
"""
|
||||
llm = DeepSparse(model='zoo:mpt-7b-dolly_mpt_pretrain-pruned50_quantized')
|
||||
print(llm(prompt))
|
||||
"""
|
||||
In physics, the term "quantization" refers to the process of transforming a continuous variable into a set of discrete values. In the context of quantum mechanics, this process is used to describe the restriction of the degrees of freedom of a system to a set of discrete values. In other words, it is the process of transforming the continuous spectrum of a physical quantity into a set of discrete, or "quantized", values.
|
||||
"""
|
||||
```
|
||||
You can also do all the other things you are used to doing in LangChain such as using `PromptTemplete`s and parsing outputs:
|
||||
```python
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.output_parsers import CommaSeparatedListOutputParser
|
||||
|
||||
llm_parser = CommaSeparatedListOutputParser()
|
||||
llm = DeepSparse(model='hf:neuralmagic/mpt-7b-chat-pruned50-quant')
|
||||
|
||||
prompt = PromptTemplate(
|
||||
template="List how to {do}",
|
||||
input_variables=["do"])
|
||||
|
||||
output = llm.predict(text=prompt.format(do="Become a great software engineer"))
|
||||
|
||||
print(output)
|
||||
"""
|
||||
List how to Become a great software engineer
|
||||
By TechRadar Staff
|
||||
Here are some tips on how to become a great software engineer:
|
||||
1. Develop good programming skills: To become a great software engineer, you need to have a strong understanding of programming concepts and techniques. You should be able to write clean, efficient code that meets the requirements of the project.
|
||||
2. Learn new technologies: To stay up-to in the field, you should be familiar with new technologies and programming languages. You should also be able to adapt to new environments and work with different tools and platforms.
|
||||
3. Build a portfolio: To showcase your skills, you should build a portfolio of your work. This will help you showcase your skills and abilities to potential employers.
|
||||
4. Network: Networking is an important aspect of your career. You should attend industry events and conferences to meet other professionals in the field.
|
||||
5. Stay up-to-date with industry trends: Stay up-to-date with industry trends and developments. This will help you stay relevant in your field and help you stay ahead of your competition.
|
||||
6. Take courses and certifications: Taking courses and certifications can help you gain new skills and knowledge. This will help you stay ahead of your competition and help you grow in your career.
|
||||
7. Practice and refine your skills: Practice and refine your skills by working on projects and solving problems. This will help you develop your skills and help you grow in your career.
|
||||
"""
|
||||
|
||||
```
|
||||
## Configuration
|
||||
|
||||
The DeepSparse LangChain integration has arguments to control the model loaded, any configs for how the model should be loaded, configs to control how tokens are generated, and then whether to return all tokens at once or to stream them one-by-one.
|
||||
|
||||
```python
|
||||
model: str
|
||||
"""The path to a model file or directory or the name of a SparseZoo model stub."""
|
||||
|
||||
model_config: Optional[Dict[str, Any]] = None
|
||||
"""Keyword arguments passed to the pipeline construction.
|
||||
Common parameters are sequence_length, prompt_sequence_length"""
|
||||
|
||||
generation_config: Union[None, str, Dict] = None
|
||||
"""GenerationConfig dictionary consisting of parameters used to control
|
||||
sequences generated for each prompt. Common parameters are:
|
||||
max_length, max_new_tokens, num_return_sequences, output_scores,
|
||||
top_p, top_k, repetition_penalty."""
|
||||
|
||||
streaming: bool = False
|
||||
"""Whether to stream the results, token by token."""
|
||||
```
|
||||
@@ -8,7 +8,7 @@
|
||||
|
||||
|
||||
```bash
|
||||
pip install dgml-utils
|
||||
pip install lxml
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
@@ -11,7 +11,7 @@
|
||||
Click [here](https://www.alibabacloud.com/zh/product/hologres) to fast deploy a Hologres cloud instance.
|
||||
|
||||
```bash
|
||||
pip install hologres-vector
|
||||
pip install psycopg2
|
||||
```
|
||||
|
||||
## Vector Store
|
||||
|
||||
69
docs/docs/integrations/providers/huggingface.mdx
Normal file
69
docs/docs/integrations/providers/huggingface.mdx
Normal file
@@ -0,0 +1,69 @@
|
||||
# Hugging Face
|
||||
|
||||
This page covers how to use the Hugging Face ecosystem (including the [Hugging Face Hub](https://huggingface.co)) within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Hugging Face wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
If you want to work with the Hugging Face Hub:
|
||||
- Install the Hub client library with `pip install huggingface_hub`
|
||||
- Create a Hugging Face account (it's free!)
|
||||
- Create an [access token](https://huggingface.co/docs/hub/security-tokens) and set it as an environment variable (`HUGGINGFACEHUB_API_TOKEN`)
|
||||
|
||||
If you want work with the Hugging Face Python libraries:
|
||||
- Install `pip install transformers` for working with models and tokenizers
|
||||
- Install `pip install datasets` for working with datasets
|
||||
|
||||
## Wrappers
|
||||
|
||||
### LLM
|
||||
|
||||
There exists two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub.
|
||||
Note that these wrappers only work for models that support the following tasks: [`text2text-generation`](https://huggingface.co/models?library=transformers&pipeline_tag=text2text-generation&sort=downloads), [`text-generation`](https://huggingface.co/models?library=transformers&pipeline_tag=text-classification&sort=downloads)
|
||||
|
||||
To use the local pipeline wrapper:
|
||||
```python
|
||||
from langchain.llms import HuggingFacePipeline
|
||||
```
|
||||
|
||||
To use a the wrapper for a model hosted on Hugging Face Hub:
|
||||
```python
|
||||
from langchain.llms import HuggingFaceHub
|
||||
```
|
||||
For a more detailed walkthrough of the Hugging Face Hub wrapper, see [this notebook](/docs/integrations/llms/huggingface_hub)
|
||||
|
||||
|
||||
### Embeddings
|
||||
|
||||
There exists two Hugging Face Embeddings wrappers, one for a local model and one for a model hosted on Hugging Face Hub.
|
||||
Note that these wrappers only work for [`sentence-transformers` models](https://huggingface.co/models?library=sentence-transformers&sort=downloads).
|
||||
|
||||
To use the local pipeline wrapper:
|
||||
```python
|
||||
from langchain.embeddings import HuggingFaceEmbeddings
|
||||
```
|
||||
|
||||
To use a the wrapper for a model hosted on Hugging Face Hub:
|
||||
```python
|
||||
from langchain.embeddings import HuggingFaceHubEmbeddings
|
||||
```
|
||||
For a more detailed walkthrough of this, see [this notebook](/docs/integrations/text_embedding/huggingfacehub)
|
||||
|
||||
### Tokenizer
|
||||
|
||||
There are several places you can use tokenizers available through the `transformers` package.
|
||||
By default, it is used to count tokens for all LLMs.
|
||||
|
||||
You can also use it to count tokens when splitting documents with
|
||||
```python
|
||||
from langchain.text_splitter import CharacterTextSplitter
|
||||
CharacterTextSplitter.from_huggingface_tokenizer(...)
|
||||
```
|
||||
For a more detailed walkthrough of this, see [this notebook](/docs/modules/data_connection/document_transformers/text_splitters/huggingface_length_function)
|
||||
|
||||
|
||||
### Datasets
|
||||
|
||||
The Hugging Face Hub has lots of great [datasets](https://huggingface.co/datasets) that can be used to evaluate your LLM chains.
|
||||
|
||||
For a detailed walkthrough of how to use them to do so, see [this notebook](/docs/integrations/document_loaders/hugging_face_dataset)
|
||||
@@ -1,11 +0,0 @@
|
||||
# Infinity
|
||||
|
||||
>[Infinity](https://github.com/michaelfeil/infinity) allows the creation of text embeddings.
|
||||
|
||||
## Text Embedding Model
|
||||
|
||||
There exists an infinity Embedding model, which you can access with
|
||||
```python
|
||||
from langchain.embeddings import InfinityEmbeddings
|
||||
```
|
||||
For a more detailed walkthrough of this, see [this notebook](/docs/integrations/text_embedding/infinity)
|
||||
@@ -1,119 +0,0 @@
|
||||
# MLflow Deployments for LLMs
|
||||
|
||||
>[The MLflow Deployments for LLMs](https://www.mlflow.org/docs/latest/llms/deployments/index.html) is a powerful tool designed to streamline the usage and management of various large
|
||||
> language model (LLM) providers, such as OpenAI and Anthropic, within an organization. It offers a high-level interface
|
||||
> that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM related requests.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
Install `mlflow` with MLflow Deployments dependencies:
|
||||
|
||||
```sh
|
||||
pip install 'mlflow[genai]'
|
||||
```
|
||||
|
||||
Set the OpenAI API key as an environment variable:
|
||||
|
||||
```sh
|
||||
export OPENAI_API_KEY=...
|
||||
```
|
||||
|
||||
Create a configuration file:
|
||||
|
||||
```yaml
|
||||
endpoints:
|
||||
- name: completions
|
||||
endpoint_type: llm/v1/completions
|
||||
model:
|
||||
provider: openai
|
||||
name: text-davinci-003
|
||||
config:
|
||||
openai_api_key: $OPENAI_API_KEY
|
||||
|
||||
- name: embeddings
|
||||
endpoint_type: llm/v1/embeddings
|
||||
model:
|
||||
provider: openai
|
||||
name: text-embedding-ada-002
|
||||
config:
|
||||
openai_api_key: $OPENAI_API_KEY
|
||||
```
|
||||
|
||||
Start the deployments server:
|
||||
|
||||
```sh
|
||||
mlflow deployments start-server --config-path /path/to/config.yaml
|
||||
```
|
||||
|
||||
## Example provided by `MLflow`
|
||||
|
||||
>The `mlflow.langchain` module provides an API for logging and loading `LangChain` models.
|
||||
> This module exports multivariate LangChain models in the langchain flavor and univariate LangChain
|
||||
> models in the pyfunc flavor.
|
||||
|
||||
See the [API documentation and examples](https://www.mlflow.org/docs/latest/python_api/mlflow.langchain) for more information.
|
||||
|
||||
## Completions Example
|
||||
|
||||
```python
|
||||
import mlflow
|
||||
from langchain.chains import LLMChain, PromptTemplate
|
||||
from langchain.llms import Mlflow
|
||||
|
||||
llm = Mlflow(
|
||||
target_uri="http://127.0.0.1:5000",
|
||||
endpoint="completions",
|
||||
)
|
||||
|
||||
llm_chain = LLMChain(
|
||||
llm=Mlflow,
|
||||
prompt=PromptTemplate(
|
||||
input_variables=["adjective"],
|
||||
template="Tell me a {adjective} joke",
|
||||
),
|
||||
)
|
||||
result = llm_chain.run(adjective="funny")
|
||||
print(result)
|
||||
|
||||
with mlflow.start_run():
|
||||
model_info = mlflow.langchain.log_model(chain, "model")
|
||||
|
||||
model = mlflow.pyfunc.load_model(model_info.model_uri)
|
||||
print(model.predict([{"adjective": "funny"}]))
|
||||
```
|
||||
|
||||
## Embeddings Example
|
||||
|
||||
```python
|
||||
from langchain.embeddings import MlflowEmbeddings
|
||||
|
||||
embeddings = MlflowEmbeddings(
|
||||
target_uri="http://127.0.0.1:5000",
|
||||
endpoint="embeddings",
|
||||
)
|
||||
|
||||
print(embeddings.embed_query("hello"))
|
||||
print(embeddings.embed_documents(["hello"]))
|
||||
```
|
||||
|
||||
## Chat Example
|
||||
|
||||
```python
|
||||
from langchain.chat_models import ChatMlflow
|
||||
from langchain.schema import HumanMessage, SystemMessage
|
||||
|
||||
chat = ChatMlflow(
|
||||
target_uri="http://127.0.0.1:5000",
|
||||
endpoint="chat",
|
||||
)
|
||||
|
||||
messages = [
|
||||
SystemMessage(
|
||||
content="You are a helpful assistant that translates English to French."
|
||||
),
|
||||
HumanMessage(
|
||||
content="Translate this sentence from English to French: I love programming."
|
||||
),
|
||||
]
|
||||
print(chat(messages))
|
||||
```
|
||||
@@ -1,11 +1,5 @@
|
||||
# MLflow AI Gateway
|
||||
|
||||
:::warning
|
||||
|
||||
MLflow AI Gateway has been deprecated. Please use [MLflow Deployments for LLMs](./mlflow) instead.
|
||||
|
||||
:::
|
||||
|
||||
>[The MLflow AI Gateway](https://www.mlflow.org/docs/latest/gateway/index) service is a powerful tool designed to streamline the usage and management of various large
|
||||
> language model (LLM) providers, such as OpenAI and Anthropic, within an organization. It offers a high-level interface
|
||||
> that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM related requests.
|
||||
|
||||
@@ -1,22 +0,0 @@
|
||||
# Outline
|
||||
|
||||
> [Outline](https://www.getoutline.com/) is an open-source collaborative knowledge base platform designed for team information sharing.
|
||||
|
||||
## Setup
|
||||
|
||||
You first need to [create an api key](https://www.getoutline.com/developers#section/Authentication) for your Outline instance. Then you need to set the following environment variables:
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
os.environ["OUTLINE_API_KEY"] = "xxx"
|
||||
os.environ["OUTLINE_INSTANCE_URL"] = "https://app.getoutline.com"
|
||||
```
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](/docs/integrations/retrievers/outline).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import OutlineRetriever
|
||||
```
|
||||
@@ -1,36 +0,0 @@
|
||||
# Stack Exchange
|
||||
|
||||
>[Stack Exchange](https://en.wikipedia.org/wiki/Stack_Exchange) is a network of
|
||||
question-and-answer (Q&A) websites on topics in diverse fields, each site covering
|
||||
a specific topic, where questions, answers, and users are subject to a reputation award process.
|
||||
|
||||
This page covers how to use the `Stack Exchange API` within LangChain.
|
||||
|
||||
## Installation and Setup
|
||||
- Install requirements with
|
||||
```bash
|
||||
pip install stackapi
|
||||
```
|
||||
|
||||
## Wrappers
|
||||
|
||||
### Utility
|
||||
|
||||
There exists a StackExchangeAPIWrapper utility which wraps this API. To import this utility:
|
||||
|
||||
```python
|
||||
from langchain.utilities import StackExchangeAPIWrapper
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of this wrapper, see [this notebook](/docs/integrations/tools/stackexchange).
|
||||
|
||||
### Tool
|
||||
|
||||
You can also easily load this wrapper as a Tool (to use with an Agent).
|
||||
You can do this with:
|
||||
```python
|
||||
from langchain.agents import load_tools
|
||||
tools = load_tools(["stackexchange"])
|
||||
```
|
||||
|
||||
For more information on tools, see [this page](/docs/modules/agents/tools/).
|
||||
@@ -7,9 +7,9 @@
|
||||
"source": [
|
||||
"# Amazon Kendra\n",
|
||||
"\n",
|
||||
"> [Amazon Kendra](https://docs.aws.amazon.com/kendra/latest/dg/what-is-kendra.html) is an intelligent search service provided by `Amazon Web Services` (`AWS`). It utilizes advanced natural language processing (NLP) and machine learning algorithms to enable powerful search capabilities across various data sources within an organization. `Kendra` is designed to help users find the information they need quickly and accurately, improving productivity and decision-making.\n",
|
||||
"> Amazon Kendra is an intelligent search service provided by Amazon Web Services (AWS). It utilizes advanced natural language processing (NLP) and machine learning algorithms to enable powerful search capabilities across various data sources within an organization. Kendra is designed to help users find the information they need quickly and accurately, improving productivity and decision-making.\n",
|
||||
"\n",
|
||||
"> With `Kendra`, users can search across a wide range of content types, including documents, FAQs, knowledge bases, manuals, and websites. It supports multiple languages and can understand complex queries, synonyms, and contextual meanings to provide highly relevant search results."
|
||||
"> With Kendra, users can search across a wide range of content types, including documents, FAQs, knowledge bases, manuals, and websites. It supports multiple languages and can understand complex queries, synonyms, and contextual meanings to provide highly relevant search results."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -74,24 +74,11 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
"name": "python"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
|
||||
@@ -1,116 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b6636c27-35da-4ba7-8313-eca21660cab3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Bedrock (Knowledge Bases)\n",
|
||||
"\n",
|
||||
"> [Knowledge bases for Amazon Bedrock](https://aws.amazon.com/bedrock/knowledge-bases/) is an Amazon Web Services (AWS) offering which lets you quickly build RAG applications by using your private data to customize FM response.\n",
|
||||
"\n",
|
||||
"> Implementing `RAG` requires organizations to perform several cumbersome steps to convert data into embeddings (vectors), store the embeddings in a specialized vector database, and build custom integrations into the database to search and retrieve text relevant to the user’s query. This can be time-consuming and inefficient.\n",
|
||||
"\n",
|
||||
"> With `Knowledge Bases for Amazon Bedrock`, simply point to the location of your data in `Amazon S3`, and `Knowledge Bases for Amazon Bedrock` takes care of the entire ingestion workflow into your vector database. If you do not have an existing vector database, Amazon Bedrock creates an Amazon OpenSearch Serverless vector store for you. For retrievals, use the Langchain - Amazon Bedrock integration via the Retrieve API to retrieve relevant results for a user query from knowledge bases.\n",
|
||||
"\n",
|
||||
"> Knowledge base can be configured through [AWS Console](https://aws.amazon.com/console/) or by using [AWS SDKs](https://aws.amazon.com/developer/tools/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b34c8cbe-c6e5-4398-adf1-4925204bcaed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using the Knowledge Bases Retriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "26c97d36-911c-4fe0-a478-546192728f30",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install boto3"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "30337664-8844-4dfe-97db-077abb51af68",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.retrievers import AmazonKnowledgeBasesRetriever\n",
|
||||
"\n",
|
||||
"retriever = AmazonKnowledgeBasesRetriever(\n",
|
||||
" knowledge_base_id=\"PUIJP4EQUA\",\n",
|
||||
" retrieval_config={\"vectorSearchConfiguration\": {\"numberOfResults\": 4}},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f9fefa50-f0fb-40e3-b4e4-67c5b232a090",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown?\"\n",
|
||||
"\n",
|
||||
"retriever.get_relevant_documents(query=query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7de9b61b-597b-4aba-95fb-49d11e84510e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using in a QA Chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0fd71709-aaed-42b5-a990-e3067bfa7143",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from botocore.client import Config\n",
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"from langchain.llms import Bedrock\n",
|
||||
"\n",
|
||||
"model_kwargs_claude = {\"temperature\": 0, \"top_k\": 10, \"max_tokens_to_sample\": 3000}\n",
|
||||
"\n",
|
||||
"llm = Bedrock(model_id=\"anthropic.claude-v2\", model_kwargs=model_kwargs_claude)\n",
|
||||
"\n",
|
||||
"qa = RetrievalQA.from_chain_type(\n",
|
||||
" llm=llm, retriever=retriever, return_source_documents=True\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"qa(query)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,182 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Outline\n",
|
||||
"\n",
|
||||
">[Outline](https://www.getoutline.com/) is an open-source collaborative knowledge base platform designed for team information sharing.\n",
|
||||
"\n",
|
||||
"This notebook shows how to retrieve documents from your Outline instance into the Document format that is used downstream."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You first need to [create an api key](https://www.getoutline.com/developers#section/Authentication) for your Outline instance. Then you need to set the following environment variables:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OUTLINE_API_KEY\"] = \"xxx\"\n",
|
||||
"os.environ[\"OUTLINE_INSTANCE_URL\"] = \"https://app.getoutline.com\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`OutlineRetriever` has these arguments:\n",
|
||||
"- optional `top_k_results`: default=3. Use it to limit number of documents retrieved.\n",
|
||||
"- optional `load_all_available_meta`: default=False. By default only the most important fields retrieved: `title`, `source` (the url of the document). If True, other fields also retrieved.\n",
|
||||
"- optional `doc_content_chars_max` default=4000. Use it to limit the number of characters for each document retrieved.\n",
|
||||
"\n",
|
||||
"`get_relevant_documents()` has one argument, `query`: free text which used to find documents in your Outline instance."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Running retriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.retrievers import OutlineRetriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = OutlineRetriever()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='This walkthrough demonstrates how to use an agent optimized for conversation. Other agents are often optimized for using tools to figure out the best response, which is not ideal in a conversational setting where you may want the agent to be able to chat with the user as well.\\n\\nIf we compare it to the standard ReAct agent, the main difference is the prompt. We want it to be much more conversational.\\n\\nfrom langchain.agents import AgentType, Tool, initialize_agent\\n\\nfrom langchain.llms import OpenAI\\n\\nfrom langchain.memory import ConversationBufferMemory\\n\\nfrom langchain.utilities import SerpAPIWrapper\\n\\nsearch = SerpAPIWrapper() tools = \\\\[ Tool( name=\"Current Search\", func=search.run, description=\"useful for when you need to answer questions about current events or the current state of the world\", ), \\\\]\\n\\n\\\\\\nllm = OpenAI(temperature=0)\\n\\nUsing LCEL\\n\\nWe will first show how to create this agent using LCEL\\n\\nfrom langchain import hub\\n\\nfrom langchain.agents.format_scratchpad import format_log_to_str\\n\\nfrom langchain.agents.output_parsers import ReActSingleInputOutputParser\\n\\nfrom langchain.tools.render import render_text_description\\n\\nprompt = hub.pull(\"hwchase17/react-chat\")\\n\\nprompt = prompt.partial( tools=render_text_description(tools), tool_names=\", \".join(\\\\[[t.name](http://t.name) for t in tools\\\\]), )\\n\\nllm_with_stop = llm.bind(stop=\\\\[\"\\\\nObservation\"\\\\])\\n\\nagent = ( { \"input\": lambda x: x\\\\[\"input\"\\\\], \"agent_scratchpad\": lambda x: format_log_to_str(x\\\\[\"intermediate_steps\"\\\\]), \"chat_history\": lambda x: x\\\\[\"chat_history\"\\\\], } | prompt | llm_with_stop | ReActSingleInputOutputParser() )\\n\\nfrom langchain.agents import AgentExecutor\\n\\nmemory = ConversationBufferMemory(memory_key=\"chat_history\") agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, memory=memory)\\n\\nagent_executor.invoke({\"input\": \"hi, i am bob\"})\\\\[\"output\"\\\\]\\n\\n```\\n> Entering new AgentExecutor chain...\\n\\nThought: Do I need to use a tool? No\\nFinal Answer: Hi Bob, nice to meet you! How can I help you today?\\n\\n> Finished chain.\\n```\\n\\n\\\\\\n\\'Hi Bob, nice to meet you! How can I help you today?\\'\\n\\nagent_executor.invoke({\"input\": \"whats my name?\"})\\\\[\"output\"\\\\]\\n\\n```\\n> Entering new AgentExecutor chain...\\n\\nThought: Do I need to use a tool? No\\nFinal Answer: Your name is Bob.\\n\\n> Finished chain.\\n```\\n\\n\\\\\\n\\'Your name is Bob.\\'\\n\\nagent_executor.invoke({\"input\": \"what are some movies showing 9/21/2023?\"})\\\\[\"output\"\\\\]\\n\\n```\\n> Entering new AgentExecutor chain...\\n\\nThought: Do I need to use a tool? Yes\\nAction: Current Search\\nAction Input: Movies showing 9/21/2023[\\'September 2023 Movies: The Creator • Dumb Money • Expend4bles • The Kill Room • The Inventor • The Equalizer 3 • PAW Patrol: The Mighty Movie, ...\\'] Do I need to use a tool? No\\nFinal Answer: According to current search, some movies showing on 9/21/2023 are The Creator, Dumb Money, Expend4bles, The Kill Room, The Inventor, The Equalizer 3, and PAW Patrol: The Mighty Movie.\\n\\n> Finished chain.\\n```\\n\\n\\\\\\n\\'According to current search, some movies showing on 9/21/2023 are The Creator, Dumb Money, Expend4bles, The Kill Room, The Inventor, The Equalizer 3, and PAW Patrol: The Mighty Movie.\\'\\n\\n\\\\\\nUse the off-the-shelf agent\\n\\nWe can also create this agent using the off-the-shelf agent class\\n\\nagent_executor = initialize_agent( tools, llm, agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION, verbose=True, memory=memory, )\\n\\nUse a chat model\\n\\nWe can also use a chat model here. The main difference here is in the prompts used.\\n\\nfrom langchain import hub\\n\\nfrom langchain.chat_models import ChatOpenAI\\n\\nprompt = hub.pull(\"hwchase17/react-chat-json\") chat_model = ChatOpenAI(temperature=0, model=\"gpt-4\")\\n\\nprompt = prompt.partial( tools=render_text_description(tools), tool_names=\", \".join(\\\\[[t.name](http://t.name) for t in tools\\\\]), )\\n\\nchat_model_with_stop = chat_model.bind(stop=\\\\[\"\\\\nObservation\"\\\\])\\n\\nfrom langchain.agents.format_scratchpad import format_log_to_messages\\n\\nfrom langchain.agents.output_parsers import JSONAgentOutputParser\\n\\n# We need some extra steering, or the c', metadata={'title': 'Conversational', 'source': 'https://d01.getoutline.com/doc/conversational-B5dBkUgQ4b'}),\n",
|
||||
" Document(page_content='Quickstart\\n\\nIn this quickstart we\\'ll show you how to:\\n\\nGet setup with LangChain, LangSmith and LangServe\\n\\nUse the most basic and common components of LangChain: prompt templates, models, and output parsers\\n\\nUse LangChain Expression Language, the protocol that LangChain is built on and which facilitates component chaining\\n\\nBuild a simple application with LangChain\\n\\nTrace your application with LangSmith\\n\\nServe your application with LangServe\\n\\nThat\\'s a fair amount to cover! Let\\'s dive in.\\n\\nSetup\\n\\nInstallation\\n\\nTo install LangChain run:\\n\\nPip\\n\\nConda\\n\\npip install langchain\\n\\nFor more details, see our Installation guide.\\n\\nEnvironment\\n\\nUsing LangChain will usually require integrations with one or more model providers, data stores, APIs, etc. For this example, we\\'ll use OpenAI\\'s model APIs.\\n\\nFirst we\\'ll need to install their Python package:\\n\\npip install openai\\n\\nAccessing the API requires an API key, which you can get by creating an account and heading here. Once we have a key we\\'ll want to set it as an environment variable by running:\\n\\nexport OPENAI_API_KEY=\"...\"\\n\\nIf you\\'d prefer not to set an environment variable you can pass the key in directly via the openai_api_key named parameter when initiating the OpenAI LLM class:\\n\\nfrom langchain.chat_models import ChatOpenAI\\n\\nllm = ChatOpenAI(openai_api_key=\"...\")\\n\\nLangSmith\\n\\nMany of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with LangSmith.\\n\\nNote that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:\\n\\nexport LANGCHAIN_TRACING_V2=\"true\" export LANGCHAIN_API_KEY=...\\n\\nLangServe\\n\\nLangServe helps developers deploy LangChain chains as a REST API. You do not need to use LangServe to use LangChain, but in this guide we\\'ll show how you can deploy your app with LangServe.\\n\\nInstall with:\\n\\npip install \"langserve\\\\[all\\\\]\"\\n\\nBuilding with LangChain\\n\\nLangChain provides many modules that can be used to build language model applications. Modules can be used as standalones in simple applications and they can be composed for more complex use cases. Composition is powered by LangChain Expression Language (LCEL), which defines a unified Runnable interface that many modules implement, making it possible to seamlessly chain components.\\n\\nThe simplest and most common chain contains three things:\\n\\nLLM/Chat Model: The language model is the core reasoning engine here. In order to work with LangChain, you need to understand the different types of language models and how to work with them. Prompt Template: This provides instructions to the language model. This controls what the language model outputs, so understanding how to construct prompts and different prompting strategies is crucial. Output Parser: These translate the raw response from the language model to a more workable format, making it easy to use the output downstream. In this guide we\\'ll cover those three components individually, and then go over how to combine them. Understanding these concepts will set you up well for being able to use and customize LangChain applications. Most LangChain applications allow you to configure the model and/or the prompt, so knowing how to take advantage of this will be a big enabler.\\n\\nLLM / Chat Model\\n\\nThere are two types of language models:\\n\\nLLM: underlying model takes a string as input and returns a string\\n\\nChatModel: underlying model takes a list of messages as input and returns a message\\n\\nStrings are simple, but what exactly are messages? The base message interface is defined by BaseMessage, which has two required attributes:\\n\\ncontent: The content of the message. Usually a string. role: The entity from which the BaseMessage is coming. LangChain provides several ob', metadata={'title': 'Quick Start', 'source': 'https://d01.getoutline.com/doc/quick-start-jGuGGGOTuL'}),\n",
|
||||
" Document(page_content='This walkthrough showcases using an agent to implement the [ReAct](https://react-lm.github.io/) logic.\\n\\n```javascript\\nfrom langchain.agents import AgentType, initialize_agent, load_tools\\nfrom langchain.llms import OpenAI\\n```\\n\\nFirst, let\\'s load the language model we\\'re going to use to control the agent.\\n\\n```javascript\\nllm = OpenAI(temperature=0)\\n```\\n\\nNext, let\\'s load some tools to use. Note that the llm-math tool uses an LLM, so we need to pass that in.\\n\\n```javascript\\ntools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\\n```\\n\\n## Using LCEL[\\u200b](https://python.langchain.com/docs/modules/agents/agent_types/react#using-lcel \"Direct link to Using LCEL\")\\n\\nWe will first show how to create the agent using LCEL\\n\\n```javascript\\nfrom langchain import hub\\nfrom langchain.agents.format_scratchpad import format_log_to_str\\nfrom langchain.agents.output_parsers import ReActSingleInputOutputParser\\nfrom langchain.tools.render import render_text_description\\n```\\n\\n```javascript\\nprompt = hub.pull(\"hwchase17/react\")\\nprompt = prompt.partial(\\n tools=render_text_description(tools),\\n tool_names=\", \".join([t.name for t in tools]),\\n)\\n```\\n\\n```javascript\\nllm_with_stop = llm.bind(stop=[\"\\\\nObservation\"])\\n```\\n\\n```javascript\\nagent = (\\n {\\n \"input\": lambda x: x[\"input\"],\\n \"agent_scratchpad\": lambda x: format_log_to_str(x[\"intermediate_steps\"]),\\n }\\n | prompt\\n | llm_with_stop\\n | ReActSingleInputOutputParser()\\n)\\n```\\n\\n```javascript\\nfrom langchain.agents import AgentExecutor\\n```\\n\\n```javascript\\nagent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)\\n```\\n\\n```javascript\\nagent_executor.invoke(\\n {\\n \"input\": \"Who is Leo DiCaprio\\'s girlfriend? What is her current age raised to the 0.43 power?\"\\n }\\n)\\n```\\n\\n```javascript\\n \\n \\n > Entering new AgentExecutor chain...\\n I need to find out who Leo DiCaprio\\'s girlfriend is and then calculate her age raised to the 0.43 power.\\n Action: Search\\n Action Input: \"Leo DiCaprio girlfriend\"model Vittoria Ceretti I need to find out Vittoria Ceretti\\'s age\\n Action: Search\\n Action Input: \"Vittoria Ceretti age\"25 years I need to calculate 25 raised to the 0.43 power\\n Action: Calculator\\n Action Input: 25^0.43Answer: 3.991298452658078 I now know the final answer\\n Final Answer: Leo DiCaprio\\'s girlfriend is Vittoria Ceretti and her current age raised to the 0.43 power is 3.991298452658078.\\n \\n > Finished chain.\\n\\n\\n\\n\\n\\n {\\'input\\': \"Who is Leo DiCaprio\\'s girlfriend? What is her current age raised to the 0.43 power?\",\\n \\'output\\': \"Leo DiCaprio\\'s girlfriend is Vittoria Ceretti and her current age raised to the 0.43 power is 3.991298452658078.\"}\\n```\\n\\n## Using ZeroShotReactAgent[\\u200b](https://python.langchain.com/docs/modules/agents/agent_types/react#using-zeroshotreactagent \"Direct link to Using ZeroShotReactAgent\")\\n\\nWe will now show how to use the agent with an off-the-shelf agent implementation\\n\\n```javascript\\nagent_executor = initialize_agent(\\n tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\\n)\\n```\\n\\n```javascript\\nagent_executor.invoke(\\n {\\n \"input\": \"Who is Leo DiCaprio\\'s girlfriend? What is her current age raised to the 0.43 power?\"\\n }\\n)\\n```\\n\\n```javascript\\n \\n \\n > Entering new AgentExecutor chain...\\n I need to find out who Leo DiCaprio\\'s girlfriend is and then calculate her age raised to the 0.43 power.\\n Action: Search\\n Action Input: \"Leo DiCaprio girlfriend\"\\n Observation: model Vittoria Ceretti\\n Thought: I need to find out Vittoria Ceretti\\'s age\\n Action: Search\\n Action Input: \"Vittoria Ceretti age\"\\n Observation: 25 years\\n Thought: I need to calculate 25 raised to the 0.43 power\\n Action: Calculator\\n Action Input: 25^0.43\\n Observation: Answer: 3.991298452658078\\n Thought: I now know the final answer\\n Final Answer: Leo DiCaprio\\'s girlfriend is Vittoria Ceretti and her current age raised to the 0.43 power is 3.991298452658078.\\n \\n > Finished chain.\\n\\n\\n\\n\\n\\n {\\'input\\': \"Who is L', metadata={'title': 'ReAct', 'source': 'https://d01.getoutline.com/doc/react-d6rxRS1MHk'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever.get_relevant_documents(query=\"LangChain\", doc_content_chars_max=100)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Answering Questions on Outline Documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(model_name=\"gpt-3.5-turbo\")\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'question': 'what is langchain?',\n",
|
||||
" 'chat_history': {},\n",
|
||||
" 'answer': \"LangChain is a framework for developing applications powered by language models. It provides a set of libraries and tools that enable developers to build context-aware and reasoning-based applications. LangChain allows you to connect language models to various sources of context, such as prompt instructions, few-shot examples, and content, to enhance the model's responses. It also supports the composition of multiple language model components using LangChain Expression Language (LCEL). Additionally, LangChain offers off-the-shelf chains, templates, and integrations for easy application development. LangChain can be used in conjunction with LangSmith for debugging and monitoring chains, and with LangServe for deploying applications as a REST API.\"}"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"qa({\"question\": \"what is langchain?\", \"chat_history\": {}})"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,321 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# MongoDB Atlas\n",
|
||||
"\n",
|
||||
"[MongoDB Atlas](https://www.mongodb.com/) is a document database that can be \n",
|
||||
"used as a vector databse.\n",
|
||||
"\n",
|
||||
"In the walkthrough, we'll demo the `SelfQueryRetriever` with a `MongoDB Atlas` vector store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating a MongoDB Atlas vectorstore\n",
|
||||
"First we'll want to create a MongoDB Atlas VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
|
||||
"\n",
|
||||
"NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `pymongo` package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install lark pymongo"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"OPENAI_API_KEY = \"Use your OpenAI key\"\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.vectorstores import MongoDBAtlasVectorSearch\n",
|
||||
"from pymongo import MongoClient\n",
|
||||
"\n",
|
||||
"CONNECTION_STRING = \"Use your MongoDB Atlas connection string\"\n",
|
||||
"DB_NAME = \"Name of your MongoDB Atlas database\"\n",
|
||||
"COLLECTION_NAME = \"Name of your collection in the database\"\n",
|
||||
"INDEX_NAME = \"Name of a search index defined on the collection\"\n",
|
||||
"\n",
|
||||
"MongoClient = MongoClient(CONNECTION_STRING)\n",
|
||||
"collection = MongoClient[DB_NAME][COLLECTION_NAME]\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = [\n",
|
||||
" Document(\n",
|
||||
" page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n",
|
||||
" metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"action\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n",
|
||||
" metadata={\"year\": 2010, \"genre\": \"thriller\", \"rating\": 8.2},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n",
|
||||
" metadata={\"year\": 2019, \"rating\": 8.3, \"genre\": \"drama\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n",
|
||||
" metadata={\"year\": 1979, \"rating\": 9.9, \"genre\": \"science fiction\"},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n",
|
||||
" metadata={\"year\": 2006, \"genre\": \"thriller\", \"rating\": 9.0},\n",
|
||||
" ),\n",
|
||||
" Document(\n",
|
||||
" page_content=\"Toys come alive and have a blast doing so\",\n",
|
||||
" metadata={\"year\": 1995, \"genre\": \"animated\", \"rating\": 9.3},\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"vectorstore = MongoDBAtlasVectorSearch.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" collection=collection,\n",
|
||||
" index_name=INDEX_NAME,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's create a vector search index on your cluster. In the below example, `embedding` is the name of the field that contains the embedding vector. Please refer to the [documentation](https://www.mongodb.com/docs/atlas/atlas-search/field-types/knn-vector) to get more details on how to define an Atlas Vector Search index.\n",
|
||||
"You can name the index `{COLLECTION_NAME}` and create the index on the namespace `{DB_NAME}.{COLLECTION_NAME}`. Finally, write the following definition in the JSON editor on MongoDB Atlas:\n",
|
||||
"\n",
|
||||
"```json\n",
|
||||
"{\n",
|
||||
" \"mappings\": {\n",
|
||||
" \"dynamic\": true,\n",
|
||||
" \"fields\": {\n",
|
||||
" \"embedding\": {\n",
|
||||
" \"dimensions\": 1536,\n",
|
||||
" \"similarity\": \"cosine\",\n",
|
||||
" \"type\": \"knnVector\"\n",
|
||||
" },\n",
|
||||
" \"genre\": {\n",
|
||||
" \"type\": \"token\"\n",
|
||||
" },\n",
|
||||
" \"ratings\": {\n",
|
||||
" \"type\": \"number\"\n",
|
||||
" },\n",
|
||||
" \"year\": {\n",
|
||||
" \"type\": \"number\"\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating our self-querying retriever\n",
|
||||
"Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"\n",
|
||||
"metadata_field_info = [\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"genre\",\n",
|
||||
" description=\"The genre of the movie\",\n",
|
||||
" type=\"string\",\n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"year\",\n",
|
||||
" description=\"The year the movie was released\",\n",
|
||||
" type=\"integer\",\n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"document_content_description = \"Brief summary of a movie\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"retriever = SelfQueryRetriever.from_llm(\n",
|
||||
" llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Testing it out\n",
|
||||
"And now we can try actually using our retriever!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This example only specifies a relevant query\n",
|
||||
"retriever.get_relevant_documents(\"What are some movies about dinosaurs\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This example specifies a filter\n",
|
||||
"retriever.get_relevant_documents(\"What are some highly rated movies (above 9)?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This example only specifies a query and a filter\n",
|
||||
"retriever.get_relevant_documents(\n",
|
||||
" \"I want to watch a movie about toys rated higher than 9\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This example specifies a composite filter\n",
|
||||
"retriever.get_relevant_documents(\n",
|
||||
" \"What's a highly rated (above or equal 9) thriller film?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This example specifies a query and composite filter\n",
|
||||
"retriever.get_relevant_documents(\n",
|
||||
" \"What's a movie after 1990 but before 2005 that's all about dinosaurs, \\\n",
|
||||
" and preferably has a lot of action\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Filter k\n",
|
||||
"\n",
|
||||
"We can also use the self query retriever to specify `k`: the number of documents to fetch.\n",
|
||||
"\n",
|
||||
"We can do this by passing `enable_limit=True` to the constructor."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = SelfQueryRetriever.from_llm(\n",
|
||||
" llm,\n",
|
||||
" vectorstore,\n",
|
||||
" document_content_description,\n",
|
||||
" metadata_field_info,\n",
|
||||
" verbose=True,\n",
|
||||
" enable_limit=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# This example only specifies a relevant query\n",
|
||||
"retriever.get_relevant_documents(\"What are two movies about dinosaurs?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -7,16 +7,7 @@
|
||||
"source": [
|
||||
"# Bedrock\n",
|
||||
"\n",
|
||||
">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that offers a choice of \n",
|
||||
"> high-performing foundation models (FMs) from leading AI companies like `AI21 Labs`, `Anthropic`, `Cohere`, \n",
|
||||
"> `Meta`, `Stability AI`, and `Amazon` via a single API, along with a broad set of capabilities you need to \n",
|
||||
"> build generative AI applications with security, privacy, and responsible AI. Using `Amazon Bedrock`, \n",
|
||||
"> you can easily experiment with and evaluate top FMs for your use case, privately customize them with \n",
|
||||
"> your data using techniques such as fine-tuning and `Retrieval Augmented Generation` (`RAG`), and build \n",
|
||||
"> agents that execute tasks using your enterprise systems and data sources. Since `Amazon Bedrock` is \n",
|
||||
"> serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy \n",
|
||||
"> generative AI capabilities into your applications using the AWS services you are already familiar with.\n",
|
||||
"\n"
|
||||
">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -1,191 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Infinity\n",
|
||||
"\n",
|
||||
"`Infinity` allows to create `Embeddings` using a MIT-licensed Embedding Server. \n",
|
||||
"\n",
|
||||
"This notebook goes over how to use Langchain with Embeddings with the [Infinity Github Project](https://github.com/michaelfeil/infinity).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Imports"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings import InfinityEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Optional: Make sure to start the Infinity instance\n",
|
||||
"\n",
|
||||
"To install infinity use the following command. For further details check out the [Docs on Github](https://github.com/michaelfeil/infinity).\n",
|
||||
"```bash\n",
|
||||
"pip install infinity_emb[all]\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Requirement already satisfied: infinity_emb[cli] in /home/michi/langchain/.venv/lib/python3.10/site-packages (0.0.8)\n",
|
||||
"\u001b[33mWARNING: infinity-emb 0.0.8 does not provide the extra 'cli'\u001b[0m\u001b[33m\n",
|
||||
"\u001b[0mRequirement already satisfied: numpy>=1.20.0 in /home/michi/langchain/.venv/lib/python3.10/site-packages (from infinity_emb[cli]) (1.24.4)\n",
|
||||
"\u001b[33mWARNING: There was an error checking the latest version of pip.\u001b[0m\u001b[33m\n",
|
||||
"\u001b[0m"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Install the infinity package\n",
|
||||
"!pip install infinity_emb[cli,torch]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Start up the server - best to be done from a separate terminal, not inside Jupyter Notebook\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"model=sentence-transformers/all-MiniLM-L6-v2\n",
|
||||
"port=7797\n",
|
||||
"infinity_emb --port $port --model-name-or-path $model\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"or alternativley just use docker:\n",
|
||||
"```bash\n",
|
||||
"model=sentence-transformers/all-MiniLM-L6-v2\n",
|
||||
"port=7797\n",
|
||||
"docker run -it --gpus all -p $port:$port michaelf34/infinity:latest --model-name-or-path $model --port $port\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Embed your documents using your Infinity instance "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"documents = [\n",
|
||||
" \"Baguette is a dish.\",\n",
|
||||
" \"Paris is the capital of France.\",\n",
|
||||
" \"numpy is a lib for linear algebra\",\n",
|
||||
" \"You escaped what I've escaped - You'd be in Paris getting fucked up too\",\n",
|
||||
"]\n",
|
||||
"query = \"Where is Paris?\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"embeddings created successful\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"#\n",
|
||||
"infinity_api_url = \"http://localhost:7797/v1\"\n",
|
||||
"# model is currently not validated.\n",
|
||||
"embeddings = InfinityEmbeddings(\n",
|
||||
" model=\"sentence-transformers/all-MiniLM-L6-v2\", infinity_api_url=infinity_api_url\n",
|
||||
")\n",
|
||||
"try:\n",
|
||||
" documents_embedded = embeddings.embed_documents(documents)\n",
|
||||
" query_result = embeddings.embed_query(query)\n",
|
||||
" print(\"embeddings created successful\")\n",
|
||||
"except Exception as ex:\n",
|
||||
" print(\n",
|
||||
" \"Make sure the infinity instance is running. Verify by clicking on \"\n",
|
||||
" f\"{infinity_api_url.replace('v1','docs')} Exception: {ex}. \"\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'Baguette is a dish.': 0.31344215908661155,\n",
|
||||
" 'Paris is the capital of France.': 0.8148670296896388,\n",
|
||||
" 'numpy is a lib for linear algebra': 0.004429399861302009,\n",
|
||||
" \"You escaped what I've escaped - You'd be in Paris getting fucked up too\": 0.5088476180154582}"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# (demo) compute similarity\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"scores = np.array(documents_embedded) @ np.array(query_result).T\n",
|
||||
"dict(zip(documents, scores))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -6,15 +6,9 @@
|
||||
"source": [
|
||||
"# Office365\n",
|
||||
"\n",
|
||||
">[Microsoft 365](https://www.office.com/) is a product family of productivity software, collaboration and cloud-based services owned by `Microsoft`.\n",
|
||||
">\n",
|
||||
">Note: `Office 365` was rebranded as `Microsoft 365`.\n",
|
||||
"\n",
|
||||
"This notebook walks through connecting LangChain to `Office365` email and calendar.\n",
|
||||
"\n",
|
||||
"To use this toolkit, you need to set up your credentials explained in the [Microsoft Graph authentication and authorization overview](https://learn.microsoft.com/en-us/graph/auth/). Once you've received a CLIENT_ID and CLIENT_SECRET, you can input them as environmental variables below.\n",
|
||||
"\n",
|
||||
"You can also use the [authentication instructions from here](https://o365.github.io/python-o365/latest/getting_started.html#oauth-setup-pre-requisite)."
|
||||
"To use this toolkit, you will need to set up your credentials explained in the [Microsoft Graph authentication and authorization overview](https://learn.microsoft.com/en-us/graph/auth/). Once you've received a CLIENT_ID and CLIENT_SECRET, you can input them as environmental variables below."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -23,8 +17,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install --upgrade O365\n",
|
||||
"!pip install beautifulsoup4 # This is optional but is useful for parsing HTML messages"
|
||||
"!pip install --upgrade O365 > /dev/null\n",
|
||||
"!pip install beautifulsoup4 > /dev/null # This is optional but is useful for parsing HTML messages"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -33,7 +27,7 @@
|
||||
"source": [
|
||||
"## Assign Environmental Variables\n",
|
||||
"\n",
|
||||
"The toolkit will read the `CLIENT_ID` and `CLIENT_SECRET` environmental variables to authenticate the user so you need to set them here. You will also need to set your `OPENAI_API_KEY` to use the agent later."
|
||||
"The toolkit will read the CLIENT_ID and CLIENT_SECRET environmental variables to authenticate the user so you need to set them here. You will also need to set your OPENAI_API_KEY to use the agent later."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -93,9 +93,12 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!wget https://raw.githubusercontent.com/openai/openai-openapi/master/openapi.yaml -O openai_openapi.yaml\n",
|
||||
"!wget https://www.klarna.com/us/shopping/public/openai/v0/api-docs -O klarna_openapi.yaml\n",
|
||||
"!wget https://raw.githubusercontent.com/APIs-guru/openapi-directory/main/APIs/spotify.com/1.0.0/openapi.yaml -O spotify_openapi.yaml"
|
||||
"!wget https://raw.githubusercontent.com/openai/openai-openapi/master/openapi.yaml\n",
|
||||
"!mv openapi.yaml openai_openapi.yaml\n",
|
||||
"!wget https://www.klarna.com/us/shopping/public/openai/v0/api-docs\n",
|
||||
"!mv api-docs klarna_openapi.yaml\n",
|
||||
"!wget https://raw.githubusercontent.com/APIs-guru/openapi-directory/main/APIs/spotify.com/1.0.0/openapi.yaml\n",
|
||||
"!mv openapi.yaml spotify_openapi.yaml"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,147 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Slack\n",
|
||||
"\n",
|
||||
"This notebook walks through connecting LangChain to your `Slack` account.\n",
|
||||
"\n",
|
||||
"To use this toolkit, you will need to get a token explained in the [Slack API docs](https://api.slack.com/tutorials/tracks/getting-a-token). Once you've received a SLACK_USER_TOKEN, you can input it as an environmental variable below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install --upgrade slack_sdk > /dev/null\n",
|
||||
"!pip install beautifulsoup4 > /dev/null # This is optional but is useful for parsing HTML messages"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Assign Environmental Variables\n",
|
||||
"\n",
|
||||
"The toolkit will read the SLACK_USER_TOKEN environmental variable to authenticate the user so you need to set them here. You will also need to set your OPENAI_API_KEY to use the agent later."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Set environmental variables here"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create the Toolkit and Get Tools\n",
|
||||
"\n",
|
||||
"To start, you need to create the toolkit, so you can access its tools later."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents.agent_toolkits import SlackToolkit\n",
|
||||
"\n",
|
||||
"toolkit = SlackToolkit()\n",
|
||||
"tools = toolkit.get_tools()\n",
|
||||
"tools"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use within an Agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import AgentType, initialize_agent\n",
|
||||
"from langchain.llms import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools=toolkit.get_tools(),\n",
|
||||
" llm=llm,\n",
|
||||
" verbose=False,\n",
|
||||
" agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent.run(\"Send a greeting to my coworkers in the #general channel.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent.run(\"How many channels are in the workspace? Please list out their names.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent.run(\n",
|
||||
" \"Tell me the number of messages sent in the #introductions channel from the past month.\"\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,105 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Steam Game Recommendation & Game Details Tool"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.agents import AgentType, initialize_agent\n",
|
||||
"from langchain.agents.agent_toolkits.steam.toolkit import SteamToolkit\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.utilities.steam import SteamWebAPIWrapper"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ[\"STEAM_KEY\"] = \"xyz\"\n",
|
||||
"os.environ[\"STEAM_ID\"] = \"123\"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"abc\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"Steam = SteamWebAPIWrapper()\n",
|
||||
"toolkit = SteamToolkit.from_steam_api_wrapper(Steam)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" toolkit.get_tools(), llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find the game details\n",
|
||||
"Action: Get Games Details\n",
|
||||
"Action Input: Terraria\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mThe id is: 105600\n",
|
||||
"The link is: https://store.steampowered.com/app/105600/Terraria/?snr=1_7_15__13\n",
|
||||
"The price is: $9.99\n",
|
||||
"The summary of the game is: Dig, Fight, Explore, Build: The very world is at your fingertips as you fight for survival, fortune, and glory. Will you delve deep into cavernous expanses in search of treasure and raw materials with which to craft ever-evolving gear, machinery, and aesthetics? Perhaps you will choose instead to seek out ever-greater foes to test your mettle in combat? Maybe you will decide to construct your own city to house the host of mysterious allies you may encounter along your travels? In the World of Terraria, the choice is yours!Blending elements of classic action games with the freedom of sandbox-style creativity, Terraria is a unique gaming experience where both the journey and the destination are completely in the player’s control. The Terraria adventure is truly as unique as the players themselves! Are you up for the monumental task of exploring, creating, and defending a world of your own? Key features: Sandbox Play Randomly generated worlds Free Content Updates \n",
|
||||
"The supported languages of the game are: English, French, Italian, German, Spanish - Spain, Polish, Portuguese - Brazil, Russian, Simplified Chinese\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Terraria is a game with an id of 105600, a link of https://store.steampowered.com/app/105600/Terraria/?snr=1_7_15__13, a price of $9.99, a summary of \"Dig, Fight, Explore, Build: The very world is at your fingertips as you fight for survival, fortune, and glory. Will you delve deep into cavernous expanses in search of treasure and raw materials with which to craft ever-evolving gear, machinery, and aesthetics? Perhaps you will choose instead to seek out ever-greater foes to test your mettle in combat? Maybe you will decide to construct your own city to house the host of mysterious allies you may encounter along your travels? In the World of Terraria, the choice is yours!Blending elements of classic action games with the freedom of sandbox-style creativity, Terraria is a unique gaming experience where both the journey and the destination are completely in the player’s control. The Terraria adventure is truly as unique as the players themselves! Are you up for the monumental task of exploring, creating, and defending a\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"{'input': 'can you give the information about the game Terraria', 'output': 'Terraria is a game with an id of 105600, a link of https://store.steampowered.com/app/105600/Terraria/?snr=1_7_15__13, a price of $9.99, a summary of \"Dig, Fight, Explore, Build: The very world is at your fingertips as you fight for survival, fortune, and glory. Will you delve deep into cavernous expanses in search of treasure and raw materials with which to craft ever-evolving gear, machinery, and aesthetics? Perhaps you will choose instead to seek out ever-greater foes to test your mettle in combat? Maybe you will decide to construct your own city to house the host of mysterious allies you may encounter along your travels? In the World of Terraria, the choice is yours!Blending elements of classic action games with the freedom of sandbox-style creativity, Terraria is a unique gaming experience where both the journey and the destination are completely in the player’s control. The Terraria adventure is truly as unique as the players themselves! Are you up for the monumental task of exploring, creating, and defending a'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"out = agent(\"can you give the information about the game Terraria\")\n",
|
||||
"print(out)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -11,11 +11,11 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
">[`Amazon AWS Lambda`](https://aws.amazon.com/pm/lambda/) is a serverless computing service provided by `Amazon Web Services` (`AWS`). It helps developers to build and run applications and services without provisioning or managing servers. This serverless architecture enables you to focus on writing and deploying code, while AWS automatically takes care of scaling, patching, and managing the infrastructure required to run your applications.\n",
|
||||
">`Amazon AWS Lambda` is a serverless computing service provided by `Amazon Web Services` (`AWS`). It helps developers to build and run applications and services without provisioning or managing servers. This serverless architecture enables you to focus on writing and deploying code, while AWS automatically takes care of scaling, patching, and managing the infrastructure required to run your applications.\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use the `AWS Lambda` Tool.\n",
|
||||
"\n",
|
||||
"By including the `AWS Lambda` in the list of tools provided to an Agent, you can grant your Agent the ability to invoke code running in your AWS Cloud for whatever purposes you need.\n",
|
||||
"By including a `awslambda` in the list of tools provided to an Agent, you can grant your Agent the ability to invoke code running in your AWS Cloud for whatever purposes you need.\n",
|
||||
"\n",
|
||||
"When an Agent uses the `AWS Lambda` tool, it will provide an argument of type string which will in turn be passed into the Lambda function via the event parameter.\n",
|
||||
"\n",
|
||||
|
||||
@@ -1,112 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Google Finance\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use the Google Finance Tool to get information from the Google Finance page\n",
|
||||
"\n",
|
||||
"To get an SerpApi key key, sign up at: https://serpapi.com/users/sign_up.\n",
|
||||
"\n",
|
||||
"Then install google-search-results with the command: \n",
|
||||
"\n",
|
||||
"pip install google-search-results\n",
|
||||
"\n",
|
||||
"Then set the environment variable SERPAPI_API_KEY to your SerpApi key\n",
|
||||
"\n",
|
||||
"Or pass the key in as a argument to the wrapper serp_api_key=\"your secret key\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Use the Tool"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install google-search-results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.tools.google_finance import GoogleFinanceQueryRun\n",
|
||||
"from langchain.utilities.google_finance import GoogleFinanceAPIWrapper\n",
|
||||
"\n",
|
||||
"os.environ[\"SERPAPI_API_KEY\"] = \"\"\n",
|
||||
"tool = GoogleFinanceQueryRun(api_wrapper=GoogleFinanceAPIWrapper())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tool.run(\"Google\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Using it with Langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
|
||||
"os.environ[\"SERP_API_KEY\"] = \"\"\n",
|
||||
"llm = OpenAI()\n",
|
||||
"tools = load_tools([\"google-scholar\", \"google-finance\"], llm=llm)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")\n",
|
||||
"agent.run(\"what is google's stock\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -1,109 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Google Trends\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use the Google Trends Tool to fetch trends information.\n",
|
||||
"\n",
|
||||
"First, you need to sign up for an `SerpApi key` key at: https://serpapi.com/users/sign_up.\n",
|
||||
"\n",
|
||||
"Then you must install `google-search-results` with the command:\n",
|
||||
"\n",
|
||||
"`pip install google-search-results`\n",
|
||||
"\n",
|
||||
"Then you will need to set the environment variable `SERPAPI_API_KEY` to your `SerpApi key`\n",
|
||||
"\n",
|
||||
"[Alternatively you can pass the key in as a argument to the wrapper `serp_api_key=\"your secret key\"`]\n",
|
||||
"\n",
|
||||
"## Use the Tool"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Requirement already satisfied: google-search-results in c:\\python311\\lib\\site-packages (2.4.2)\n",
|
||||
"Requirement already satisfied: requests in c:\\python311\\lib\\site-packages (from google-search-results) (2.31.0)\n",
|
||||
"Requirement already satisfied: charset-normalizer<4,>=2 in c:\\python311\\lib\\site-packages (from requests->google-search-results) (3.3.2)\n",
|
||||
"Requirement already satisfied: idna<4,>=2.5 in c:\\python311\\lib\\site-packages (from requests->google-search-results) (3.4)\n",
|
||||
"Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\python311\\lib\\site-packages (from requests->google-search-results) (2.1.0)\n",
|
||||
"Requirement already satisfied: certifi>=2017.4.17 in c:\\python311\\lib\\site-packages (from requests->google-search-results) (2023.7.22)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!pip install google-search-results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.tools.google_trends import GoogleTrendsQueryRun\n",
|
||||
"from langchain.utilities.google_trends import GoogleTrendsAPIWrapper\n",
|
||||
"\n",
|
||||
"os.environ[\"SERPAPI_API_KEY\"] = \"\"\n",
|
||||
"tool = GoogleTrendsQueryRun(api_wrapper=GoogleTrendsAPIWrapper())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Query: Water\\nDate From: Nov 20, 2022\\nDate To: Nov 11, 2023\\nMin Value: 72\\nMax Value: 100\\nAverage Value: 84.25490196078431\\nPrecent Change: 5.555555555555555%\\nTrend values: 72, 72, 74, 77, 86, 80, 82, 88, 79, 79, 85, 82, 81, 84, 83, 77, 80, 85, 82, 80, 88, 84, 82, 84, 83, 85, 92, 92, 100, 92, 100, 96, 94, 95, 94, 98, 96, 84, 86, 84, 85, 83, 83, 76, 81, 85, 78, 77, 81, 75, 76\\nRising Related Queries: avatar way of water, avatar the way of water, owala water bottle, air up water bottle, lake mead water level\\nTop Related Queries: water park, water bottle, water heater, water filter, water tank, water bill, water world, avatar way of water, avatar the way of water, coconut water, deep water, water cycle, water dispenser, water purifier, water pollution, distilled water, hot water heater, water cooler, sparkling water, american water, micellar water, density of water, tankless water heater, tonic water, water jug'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tool.run(\"Water\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.9.16 ('langchain')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "15e58ce194949b77a891bd4339ce3d86a9bd138e905926019517993f97db9e6c"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
@@ -1,262 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Reddit Search \n",
|
||||
"\n",
|
||||
"In this notebook, we learn how the Reddit search tool works. \n",
|
||||
"First make sure that you have installed praw with the command below: "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "shellscript"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install praw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Then you need to set you need to set up the proper API keys and environment variables. You would need to create a Reddit user account and get credentials. So, create a Reddit user account by going to https://www.reddit.com and signing up. \n",
|
||||
"Then get your credentials by going to https://www.reddit.com/prefs/apps and creating an app. \n",
|
||||
"You should have your client_id and secret from creating the app. Now, you can paste those strings in client_id and client_secret variable. \n",
|
||||
"Note: You can put any string for user_agent "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"client_id = \"\"\n",
|
||||
"client_secret = \"\"\n",
|
||||
"user_agent = \"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.tools.reddit_search.tool import RedditSearchRun\n",
|
||||
"from langchain.utilities.reddit_search import RedditSearchAPIWrapper\n",
|
||||
"\n",
|
||||
"search = RedditSearchRun(\n",
|
||||
" api_wrapper=RedditSearchAPIWrapper(\n",
|
||||
" reddit_client_id=client_id,\n",
|
||||
" reddit_client_secret=client_secret,\n",
|
||||
" reddit_user_agent=user_agent,\n",
|
||||
" )\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can then set your queries for example, what subreddit you want to query, how many posts you want to be returned, how you would like the result to be sorted etc."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.tools.reddit_search.tool import RedditSearchSchema\n",
|
||||
"\n",
|
||||
"search_params = RedditSearchSchema(\n",
|
||||
" query=\"beginner\", sort=\"new\", time_filter=\"week\", subreddit=\"python\", limit=\"2\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Finally run the search and get your results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"result = search.run(tool_input=search_params.dict())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here is an example of printing the result. \n",
|
||||
"Note: You may get different output depending on the newest post in the subreddit but the formatting should be similar."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"> Searching r/python found 2 posts:\n",
|
||||
"> Post Title: 'Setup Github Copilot in Visual Studio Code'\n",
|
||||
"> User: Feisty-Recording-715\n",
|
||||
"> Subreddit: r/Python:\n",
|
||||
"> Text body: 🛠️ This tutorial is perfect for beginners looking to strengthen their understanding of version control or for experienced developers seeking a quick reference for GitHub setup in Visual Studio Code.\n",
|
||||
">\n",
|
||||
">🎓 By the end of this video, you'll be equipped with the skills to confidently manage your codebase, collaborate with others, and contribute to open-source projects on GitHub.\n",
|
||||
">\n",
|
||||
">\n",
|
||||
">Video link: https://youtu.be/IdT1BhrSfdo?si=mV7xVpiyuhlD8Zrw\n",
|
||||
">\n",
|
||||
">Your feedback is welcome\n",
|
||||
"> Post URL: https://www.reddit.com/r/Python/comments/1823wr7/setup_github_copilot_in_visual_studio_code/\n",
|
||||
"> Post Category: N/A.\n",
|
||||
"> Score: 0\n",
|
||||
">\n",
|
||||
">Post Title: 'A Chinese Checkers game made with pygame and PySide6, with custom bots support'\n",
|
||||
">User: HenryChess\n",
|
||||
">Subreddit: r/Python:\n",
|
||||
"> Text body: GitHub link: https://github.com/henrychess/pygame-chinese-checkers\n",
|
||||
">\n",
|
||||
">I'm not sure if this counts as beginner or intermediate. I think I'm still in the beginner zone, so I flair it as beginner.\n",
|
||||
">\n",
|
||||
">This is a Chinese Checkers (aka Sternhalma) game for 2 to 3 players. The bots I wrote are easy to beat, as they're mainly for debugging the game logic part of the code. However, you can write up your own custom bots. There is a guide at the github page.\n",
|
||||
"> Post URL: https://www.reddit.com/r/Python/comments/181xq0u/a_chinese_checkers_game_made_with_pygame_and/\n",
|
||||
"> Post Category: N/A.\n",
|
||||
" > Score: 1\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using tool with an agent chain\n",
|
||||
"\n",
|
||||
"Reddit search functionality is also provided as a multi-input tool. In this example, we adapt [existing code from the docs](https://python.langchain.com/docs/modules/agents/how_to/sharedmemory_for_tools), and use ChatOpenAI to create an agent chain with memory. This agent chain is able to pull information from Reddit and use these posts to respond to subsequent input. \n",
|
||||
"\n",
|
||||
"To run the example, add your reddit API access information and also get an OpenAI key from the [OpenAI API](https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Adapted code from https://python.langchain.com/docs/modules/agents/how_to/sharedmemory_for_tools\n",
|
||||
"\n",
|
||||
"from langchain.agents import AgentExecutor, StructuredChatAgent, Tool\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.memory import ConversationBufferMemory, ReadOnlySharedMemory\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.tools.reddit_search.tool import RedditSearchRun\n",
|
||||
"from langchain.utilities.reddit_search import RedditSearchAPIWrapper\n",
|
||||
"\n",
|
||||
"# Provide keys for Reddit\n",
|
||||
"client_id = \"\"\n",
|
||||
"client_secret = \"\"\n",
|
||||
"user_agent = \"\"\n",
|
||||
"# Provide key for OpenAI\n",
|
||||
"openai_api_key = \"\"\n",
|
||||
"\n",
|
||||
"template = \"\"\"This is a conversation between a human and a bot:\n",
|
||||
"\n",
|
||||
"{chat_history}\n",
|
||||
"\n",
|
||||
"Write a summary of the conversation for {input}:\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(input_variables=[\"input\", \"chat_history\"], template=template)\n",
|
||||
"memory = ConversationBufferMemory(memory_key=\"chat_history\")\n",
|
||||
"\n",
|
||||
"prefix = \"\"\"Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:\"\"\"\n",
|
||||
"suffix = \"\"\"Begin!\"\n",
|
||||
"\n",
|
||||
"{chat_history}\n",
|
||||
"Question: {input}\n",
|
||||
"{agent_scratchpad}\"\"\"\n",
|
||||
"\n",
|
||||
"tools = [\n",
|
||||
" RedditSearchRun(\n",
|
||||
" api_wrapper=RedditSearchAPIWrapper(\n",
|
||||
" reddit_client_id=client_id,\n",
|
||||
" reddit_client_secret=client_secret,\n",
|
||||
" reddit_user_agent=user_agent,\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"prompt = StructuredChatAgent.create_prompt(\n",
|
||||
" prefix=prefix,\n",
|
||||
" tools=tools,\n",
|
||||
" suffix=suffix,\n",
|
||||
" input_variables=[\"input\", \"chat_history\", \"agent_scratchpad\"],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0, openai_api_key=openai_api_key)\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
|
||||
"agent = StructuredChatAgent(llm_chain=llm_chain, verbose=True, tools=tools)\n",
|
||||
"agent_chain = AgentExecutor.from_agent_and_tools(\n",
|
||||
" agent=agent, verbose=True, memory=memory, tools=tools\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Answering the first prompt requires usage of the Reddit search tool.\n",
|
||||
"agent_chain.run(input=\"What is the newest post on r/langchain for the week?\")\n",
|
||||
"# Answering the subsequent prompt uses memory.\n",
|
||||
"agent_chain.run(input=\"Who is the author of the post?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.11.5 64-bit ('langchaindev')",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "3929050b09828356c9f5ebaf862d05c053d8228eddbc70f990c168e54dd824ba"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,74 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# StackExchange\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use the stack exchange component.\n",
|
||||
"\n",
|
||||
"All you need to do is install stackapi:\n",
|
||||
"1. pip install stackapi\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install stackapi"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.utilities import StackExchangeAPIWrapper"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"stackexchange = StackExchangeAPIWrapper()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"stackexchange.run(\"zsh: command not found: python\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -22,7 +22,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install hologres-vector"
|
||||
"#!pip install psycopg2"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -23,7 +23,7 @@
|
||||
"\n",
|
||||
"You will need a running Meilisearch instance to use as your vector store. You can run [Meilisearch in local](https://www.meilisearch.com/docs/learn/getting_started/installation#local-installation) or create a [Meilisearch Cloud](https://cloud.meilisearch.com/) account.\n",
|
||||
"\n",
|
||||
"As of Meilisearch v1.3, vector storage is an experimental feature. After launching your Meilisearch instance, you need to **enable vector storage**. For self-hosted Meilisearch, read the docs on [enabling experimental features](https://www.meilisearch.com/docs/learn/experimental/overview). On **Meilisearch Cloud**, enable _Vector Store_ via your project _Settings_ page.\n",
|
||||
"As of Meilisearch v1.3, vector storage is an experimental feature. After launching your Meilisearch instance, you need to **enable vector storage**. For self-hosted Meilisearch, read the docs on [enabling experimental features](https://www.meilisearch.com/docs/learn/experimental/vector-search). On **Meilisearch Cloud**, enable _Vector Store_ via your project _Settings_ page.\n",
|
||||
"\n",
|
||||
"You should now have a running Meilisearch instance with vector storage enabled. 🎉\n",
|
||||
"\n",
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user